Difference between revisions of "Extracting Information from Classics Scholarly Texts (Romanello)"

From The Digital Classicist Wiki
Jump to: navigation, search
(Material: adding text mining category)
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Provisional Title: ==  
+
==Title==  
'''''Structured and Unstructured: Extracting Information from Classics Scholarly Texts (PhD Research Project in Digital Humanities)'''''
+
''Structured and Unstructured: Extracting Information from Classics Scholarly Texts''
  
[[user:MatteoRomanello|Matteo Romanello]]
+
==Author==
 +
* [[user:MatteoRomanello|Matteo Romanello]]
  
Supervisors:
+
==Dates==
 +
* Started: 2010?
 +
* Awarded: 2015
 +
 
 +
==Supervisors==
 
* Willard McCarty (Centre for Computing in the Humanities, King's College London)
 
* Willard McCarty (Centre for Computing in the Humanities, King's College London)
 
* Jonathan Ginzburg (Department of Computer Science, King's College London)
 
* Jonathan Ginzburg (Department of Computer Science, King's College London)
  
== Abstract ==
+
==Abstract==
 
+
  
 
The project is an ongoing Computational Linguistic and Text Analytic study of how the language and structure of explicitly encoded data sources can be used to help mining texts of unencoded corpora.
 
The project is an ongoing Computational Linguistic and Text Analytic study of how the language and structure of explicitly encoded data sources can be used to help mining texts of unencoded corpora.
  
The two corpora being currently considered contain respectively OCRed journal papers and working papers about Classical(Greek and Latin) texts.
+
The two corpora being currently considered contain respectively OCRed journal papers and working papers about Classical (Greek and Latin) texts.
  
The presented project aims at showing how  - and with which gain in terms of accuracy - information extracted from structured data sources can be used to automatically extract information from an unstructured
+
The presented project aims at showing how  - and with which gain in terms of accuracy - information extracted from structured data sources can be used to automatically extract information from an unstructured corpus. The extracted information is meant to be used in order to provide semantic access over the corpus itself.
corpus. The extracted information is meant to be used in order to provide semantic access over the corpus itself.
+
  
== Presentations ==
+
==Presentations==
  
 
* poster presentation at the Arts and Humanities Week 2009, King's College London : [[http://www.slideshare.net/56k/extracting-information-from-classics-scholarly-texts poster]]
 
* poster presentation at the Arts and Humanities Week 2009, King's College London : [[http://www.slideshare.net/56k/extracting-information-from-classics-scholarly-texts poster]]
 
* presentation at the PhD Seminar (CCH/KCL) : [[http://www.slideshare.net/56k/stuctured-vs-unstructured-extracting-information-from-classics-scholarly-texts slides]]
 
* presentation at the PhD Seminar (CCH/KCL) : [[http://www.slideshare.net/56k/stuctured-vs-unstructured-extracting-information-from-classics-scholarly-texts slides]]
 
* presentation at the EIRI – CCH Conference on the Digitization in the Humanities at Keio University (Tokyo): [[http://www.slideshare.net/56k/romanello-tokyo slides]]
 
* presentation at the EIRI – CCH Conference on the Digitization in the Humanities at Keio University (Tokyo): [[http://www.slideshare.net/56k/romanello-tokyo slides]]
 +
* poster presentation at the [http://dh2010.cch.kcl.ac.uk DH2010 conference]: [[http://www.slideshare.net/56k/structured-and-unstructured-extracting-information-from-classics-scholarly-texts poster]], [[http://dh2010.cch.kcl.ac.uk/academic-programme/abstracts/papers/html/ab-803.html HTML abstract]], [[http://dh2010.cch.kcl.ac.uk/academic-programme/abstracts/papers/pdf/ab-803.pdf PDF abstract]]
 +
* Presentation at the Digital Classicist seminar, June 25, 2010 ([http://www.digitalclassicist.org/wip/wip2010-04mr.html abstract], [http://www.digitalclassicist.org/wip/wip2010-04mr.mp3 audio], [http://www.digitalclassicist.org/wip/wip2010-04mr.pdf slides])
 +
 +
==Material==
 +
 +
* The '''bibliographic material''' I am collecting for the literature review is being shared at [[http://wiki.digitalclassicist.org/Digital_Classics_Bibliography this Wiki page]]
 +
* A first piece of software I developed for my project and made available under open access license is '''CRefEx a canonical references extractor''': [http://github.com/mromanello/CRefEx code], [http://wiki.github.com/mromanello/CRefEx/ wiki]. Please join the development of this tool if you are interested on this topic!
  
  
== Material ==
+
[[category:Dissertations|Romanello M Extracting]]
 +
[[category:projects]]
 +
[[category:citation]]
 +
[[category:text mining]]

Latest revision as of 17:21, 4 April 2017

Contents

[edit] Title

Structured and Unstructured: Extracting Information from Classics Scholarly Texts

[edit] Author

[edit] Dates

  • Started: 2010?
  • Awarded: 2015

[edit] Supervisors

  • Willard McCarty (Centre for Computing in the Humanities, King's College London)
  • Jonathan Ginzburg (Department of Computer Science, King's College London)

[edit] Abstract

The project is an ongoing Computational Linguistic and Text Analytic study of how the language and structure of explicitly encoded data sources can be used to help mining texts of unencoded corpora.

The two corpora being currently considered contain respectively OCRed journal papers and working papers about Classical (Greek and Latin) texts.

The presented project aims at showing how - and with which gain in terms of accuracy - information extracted from structured data sources can be used to automatically extract information from an unstructured corpus. The extracted information is meant to be used in order to provide semantic access over the corpus itself.

[edit] Presentations

  • poster presentation at the Arts and Humanities Week 2009, King's College London : [poster]
  • presentation at the PhD Seminar (CCH/KCL) : [slides]
  • presentation at the EIRI – CCH Conference on the Digitization in the Humanities at Keio University (Tokyo): [slides]
  • poster presentation at the DH2010 conference: [poster], [HTML abstract], [PDF abstract]
  • Presentation at the Digital Classicist seminar, June 25, 2010 (abstract, audio, slides)

[edit] Material

  • The bibliographic material I am collecting for the literature review is being shared at [this Wiki page]
  • A first piece of software I developed for my project and made available under open access license is CRefEx a canonical references extractor: code, wiki. Please join the development of this tool if you are interested on this topic!
Personal tools