Difference between revisions of "Open Greek and Latin project"
|Line 9:||Line 9:|
* [https://github.com/OpenGreekAndLatin ]
Revision as of 16:37, 1 July 2014
The Open Greek and Latin project is one of the efforts of the Open Philology Project of the Humboldt Chair of Digital Humanities at the University of Leipzig. Its ultimate goal is to represent every source text produced in Classical Greek or Latin from antiquity through the present, including texts preserved in manuscript tradition as well as on inscriptions, papyri, ostraca and other written artifacts. Over the course of the next five years, we will focus upon converting as much Greek and Latin, available as scanned printed books, into an open, dynamic corpus, continuously augmented and improved by a combination of automated processes and human contributions of many kinds. The focus upon Greek and Latin reflects both the belief that we have an obligation to disseminate European cultural heritage and the observation that recent advances in OCR technology for Greek and Latin make these intertwined languages ready for large-scale work. The Open Greek and Latin Project aims at providing at least one version for all Greek and Latin sources produced during antiquity (through c. 600 CE) and a growing collection from the vast body of post-classical Greek and Latin that still survives. Perhaps 150 million words of Greek and Latin, preserved in manuscripts, on stone, on papyrus or other writing surface, survive from antiquity. Analysis of 10,000 books in Latin, downloaded from Archive.org, identified more than 200 million words of post-classical Latin. With 70,000 public domain books listed in the Hathi Trust as being in Ancient Greek or Latin, the amount of Greek and Latin already available will almost certainly exceed 1 billion words. Where existing corpora of Greek and Latin have generally included one edition of a work, Open Greek and Latin Corpus is designed to manage multiple versions of, and to represent the complete textual history of, a work: every manuscript, every papyrus fragment, and every printed edition are all versions within the history of a text. In the short run, this involves using OCR-technology optimized for Classical Greek and Latin to create an open corpus that is reasonably comprehensive for the c. 100 million words produced through c. 600 CE and that begins to make available the billions of words produced after 600 CE in Greek and Latin that survive.