OCR for ancient Greek: Difference between revisions
Jump to navigation
Jump to search
RyanBaumann (talk | contribs) (add Antigrapheus) |
RyanBaumann (talk | contribs) (add Kraken) |
||
Line 7: | Line 7: | ||
** [[Lace: Greek OCR]] collects results of OCR processing with Rigaudon on public domain texts | ** [[Lace: Greek OCR]] collects results of OCR processing with Rigaudon on public domain texts | ||
** Initial reports on preliminary results of a survey of techniques: http://www.heml.org/RobertsonGreekOCR/ | ** Initial reports on preliminary results of a survey of techniques: http://www.heml.org/RobertsonGreekOCR/ | ||
* A number of people have produced training files for specific Greek fonts in the [http://kraken.re/ Kraken] OCR engine: | |||
** [https://github.com/pharos-alexandria/kraken-ocr-greek_cursive Greek Cursive, from an edition of John Chrysostom's works by Henry Savile] | |||
** [https://github.com/ryanfb/kraken-gaza-iliad Greek from an edition of Theodorus Gaza's Attic paraphrase of the Iliad] | |||
** [https://github.com/mittagessen/kraken-models Greek models in the Kraken models repo] (these are in the legacy pyrnn model format and may not work with the latest version of Kraken, see [https://github.com/mittagessen/kraken/issues/118 this issue]) | |||
* The [http://gamera.informatik.hsnr.de/ Gamera] toolkit for analysing and scanning complex texts includes some experiments with polytonic Greek | * The [http://gamera.informatik.hsnr.de/ Gamera] toolkit for analysing and scanning complex texts includes some experiments with polytonic Greek | ||
* Federico Boschetti did some earlier experimentation with adapting/training Google's OCR engine [http://code.google.com/p/tesseract-ocr/ tesseract] to ancient Greek texts: http://www.himeros.eu/ ([http://www.perseus.tufts.edu/~ababeu/ecdl2009-preprint.pdf related paper]) | * Federico Boschetti did some earlier experimentation with adapting/training Google's OCR engine [http://code.google.com/p/tesseract-ocr/ tesseract] to ancient Greek texts: http://www.himeros.eu/ ([http://www.perseus.tufts.edu/~ababeu/ecdl2009-preprint.pdf related paper]) |
Revision as of 13:36, 12 July 2019
Tools and advice for the Optical Character Recognition (OCR) of Ancient Greek
- Ancient Greek OCR provides downloads and instructions for OCR using the Tesseract engine. Works on Windows, Linux, OSX & Android.
- Antigrapheus allows you to use the Ancient Greek OCR training file above to OCR documents in a web browser, using Tesseract.js.
- Bruce Robertson has created "Rigaudon", "a complete suite of scripts, python code and data required for producing polytonic Greek OCR"
- Rigaudon GitHub page
- Lace: Greek OCR collects results of OCR processing with Rigaudon on public domain texts
- Initial reports on preliminary results of a survey of techniques: http://www.heml.org/RobertsonGreekOCR/
- A number of people have produced training files for specific Greek fonts in the Kraken OCR engine:
- Greek Cursive, from an edition of John Chrysostom's works by Henry Savile
- Greek from an edition of Theodorus Gaza's Attic paraphrase of the Iliad
- Greek models in the Kraken models repo (these are in the legacy pyrnn model format and may not work with the latest version of Kraken, see this issue)
- The Gamera toolkit for analysing and scanning complex texts includes some experiments with polytonic Greek
- Federico Boschetti did some earlier experimentation with adapting/training Google's OCR engine tesseract to ancient Greek texts: http://www.himeros.eu/ (related paper)
- The commercial OCR software Anagnostis (€585) can handle ancient Greek, though apparently poorly
- ABBYY FineReader can be made to work with ancient Greek with extensive training
- Google Docs now allows you to have it do OCR on uploaded documents in a variety of languages, and you can get some results by specifying "Greek" and uploading a PDF (images seem not to work). Quality is about on the level of Google Books OCR of printed ancient Greek.
Alternatives
- AccessTEI is a service for members of the TEI for manual keying of texts which can handle ancient Greek