OCR for ancient Greek

Tools and advice for the Optical Character Recognition (OCR) of Ancient Greek

Ancient Greek OCR provides downloads and instructions for OCR using the Tesseract engine. Works on Windows, Linux, OSX & Android.
Antigrapheus allows you to use the Ancient Greek OCR training file above to OCR documents in a web browser, using Tesseract.js.
Bruce Robertson has created "Rigaudon", "a complete suite of scripts, python code and data required for producing polytonic Greek OCR"
- Rigaudon GitHub page
- Lace: Greek OCR collects results of OCR processing with Rigaudon on public domain texts
- Initial reports on preliminary results of a survey of techniques: http://www.heml.org/RobertsonGreekOCR/
A number of people have produced training files for specific Greek fonts in the Kraken OCR engine:
- Greek Cursive, from an edition of John Chrysostom's works by Henry Savile
- Greek from an edition of Theodorus Gaza's Attic paraphrase of the Iliad
- Greek models in the Kraken models repo (these are in the legacy pyrnn model format and may not work with the latest version of Kraken, see this issue)
The Gamera toolkit for analysing and scanning complex texts includes some experiments with polytonic Greek
Federico Boschetti did some earlier experimentation with adapting/training Google's OCR engine tesseract to ancient Greek texts: http://www.himeros.eu/ (related paper)
ABBYY FineReader can be made to work with ancient Greek with extensive training
Google Docs now allows you to have it do OCR on uploaded documents in a variety of languages, and you can get some results by specifying "Greek" and uploading a PDF (images seem not to work). Quality is about on the level of Google Books OCR of printed ancient Greek.

Alternatives

AccessTEI is a service for members of the TEI for manual keying of texts which can handle ancient Greek

OCR for ancient Greek

Tools and advice for the Optical Character Recognition (OCR) of Ancient Greek

Alternatives

Navigation menu

Search