OCR for ancient Greek: Difference between revisions

From The Digital Classicist Wiki
Jump to navigation Jump to search
(Added Tesseract tool, reordered)
(8 intermediate revisions by 5 users not shown)
Line 1: Line 1:
* [http://code.google.com/p/tesseract-ocr/ Tesseract] is an ongoing Google open source project for OCR.
==Tools and advice for the Optical Character Recognition (OCR) of Ancient Greek==
 
* [http://ancientgreekocr.org Ancient Greek OCR] provides downloads and instructions for OCR using the [http://code.google.com/p/tesseract-ocr Tesseract] engine. Works on Windows, Linux, OSX & Android.
* [https://dcthree.github.io/antigrapheus/ Antigrapheus] allows you to use the Ancient Greek OCR training file above to OCR documents in a web browser, using Tesseract.js.
* Bruce Robertson has created "Rigaudon", "a complete suite of scripts, python code and data required for producing polytonic Greek OCR"
** [https://github.com/brobertson/rigaudon Rigaudon GitHub page]
** [[Lace: Greek OCR]] collects results of OCR processing with Rigaudon on public domain texts
** Initial reports on preliminary results of a survey of techniques: http://www.heml.org/RobertsonGreekOCR/
* A number of people have produced training files for specific Greek fonts in the [http://kraken.re/ Kraken] OCR engine:
** [https://github.com/pharos-alexandria/kraken-ocr-greek_cursive Greek Cursive, from an edition of John Chrysostom's works by Henry Savile]
** [https://github.com/ryanfb/kraken-gaza-iliad Greek from an edition of Theodorus Gaza's Attic paraphrase of the Iliad]
** [https://github.com/mittagessen/kraken-models Greek models in the Kraken models repo] (these are in the legacy pyrnn model format and may not work with the latest version of Kraken, see [https://github.com/mittagessen/kraken/issues/118 this issue])
* The [http://gamera.informatik.hsnr.de/ Gamera] toolkit for analysing and scanning complex texts includes some experiments with polytonic Greek
* The [http://gamera.informatik.hsnr.de/ Gamera] toolkit for analysing and scanning complex texts includes some experiments with polytonic Greek
* Bruce Robertson reports on some preliminary results of a survey of techniques: http://www.heml.org/RobertsonGreekOCR/
* Federico Boschetti did some earlier experimentation with adapting/training Google's OCR engine [http://code.google.com/p/tesseract-ocr/ tesseract] to ancient Greek texts: http://www.himeros.eu/ ([http://www.perseus.tufts.edu/~ababeu/ecdl2009-preprint.pdf related paper])
* Federico Boschetti has been experimenting with adapting/training Google's OCR engine [http://code.google.com/p/tesseract-ocr/ tesseract] to ancient Greek texts: http://www.himeros.eu/ ([http://www.perseus.tufts.edu/~ababeu/ecdl2009-preprint.pdf related paper])
* The commercial OCR software [http://www.ideatech-online.com/index.php?option=com_content&task=view&id=23&Itemid=27 Anagnostis] (€585) can handle ancient Greek, though apparently poorly
* [http://finereader.abbyy.com/ ABBYY FineReader] can be made to work with ancient Greek with extensive training
* [http://finereader.abbyy.com/ ABBYY FineReader] can be made to work with ancient Greek with extensive training
* Google Docs now allows you to have it do [http://googledocs.blogspot.com/2011/02/optical-character-recognition-ocr-in-34.html OCR on uploaded documents in a variety of languages], and you can get some results by specifying "Greek" and uploading a PDF (images seem not to work). Quality is about on the level of Google Books OCR of printed ancient Greek.
* Google Docs now allows you to have it do [http://googledocs.blogspot.com/2011/02/optical-character-recognition-ocr-in-34.html OCR on uploaded documents in a variety of languages], and you can get some results by specifying "Greek" and uploading a PDF (images seem not to work). Quality is about on the level of Google Books OCR of printed ancient Greek.


===alternatives===
===Alternatives===


* [http://accesstei.apexcovantage.com/ AccessTEI] is a service for members of the TEI for manual keying of texts which can handle ancient Greek
* [http://accesstei.apexcovantage.com/ AccessTEI] is a service for members of the TEI for manual keying of texts which can handle ancient Greek


==External links==
 
* [https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1005&L=DIGITALCLASSICIST&F=&S=&P=2180 Discussion of ancient Greek OCR software on Digital Classicist mailing list]
* [http://www.odl.ox.ac.uk/papers/OCRFeasibility_final.pdf Deciding whether Optical Character Recognition is feasible, Simon Tanner (KDCS), 2004]


[[category:FAQ]]
[[category:FAQ]]
[[category:Tools]]
[[category:Tools]]
[[category:OCR]]

Revision as of 17:49, 6 August 2019

Tools and advice for the Optical Character Recognition (OCR) of Ancient Greek

Alternatives

  • AccessTEI is a service for members of the TEI for manual keying of texts which can handle ancient Greek