OCR for ancient Greek: Difference between revisions

From The Digital Classicist Wiki
Jump to navigation Jump to search
(Add ancientgreekocr.org tesseract option)
(add/update Bruce Robertson's work)
Line 1: Line 1:
* [http://ancientgreekocr.org Ancient Greek OCR] provides downloads and instructions for OCR using the [http://code.google.com/p/tesseract-ocr Tesseract] engine. Works on Windows, Linux, OSX & Android.
* [http://ancientgreekocr.org Ancient Greek OCR] provides downloads and instructions for OCR using the [http://code.google.com/p/tesseract-ocr Tesseract] engine. Works on Windows, Linux, OSX & Android.
* Bruce Robertson has created "Rigaudon", "a complete suite of scripts, python code and data required for producing polytonic Greek OCR"
** [https://github.com/brobertson/rigaudon Rigaudon GitHub page]
** [http://heml.mta.ca/lace Lace: Greek OCR] collects results of OCR processing with Rigaudon on public domain texts
** Initial reports on preliminary results of a survey of techniques: http://www.heml.org/RobertsonGreekOCR/
* The [http://gamera.informatik.hsnr.de/ Gamera] toolkit for analysing and scanning complex texts includes some experiments with polytonic Greek
* The [http://gamera.informatik.hsnr.de/ Gamera] toolkit for analysing and scanning complex texts includes some experiments with polytonic Greek
* Bruce Robertson reports on some preliminary results of a survey of techniques: http://www.heml.org/RobertsonGreekOCR/
* Federico Boschetti did some earlier experimentation with adapting/training Google's OCR engine [http://code.google.com/p/tesseract-ocr/ tesseract] to ancient Greek texts: http://www.himeros.eu/ ([http://www.perseus.tufts.edu/~ababeu/ecdl2009-preprint.pdf related paper])
* Federico Boschetti did some earlier experimentation with adapting/training Google's OCR engine [http://code.google.com/p/tesseract-ocr/ tesseract] to ancient Greek texts: http://www.himeros.eu/ ([http://www.perseus.tufts.edu/~ababeu/ecdl2009-preprint.pdf related paper])
* The commercial OCR software [http://www.ideatech-online.com/index.php?option=com_content&task=view&id=23&Itemid=27 Anagnostis] (€585) can handle ancient Greek, though apparently poorly
* The commercial OCR software [http://www.ideatech-online.com/index.php?option=com_content&task=view&id=23&Itemid=27 Anagnostis] (€585) can handle ancient Greek, though apparently poorly

Revision as of 16:17, 1 July 2014

  • Ancient Greek OCR provides downloads and instructions for OCR using the Tesseract engine. Works on Windows, Linux, OSX & Android.
  • Bruce Robertson has created "Rigaudon", "a complete suite of scripts, python code and data required for producing polytonic Greek OCR"
  • The Gamera toolkit for analysing and scanning complex texts includes some experiments with polytonic Greek
  • Federico Boschetti did some earlier experimentation with adapting/training Google's OCR engine tesseract to ancient Greek texts: http://www.himeros.eu/ (related paper)
  • The commercial OCR software Anagnostis (€585) can handle ancient Greek, though apparently poorly
  • ABBYY FineReader can be made to work with ancient Greek with extensive training
  • Google Docs now allows you to have it do OCR on uploaded documents in a variety of languages, and you can get some results by specifying "Greek" and uploading a PDF (images seem not to work). Quality is about on the level of Google Books OCR of printed ancient Greek.

alternatives

  • AccessTEI is a service for members of the TEI for manual keying of texts which can handle ancient Greek

External links