Deep learning-based OCR for Greek paleographic manuscripts (Platanou)

From The Digital Classicist Wiki
Jump to navigation Jump to search

Available

Title

  • Deep Learning-based OCR for Greek paleographic manuscripts

Status: completed (defended in December 2021)

Author

Paraskevi Platanou

Supervisors: Papaioannou Georgios, Pavlopoulos Ioannis - Department of Informatics, Athens University of Economics and Business.

Abstract

Today classicists are provided with a great number of digital tools which, in turn, offer possibilities for further study and new research goals. In this thesis we explore the idea that old Greek handwriting can be machine-readable and consequently, researchers can study the target material fast and efficiently. Previous studies have shown that Optical Character Recognition (OCR) models are capable of attaining good accuracy rates. However, achieving high accuracy OCR results for Greek manuscripts is still considered to be a major challenge. The overall aim of this thesis is to examine the efficiency of OCR software for old manuscript reading and train a deep learning model for this task. To address this statement, we study and use digitized images of the Oxford University Bodleian Library Greek manuscripts. In particular, we follow steps which include image preprocessing, transcription and programming. Our ambition is to go beyond the many challenges we face from one step to the other, taking into consideration that Greek handwritten characters are challenging alone when it comes to machine reading, and develop OCR models using deep learning methods in order to render old Greek handwriting machine readable.

Presentations

  • Newcastle PGF Seminar Series 13 April 2022
  • UCL Lyceum Classics Community Seminar 9 March 2022