Deucalion and Pie lemmatizers

Available

 * Pie: https://github.com/emanjavacas/pie
 * Deucalion (with LASLA data): https://github.com/chartes/deucalion-model-lasla

Author

 * Enrique Manjavas
 * Mike Kestemont
 * Thibault Clérice

Description
Pie is a language independant lemmatizer implemented in python and built for "variation-rich languages" which includes Latin. It's a deep learning tool that can be trained and retrained with data in TSV format. As of 2019, it seems to be one of the state-of-the-art lemmatizers in terms of results. It can be trained jointly on morphology, POS and lemmatization tasks.

Deucalion
Deucalion is :


 * a model for the lemmatizer Pie (.tar file on github)
 * a web-application that can be easily deployed for running a lemmatization service. It runs on Python3 and flask
 * a Docker Image that makes running it even simpler

In terms of statistics, the corpus was trained over around 1.3 million tokens (June 2019). The accuracy are described in the information folder of the image but we can note the following accuracies:


 * Lemmatization : 97,52 %
 * Part-Of-Speech: 96.55 %
 * Morphology
 * Voice : 99.18 %
 * Mood : 98.36 %
 * Degree : 98.30 %
 * Number : 97.88 %
 * Person : 99.18 %
 * Tense : 98.75 %
 * Tense : 93.74 %
 * Gender : 97.27 % (Note that not all words were annotated in genders in the LASLA data, specifically not the nouns)

A version is hosted at the École des Chartes