Deucalion and Pie lemmatizers: Difference between revisions

Revision as of 12:40, 22 September 2020

Available

Pie: https://github.com/emanjavacas/pie
Latin Model: https://github.com/PonteIneptique/latin-lasla-models
Pie-Extended: https://github.com/hipster-philology/nlp-pie-taggers

Author

Enrique Manjavas
Mike Kestemont
Thibault Clérice

Description

Pie is a language independant lemmatizer implemented in python and built for "variation-rich languages" which includes Latin. It's a deep learning tool that can be trained and retrained with data in TSV format. As of 2019, it seems to be one of the state-of-the-art lemmatizers in terms of results. It can be trained jointly on morphology, POS and lemmatization tasks.

Pie Extended

Pie-Extended an extension built on top of Pie to ease its use as a tagger: it handles downloading of models, tokenization and post-/pre-processing. It requires python > 3.6 and just enough knowledge about installing libraries in Python as well as using a Command Line Interface.

Deucalion (now Flask Pie)

Flask-Pie (previously known as Deucalion) provides adapters to server Pie models over HTTP servers.

Bibliography

D. Longrée, C. Philippart de Foy & G. Purnelle. « Structures phrastiques et analyse automatique des données morphosyntaxiques : le projet LatSynt », in S. Bolasco, I. Chiari & L. Giuliano (eds), Statistical Analysis of Textual Data, Proceedings of 10th International Conference Journées d'Analyse statistique des Données Textuelles, 9-11 June 2010, Sapienza University of Rome, Rome, LED, pp. 433-442.
D. Longrée & C. Poudat, « New Ways of Lemmatizing and Tagging Classical and post-Classical Latin: the LATLEM project of the LASLA », in P. Anreiter & M. Kienpointner (éd.), Proceedings of the 15th International Colloquium on Latin Linguistics, (Innsbrucker Beiträge zur Sprachwissenschaft), Innsbruck, 2010, pp. 683-694.
D. Longrée & C. Philippart de Foy & G. Purnelle, « Subordinate clause boundaries and word order in Latin: the contribution of the L.A.S.L.A. syntactic parser project LatSynt », in P. Anreiter & M. Kienpointner, éd.), Proceedings of the 15th International Colloquium on Latin Linguistics, (Innsbrucker Beiträge zur Sprachwissenschaft), Innsbruck, 2010, pp. 673-681.
D. Longrée & Poudat C., « Variations langagières et annotation morphosyntaxique du latin classique », TAL, 50 – n° 2/2009, Special issue on "Natural Language Processing and Ancient Languages", pp. 129-148.
Enrique Manjavacas & Mike Kestemont. (2019, January 17). emanjavacas/pie v0.1.3 (Version v0.1.3). Zenodo. http://doi.org/10.5281/zenodo.2542537
Thibault Clérice. (2019, February 1). chartes/deucalion-model-lasla: LASLA Latin Lemmatizer - Alpha (Version 0.0.1). Zenodo. http://doi.org/10.5281/zenodo.2554847

@@ Line 2: / Line 2: @@
 * Pie: https://github.com/emanjavacas/pie
-* Deucalion (with LASLA data): https://github.com/chartes/deucalion-model-lasla
+* Latin Model: https://github.com/PonteIneptique/latin-lasla-models
+* Pie-Extended: https://github.com/hipster-philology/nlp-pie-taggers
 == Author ==
@@ Line 14: / Line 15: @@
 '''Pie''' is a language independant lemmatizer implemented in python and built for "variation-rich languages" which includes Latin. It's a deep learning tool that can be trained and retrained with data in TSV format. As of 2019, it seems to be one of the state-of-the-art lemmatizers in terms of results. It can be trained jointly on morphology, POS and lemmatization tasks.
-=== Deucalion ===
+=== Pie Extended ===
-Deucalion is :
+Pie-Extended an extension built on top of Pie to ease its use as a tagger: it handles downloading of models, tokenization and post-/pre-processing. It requires python > 3.6 and just enough knowledge about installing libraries in Python as well as using a Command Line Interface.
-* a model for the lemmatizer Pie ([https://github.com/chartes/deucalion-model-lasla/blob/master/lemma.split-morph.tar .tar file on github])
+=== Deucalion (now Flask Pie) ===
-* a web-application that can be easily deployed for running a lemmatization service. It runs on Python3 and flask
-* a [https://hub.docker.com/r/ponteineptique/deucalion-model-lasla Docker Image ] that makes running it even simpler
-In terms of statistics, the corpus was trained over around 1.3 million tokens (June 2019). The accuracy are described in the [https://github.com/chartes/deucalion-model-lasla/tree/master/information information] folder of the image but we can note the following accuracies:
+Flask-Pie (previously known as Deucalion) provides adapters to server Pie models over HTTP servers.
-* Lemmatization : 97,52 %
-* Part-Of-Speech: 96.55 %
-* Morphology
-** Voice : 99.18 %
-** Mood : 98.36 %
-** Degree : 98.30 %
-** Number : 97.88 %
-** Person : 99.18 %
-** Tense : 98.75 %
-** Tense : 93.74 %
-** Gender : 97.27 % (Note that not all words were annotated in genders in the LASLA data, specifically not the nouns)
-A version is hosted at [https://dev.chartes.psl.eu/deucalion/models/lasla/ the École des Chartes]
 == Bibliography ==

Deucalion and Pie lemmatizers: Difference between revisions

Revision as of 12:40, 22 September 2020

Contents

Available

Author

Description

Pie Extended

Deucalion (now Flask Pie)

Bibliography

Navigation menu

Deucalion and Pie lemmatizers: Difference between revisions

Revision as of 12:40, 22 September 2020

Available

Author

Description

Pie Extended

Deucalion (now Flask Pie)

Bibliography

Navigation menu

Search