Deucalion and Pie lemmatizers: Difference between revisions

Latest revision as of 16:23, 28 May 2023

Available

Pie: https://github.com/emanjavacas/pie
Latin Model: https://github.com/PonteIneptique/latin-lasla-models
Pie-Extended: https://github.com/hipster-philology/nlp-pie-taggers
Deucalion, a Web interface for Flask-Pie: https://dh.chartes.psl.eu/deucalion/ (Ancient Greek, and Latin, as well as Old French, Modern French, Early Modern French, and Middle Dutch)

Author

Enrique Manjavas
Mike Kestemont
Thibault Clérice

Description

Pie is a language independant lemmatizer implemented in python and built for "variation-rich languages" which includes Latin. It's a deep learning tool that can be trained and retrained with data in TSV format. As of 2019, it seems to be one of the state-of-the-art lemmatizers in terms of results. It can be trained jointly on morphology, POS and lemmatization tasks.

Pie Extended

Pie-Extended an extension built on top of Pie to ease its use as a tagger: it handles downloading of models, tokenization and post-/pre-processing. It requires python > 3.6 and just enough knowledge about installing libraries in Python as well as using a Command Line Interface.

Deucalion (now Flask Pie)

Flask-Pie (previously known as Deucalion) provides adapters to server Pie models over HTTP servers.

Bibliography

D. Longrée, C. Philippart de Foy & G. Purnelle. « Structures phrastiques et analyse automatique des données morphosyntaxiques : le projet LatSynt », in S. Bolasco, I. Chiari & L. Giuliano (eds), Statistical Analysis of Textual Data, Proceedings of 10th International Conference Journées d'Analyse statistique des Données Textuelles, 9-11 June 2010, Sapienza University of Rome, Rome, LED, pp. 433-442.
D. Longrée & C. Poudat, « New Ways of Lemmatizing and Tagging Classical and post-Classical Latin: the LATLEM project of the LASLA », in P. Anreiter & M. Kienpointner (éd.), Proceedings of the 15th International Colloquium on Latin Linguistics, (Innsbrucker Beiträge zur Sprachwissenschaft), Innsbruck, 2010, pp. 683-694.
D. Longrée & C. Philippart de Foy & G. Purnelle, « Subordinate clause boundaries and word order in Latin: the contribution of the L.A.S.L.A. syntactic parser project LatSynt », in P. Anreiter & M. Kienpointner, éd.), Proceedings of the 15th International Colloquium on Latin Linguistics, (Innsbrucker Beiträge zur Sprachwissenschaft), Innsbruck, 2010, pp. 673-681.
D. Longrée & Poudat C., « Variations langagières et annotation morphosyntaxique du latin classique », TAL, 50 – n° 2/2009, Special issue on "Natural Language Processing and Ancient Languages", pp. 129-148.
Enrique Manjavacas & Mike Kestemont. (2019, January 17). emanjavacas/pie v0.1.3 (Version v0.1.3). Zenodo. http://doi.org/10.5281/zenodo.2542537
Thibault Clérice. (2019, February 1). chartes/deucalion-model-lasla: LASLA Latin Lemmatizer - Alpha (Version 0.0.1). Zenodo. http://doi.org/10.5281/zenodo.2554847

@@ Line 1: / Line 1: @@
-== Pie ==
+== Available ==
-[https://github.com/emanjavacas/pie Pie] is a language independant lemmatizer implemented in python and built for "variation-rich languages" which includes Latin. It's a deep learning tool that can be trained and retrained with data in TSV format. As of 2019, it seems to be one of the state-of-the-art lemmatizers in terms of results. It can be trained jointly on morphology, POS and lemmatization tasks.
+* Pie: https://github.com/emanjavacas/pie
+* Latin Model: https://github.com/PonteIneptique/latin-lasla-models
+* Pie-Extended: https://github.com/hipster-philology/nlp-pie-taggers
+* Deucalion, a Web interface for Flask-Pie: https://dh.chartes.psl.eu/deucalion/ (Ancient Greek, and Latin, as well as Old French, Modern French, Early Modern French, and Middle Dutch)
-== Deucalion ==
+== Author ==
-[https://github.com/chartes/deucalion-model-lasla Deucalion (with LASLA data)] is :
+* Enrique Manjavas
+* Mike Kestemont
+* Thibault Clérice
-* a model for the lemmatizer Pie ([https://github.com/chartes/deucalion-model-lasla/blob/master/lemma.split-morph.tar .tar file on github])
+== Description ==
-* a web-application that can be easily deployed for running a lemmatization service. It runs on Python3 and flask
-* a [https://hub.docker.com/r/ponteineptique/deucalion-model-lasla Docker Image ] that makes running it even simpler
-In terms of statistics, the corpus was trained over around 1.3 million tokens (June 2019). The accuracy are described in the [https://github.com/chartes/deucalion-model-lasla/tree/master/information information] folder of the image but we can note the following accuracies:
+'''Pie''' is a language independant lemmatizer implemented in python and built for "variation-rich languages" which includes Latin. It's a deep learning tool that can be trained and retrained with data in TSV format. As of 2019, it seems to be one of the state-of-the-art lemmatizers in terms of results. It can be trained jointly on morphology, POS and lemmatization tasks.
-* Lemmatization : 97,52 %
+=== Pie Extended ===
-* Part-Of-Speech: 96.55 %
-* Morphology
+Pie-Extended an extension built on top of Pie to ease its use as a tagger: it handles downloading of models, tokenization and post-/pre-processing. It requires python > 3.6 and just enough knowledge about installing libraries in Python as well as using a Command Line Interface.
-** Voice : 99.18 %
-** Mood : 98.36 %
+=== Deucalion (now Flask Pie) ===
-** Degree : 98.30 %
-** Number : 97.88 %
+Flask-Pie (previously known as Deucalion) provides adapters to server Pie models over HTTP servers.
-** Person : 99.18 %
-** Tense : 98.75 %
-** Tense : 93.74 %
-** Gender : 97.27 % (Note that not all words were annotated in genders in the LASLA data, specifically not the nouns)
-A version is hosted at [https://dev.chartes.psl.eu/deucalion/models/lasla/ the École des Chartes]
 == Bibliography ==
 * D. Longrée, C. Philippart de Foy & G. Purnelle. « Structures phrastiques et analyse automatique des données morphosyntaxiques : le projet LatSynt », in S. Bolasco, I. Chiari & L. Giuliano (eds), Statistical Analysis of Textual Data, Proceedings of 10th International Conference Journées d'Analyse statistique des Données Textuelles, 9-11 June 2010, Sapienza University of Rome, Rome, LED, pp. 433-442.
@@ Line 35: / Line 32: @@
 * Enrique Manjavacas & Mike Kestemont. (2019, January 17). emanjavacas/pie v0.1.3 (Version v0.1.3). Zenodo. http://doi.org/10.5281/zenodo.2542537
 * Thibault Clérice. (2019, February 1). chartes/deucalion-model-lasla: LASLA Latin Lemmatizer - Alpha (Version 0.0.1). Zenodo. http://doi.org/10.5281/zenodo.2554847
+[[category:lemmatisation]]
+[[category:tools]]
+[[category:linguistics]]

Deucalion and Pie lemmatizers: Difference between revisions

Latest revision as of 16:23, 28 May 2023

Contents

Available

Author

Description

Pie Extended

Deucalion (now Flask Pie)

Bibliography

Navigation menu

Deucalion and Pie lemmatizers: Difference between revisions

Latest revision as of 16:23, 28 May 2023

Available

Author

Description

Pie Extended

Deucalion (now Flask Pie)

Bibliography

Navigation menu

Search