Deucalion and Pie lemmatizers: Difference between revisions

From The Digital Classicist Wiki
Jump to navigation Jump to search
No edit summary
(Add link for Deucalion as on online service)
 
(2 intermediate revisions by one other user not shown)
Line 4: Line 4:
* Latin Model: https://github.com/PonteIneptique/latin-lasla-models
* Latin Model: https://github.com/PonteIneptique/latin-lasla-models
* Pie-Extended: https://github.com/hipster-philology/nlp-pie-taggers
* Pie-Extended: https://github.com/hipster-philology/nlp-pie-taggers
* Deucalion, a Web interface for Flask-Pie: https://dh.chartes.psl.eu/deucalion/ (Ancient Greek, and Latin, as well as Old French, Modern French, Early Modern French, and Middle Dutch)


== Author ==
== Author ==
Line 34: Line 35:
[[category:lemmatisation]]
[[category:lemmatisation]]
[[category:tools]]
[[category:tools]]
[[category:programming]]
[[category:linguistics]]

Latest revision as of 16:23, 28 May 2023

Available

Author

  • Enrique Manjavas
  • Mike Kestemont
  • Thibault Clérice

Description

Pie is a language independant lemmatizer implemented in python and built for "variation-rich languages" which includes Latin. It's a deep learning tool that can be trained and retrained with data in TSV format. As of 2019, it seems to be one of the state-of-the-art lemmatizers in terms of results. It can be trained jointly on morphology, POS and lemmatization tasks.

Pie Extended

Pie-Extended an extension built on top of Pie to ease its use as a tagger: it handles downloading of models, tokenization and post-/pre-processing. It requires python > 3.6 and just enough knowledge about installing libraries in Python as well as using a Command Line Interface.

Deucalion (now Flask Pie)

Flask-Pie (previously known as Deucalion) provides adapters to server Pie models over HTTP servers.

Bibliography

  • D. Longrée, C. Philippart de Foy & G. Purnelle. « Structures phrastiques et analyse automatique des données morphosyntaxiques : le projet LatSynt », in S. Bolasco, I. Chiari & L. Giuliano (eds), Statistical Analysis of Textual Data, Proceedings of 10th International Conference Journées d'Analyse statistique des Données Textuelles, 9-11 June 2010, Sapienza University of Rome, Rome, LED, pp. 433-442.
  • D. Longrée & C. Poudat, « New Ways of Lemmatizing and Tagging Classical and post-Classical Latin: the LATLEM project of the LASLA », in P. Anreiter & M. Kienpointner (éd.), Proceedings of the 15th International Colloquium on Latin Linguistics, (Innsbrucker Beiträge zur Sprachwissenschaft), Innsbruck, 2010, pp. 683-694.
  • D. Longrée & C. Philippart de Foy & G. Purnelle, « Subordinate clause boundaries and word order in Latin: the contribution of the L.A.S.L.A. syntactic parser project LatSynt », in P. Anreiter & M. Kienpointner, éd.), Proceedings of the 15th International Colloquium on Latin Linguistics, (Innsbrucker Beiträge zur Sprachwissenschaft), Innsbruck, 2010, pp. 673-681.
  • D. Longrée & Poudat C., « Variations langagières et annotation morphosyntaxique du latin classique », TAL, 50 – n° 2/2009, Special issue on "Natural Language Processing and Ancient Languages", pp. 129-148.
  • Enrique Manjavacas & Mike Kestemont. (2019, January 17). emanjavacas/pie v0.1.3 (Version v0.1.3). Zenodo. http://doi.org/10.5281/zenodo.2542537
  • Thibault Clérice. (2019, February 1). chartes/deucalion-model-lasla: LASLA Latin Lemmatizer - Alpha (Version 0.0.1). Zenodo. http://doi.org/10.5281/zenodo.2554847