AGILe lemmatizer for Ancient Greek: Difference between revisions

From The Digital Classicist Wiki
Jump to navigation Jump to search
(page created and added main information)
 
(modified authors)
Line 6: Line 6:
* Jasper Bos
* Jasper Bos


Maintainers:
People permanently involved:
* Evelien de Graaf
* Evelien de Graaf
* Silvia Stopponi
* Silvia Stopponi
* Saskia Peels-Matthey


==Description==
==Description==

Revision as of 17:22, 21 November 2023

Available

Authors

Creator of the first version:

  • Jasper Bos

People permanently involved:

  • Evelien de Graaf
  • Silvia Stopponi
  • Saskia Peels-Matthey

Description

AGILe is the first lemmatizer trained on Ancient Greek inscriptions. It is based on the https://stanfordnlp.github.io/stanza/lemma.html and the core of its trainig data is the Collection of Greek Ritual Norms (CGRN) corpus, complemented with data from the PROEIL corpus to increase the quantity of training data.

AGILe achieved an accuracy of 85.1% on the CGRN test set, this means that it was able to correctly lemmatize 85.1% of the wordforms in the CGRN test set.


References

de Graaf, E., Stopponi, S., Bos, J., Peels-Matthey, S., & Nissim, M. (2022, June). AGILe: The first lemmatizer for Ancient Greek inscriptions. In The 13th Conference on Language Resources and Evaluation (pp. 5334-5344). European Language Resources Association (ELRA). Available: https://aclanthology.org/2022.lrec-1.571/

Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C. D. (2020). Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 101–108). Association for Computational Linguistics. Available: https://aclanthology.org/2020.acl-demos.14.pdf