AGILe lemmatizer for Ancient Greek: Difference between revisions
(modified authors) |
(updated people involved) |
||
| (3 intermediate revisions by the same user not shown) | |||
| Line 3: | Line 3: | ||
==Authors== | ==Authors== | ||
Developer of the first version: | |||
* Jasper Bos | * Jasper Bos | ||
| Line 12: | Line 12: | ||
==Description== | ==Description== | ||
AGILe is the first lemmatizer trained on Ancient Greek inscriptions. It is based on the [ | AGILe is the first lemmatizer trained on Ancient Greek inscriptions. It is based on the [https://stanfordnlp.github.io/stanza/lemma.html Stanza lemmatizer] by Qi et al. (2020) and the core of its trainig data is the [[Collection of Greek Ritual Norms]] (CGRN) corpus, complemented with data from the [[Pragmatic Resources in Old Indo-European Languages|PROIEL]] corpus to increase the quantity of training data. | ||
AGILe achieved an accuracy of 85.1% on the CGRN test set, this means that it was able to correctly lemmatize 85.1% of the wordforms in the CGRN test set. | AGILe achieved an accuracy of 85.1% on the CGRN test set, this means that it was able to correctly lemmatize 85.1% of the wordforms in the CGRN test set. | ||
AGILe accepts as input a string of text and returns a [https://stanfordnlp.github.io/stanza/data_objects.html#document Stanza Document] object. | |||
==References== | ==References== | ||
Latest revision as of 14:15, 10 July 2025
Available
Authors
Developer of the first version:
- Jasper Bos
People permanently involved:
- Evelien de Graaf
- Silvia Stopponi
- Saskia Peels-Matthey
Description
AGILe is the first lemmatizer trained on Ancient Greek inscriptions. It is based on the Stanza lemmatizer by Qi et al. (2020) and the core of its trainig data is the Collection of Greek Ritual Norms (CGRN) corpus, complemented with data from the PROIEL corpus to increase the quantity of training data.
AGILe achieved an accuracy of 85.1% on the CGRN test set, this means that it was able to correctly lemmatize 85.1% of the wordforms in the CGRN test set.
AGILe accepts as input a string of text and returns a Stanza Document object.
References
de Graaf, E., Stopponi, S., Bos, J., Peels-Matthey, S., & Nissim, M. (2022, June). AGILe: The first lemmatizer for Ancient Greek inscriptions. In The 13th Conference on Language Resources and Evaluation (pp. 5334-5344). European Language Resources Association (ELRA). Available: https://aclanthology.org/2022.lrec-1.571/
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C. D. (2020). Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 101–108). Association for Computational Linguistics. Available: https://aclanthology.org/2020.acl-demos.14.pdf