LiLa: Linking Latin: Difference between revisions
(Added the 'LiLa: Linking Latin' project) |
(director) |
||
Line 2: | Line 2: | ||
* https://lila-erc.eu | * https://lila-erc.eu | ||
==Director== | |||
* Marco Passarotti | |||
==Description== | ==Description== | ||
The ''LiLa: Linking Latin'' project (2018-2023) is building a Linked Data Knowledge Base of Linguistic Resources and Natural Language Processing (NLP) tools for Latin. LiLa collects and connects both existing and newly-generated (meta)data. The former are mostly linguistic resources (corpora, lexica, ontologies, dictionaries, thesauri) and NLP tools (tokenisers, lemmatisers, PoS-taggers, morphological analysers and dependency parsers) for Latin. These are currently available from different providers under different licences. As for newly-generated (meta)data, LiLa assesses a set of selected linguistic resources by expanding their lexical and/or textual coverage. In particular, LiLa (a) enhances a large amount of Latin texts with PoS-tagging and lemmatisation, (b) harmonises the annotation of the three Universal Dependencies treebanks for Latin, (c) improves the lexical coverage of the Latin WordNet and the valency lexicon Latin-Vallex, and (d) expands the textual coverage of the Index Thomisticus Treebank. Furthermore, LiLa builds a set of newly-trained models for PoS-tagging and lemmatisation, and works on developing and testing the best performing NLP pipeline for such a task. | The '''LiLa: Linking Latin''' project (2018-2023) is building a Linked Data Knowledge Base of Linguistic Resources and Natural Language Processing (NLP) tools for Latin. LiLa collects and connects both existing and newly-generated (meta)data. The former are mostly linguistic resources (corpora, lexica, ontologies, dictionaries, thesauri) and NLP tools (tokenisers, lemmatisers, PoS-taggers, morphological analysers and dependency parsers) for Latin. These are currently available from different providers under different licences. As for newly-generated (meta)data, LiLa assesses a set of selected linguistic resources by expanding their lexical and/or textual coverage. In particular, LiLa (a) enhances a large amount of Latin texts with PoS-tagging and lemmatisation, (b) harmonises the annotation of the three Universal Dependencies treebanks for Latin, (c) improves the lexical coverage of the Latin WordNet and the valency lexicon Latin-Vallex, and (d) expands the textual coverage of the Index Thomisticus Treebank. Furthermore, LiLa builds a set of newly-trained models for PoS-tagging and lemmatisation, and works on developing and testing the best performing NLP pipeline for such a task. | ||
Connections between datasets are edges labelled with a restricted set of values (metadata) taken from a vocabulary of knowledge description. | Connections between datasets are edges labelled with a restricted set of values (metadata) taken from a vocabulary of knowledge description. | ||
Revision as of 18:20, 1 July 2019
Available
Director
- Marco Passarotti
Description
The LiLa: Linking Latin project (2018-2023) is building a Linked Data Knowledge Base of Linguistic Resources and Natural Language Processing (NLP) tools for Latin. LiLa collects and connects both existing and newly-generated (meta)data. The former are mostly linguistic resources (corpora, lexica, ontologies, dictionaries, thesauri) and NLP tools (tokenisers, lemmatisers, PoS-taggers, morphological analysers and dependency parsers) for Latin. These are currently available from different providers under different licences. As for newly-generated (meta)data, LiLa assesses a set of selected linguistic resources by expanding their lexical and/or textual coverage. In particular, LiLa (a) enhances a large amount of Latin texts with PoS-tagging and lemmatisation, (b) harmonises the annotation of the three Universal Dependencies treebanks for Latin, (c) improves the lexical coverage of the Latin WordNet and the valency lexicon Latin-Vallex, and (d) expands the textual coverage of the Index Thomisticus Treebank. Furthermore, LiLa builds a set of newly-trained models for PoS-tagging and lemmatisation, and works on developing and testing the best performing NLP pipeline for such a task. Connections between datasets are edges labelled with a restricted set of values (metadata) taken from a vocabulary of knowledge description.
LiLa meets the so-called FAIR Guiding Principles for scientific data management and stewardship, which state that scholarly data must be Findable, Accessible, Interoperable and Reusable.