Index Thomisticus Treebank

From The Digital Classicist Wiki
Jump to navigation Jump to search

Available

Description

The Index Thomisticus is a pioneer project in Computational Linguistics, Humanities Computing and Digital Humanities. Begun by father Roberto Busa SJ in the second half of the 1940s, the Index Thomisticus is a corpus containing the opera omnia (in Latin) of Thomas Aquinas (118 texts) as well as 61 texts by other authors related to Thomas, for a total of approximately 11 million words morphologically tagged and lemmatized by hand. In the early 1970s, Busa began planning a second project aimed at both the morphosyntactic disambiguation of the Index Thomisticus lemmatization and the syntactic annotation of its sentences. Today, these tasks are performed by the Index Thomisticus Treebank, a dependency-based syntactically annotated corpus built upon the texts of the Index Thomisticus corpus. The annotation style of the treebank is based on the guidelines developed in Prague for the so-called 'analytical' layer of annotation of the Prague Dependency Treebank for Czech. A [Universal Dependencies https://universaldependencies.org/] (UD) version of the Index Thomisticus Treebank is also available.

Beyond the Index Thomisticus Treebank, the project also includes:

1. a semantically/pragmatically annotated portion of the Latin Dependency Treebank (with the same annotation style used for the Index Thomisticus Treebank), which features texts of authors from the Classical era; 2. a syntactically-based valency lexicon (IT-VaLex) automatically induced from the syntactic layer of annotation of the Index Thomisticus Treebank; 3. a semantically-based valency lexicon (VALLEX) built in close connection with the semantic/pragmatic annotation of both the Latin Dependency Treebank and the Index Thomisticus Treebank.