Computational Corpus Annotation for Quantitative Analysis of Latin Lexical Semantics
Available
Principal Investigator
- Barbara McGillivray
Description
Taken from the project website (accessed 2026-05-15):
Understanding language crucially requires capturing words’ meanings, but these are not directly observable. Words’ meanings change over time and vary by register, genre, style, social and geographic factors. Our knowledge of semantic variation and change in historical languages is largely based on qualitative evidence from dictionaries and small-scale studies. Large quantitative studies are not possible yet because they require high-quality data with rich semantic annotation indicating the meaning of each word’s usage. Since this is time-consuming and complex, we lack large-scale quantitative accounts of semantic variation and change over long time spans. Recent computational methods in historical word sense disambiguation allow us, in principle, to automate semantic annotation. Hence, large-scale quantitative semantic analyses are now within reach. Latin has one of the longest recorded histories, an unprecedented set of tools and digital corpora covering over two thousand years and is a key part of Europe’s cultural heritage. This context places Latin in an excellent position to lead the way in quantitative historical semantics. Uniquely integrating computational methods in a novel corpus annotation system to analyse Latin words’ meaning quantitatively at scale, COALA can transform the way historical lexical semantics research is done. The impact spans multiple fields: in corpus linguistics, addressing open challenges for consistent sense annotation at scale for a historical language; in computational semantics, advancing state-ofthe-art methods as a reliable basis for lexical semantics research; in Latin and historical semantics, answering open questions on how polysemy varies by text genre, how words in the same lexical field change their meaning, and how the timing of semantic innovations relates to lasting changes. Our analysis will also be the first extensive empirical semantic investigation of Latin’s status as a fossilised language throughout its history.