The topic of my research is the linguistic study (phonetic, morphological, syntactical, lexical) of the approximately 4000 Latin inscriptions from the Roman province Dacia. Its aim is to contribute to a deeper understanding of how the Latin language evolved in this eastern province of the multilingual Roman Empire. In order to facilitate my work and to do it accurately, I started building a corpus of inscriptions, using computational methods for the purposes of encoding, concordancing, search and statistical analysis, and for deriving electronic editions of the corpus.
The inscriptions are encoded using TEI XML, and transformations are applied using XSL stylesheets to render full text editions at the different levels of coding. I used the standards which have been set out in The Menota handbook (Guidelines for the electronic encoding of Medieval Nordic primary sources), ed. Odd Einar Haugen (see:).
An Internet-based search interface developed by dr. Paul Meurer from Aksis (The Department of Culture, Language and Information Technology, University of Bergen), for Corpus Workbench at IMS, Stuttgart, is used for databank searching.
Each XML file consists in two parts: the header and the text. The information concerning each inscription (editor, publication, recording, bibliography, date, and so on) is included in the header. The <text> part of the XML file includes only the original Latin text. The Inscriptions were encoded on three different levels: facsimile (the text is represented exactly as it appears on the ancient material), diplomatic (this provides the version which is given by the editor of the text, abbreviations are expanded using special marks) and normalised (at this level, each word is represented in accordance with the grammatical rules, so the text corresponds to a literary version).
Based on this corpus, a concordance of words will be made in order to study the deviation between classical Latin and the Latin in the inscriptions, analyzing the peculiarities of the vulgar Latin.