Translation alignment

Description
Translation Alignment (TA) is a task derived from Natural Language Processing. Fundamentally, it consists in establishing correspondences between texts in different languages, in order to define which parts of a source text correspond to which parts of a second text. When performed across two languages, it is defined as bilingual alignment; when performed across multiple languages, it is defined as multi-lingual alignment.

The task of TA can be performed at various levels of granularity: from entire books to single chapters or sections, up to sentences and individual words. A set of texts aligned at some level is defined as parallel texts or parallel corpora. The output of a TA pipeline is a list of pairs of items (words, sentences, or larger chunks) which are often called Translation Pairs (TPs).

Parallel corpora in modern and ancient languages are used for a variety of purposes, including training machine translation models, but also for automatic bilingual lexicon extraction, corpus linguistics analysis, translation history research, language learning, and cross-lingual annotation projection.

Methods
TA is performed automatically, semi-automatically, or entirely manually through annotation. There are several computational methods to perform TA. The first methods, such as IBM developed by Brown et al. (1993) were developed in the 90s, and were based on statistical lexical models. Later, Och and Ney (2003) introduced Giza++, which was considered the state-of-the-art in the field until the advent of transformer-based and neural models.

Transformer-based models for TA exploit multilingual contextualized language models to create accurate alignments, using varoius types of data for fine-tuning. The most recent automatic model for TA in ancient languages was developed by Yousef et al. (2022) for Ancient Greek and modern translations, and it is based on two multilingual contextualized language models, mBERT and XLM-R.

Software

 * Ugarit (manual alignment editor): http://ugarit.ialigner.com/
 * Ugarit Automatic Aligner http://ugarit-aligner.com/
 * Alpheios https://alpheios.net/

Alignment Guidelines and Gold Standards

 * General resource: Available alignment guidelines for ancient and modern languages: https://ugarit.ialigner.com/guidelines.php
 * Alignment Gold Standard and Guidelines for Ancient Greek into English, Latin and Portuguese: https://github.com/UgaritAlignment/Alignment-Gold-Standards

Training

 * Translation and Text Alignment. Chiara Palladino, Farnoosh Shamsian, Maia Shukhosvili. Sunoikisis Digital Classics, Fall 2021, Nov 18 2021. https://github.com/SunoikisisDC/SunoikisisDC-2021-2022/wiki/Translation-and-Text-Alignment
 * We want to learn all languages! Applications of translation alignment in digital environments. Chiara Palladino & Tariq Yousef. Digital Classicist London Seminar 2021, June 25 2021. https://www.youtube.com/live/R2Ms6yAMZss?feature=share
 * Translation Technologies. Franziska Naether & Chiara Palladino. Sunoikisis Digital Classics, Fall 2020, Nov 26 2020. https://github.com/SunoikisisDC/SunoikisisDC-2020-2021/wiki/8-Translation-Technologies
 * Digital classics and learning Greek in Iran. Farnoosh Shamsian. Sunoikisis Digital Classics, Summer 2020, May 14 2020. https://github.com/SunoikisisDC/SunoikisisDC-2019-2020/wiki/Summer2020-Session-6