Text Reuse

Text Reuse Panel at DH 2014
“Rethinking Text Reuse as Digital Classicists” is the title of a panel session which will be held at the 2014 Digital Humanities Conference (DH 2014, Lausanne, 10 July 2014, 09:00-10:30).

Text reuse – the meaningful reiteration of text, usually beyond the simple repetition of common language – is a broad concept that can naturally be understood at different levels and studied in a large variety of contexts. This panel will gather researchers from different projects focussing on text reuse in the field of Digital Classics with the aim of discussing the possible approaches to and understandings of the notion. It will also bring together current efforts and lay the ground for further research.

This page is created to prepare the event, but aims more generally at fostering information sharing and further explorations on the topic.

Participants
Conveners Invited participants
 * Aurélien Berra (Université Paris-Ouest & EHESS)
 * Matteo Romanello (German Archaeological Institute & King’s College London)
 * Alexandra Trachsel (University of Hamburg)
 * Monica Berti (University of Leipzig)
 * Chris Forstall [ Neil Coffee ] (University at Buffalo, SUNY)
 * Annette Geßner (University of Leipzig)
 * Charlotte Tupman (King’s College London)

Papers

 * Introduction
 * Monica Berti's talk
 * Annette Gessner, "Text-Mining-Approaches to find Text Re-Use"

Links

 * Description of the panel on the conference website
 * Collection of tweets panel-related tweets on Storify

Description of the Panel
Why rethink text reuse?

Text reuse is the meaningful reiteration of text, usually beyond the simple repetition of common language. Such a broad concept can naturally be understood at different levels and studied in a large variety of contexts. This diversity of approaches is to some extent explained by the fact that the phenomenon exists in almost all disciplines of the Humanities, and is crucial in those which focus on texts.

At one end of this spectrum we find the methods developed by computational linguistics. Research projects in this field study text reuse through automatic analyses within large corpora that often come from widely different backgrounds. The approaches range from the automatic detection of allusions and intertextual phenomena, for example in historical texts, to the detection of plagiarism in modern ones [1][2][3][4]. At the other end, the concept also designates a core scholarly activity, connected to most of the “scholarly primitives” [5] – this meta- level being obviously our own practice, and having its roots in Antiquity. Furthermore, any kind of citation constitutes an indirect way of transmitting knowledge, either consciously or unconsciously, as well as a rhetorical or narrative device allowing an author to communicate with his audience beyond the level of the linguistic content. As a result, this notion shows how deeply intertwined objectivity and subjectivity are when one handles texts.

Digital approaches often aim at highlighting or defining these complex links between an initial statement and its multiple occurrences (often translations) in later contexts. Indeed, especially when the text reuse of ancient elements in corpora of more recent texts is studied, the fact that the statements are given in translation is an important issue and introduces an additional difficulty. This, however, is not a completely new problem. It can be observed each time that two cultures meet and borrow elements from each others’ cultural heritage. A further notion of text reuse is reached when not only the interconnections between the different reuses of a given textual element are investigated, but also the connections between the contexts in which they occur, whether in the form of unabbreviated quotations or as references within a more conventional citation system.

This panel proposes to gather researchers from different projects focusing on text reuse in order to create an inventory of the possible approaches to and understandings of the notion. Our objective is to highlight the historical dimension of the phenomenon and, ultimately, find some common features that could lead to a more systematic study. Texts are data indeed, but text reuse provides an excellent demonstration that they must be studied also and at the same time as intentional, sophisticated and reflexive cultural products. The emergence of Digital Classics, and of Digital Humanities in general, is an occasion to rethink text reuse and work towards the integration of – or at least foster dialogue and interconnection between – various perspectives.

Studying text reuse in Digital Classics

A panel on text reuse at the Digital Humanities 2014 Conference seems a very timely initiative, because several projects are currently addressing the question and developing new tools to deal with its different aspects.

The Perseids platform [6][7] can be mentioned first. As a project of the Perseus Digital Library [8], it aims at creating a collaborative online environment for the edition of a great variety of ancient documents, privileging the requirements of the editing of fragmentarily preserved sources (especially if they are transmitted through quotations) – a specific case of text reuse [9][10][11][12]. Indeed current digital libraries, like the Perseus Digital Library or the Thesaurus Linguae Graecae, have started with the wholly preserved ancient texts and deal with fragmentary works as if they were independent entities at the same level as the others. This clearly creates conceptual difficulties, since we only have indirect access to most of the fragmentarily preserved work: some parts of a lost initial work have been reused in the form of quotations in later texts. This reuse may have left some traces in the rewording of the quotation and therefore it is essential to keep the link to the context in which a given passage has been embedded when editing fragmentarily preserved texts.

One way of addressing this issue has been explored by the Sharing Ancient Wisdoms project [13]. The project’s goal was to provide digital editions of several texts belonging to the so-called tradition of wisdom literature, by analysing the quoted sayings or proverbs and creating an ontology allowing to describe their diverse relationships [14]. Still another approach must be chosen when the focus is shifted from the edition of a text with many quotations in it, such as those dealt with in the SAWS project, to the edition of a set of quotations that come from different source texts, but belong to one lost work, as is currently being explored in Alexandra Trachsel’s research on Demetrios of Scepsis.

In a complementary fashion, the study of single works of considerable size as webs of quotations should enable us to deal better with the reflexive dimension of encyclopaedic writings. Such a perspective is being built in the Digital Athenaeus project, which will explore the combination of digital and philological means of analysis in the preparation of a new edition of the Deipnosophists – a complex literary construction which sets scholarly discussions and pastimes in the context of an Imperial symposium and thus witnesses to the dynamics of text reuse [15].

Further projects, such as Tesserae [16] or Eumaios [17] move beyond the concept of quotation and focus on more hidden or less acknowledged forms of intertextuality. Tesserae, in particular, is devised to help scholars find previously unexplored intertextual parallels by means of automatic text reuse detection [18]. This work has employed small benchmark sets of recognised parallels against which search techniques are measured and methods are improved. But having at hand a large and systematic repertory of already studied loci paralleli is something from which a tool like Tesserae will benefit immensely and that can be built, to a large extent automatically, by extracting from the literature the text passages that were already studied in relation to one another. These parallels are usually signalled in journal articles and other types of secondary sources by means of canonical citations, whose automatic extraction from large corpora of unstructured texts, such as those of JSTOR or the Internet Archive, is a topic that is currently being explored [19][20].

The identification and extraction of text reuse is central in eTRACES [21], a project which just developed a tool named GERTRUDE (Göttingen E-Research Text-Re-Use for Digital Editions). Working on extremely heterogeneous corpora and primarily on German literature written between 1500 and 1900, it actually reflects on and solves similar problems.

All these projects, though they have the concept of text reuse in common, can be distinguished either by the type of corpus they use (texts from Antiquity, German literature, modern scholarly writings) or by their starting point (working on source texts where quotations are preserved, establishing relationships between different works in which the same textual elements occur, or focusing on quoted or reused elements). However, they have accumulated a great amount of knowledge on how to deal with the multiple forms of this cultural practice. The panel therefore aims at bringing together these efforts and should allow each of the projects to benefit from the expertise of the others, so that the solutions already found may be discussed and in the hope that our desiderata may lay the ground for further research.

Practical organisation of the panel

Besides the conveners, who will introduce and moderate the discussion, the panel will involve four speakers. After a brief presentation of the participants and of the main issues of the topic (10 minutes), short talks will be given by the four panel participants, illustrating different aspects of text reuse (40 minutes). The remaining time will be devoted to a discussion among all the participants and will be focused on the challenges and desiderata for further projects dealing with text reuse, in Digital Classics and beyond this field (40 minutes).

References

1. Bamman, D., & Crane, G. (2008). The logic and discovery of textual allusion. In In: Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008), Marrakesh.

2. Büchler, M., 2013. Informationstechnische Aspekte des Historical Text Re-use. PhD Thesis, Universität Leipzig. Retrieved from http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-108515.

3. Bamman, D. & Crane, G., 2009. Discovering Multilingual Text Reuse in Literary Texts. Available at http://www.perseus.tufts.edu/publications/2009-Bamman.pdf.

4. Lee, J., 2007. A Computational Model of Text Reuse in Ancient Literary Texts. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 472–479. Prague, Czech Republic: Association for Computational Linguistics. Retrieved from http://acl.ldc.upenn.edu/P/P07/P07-1060.pdf.

5. Unsworth, J. (2000). Scholarly Primitives: What Methods Do Humanities Researchers Have in Common, and How Might Our Tools Reflect This? Formal methods, experimental practice. King’s College, London. http://people.brandeis.edu/~unsworth/Kings.5-00/primitives.htm.

6. Almas, B. & Berti, M., 2013. Perseids Collaborative Platform for Annotating Text Re-Uses of Fragmentary Authors. In F. Tomasi & F. Vitali, eds. DH-Case 2013. Available at http://dx.doi.org/10.1145/2517978.2517986.

7. Perseids. A collaborative editing plaftorm for source documents in Classics, http://sites.tufts.edu/perseids/.

8. Perseus Digital Library, http://www.perseus.tufts.edu (Accessed on November 1, 2013).

9. Romanello, M., Berti, M., Boschetti, F., Babeu, A., & Crane, G., 2009. Rethinking Critical Editions of Fragmentary Texts by Ontologies. In. S. Mornati, ed., Rethinking Electronic Publishing: Innovation in Communication Paradigms and Technologies - Proceedings of the 13th International Conference on Electronic Publishing held in Milano, Italy 10-12 June 2009, pp. 155-174.

10. Romanello, M., 2011. The Digital Critical Edition of Fragments: Theoretical Problems and Technical Solution. In P. Kurras Cotticelli, ed., Linguistica e Filologia Digitale: Aspetti e Progetti, pp. 147–155. Alessandria: Edizioni dell’Orso. Retrieved from http://eprints.rclis.org/handle/10760/15592.

11. Trachsel, A., 2012. Collecting Fragments Today: What Status Will a Fragment Have in the Era of Digital Philology? In C. Clivaz, J. Meizoz, F. Vallotton, & J. Verheyden, eds., Lire demain – Reading Tomorrow, pp. 415- 429 (ebook). Lausanne: Presses polytechniques et universitaires romandes.

12. Berti, M., 2013. Collecting Quotations by Topic: Degrees of Preservation and Transtextual Relations among Genres. Ancient Society, 43, pp. 269–288.

13. Sharing Ancient Wisdoms, http://www.ancientwisdoms.ac.uk/ (Accessed on November 1, 2013).

14. Dunn, S., Hedges, M., Jordanous, A., Lawrence, K. F., Roueché, C., Tupman, C. & Wakelnig E, 2012. Sharing Ancient Wisdoms: Developing Structures for Tracking Cultural Dynamics by Linking Moral and Philosophical Anthologies with their Source and Recipient Texts. In Digital Humanities Conference, Hamburg, Germany. Available in the Book of Abstracts at http://www.dh2012.uni-hamburg.de/conference/programme/abstracts/, pp. 176-179.

15. Romanello, M., & Berra, A., 2011. The Critical Step in Open Content Greek: Towards a Digital Edition of Athenaeus. In TEI Members Meeting, Würzburg, Germany. Available in the Book of Abstracts at http://www.zde.uni-wuerzburg.de/tei_mm_2011, pp. 43-47, and http://philologia.hypotheses.org/512.

16. Tesserae, http://tesserae.caset.buffalo.edu/ (Accessed on November 1, 2013).

17. Eumaios: a collaborative website for Early Greek epic, http://panini.northwestern.edu/AnaServer?eumaios+0+frame.anv (Accessed on November 1, 2013).

18. Coffee, N., Koenig, J.-P., Poornima, S., Forstall, C. W., Ossewaarde, R., & Jacobson, S. L., 2013. The Tesserae Project: Intertextual Analysis of Latin Poetry. Literary and Linguistic Computing, 28(2), pp. 221–228. DOI: 10.1093/llc/fqs033.

19. Romanello, M., 2013. Creating an Annotated Corpus for Extracting Canonical Citations from Classics-Related Texts by Using Active Annotation. In A. Gelbukh, ed., Computational Linguistics and Intelligent Text Processing. 14th International Conference, CICLing 2013, Samos, Greece, March 24-30, 2013, Proceedings, Part I. Springer, Berlin Heidelberg, pp. 60–76.

20. Romanello, M., Boschetti, F. & Crane, G., 2009. Citations in the Digital Library of Classics: Extracting Canonical References by Using Conditional Random Fields. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries. Morristown, NJ, USA: Association for Computational Linguistics, pp. 80–87.

21. eTRACES, http://etraces.e-humanities.net/ (Accessed on November 1, 2013).