Citation in digital scholarship

This page provides examples and offers best practices for making citations in digital scholarship. The term 'citation' is meant very generally as the encoding of reference to an external entity in support of, as illustration of, or otherwise in relationship to a work of digital scholarship. Scholars cite resources ranging from primary texts, contemporary scholarship, museum objects, people, places, and a wide range of other entities and categories of information.

Some preliminaries:


 * The page is divided into categories of evidence with options for citing specific instances.
 * The structure of the examples presented emphasizes the practical solutions that are available now while leaving room for suggestion as to future directions.
 * An xml context – tei, xhtml, or other(?) - is assumed.
 * Discussion of this page can take place on the Digital Classicist Discussion List.
 * While this page does assert categories, those are also up for discussion. What is the theoretical and practical difference between a "primary source" and "secondary scholarship"? It is reasonable to cite the 9th century scholar Photius as both.
 * It is not the goal of this page to suggest that "Citation in digital scholarship" is a simple
 * For all the examples below, but particularly for sites creating stable id's (e.g. Pleiades), a concern is for a generic, interoperable, author-friendly convention to refer to those resources in ways that the sites themselves will recognize. "If you make a reference to Pleiades, how does Pleiades know that you've done so?"

Matteo Romanello (2008) has described goals for a linking system as:


 * open-ended: it should be possible to link and retrieve other resources related to a given author or work as soon as they appear on the Web. Each link would be resolved into an open-ended, and therefore potentially infinite, number of on-line resources;
 * interoperable: it should guarantee the reuse of data and the interoperability among web applications that use different communication protocols and interfaces;
 * semantic and language-neutral: such a linking system should allow to identify each author, work and edition of a work with a unique identifier rather than with a language-dependent name. If an author is univocally identified it is possible to map the name of the same author written in different languages to that unique identifier. But it is impossible to do the reverse.

1. Plain-text citations
Sample text: ''Herodotus (1.78) describes Babylon as the strongest and most famous city in Assyria. It is likely that this city was subsequently the mint from which Alexander issued a series of coins depicting eastern warriors on the obverse and an elephant on the reverse (e.g. ANS 1995.51.68). See discussion by Martin Price (1991).''

Is it possible to establish a robust convention that allows unambiguous machine-recognizable linking to the cited text, to Alexander, to Babylon, to a description of the the coin in the collection of the American Numismatic Society and to the article "Circulation at Babylon in 323 B.C."?

2. Indicating the Presence of a Citation (@*="citation")
HTML: &lt;span class="citation">Herodotus (1.78)&lt;/span>

TEI:

In both these usages, an xpath selector "//*[@*='citation']" will create a set of all the citations in a text. That is robust.

3. Normalizing the plain text citation
HTML: &lt;span class="citation" lang="en" title="Herodotus Histories 1.78">Herodotus (1.78)&lt;/span>

TEI:

Normalization will assist tools that can automatically recognize plain text citations.

If the value of the 'title' attribute would be identical to the text representation of the element it is attached to, it can be left out.

Note: in the HTML5 spec, elements without @title inherit the value from any ancestor that has @title. That should not happen in the case of a citation.

4. Be explicit about language
Both "Herodotus Histories 1.78" and "Hdt. 1.78" can be considered English representations of the citation of that text. The German equivalent of the first is "Herodot Historien 1.78", the Latin - still with Arabic numerals - is "Herodotus Historiae 1.78". If the language of the citation is the same as its prose context, it is not necessary to further markup the citation. It is common practice in some disciplines to cite the title of a work in its original language or in a widely accepted academic language, such as Latin titles for Greek works in Classics.

HTML: Herodotus (Historiae 1.78) describes...

The 'lang' attribute specifices the language of the element to which it is attached. It does not directly specify the language of the 'title' attribute. Therefore, they must be the same.

5. Deriving a Machine Recognizable "Cannonical Reference"
HTML: Herodotus (1.178)

TEI:

Linking to Perseus provides access to the text. The URI in @href does not meet the requirement of being a persistent unique identifier for that chunk of text. A naming scheme that implements the Canonical Text Services (CTS) protocol or something similar is needed.

More complete markup
Extrapolating from the truncated steps above gives the following markup for the sample text:

HTML:

Notes: The reference to the M. Price article is insufficient.

TEI: to come.

Adding other markup schemes to conformant citations
The 'class="citation" title=""' html pattern is designed so that it can be easily used with other markup schemes. The global 'class' attribute in html is a space separated list so that other, unrelated values can be present without interfering with the identification of an element as a citation. The global 'title' attribute is directly suitable for the role envisioned here so shouldn't clash with other conforming uses.

Content-creators may choose to add in additional markup. Links to guidelines for doing so are list here.
 * Citations with added RDFa

Ancient Mediterranean Primary Texts
"Classics" has well established abbreviations. Neither complete, nor unambiguous, but well established.
 * Plain text: "Hom. Il. 2.345", "Homer, Iliad 2.345"

The following examples illustrate that the same text can appear in different places.
 * HTML: Hom. Il. 2.345

This example does not address the presence and/or capabilities of the Canonical Text Services (CTS) protocol and URN scheme under development at the Center for Hellenic Studies.

Geographic Entities
Within the Ancient Mediterranean, the Pleiades Project is establishing short URL as identifiers for geographic entities (but see their own discussion for details). Geonames.org is a worldwide list of identifiers.
 * HTML: Ephesus
 * HTML: Samarkand

Bibliographic Data
Worldcat. But there may be licensing issues.
 * HTML: Yang, X., National Gallery of Art (U.S.), Museum of Fine Arts, Houston., & Asian Art Museum of San Francisco. (1999). The golden age of Chinese archaeology: Celebrated discoveries from the People's Republic of China. Washington: National Gallery of Art

What is the relationship between citing a work and citing its bibliographic record? Is that a necessary distinction?

Museum Objects
Or any cataloged object with stable id?

HTML: ANS 1968.34.40.

Egyptian Papyri
The sites http://papyri.info and http://trismegistos.org (e.g. http://www.trismegistos.org/tm/detail.php?tm=23 ) are islands of stability here.

HTML: TM23</a>