Citation in digital scholarship

Introduction
This page suggests best practices for making citations in digital scholarship, and documents a set of conventions that are intended to promote greater interoperability. It will also point to tools for identifying, processing, and presenting citations in server and client-side environments. Additionally, it highlights resources that are creating stable URLs relevant to digital scholarship, with a focus on humanities disciplines. The term 'citation' is meant very generally as the encoding of reference to an external entity in support of, as illustration of, or otherwise in relationship to a work of digital scholarship. Scholars cite resources ranging from primary texts, contemporary scholarship, museum objects, people, places, and a wide range of other entities and categories of information.

This effort takes as its starting point that the conventions described of citations should:


 * 1) Be automatically parsable. Automatic agents should be able to recognize that a citation is being made, and to identify what is being cited.
 * 2) Encourage reuse of existing naming schemes. A consistently applied convention should allow distinct and independent citation to the same entity to be recognized by third parties. For all the examples below, but particularly for sites creating stable id's (e.g. Pleiades), a concern is for a generic, interoperable, author-friendly convention to refer to those resources in ways that the sites themselves will recognize. "If you make a reference to Pleiades, how does Pleiades know that you've done so?"
 * 3) Support user interaction. Client-side operations, such as "show me a map of all geographic entities in a document" can be facilitated by a robust citation convention.
 * 4) Recognize that various standards already exist and not take unnecessary steps to interfere with the deployment of those standards.

The above list is based up on previous scholarship in the field of digital citation.

Recommended Convention (@*="citation" paired with URI)
For xml-based documents, the following conventions are recommended.
 * Citations should be human readable.
 * Citations must be indicated by a containing element that has an attribute whose value is 'citation'. In (x)html that attribute is the 'class' global attribute. The only role of this attribute is to identify a span of text that is amenable to automatic processing to identify, describe or make actionable to a user the cited resource.
 * Citations should be qualified by an attribute giving an unabbreviated plain text version of the citation. This is unnecessary when the citation itself is not abbreviated.
 * The language of the citation should be indicated if it is distinct from the language of the host document.
 * Citations should link to stable online resources that make available, are surrogates for, or otherwise define the cited entity.
 * Citations may use an existing standard to indicate the nature of the entity being cited and to describe the relationship of the online resource being linked to the underlying concept that online resource describes.

Simple Examples

 * Herodotus (1.78)
 * &lt;span class="citation">Briant, P., Boucharlat, R., & Réseau international d'études et de recherches achéménides. (2005). L'archéologie de l'empire achéménide. Paris: de Boccard. (Worldcat recrod)
 * Ephesus

Other examples with more markup

 * Ephesus'
 * Here the citation is to a 'http://purl.org/dc/terms/Location' that is defined at the URL 'http://pleiades.stoa.org/places/599612'.


 * Herodotus (1.78)
 * The @typeof will produce an RDFa triple indicating the resource at the URI is a Dublin Core Text.


 * &lt;span typeof="dc:Text" class="citation">Briant, P., Boucharlat, R., and Réseau international d'études et de recherches achéménides. (2005). L'archéologie de l'empire achéménide. Paris: de Boccard. (Worldcat recrod)
 * The reference is to a text whose definition is at 'http://www.worldcat.org...". Dublin Core makes no distinction between "primary source" text and "secondary" text. Other ontologies do.

Use Cases
The '@*="citation"' pattern can be used in the following circumstances.


 * To distinguish those links which contribute to the intellectual argument of a document from those that implement a user-interface or indicate the immediate publishing environment of a document. For example, a link to the homepage of a website hosting a document should not be marked with 'citation'.


 * By authors of (x)html documents that are the archival version of digital scholarship.


 * As a presentation target for documents that are stored in a non-html format but presented as such on the web.

Preliminary Notes

 * An xml environment, with examples implemented in (x)html and tei, is assumed.
 * While this page does assert categories, those are also up for discussion. What is the theoretical and practical difference between a "primary source" and "secondary scholarship"? It is reasonable to cite the 9th century scholar Photius as both.

1. Plain-text citations
Sample text: ''Herodotus (1.78) describes Babylon as the strongest and most famous city in Assyria. It is likely that this city was subsequently the mint from which Alexander issued a series of coins depicting eastern warriors on the obverse and an elephant on the reverse (e.g. ANS 1995.51.68). See discussion by Martin Price (1991).''

Is it possible to establish a robust convention that allows unambiguous machine-recognizable linking to the cited text, to Alexander, to Babylon, to a description of the the coin in the collection of the American Numismatic Society and to the article "Circulation at Babylon in 323 B.C."?

2. Indicating the Presence of a Citation (@*="citation")
HTML: &lt;span class="citation">Herodotus (1.78)&lt;/span>

TEI:

In both these usages, an xpath selector "//*[@*='citation']" will create a set of all the citations in a text. That is robust.

3. Normalizing the plain text citation
HTML: &lt;span class="citation" lang="en" title="Herodotus Histories 1.78">Herodotus (1.78)&lt;/span>

TEI:

Normalization will assist tools that can automatically recognize plain text citations.

If the value of the 'title' attribute would be identical to the text representation of the element it is attached to, it can be left out.

Note: in the HTML5 spec, elements without @title inherit the value from any ancestor that has @title. That should not happen in the case of a citation.

4. Be explicit about language
Both "Herodotus Histories 1.78" and "Hdt. 1.78" can be considered English representations of the citation of that text. The German equivalent of the first is "Herodot Historien 1.78", the Latin - still with Arabic numerals - is "Herodotus Historiae 1.78". If the language of the citation is the same as its prose context, it is not necessary to further markup the citation. It is common practice in some disciplines to cite the title of a work in its original language or in a widely accepted academic language, such as Latin titles for Greek works in Classics.

HTML: Herodotus (Historiae 1.78) describes...

The 'lang' attribute specifices the language of the element to which it is attached. It does not directly specify the language of the 'title' attribute. Therefore, they must be the same.

5. Choosing a URL
Ideally, citations in digital scholarship are paired with a link to an online resource available at a persistent URI that that has clear semantics. Such URIs do not always exist, which is one reason to put a plain-text reference in the 'title' attribute.

HTML: Herodotus (1.78)

TEI:

HTML: Babylon</a>

More complete markup
Extrapolating from the truncated steps above gives the following markup for the sample text:

HTML:

Notes: The reference to the M. Price article is insufficient.

TEI: to come.

Adding other markup schemes to conformant citations
The 'class="citation" title="<normalized plain text citation>"' html pattern is designed so that it can be easily used with other markup schemes. The global 'class' attribute in html is a space separated list so that other, unrelated values can be present without interfering with the identification of an element as a citation. The global 'title' attribute is directly suitable for the role envisioned here so shouldn't clash with other conforming uses.

Content-creators may choose to add in additional markup. Links to guidelines for doing so are list here.

OpenURL/Coins/Zotero

 * Citations with COINS

RDFa

 * Citations with added RDFa

Ancient Mediterranean Primary Texts
"Classics" has well established abbreviations. Neither complete, nor unambiguous, but well established.
 * Plain text: "Hom. Il. 2.345", "Homer, Iliad 2.345"

The following examples illustrate that the same text can appear in different places.
 * HTML: Hom. Il. 2.345</a>

This example does not address the presence and/or capabilities of the Canonical Text Services (CTS) protocol and URN scheme under development at the Center for Hellenic Studies.

Geographic Entities
Within the Ancient Mediterranean, the Pleiades Project is establishing short URL as identifiers for geographic entities (but see their own discussion for details). Geonames.org is a worldwide list of identifiers.
 * HTML: Ephesus</a>
 * HTML: Samarkand</a>

Bibliographic Data
Worldcat. But there may be licensing issues.
 * HTML: Yang, X., National Gallery of Art (U.S.), Museum of Fine Arts, Houston., & Asian Art Museum of San Francisco. (1999). The golden age of Chinese archaeology: Celebrated discoveries from the People's Republic of China. Washington: National Gallery of Art</a>

What is the relationship between citing a work and citing its bibliographic record? Is that a necessary distinction?

Museum Objects
Or any cataloged object with stable id?

HTML: ANS 1968.34.40</a>.

Egyptian Papyri
The sites http://papyri.info and http://trismegistos.org (e.g. http://www.trismegistos.org/tm/detail.php?tm=23 ) are islands of stability here.

HTML: TM23</a>