Citation in digital scholarship

This page suggests best practices for making citations in digital scholarship and documents a set of conventions that are intended to promote greater interoperability. It will also point to tools for identifying, processing, and presenting citations in server and client-side environments. Additionally, it highlights resources that are creating stable URLs relevant to digital scholarship, with a focus on humanities disciplines.

The phrase 'Citation in Digital Scholarship' is meant very generally as the encoding of reference to an external entity in support of, as illustration of, or otherwise in relationship to a work of digitally available scholarship. Scholars cite resources ranging from primary texts, contemporary scholarship, museum objects, people, places, and a wide range of other entities and categories of information. A set of robust and straightforward conventions that allow for local extension will enable increased recognition of the growing number of links between scholarly works.

In its most simple form, the convention adopted here is to wrap a citation in an element that has an attribute with value 'citation'. A title attribute, or equivalent, can be used to indicate a plain-text non-abbreviated form of a citation. In html that can mean:


 * Alcock (1991)
 * Further example: Herodotus (1.78)
 * Example with added RDFa: Ephesus

Requirements
This effort takes as its starting point that the conventions described should:


 * 1) Be automatically parsable. Automatic agents should be able to recognize that a citation is being made, and to identify what is being cited.
 * 2) Encourage reuse of existing naming schemes. A consistently applied convention should allow distinct and independent citation to the same entity to be recognized by third parties. For all the examples below, but particularly for sites creating stable id's (e.g. Pleiades), a concern is for a generic, interoperable, author-friendly convention to refer to those resources in ways that the sites themselves will recognize. "If you make a reference to Pleiades, how does Pleiades know that you've done so?"
 * 3) Support user interaction. Client-side operations, such as "show me a map of all geographic entities in a document" can be facilitated by a robust citation convention.
 * 4) Recognize that various standards already exist and not take unnecessary steps to interfere with the deployment of those standards.

The above list is based up on previous scholarship in the field of digital citation.

Recommended Convention (@*="citation" paired with URI)
For xml-based documents, the following conventions are recommended:
 * Citations should be human readable.
 * Citations must be indicated by a containing element that has an attribute whose value is 'citation'. In (x)html that attribute is the 'class' global attribute. The only role of this attribute is to identify a span of text that is amenable to automatic processing to identify, describe or make actionable to a user the cited resource.
 * Citations should be qualified by an attribute giving an unabbreviated plain text version of the citation. This is unnecessary when the citation itself is not abbreviated.
 * The language of the citation should be indicated if it is distinct from the language of the host document.
 * Citations should link to stable online resources that make available, are surrogates for, or otherwise define the cited entity.
 * Citations may use an existing standard, such as microdata, microformats, or RDFa, to indicate the nature of the entity being cited and to describe the relationship of the online resource being linked to the underlying concept that online resource describes. The specific patterns suggested for each of these standards is the topic of ongoing community discussion.

Simple Examples

 * Herodotus (1.78)
 * &lt;span class="citation">Briant, P., Boucharlat, R., & Réseau international d'études et de recherches achéménides. (2005). L'archéologie de l'empire achéménide. Paris: de Boccard. (Worldcat recrod)
 * Ephesus

Other examples with more markup
Given the following mapping between prefixes and URL:
 * 'dc' = 'http://purl.org/dc/terms/'
 * 'skos' = 'http://www.w3.org/2004/02/skos/core#'

the following examples add RDF to conformant citations:


 * Herodotus (1.78)
 * The @typeof will produce an RDFa triple indicating the resource at the URI is a Dublin Core Text.


 * Ephesus'
 * Here the citation is to a 'http://purl.org/dc/terms/Location' that is defined at the URL 'http://pleiades.stoa.org/places/599612'.

Addition of RDF to conformant citations is discussed in more detail on the page Citations with added RDFa.

Use Cases
The '@*="citation"' pattern can be used in the following circumstances.


 * To distinguish those links which contribute to the intellectual argument of a document from those that implement a user-interface or indicate the immediate publishing environment of a document. For example, a link to the homepage of a website hosting a document should not be marked with 'citation'.


 * By authors of (x)html documents that are the archival version of digital scholarship.


 * As a presentation target for documents that are stored in a non-html format but presented as such on the web.

Preliminary Notes

 * An xml environment, with examples implemented in (x)html and tei, is assumed.
 * While this page does assert categories, those are also up for discussion. What is the theoretical and practical difference between a "primary source" and "secondary scholarship"? It is reasonable to cite the 9th century scholar Photius as both.

1. Plain-text citations
Sample text: ''Herodotus (1.78) describes Babylon as the strongest and most famous city in Assyria. It is likely that this city was subsequently the mint from which Alexander issued a series of coins depicting eastern warriors on the obverse and an elephant on the reverse (e.g. ANS 1995.51.68). See discussion by Martin Price (1991).''

Is it possible to establish a robust convention that allows unambiguous machine-recognizable linking to the cited text, to Alexander, to Babylon, to a description of the the coin in the collection of the American Numismatic Society and to the article "Circulation at Babylon in 323 B.C."?

2. Indicating the Presence of a Citation (@*="citation")
HTML: &lt;span class="citation">Herodotus (1.78)&lt;/span>

TEI:

In both these usages, an xpath selector "//*[@*='citation']" will create a set of all the citations in a text. That is robust.

3. Normalizing the plain text citation
HTML: &lt;span class="citation" lang="en" title="Herodotus Histories 1.78">Herodotus (1.78)&lt;/span>

TEI:

Normalization will assist tools that can automatically recognize plain text citations.

If the value of the 'title' attribute would be identical to the text representation of the element it is attached to, it can be left out.

Note: in the HTML5 spec, elements without @title inherit the value from any ancestor that has @title. That should not happen in the case of a citation.

4. Be explicit about language
Both "Herodotus Histories 1.78" and "Hdt. 1.78" can be considered English representations of the citation of that text. The German equivalent of the first is "Herodot Historien 1.78", the Latin - still with Arabic numerals - is "Herodotus Historiae 1.78". If the language of the citation is the same as its prose context, it is not necessary to further markup the citation. It is common practice in some disciplines to cite the title of a work in its original language or in a widely accepted academic language, such as Latin titles for Greek works in Classics.

HTML: Herodotus (Historiae 1.78</a>) describes...

The 'lang' attribute specifices the language of the element to which it is attached. It does not directly specify the language of the 'title' attribute. Therefore, they must be the same.

5. Choosing a URL
Ideally, citations in digital scholarship are paired with a link to an online resource available at a persistent URI that that has clear semantics. Such URIs do not always exist, which is one reason to put a plain-text reference in the 'title' attribute.

HTML: Herodotus (1.78)</a>

TEI:

HTML: Babylon</a>

Adding other markup schemes to conformant citations
The 'class="citation" title="<normalized plain text citation>"' html pattern is designed so that it can be easily used with other markup schemes. The global 'class' attribute in html is a space separated list so that other, unrelated values can be present without interfering with the identification of an element as a citation. The global 'title' attribute is directly suitable for the role envisioned here so shouldn't clash with other conforming uses.

Content-creators may choose to add in additional markup. Links to guidelines for doing so are list here.

OpenURL/Coins/Zotero

 * Citations with COINS

CTS + Microformats

 * Citations with CTS and Microformats

RDFa

 * Citations with added RDFa

Canonical Workds Knowledge Base (CWKB)

 * CWKB

Categories of resources that can be cited
Note: the page Current practice in citation has been started.

Ancient Mediterranean Primary Texts
"Classics" has well established abbreviations. Neither complete, nor unambiguous, but well established.
 * Plain text: "Hom. Il. 2.345", "Homer, Iliad 2.345"

The following examples illustrate that the same text can appear in different places.
 * HTML: Hom. Il. 2.345</a>

This example does not address the presence and/or capabilities of the Canonical Text Services (CTS) protocol and URN scheme under development at the Center for Hellenic Studies.

Geographic Entities
Within the Ancient Mediterranean, the Pleiades Project is establishing short URL as identifiers for geographic entities (but see their own discussion for details). Geonames.org is a worldwide list of identifiers.
 * HTML: Ephesus</a>
 * HTML: Samarkand</a>

Bibliographic Data
Worldcat. But there may be licensing issues.
 * HTML: Yang, X., National Gallery of Art (U.S.), Museum of Fine Arts, Houston., & Asian Art Museum of San Francisco. (1999). The golden age of Chinese archaeology: Celebrated discoveries from the People's Republic of China. Washington: National Gallery of Art</a>

What is the relationship between citing a work and citing its bibliographic record? Is that a necessary distinction?

Museum Objects
Or any cataloged object with stable id?

HTML: ANS 1968.34.40</a>.

Egyptian Papyri
The sites http://papyri.info and http://trismegistos.org (e.g. http://www.trismegistos.org/tm/detail.php?tm=23 ) are islands of stability here.

HTML: TM23</a>

Tools

 * citations.js is an early-stage javascript library that parses 'class="citation"' markup. http://github.com/sfsheath/citations-js
 * http://github.com/mromanello/CTS_dev/ hosts code related to the Canonical Text Services protocol.
 * http://github.com/mromanello/CRefEx is a "A Canonical Refences Extractor written in python".

References and Further Reading

 * Heath 2010: Sebastian Heath, 'Diversity and Reuse of Digital Resources for Ancient Mediterranean Material Culture.' In G. Bodard and S. Mahony, eds., Digital Research in the Study of Classical Antiquity. (2010), pp. 35-52. Farnham, UK: Ashgate. http://hdl.handle.net/2451/29797


 * Isaksen 2009. Leif Isaksen ‘Pandora’s Box: The Future of Cultural Heritage on the World Wide Web’ http://leifuss.files.wordpress.com/2009/04/pandorasboxrev1.pdf.


 * Romanello 2007. Matteo Romanello, "A semantic linking system for canonical references to electronic corpora," in International Conference on Electronic Corpora of Ancient Languages : proceedings of the international conference, Prague, November 16-17, 2007, P. Zemanek, Ed., Prague, 2007, pp. 107-120. [Online]. Available: http://eprints.rclis.org/16239/1/Romanello2008.pdf


 * Romanello 2008. Matteo Romanello. "A Semantic Linking Framework to Provide Critical Value-Added Services for E-Journals on Classics." ELPUB 2008: Open Scholarship: Authority, Community, and Sustainability in the Age of Web 2.0 - Proceedings of the 12th International Conference on Electronic Publishing: http://elpub.scix.net/data/works/att/401_elpub2008.content.pdf.


 * Smith 2009. Neel Smith. “Citation in Classical Studies”, Digital Humanities Quarterly Winter 2009 Volume 3 Number 1. http://digitalhumanities.org/dhq/vol/3/1/000028/000028.html