OSCE Scaife Paper

Open Source Critical Editions Workshop at Kings College London September 22, 2006

=Tools for Collaborative Editing (some thoughts by Ross Scaife and Dot Porter)=

Introduction to the Concept
The Wikipedia entry on "collaborative editor" defines the term quite simply: "A collaborative editor allows simultaneous editing of the same document or video by different participants using different computers." Electronic editions have become steadily more popular over the past decade. Libraries and museums have led the charge, followed by increasing numbers of scholars, both individuals and groups, who form the basis of an active community of electronic editors. As this community grows, so does the need for tools suitable to the types of editions that people and institutions are actually creating. Generally, there are three specific needs of humanists involved in collaborative editing projects. Scholars need to be able to build editions encompassing text, images, and annotations, the latter usually using the Extensible Markup Language (XML), the de facto standard for encoding electronic editions in the humanities, and the mode of expression of the Text Encoding Initiative (TEI). Second, software needs to have access control and version management systems that will allow several different editors to collaborate on an edition with different levels of access and without fear that one editor might inadvertently overwrite another's work. Finally, accessibility. Software needs to be designed in such a way that it will encourage collaborative work among individuals who are geographically dispersed, and may encourage electronic editing by those many accomplished humanities scholars who are familiar with basic computer tools (word processors, web browsers, etc.) but who may be put off by regular XML editing software.

Good collaborative editing software will foster the creation of scholarly works by forging partnerships between individuals and institutions, enabling them to share resources, both physical resources (in the form of texts and images) and intellectual (in the form of subject knowledge and editing experience). Software released under an Open-Source license will especially promote cooperation among smaller institutions that might not have the resources to purchase expensive software. Such software could even become a significant resource not only for scholars, but also for teachers and students, potentially encouraging collaborative projects between schools around the world.

Maintaining an edition with multiple editors contributing to the same document requires a significant amount of work. Editors must be careful not to overwrite changes made by others, for example by coordinating the process so that no two editors work on a file at the same time. Word processing software such as Microsoft Word includes a tool for "Tracking Changes", which enables users to work collaboratively; however, though the resulting files are suitable for printing, they are not encoded in a standard acceptable for electronic editions. With the increasing scale and scope of electronic editions, the need for a collaborative editing process rooted in accepted standards, and software to support this process, is even stronger.

How can collaborative editing software help classicists? Give a few real-life examples

 * This page was initially produced and edited by two individuals in Writely (Cnet review  compares other AJAX'ed word processors)
 * readily editable by one or more people, like a wiki. Unlike a wiki, Writely feels like regular word processor, a pared-down MS Word.
 * numerous output formats (html rtf doc odf pdf) to suit a variety of publication/access needs (both print and online publication)
 * provides a view of the history of a document's revisions over time, which helps to show the relative contributions of collaborators over time.
 * documents can be shown to select viewers or made public
 * Similar to Writely, LiveDocuments promises synchronization of Microsoft Office Documents, allowing for collaborative editing/writing in a context familiar to most scholars
 * There is no server requirement, editors need not log on to a central server
 * "LiveDocuments promises Office collaboration without a server"
 * Classics context: note the ideas about a communal text, a personal text, and the text of a given MS presented 11 years ago by the Vergil Project (never implemented, unfortunately)
 * Communal text: "users will participate in the "establishment" of a text that will never reach final form. Here is how it will work. All the texts at this site include a critical apparatus of variant readings, conjectural emendations, and so forth. Because this information is presented on-line, it is possible for interested users to select the readings that they prefer -- to vote, in effect, for the reading that they think should appear in a given passage. These votes can then be tabulated, and the reading receiving the most votes will appear in the Communal Text. Those who consult this version of the text must therefore do so on the understanding that it does not represent the final judgment of any single editorial expert, but the aggregate opinion of the community of users of the site, and that it is subject to change at any moment."
 * Personal text: "Through this menu item users can record their preferences and use them to establish the text that they habitually consult. Of course, it will be possible to use this feature in other ways as well. Someone who wanted to use this site but felt the need of a little extra editorial authority might simply enter into his or her text whatever readings are printed by his or her favorite editor. On the other hand, a group of scholars interested in constructing a text for some specific purpose might use this resource collaboratively. So might a class on Vergil or on textual criticism. No doubt other applications will be thought of as well.
 * Text of a particular manuscript: "Through this feature it will be possible to see the text as it appears in any of the manuscripts whose readings have been entered into the database. If one were interested in the Palatinus, for example, a diplomatic transcript of that manuscript would (with secondary readings and corrections available via hypertext links). In some cases images are available as well, and we hope eventually to provide facsimiles of all the mss in the database."
 * Suda On Line is another oldie-but-goodie with strengths and weaknesses
 * Virtual Humanities Lab at Brown University has been developing a system for collaborative annotation of literary texts. The guidelines for annotation (published here: ) are simple, the software is accessed through a regular web browser.
 * Compare the proposed Homer Multitext:

{quote}"An ideal edition of Homer would encompass the full historical reality of the Homeric textual tradition as it evolved through time, from the pre-Classical era well into the medieval. Our attempt to create such an edition is already underway. Instead of choosing between variants and plus verses in an attempt to recover the ipsissima verba of Homer, we propose to include them in a multitext edition that embraces the fluidity of the textual traditions of the Iliad and Odyssey. The ideal format for this multitext edition of Homer is not a traditional printed text but an electronic, web-based edition. Unlimited in its ability to handle complex sets of variants, an electronic multitext offers critical readers of Homer the opportunity to consider many historical Iliads and Odysseys from the standpoint of many different sources of transmission, and so also allows the user to recover both a more accurate and more accessible picture of the fluidity of the tradition in the earliest stages of textuality." {quote}


 * EDUCE: Ideally, this project needs a strategy for imposing editorial control over the resulting documents in a process that involves establishing the texts, encoding them with standard TEI-XML markup using newly available Open Source software tools, and then publishing the transcripts side-by-side with their associated images following Open Access protocols.

Collaborative Editors
Different types of collaborative editors (see Appendix for list of editors)


 * synchronous vs. asynchronous. Synchronous editors work in "real-time". Changes made by one editor are immediately visible to other editors. Asynchronous editors (including Writely, MediaWiki, and version management systems) synchronize working versions either automatically after-the-fact, or (in the case of version management systems) require users to update changes manually.
 * text-only editing vs. image-based editing. Text-only editors, including most XML and word processing programs, focus solely on the editing of the text. Image-based editors (including the EPPT and the University of Victoria Image Markup Tool) provide simple methods for either incorporating images into editions, or building textual annotations onto images.
 * XML editors vs. text-only editors. XML editors, for example oXygen or XMetal, provide support for building XML annotations into texts. The better editors include various other XML support: XPath searching, XSLT development for translation, DTD or schema development for validation.
 * Problems with collaborative editing

Version control: Wiki, Subversion

Administration for collaborative editing has two main issues: version management and access control. Version management deals with the problems of simultaneous editing. When a user makes changes to a document, we must be prepared to combine those changes with other changes by editors working on the same document. Furthermore, it may be necessary to obtain an earlier version of a document for reference, or even to reverse part of a series of changes while leaving other edits in place. A version management system tracks the branching revisions of a document as it is updated by a number of individuals.

Access control sets limits on the documents an editor can modify (coarse-grained access control), and the types of changes he or she can make to those documents (fine-grained access control). Such a system allows a project administrator to delegate editing responsibilities in a controlled manner. Consider, for example, two scholars with different specialized knowledge who are collaborating on an editing project. One scholar studies language, and is responsible for editing the linguistic aspects of a particular text. Another scholar specializes in manuscript studies, and is responsible for describing aspects of the text within the context of a specific manuscript - the scribal handwriting, condition of the manuscript, etc. The document curator, then, can grant the textual editor access to update sets of markup for describing the language of the text, but not for describing information such as scribal handwriting and condition of the manuscript. Likewise, the manuscript scholar would have access to modify sets of markup for describing the manuscript, but not the language of the text. On the other hand, neither of these scholars would be able to modify administrative markup such as the document's headers. Fine-grained access control allows the administrator to enable both scholars to work simultaneously within their domains of expertise without compromising the integrity and control of the editorial process. The document curator or project coordinator creates a set of rules that specify the "shape" of modifications particular users are allowed to make. Then, when a user attempts to modify part of the document, those access control rules are compared to that part of a document; much like a key in a lock, if the "shape" of the rule matches the document, the lock opens and the change is permitted to go through.

Source code management (SCM) systems such as CVS and SVN have shown their ability to assist in collaborative maintenance of computer source code. SCMs allow programmers to maintain parallel branches of their source code, merging sets of changes from one branch to another. However, SCMs take a line-oriented approach to revision management; while this is ideal for computer source code, is not well suited to XML documents, where modifications usually follow the document's hierarchical structure. Furthermore, merging conflicting changes can be a complex process, and often must be dealt with before a user can commit their changes to a central repository. Finally, SCM systems support primarily coarse-grained access control, so that permission to modify part of a document implies permission to modify the entire document; fine-grained access control affords much more flexibility in organizing a collaborative editing project.

Editing needs are also not fully served by content management systems such as the open-source MediaWiki. This system, which underlies the highly successful Wikipedia collaborative encyclopedia project, has demonstrated its ability to handle collaborative editing at a massive scale. Support for access control, however, is quite limited, given the open-editing model of Wikipedia. While supervisors can "lock" documents to prevent them from being modified, it is difficult to limit access in a more complex fashion. Furthermore, although such systems typically support version management, the revisions of a document are treated as following a linear sequence. Such a model does not adequately capture the complexities of parallel changes, where an editor may modify a document unaware of changes being made by another editor to the same document.

None of the existing systems is designed for a highly collaborative environment with large numbers of concurrent changes and with constant revision tracking. The "perfect" system would combine the version-tracking features of SCM, the scalability of collaborative content management systems, and the security and flexibility of fine-grained access control.

Finding valid metrics for apportioning scholarly credit
Few collaborative projects are prominantly describing their methods for crediting participation. For one good example, see the Tibetan and Himalayan Digital Library.


 * we need to harness the self-interest of scholars:
 * but collaborative work is often incremental (with many small contributions over time). MediaWiki, with its version management system, does provide a way to track the contributions of individuals over time.
 * peer-assessment may be feasible in some cases (as with assessments of Amazon reviews' helpfulness) but often the number of people involved may be very small, in a field like ours.
 * SOL counts users' contributions as translators and editors but cannot provide any qualitative measure, so one person who provided only a single entry that is of very high quality may seem to have done little

Conclusions? Future Directions?
Web-based software would enable collaboration on image- and text-based electronic editions over the Internet, enabling geographically dispersed groups of humanists to collaborate on editions encompassing text, image, and annotations. Even the most tech-savvy humanist working in seclusion is familiar with the dangers of editing electronic files; it is far too easy to copy older versions of files over newer ones, or to accidentally overwrite text through a careless cut and paste. Multiple editors collaborating on the same project require even more coordination and effort to avoid the chance of accidental loss of information. Support management of the complex array of document versions that arise during the collaborative editing process, and by implementing fine-grained access control to documents. Version management would record the history of editors' changes to the electronic edition, allowing for both internal and public review of the status and progress of an electronic edition project. Fine-grained access control would allow project coordinators to delegate editing tasks to individual editors or groups, by limiting modifications to individual parts of a document and its markup. A convenient and flexible interface, running through a standard Internet browser, would allow the coordinator to easily define access-control policies. Tools should take advantage of accepted standards such as the Extensible Markup Language (XML) and the Text Encoding Initiative (TEI), as well as more subject-specific tools such as Epigraphic Documents in TEI XML (EpiDoc) and the Canonical Text Services (CTS) protocol. The community of researchers in the Humanities and Classics in particular would be well-served with a platform that provides the following functionalities:


 * 1) Users in diverse locations can simultaneously edit the same document, using a familiar web browser interface.
 * 2) The automatically managed history of editorial changes allows for merging and/or reverting selected changes without causing version conflicts.
 * 3) Coordinators can add the full advantage of collaboration to works-in-progress by importing existing sources without changing schemas or markup.
 * 4) The use of CTS enables uniform citations to electronic editions.

Appendix: Overview of scholarship

 * "Will Wikipedia Mean the End Of Traditional Encyclopedias?" dialogue between Jimmy Wales and Dale Hoiberg, Wall Street Journal Online, September 12, 2006, URL:
 * "Britannica versus Wikipedia heads to the WSJ," by Ken Fisher. Arstechnica, September 12, 2006, URL:
 * "The Wiki That Edited Me," by Ryan Singel. Wired News, September 7, 2006, URL:
 * "Puppy smoothies: Improving the reliability of open, collaborative wikis," by Tom Cross. First Monday, volume 11, number 9 (September 2006), URL:
 * "7 Things you should Know about Collaborative Editing," EDUCAUSE
 * "Undoing Actions in Collaborative Work,"
 * "A Framework for Undoing Actions in Collaborative Systems,"
 * "Fault-Tolerant Computing in Real-Time Collaborative Editing Systems"
 * "Access Control in Collaborative Systems"
 * "A Model for Semi-(a)Synchronous Collaborative Editing"
 * "A Multimedia Desktop Collaboration System"
 * "A Proposed Model and Functionality Definition for a Collaborative Editing and Conferencing System"
 * "A Survey of Experiences of Collaborative Writing," pp. 87-112, In: Computer Supported Collaborative Writing, Mike Sharples (Ed.), Computer Supported Cooperative Work, Springer-Verlag, London, UK, Computer Supported Cooperative Work, 1993, ISBN 3540197826
 * "Atomic Data Abstractions in a Distributed Collaborative Editing System"
 * "CoDoc: Multi-mode Collaboration over Documents" http://dret.net/biblio/reference/ign04 Engineering Library QA76.758 .C33 2004
 * "Design and Implementation of a Distributed Program for Collaborative Editing"
 * "Designing a Distributed Collaborative Environment"
 * "Flexible Diff-ing in a Collaborative Writing System" (Math Sciences Library HD66 .C563 1992)
 * "Using Web Annotations for Asynchronous Collaboration Around Documents," pp. 309-318, In: David G. Durand (Ed.), Proceedings of the ACM 2000 Conference on Computer Supported Cooperative Work, ACM Press, Philadelphia, Pennsylvania, December 2000, ISBN 1-58113-222-0. Engineering Library QA75.5 C65 2000
 * The Wiki Way: Collaboration and Sharing on the Internet:
 * "The Collaborative Multi-User Editor Project Iris"


 * Resources:*

See http://en.wikipedia.org/wiki/Collaborative_software for a good general discussion of collaborative software in general and for a definition of "computer-supported cooperative work"

Existing Tools
synchronous (see ):

SubEthaEdit (MacOSX):


 * (review)

ACE (platform independent):

Gobby (Linux, Windows, MacOSX):

MoonEdit (Linux, Windows, FreeBSD):

TeNDaX:

Chalk: http://blog.chalk.it/

GroupSketch (a tool for synchronous collaborative sketching):

GROVE, "a textual multi-user outlining tool": Ellis, C., Gibbs, S. and Rein, G. (1990). Design and use of a group editor. In Cockton (Ed.), Engineering for Human-Computer Interaction. North-Holland.

ShrEdit, "a multi-user text editor": L.J. McGuffin, and G.M. Olson: "ShrEdit: a shared electronic workspace," CSMIL Technical Report #45, The University of Michigan, 1992.

DistEdit, "a toolkit for implementing distributed group editors": (Knister, M.J and Prakash, A. (1990): "DistEdit: A Distributed Toolkit for Supporting Multiple Group 'Editors", Proceedings of CSCW '90, ACM 1990 Conference on Computer Supported Cooperative Work, Los Angeles, 1990)

asynchronous:

Writely:

DocSynch:

And, of course, Wiki:

Backend

WebDAV (Web-based Distributed Authoring and Versioning; a set of extensions to the HTTP protocol which allows users to collaboratively edit and manage files on remote web servers):

IETF Delta-V Working Group (This working group will define extensions to HTTP and the WebDAV Distributed Authoring Protocol necessary to enable distributed Web authoring tools to perform, in an interoperable manner, versioning and configuration management of Web resources):

MATE (Multilevel Annotation Tools Engineering; aims to facilitate re-use of language resources by addressing the problems of creating, acquiring, and maintaining language corpora):

Plone: A user-friendly and powerful open source Content Management System ("ideal as an intranet and extranet server, as a document publishing system, a portal server and as a groupware tool for collaboration between separately located entities."; supports XML (see and  for more general Python-XML)):

Plone is built using...

Zope (Z Object Publishing Environment; an open source application server for building content management systems, intranets, portals, and custom applications; Zope also supports XML (see and http://www.zope.org/Members/haqa/XMLKit)):