<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://wiki.digitalclassicist.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=NotisToufexis</id>
	<title>The Digital Classicist Wiki - User contributions [en-gb]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.digitalclassicist.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=NotisToufexis"/>
	<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/Special:Contributions/NotisToufexis"/>
	<updated>2026-06-27T07:00:26Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.41.1</generator>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Grammar_of_Medieval_Greek&amp;diff=7398</id>
		<title>Grammar of Medieval Greek</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Grammar_of_Medieval_Greek&amp;diff=7398"/>
		<updated>2016-11-01T16:41:28Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Available ===&lt;br /&gt;
&lt;br /&gt;
* http://www.mml.cam.ac.uk/greek/research&lt;br /&gt;
&lt;br /&gt;
The Grammar will be published in book form by Cambridge University Press. Publication is due in 2017.&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
The main aim of the project is to provide a comprehensive description of the Greek language between 1100 and 1700. These dates are chosen because texts in the vernacular become available in significant quantity only in the 12th century, and, although there is no obvious point at which to locate the end of the &amp;quot;medieval&amp;quot; period, by the 18th century important cultural and political changes are afoot. The period constitutes a coherent whole in terms of the development of the Greek vernacular. The analysis will be based on as wide a corpus of vernacular texts as possible, including non-literary sources (documents, letters etc.) which have been largely ignored in past studies of Medieval Greek. In certain cases, early medieval texts (5th-11th century) will be taken into account, mainly to illuminate points of historical evolution or the earliest dating of phenomena.&lt;br /&gt;
&lt;br /&gt;
[[Category:Projects]]&lt;br /&gt;
[[Category:XML]]&lt;br /&gt;
[[category:Byzantine]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Grammar_of_Medieval_Greek&amp;diff=7397</id>
		<title>Grammar of Medieval Greek</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Grammar_of_Medieval_Greek&amp;diff=7397"/>
		<updated>2016-11-01T16:38:43Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Available ===&lt;br /&gt;
&lt;br /&gt;
* http://www.mml.cam.ac.uk/greek/research&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
The main aim of the project is to provide a comprehensive description of the Greek language between 1100 and 1700. These dates are chosen because texts in the vernacular become available in significant quantity only in the 12th century, and, although there is no obvious point at which to locate the end of the &amp;quot;medieval&amp;quot; period, by the 18th century important cultural and political changes are afoot. The period constitutes a coherent whole in terms of the development of the Greek vernacular. The analysis will be based on as wide a corpus of vernacular texts as possible, including non-literary sources (documents, letters etc.) which have been largely ignored in past studies of Medieval Greek. In certain cases, early medieval texts (5th-11th century) will be taken into account, mainly to illuminate points of historical evolution or the earliest dating of phenomena.&lt;br /&gt;
&lt;br /&gt;
[[Category:Projects]]&lt;br /&gt;
[[Category:XML]]&lt;br /&gt;
[[category:Byzantine]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=User:NotisToufexis&amp;diff=2856</id>
		<title>User:NotisToufexis</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=User:NotisToufexis&amp;diff=2856"/>
		<updated>2010-06-03T18:33:35Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* http://www.toufexis.info&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Alpheios&amp;diff=2855</id>
		<title>Alpheios</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Alpheios&amp;diff=2855"/>
		<updated>2010-06-03T18:33:03Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Alpheios Tools ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[http://alpheios.net Alpheios Tools]&lt;br /&gt;
&lt;br /&gt;
Alpheios makes software for reading and learning languages. In this release we support Latin and Ancient Greek texts. Support for Arabic and Chinese is currently under development and in the future we hope to support additional languages. Please see Resources Under Development for more information.&lt;br /&gt;
&lt;br /&gt;
The Alpheios tools can be istalled as a Firefox extension [http://alpheios.net/content/installation (installation instructions here)] and can be used with every Greek (in Unicode) and Latin text found online. The tools offer access to standard dictionaries, morphological analysis and much more.&lt;br /&gt;
&lt;br /&gt;
[[category:Tools]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=User:NotisToufexis&amp;diff=2854</id>
		<title>User:NotisToufexis</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=User:NotisToufexis&amp;diff=2854"/>
		<updated>2010-06-03T18:32:39Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* http://www.toufexis.info&lt;br /&gt;
&lt;br /&gt;
[[Alpheios Tools]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=User:NotisToufexis&amp;diff=2853</id>
		<title>User:NotisToufexis</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=User:NotisToufexis&amp;diff=2853"/>
		<updated>2010-06-03T18:32:09Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* http://www.toufexis.info&lt;br /&gt;
&lt;br /&gt;
[[Alpheios]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Help:Editing&amp;diff=2852</id>
		<title>Help:Editing</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Help:Editing&amp;diff=2852"/>
		<updated>2010-06-03T18:31:39Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Help:Editing&amp;diff=2851</id>
		<title>Help:Editing</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Help:Editing&amp;diff=2851"/>
		<updated>2010-06-03T18:30:25Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Alpheios Tools ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[http://alpheios.net Alpheios Tools]&lt;br /&gt;
&lt;br /&gt;
Alpheios makes software for reading and learning languages. In this release we support Latin and Ancient Greek texts. Support for Arabic and Chinese is currently under development and in the future we hope to support additional languages. Please see Resources Under Development for more information.&lt;br /&gt;
&lt;br /&gt;
The Alpheios tools can be istalled as a Firefox extension [http://alpheios.net/content/installation (installation instructions here)] and can be used with every Greek (in Unicode) and Latin text found online. The tools offer access to standard dictionaries, morphological analysis and much more.&lt;br /&gt;
&lt;br /&gt;
[[category:Tools]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Diogenes&amp;diff=2149</id>
		<title>Diogenes</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Diogenes&amp;diff=2149"/>
		<updated>2007-08-24T08:48:18Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Download ===&lt;br /&gt;
&lt;br /&gt;
* http://www.dur.ac.uk/p.j.heslin/Software/Diogenes/index.php&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
Diogenes is a tool for searching and browsing the databases of ancient texts, primarily in Latin and Greek, that are published by the [http://www.tlg.uci.edu/ Thesaurus Linguae Graecae] and the [http://www.packhum.org/ Packard Humanities Institute]. It is free software: you are encouraged to modify, improve, and redistribute it under the terms of the [http://www.dur.ac.uk/p.j.heslin/Software/Diogenes/license.php GNU General Public license].&lt;br /&gt;
&lt;br /&gt;
The goal of this software package is to provide a free, transparent and flexible interface to the classical databases on CD-Rom in the PHI format, which include the TLG, the PHI corpus of Latin texts up to AD 200, the Duke Documentary Papyri collection, and the PHI-sponsored corpora of ancient inscriptions.&lt;br /&gt;
&lt;br /&gt;
* The latest version of Diogenes is much easier to install for Mac (OS X 10.3 or higher), Windows, and Linux.&lt;br /&gt;
&lt;br /&gt;
[[Category:Tools]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Concording_Greek_and_Latin_texts&amp;diff=2148</id>
		<title>Concording Greek and Latin texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Concording_Greek_and_Latin_texts&amp;diff=2148"/>
		<updated>2007-08-24T08:45:40Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: /* Concording a Greek text */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Concording a Latin text==&lt;br /&gt;
A nice, free tool for quick concordances of one's own collection of Latin texts (or a single text) is [http://www.textworld.com/scp/ Simple Concordance Program] by Alan Reed.  Easy to learn, easy to use.  In SCP you can design your own alphabets, so you can also use it, e. g., for Greek texts in Betacode.&lt;br /&gt;
&lt;br /&gt;
If you need a repository of Latin texts to concord, a bit more carefully proofread texts than those at the Latin Library can be found at the [http://neptune.fltr.ucl.ac.be/corpora/ Itinera electronica], courtesy of the Universite Catholique de Louvain, Belgium.  Itinera electronica have their own [http://neptune.fltr.ucl.ac.be/corpora/corpora.htm online concording facilities] as well.&lt;br /&gt;
&lt;br /&gt;
Another valuable tool (if you bring your own texts to it) is [http://portal.tapor.ca/portal/portal TAPoR], Text Analysis Portal for Research (a project based at McMaster University, and consisting of a network of six of the leading Humanities computing centres in Canada).&lt;br /&gt;
&lt;br /&gt;
==Concording a Greek text==&lt;br /&gt;
&lt;br /&gt;
A nice, free and cross-platform tool for concording Greek texts encoded in Unicode is Laurence Anthony's AntConc, available under http://www.antlab.sci.waseda.ac.jp/software.html. &lt;br /&gt;
&lt;br /&gt;
[[category:FAQ]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Concording_Greek_and_Latin_texts&amp;diff=2147</id>
		<title>Concording Greek and Latin texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Concording_Greek_and_Latin_texts&amp;diff=2147"/>
		<updated>2007-08-24T08:43:41Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: /* Concording a Greek text */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Concording a Latin text==&lt;br /&gt;
A nice, free tool for quick concordances of one's own collection of Latin texts (or a single text) is [http://www.textworld.com/scp/ Simple Concordance Program] by Alan Reed.  Easy to learn, easy to use.  In SCP you can design your own alphabets, so you can also use it, e. g., for Greek texts in Betacode.&lt;br /&gt;
&lt;br /&gt;
If you need a repository of Latin texts to concord, a bit more carefully proofread texts than those at the Latin Library can be found at the [http://neptune.fltr.ucl.ac.be/corpora/ Itinera electronica], courtesy of the Universite Catholique de Louvain, Belgium.  Itinera electronica have their own [http://neptune.fltr.ucl.ac.be/corpora/corpora.htm online concording facilities] as well.&lt;br /&gt;
&lt;br /&gt;
Another valuable tool (if you bring your own texts to it) is [http://portal.tapor.ca/portal/portal TAPoR], Text Analysis Portal for Research (a project based at McMaster University, and consisting of a network of six of the leading Humanities computing centres in Canada).&lt;br /&gt;
&lt;br /&gt;
==Concording a Greek text==&lt;br /&gt;
&lt;br /&gt;
A nice, free and cross-platform tool for concording Greek texts encoded in Unicode is Laurence Anthony's AntConc, available under http://www.antlab.sci.waseda.ac.jp/software.html. This is more of a tool for analysis and does not allow for a concordance or an index to be printed directly from the software.&lt;br /&gt;
&lt;br /&gt;
[[category:FAQ]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Concording_Greek_and_Latin_texts&amp;diff=2146</id>
		<title>Concording Greek and Latin texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Concording_Greek_and_Latin_texts&amp;diff=2146"/>
		<updated>2007-08-24T08:42:06Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: /* Concording a Greek text */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Concording a Latin text==&lt;br /&gt;
A nice, free tool for quick concordances of one's own collection of Latin texts (or a single text) is [http://www.textworld.com/scp/ Simple Concordance Program] by Alan Reed.  Easy to learn, easy to use.  In SCP you can design your own alphabets, so you can also use it, e. g., for Greek texts in Betacode.&lt;br /&gt;
&lt;br /&gt;
If you need a repository of Latin texts to concord, a bit more carefully proofread texts than those at the Latin Library can be found at the [http://neptune.fltr.ucl.ac.be/corpora/ Itinera electronica], courtesy of the Universite Catholique de Louvain, Belgium.  Itinera electronica have their own [http://neptune.fltr.ucl.ac.be/corpora/corpora.htm online concording facilities] as well.&lt;br /&gt;
&lt;br /&gt;
Another valuable tool (if you bring your own texts to it) is [http://portal.tapor.ca/portal/portal TAPoR], Text Analysis Portal for Research (a project based at McMaster University, and consisting of a network of six of the leading Humanities computing centres in Canada).&lt;br /&gt;
&lt;br /&gt;
==Concording a Greek text==&lt;br /&gt;
&lt;br /&gt;
A nice, free and cross-platform tool for concording Greek texts encoded in Unicode is Laurence Anthony's AntConc, available under http://www.antlab.sci.waseda.ac.jp/software.html.&lt;br /&gt;
&lt;br /&gt;
[[category:FAQ]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Digiclass:Members&amp;diff=2031</id>
		<title>Digiclass:Members</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Digiclass:Members&amp;diff=2031"/>
		<updated>2007-02-07T10:32:21Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Administrators ===&lt;br /&gt;
&lt;br /&gt;
The following are the administrators of this Wiki space: contact any of the below to have your membership confirmed.&lt;br /&gt;
&lt;br /&gt;
* [[User:GabrielBodard|Gabriel Bodard]]&lt;br /&gt;
* [http://www.methodsnetwork.ac.uk/network/stuart.html Stuart Dunn]&lt;br /&gt;
* [[User:JuanGarces|Juan Garcés]]&lt;br /&gt;
* [http://www.cch.kcl.ac.uk/legacy/tmp/profiles/sm.htm Simon Mahony]&lt;br /&gt;
&lt;br /&gt;
=== Wiki Editors ===&lt;br /&gt;
&lt;br /&gt;
All are welcome to join the Digitalclasicist Wiki as editors and help us build the FAQ and other documents. Contact any of the administrators above to apply for an account.&lt;br /&gt;
&lt;br /&gt;
=== Partner Institutions ===&lt;br /&gt;
&lt;br /&gt;
The Digitalclassicist is proud to list among its partner institutions the following bodies:&lt;br /&gt;
&lt;br /&gt;
* [[Alliance of Digital Humanities Organisations]]&lt;br /&gt;
* [[Ancient World Mapping Center]]&lt;br /&gt;
* [[Centre for Computing in the Humanities]], King's College London&lt;br /&gt;
* [[Center for Hellenic Studies]], Washington DC&lt;br /&gt;
* [[Centre for the Study of Ancient Documents]], Oxford&lt;br /&gt;
* [[Digital Medievalist]]&lt;br /&gt;
* [[Humanist List]]&lt;br /&gt;
* [[ICT in Arts and Humanities Research]]&lt;br /&gt;
* [[Institute for Advanced Technology in the Humanities]], Virginia&lt;br /&gt;
* [[Perseus Digital Library]]&lt;br /&gt;
* [[Stoa Consortium]]&lt;br /&gt;
&lt;br /&gt;
=== Full Digital Classicist Community ===&lt;br /&gt;
&lt;br /&gt;
This is a full list of the membership of the Digitalclassicist community, who serve in both editorial and advisory capacity. (Users with wiki accounts are highlighted in this list.)&lt;br /&gt;
&lt;br /&gt;
* James Aitken&lt;br /&gt;
* Deborah Anderson&lt;br /&gt;
* Paul Arthur&lt;br /&gt;
* Rodney Ast&lt;br /&gt;
* Richard Beacham&lt;br /&gt;
* Eugenia Beu-Dachin&lt;br /&gt;
* Christopher Blackwell&lt;br /&gt;
* [[User:GabrielBodard|Gabriel Bodard]]&lt;br /&gt;
* John Bodel&lt;br /&gt;
* Hugh Bowden&lt;br /&gt;
* Alan Bowman&lt;br /&gt;
* Marjorie Burghart&lt;br /&gt;
* Luca Cardin&lt;br /&gt;
* [[User:HughCayless|Hugh Cayless]]&lt;br /&gt;
* Arianna Ciula&lt;br /&gt;
* Greg Crane&lt;br /&gt;
* Charles Crowther&lt;br /&gt;
* James Cummings&lt;br /&gt;
* Alexander Czmiel&lt;br /&gt;
* Jason Davies&lt;br /&gt;
* Antonio de Freitas&lt;br /&gt;
* Helma Dik&lt;br /&gt;
* [[User:StuartDunn|Stuart Dunn]]&lt;br /&gt;
* [[User:TomElliott|Tom Elliott]]&lt;br /&gt;
* [[User:SarahFinlayson|Sarah Finlayson]]&lt;br /&gt;
* Tim Finney&lt;br /&gt;
* Bruce Fraser&lt;br /&gt;
* Michael Fraser&lt;br /&gt;
* Bernie Frischer&lt;br /&gt;
* Brian Fuchs&lt;br /&gt;
* Michael Fulford&lt;br /&gt;
* Daniele Fusi&lt;br /&gt;
* [[User:JuanGarces|Juan Garcés]]&lt;br /&gt;
* Ioannis Georganas&lt;br /&gt;
* Sebastian Heath&lt;br /&gt;
* Matthäus Heil&lt;br /&gt;
* Peter Heslin&lt;br /&gt;
* Damian Hippisley&lt;br /&gt;
* Laval Hunsucker&lt;br /&gt;
* Dolores Iorizzo&lt;br /&gt;
* Leif Isaksen&lt;br /&gt;
* Michael Jeffries&lt;br /&gt;
* Charles Jones&lt;br /&gt;
* [[User:NevenJovanovic|Neven Jovanović]]&lt;br /&gt;
* Ahuvia Kahane&lt;br /&gt;
* Ruth Kirkham&lt;br /&gt;
* Kalle Korhonen&lt;br /&gt;
* Nathan Lea&lt;br /&gt;
* Eleonora Litta&lt;br /&gt;
* [[User:SimonMahony|Simon Mahony]]&lt;br /&gt;
* Elaine Matthews&lt;br /&gt;
* Willard McCarty&lt;br /&gt;
* Paolo Monella&lt;br /&gt;
* Martin Mueller&lt;br /&gt;
* Orla Mulholland&lt;br /&gt;
* Riona Naidu&lt;br /&gt;
* Dirk Obbink&lt;br /&gt;
* James J. O'Donnell&lt;br /&gt;
* Espen Ore&lt;br /&gt;
* Silvio Panciera&lt;br /&gt;
* Artemis Papakostouli&lt;br /&gt;
* John Pearce&lt;br /&gt;
* Ivana Petrovic&lt;br /&gt;
* Modignani Picozzi&lt;br /&gt;
* Dominic Rathbone&lt;br /&gt;
* Daniel Riaño&lt;br /&gt;
* Andrea Rotstein&lt;br /&gt;
* Charlotte Roueché&lt;br /&gt;
* Ian Ruffell&lt;br /&gt;
* Jeff Rydberg-Cox&lt;br /&gt;
* Ross Scaife&lt;br /&gt;
* R. W. Sharples&lt;br /&gt;
* Janice Siegel&lt;br /&gt;
* Amy Smith&lt;br /&gt;
* Neel Smith&lt;br /&gt;
* Robin Smith&lt;br /&gt;
* [[user:JoshSosin|Joshua Sosin]]&lt;br /&gt;
* Joe Tebben&lt;br /&gt;
* Melissa Terras&lt;br /&gt;
* [http://www.toufexis.info Notis Toufexis]&lt;br /&gt;
* [[User:CharlotteTupman|Charlotte Tupman]]&lt;br /&gt;
* Hafed Walda&lt;br /&gt;
* Sue Willetts&lt;br /&gt;
* Andrew Wilson&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Digiclass:Members&amp;diff=2030</id>
		<title>Digiclass:Members</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Digiclass:Members&amp;diff=2030"/>
		<updated>2007-02-07T10:31:52Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Administrators ===&lt;br /&gt;
&lt;br /&gt;
The following are the administrators of this Wiki space: contact any of the below to have your membership confirmed.&lt;br /&gt;
&lt;br /&gt;
* [[User:GabrielBodard|Gabriel Bodard]]&lt;br /&gt;
* [http://www.methodsnetwork.ac.uk/network/stuart.html Stuart Dunn]&lt;br /&gt;
* [[User:JuanGarces|Juan Garcés]]&lt;br /&gt;
* [http://www.cch.kcl.ac.uk/legacy/tmp/profiles/sm.htm Simon Mahony]&lt;br /&gt;
&lt;br /&gt;
=== Wiki Editors ===&lt;br /&gt;
&lt;br /&gt;
All are welcome to join the Digitalclasicist Wiki as editors and help us build the FAQ and other documents. Contact any of the administrators above to apply for an account.&lt;br /&gt;
&lt;br /&gt;
=== Partner Institutions ===&lt;br /&gt;
&lt;br /&gt;
The Digitalclassicist is proud to list among its partner institutions the following bodies:&lt;br /&gt;
&lt;br /&gt;
* [[Alliance of Digital Humanities Organisations]]&lt;br /&gt;
* [[Ancient World Mapping Center]]&lt;br /&gt;
* [[Centre for Computing in the Humanities]], King's College London&lt;br /&gt;
* [[Center for Hellenic Studies]], Washington DC&lt;br /&gt;
* [[Centre for the Study of Ancient Documents]], Oxford&lt;br /&gt;
* [[Digital Medievalist]]&lt;br /&gt;
* [[Humanist List]]&lt;br /&gt;
* [[ICT in Arts and Humanities Research]]&lt;br /&gt;
* [[Institute for Advanced Technology in the Humanities]], Virginia&lt;br /&gt;
* [[Perseus Digital Library]]&lt;br /&gt;
* [[Stoa Consortium]]&lt;br /&gt;
&lt;br /&gt;
=== Full Digital Classicist Community ===&lt;br /&gt;
&lt;br /&gt;
This is a full list of the membership of the Digitalclassicist community, who serve in both editorial and advisory capacity. (Users with wiki accounts are highlighted in this list.)&lt;br /&gt;
&lt;br /&gt;
* James Aitken&lt;br /&gt;
* Deborah Anderson&lt;br /&gt;
* Paul Arthur&lt;br /&gt;
* Rodney Ast&lt;br /&gt;
* Richard Beacham&lt;br /&gt;
* Eugenia Beu-Dachin&lt;br /&gt;
* Christopher Blackwell&lt;br /&gt;
* [[User:GabrielBodard|Gabriel Bodard]]&lt;br /&gt;
* John Bodel&lt;br /&gt;
* Hugh Bowden&lt;br /&gt;
* Alan Bowman&lt;br /&gt;
* Marjorie Burghart&lt;br /&gt;
* Luca Cardin&lt;br /&gt;
* [[User:HughCayless|Hugh Cayless]]&lt;br /&gt;
* Arianna Ciula&lt;br /&gt;
* Greg Crane&lt;br /&gt;
* Charles Crowther&lt;br /&gt;
* James Cummings&lt;br /&gt;
* Alexander Czmiel&lt;br /&gt;
* Jason Davies&lt;br /&gt;
* Antonio de Freitas&lt;br /&gt;
* Helma Dik&lt;br /&gt;
* [[User:StuartDunn|Stuart Dunn]]&lt;br /&gt;
* [[User:TomElliott|Tom Elliott]]&lt;br /&gt;
* [[User:SarahFinlayson|Sarah Finlayson]]&lt;br /&gt;
* Tim Finney&lt;br /&gt;
* Bruce Fraser&lt;br /&gt;
* Michael Fraser&lt;br /&gt;
* Bernie Frischer&lt;br /&gt;
* Brian Fuchs&lt;br /&gt;
* Michael Fulford&lt;br /&gt;
* Daniele Fusi&lt;br /&gt;
* [[User:JuanGarces|Juan Garcés]]&lt;br /&gt;
* Ioannis Georganas&lt;br /&gt;
* Sebastian Heath&lt;br /&gt;
* Matthäus Heil&lt;br /&gt;
* Peter Heslin&lt;br /&gt;
* Damian Hippisley&lt;br /&gt;
* Laval Hunsucker&lt;br /&gt;
* Dolores Iorizzo&lt;br /&gt;
* Leif Isaksen&lt;br /&gt;
* Michael Jeffries&lt;br /&gt;
* Charles Jones&lt;br /&gt;
* [[User:NevenJovanovic|Neven Jovanović]]&lt;br /&gt;
* Ahuvia Kahane&lt;br /&gt;
* Ruth Kirkham&lt;br /&gt;
* Kalle Korhonen&lt;br /&gt;
* Nathan Lea&lt;br /&gt;
* Eleonora Litta&lt;br /&gt;
* [[User:SimonMahony|Simon Mahony]]&lt;br /&gt;
* Elaine Matthews&lt;br /&gt;
* Willard McCarty&lt;br /&gt;
* Paolo Monella&lt;br /&gt;
* Martin Mueller&lt;br /&gt;
* Orla Mulholland&lt;br /&gt;
* Riona Naidu&lt;br /&gt;
* Dirk Obbink&lt;br /&gt;
* James J. O'Donnell&lt;br /&gt;
* Espen Ore&lt;br /&gt;
* Silvio Panciera&lt;br /&gt;
* Artemis Papakostouli&lt;br /&gt;
* John Pearce&lt;br /&gt;
* Ivana Petrovic&lt;br /&gt;
* Modignani Picozzi&lt;br /&gt;
* Dominic Rathbone&lt;br /&gt;
* Daniel Riaño&lt;br /&gt;
* Andrea Rotstein&lt;br /&gt;
* Charlotte Roueché&lt;br /&gt;
* Ian Ruffell&lt;br /&gt;
* Jeff Rydberg-Cox&lt;br /&gt;
* Ross Scaife&lt;br /&gt;
* R. W. Sharples&lt;br /&gt;
* Janice Siegel&lt;br /&gt;
* Amy Smith&lt;br /&gt;
* Neel Smith&lt;br /&gt;
* Robin Smith&lt;br /&gt;
* [[user:JoshSosin|Joshua Sosin]]&lt;br /&gt;
* Joe Tebben&lt;br /&gt;
* Melissa Terras&lt;br /&gt;
* [[http://www.toufexis.info|Notis Toufexis]]&lt;br /&gt;
* [[User:CharlotteTupman|Charlotte Tupman]]&lt;br /&gt;
* Hafed Walda&lt;br /&gt;
* Sue Willetts&lt;br /&gt;
* Andrew Wilson&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Choudhury_Paper&amp;diff=2027</id>
		<title>OSCE Choudhury Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Choudhury_Paper&amp;diff=2027"/>
		<updated>2007-01-29T15:29:18Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Position Paper on Licensing/Legal Matters=&lt;br /&gt;
&lt;br /&gt;
==Sayeed Choudhury&amp;lt;br&amp;gt;Library Digital Programs, Sheridan Libraries, Johns Hopkins University==&lt;br /&gt;
&lt;br /&gt;
In their position paper, Stuart Dunn and Tobias Blanke raise an interesting and relevant question regarding digital texts: &amp;amp;quot;How can such texts fit in existing library and information (infra)structures?  Will these need to be rethought?&amp;amp;quot;  Winston Tabb, Sheridan Dean of University Libraries at Johns Hopkins, has stated that libraries are built upon three pillars - collections, services and infrastructure.  Arguably, collections have represented the most important element in the print world, with services and infrastructure supporting the collections.  In the digital world, these elements are becoming blurred. It may be appropriate to assert that the ~~principles~~ by which libraries (and archives and museums) have operated remain valid, but the ~~practices~~ need to be reconsidered.  Not surprisingly, libraries are facing new challenges, and opportunities, with the development of infrastructure to support digital collections and services.&lt;br /&gt;
&lt;br /&gt;
At the heart of this infrastructure development effort is the repository.  There are many defintions for repository, but for the purpose of this discussion, the most useful one is offered by Cliff Lynch who stated a &amp;amp;quot;repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.&amp;amp;quot; (http://www.arl.org/newsltr/226/ir.html)&lt;br /&gt;
&lt;br /&gt;
The emphasis on both services and preservation is particularly noteworthy.  From a preservation perspective, it is important to note that both open standards and open source augment our abiilty to support digital preservation. (http://www.ils.unc.edu/callee/oss_preservation.htm).  From the service perspective, other position papers have raised several interesting (potential) needs or uses for digital texts.  One theme becomes patently clear in reading these papers: scholars will not only need access to view digital texts, but will also need the ability to download (en masse), manipulate, transform and repurpose digital texts.  The collaborative editing envsiaged by Ross Scaife and Dot Porter would be difficult without fully open access to digital texts.  The type of markup described by Gabriel Bodard almost certainly requires complete access to digital texts. Greg Crane has often discussed the possibilities for machine translation, language modeling and document analysis with large corpora of digitized texts (http://www.dlib.org/dlib/march06/crane/03crane.html).&lt;br /&gt;
&lt;br /&gt;
These ideas raise very important questions.  Are the libraries involved with Google Book Search (http://books.google.com/) providing only part of the solution?  Even more disconcerting is the idea that these libraries, though well-intentioned, may even inhibit the ability of scholars to work with digital texts in a manner that supports new scholarship.  Will Google work with the scholarly community to build tools and services, or only consider commercial opportunities?  Understandably, libraries, including those working with the Open Content Alliance (http://www.opencontentalliance.org/), consider whether to digitize books already available through Google Book Search in an effort to avoid duplicative efforts.  However, it's important to consider both the collections and services aspects.&lt;br /&gt;
&lt;br /&gt;
Repository development obviously entails a high degree of technology work, but repositories, particularly institutional respositories should respond to a policy and legal framework.  From a technological perspective, it is optimal to develop an unconstrained, open system that can be constrained or modified according to local policy or legal frameworks; it is difficult, if not impossible, to move in the other direction.  The e-Science community has noted that it is important to consider openness even in terms of the data.  The SPARC Open Data (http://www.arl.org/sparc/opendata/) states: &amp;amp;quot;Many advocates of Open Data believe that, although there are substantial potential benefits from sharing and reusing digital data upon which scientific advances are built, today much of it is being lost or underutilized because of legal, technological and other barriers.&amp;amp;quot; That is, even the most open system may not support preservation or scholarly needs if the data is constrained through proprietary formats or legal restrictions. &lt;br /&gt;
&lt;br /&gt;
With these observation in mind, it seems obvious that the scholarly community should adopt, even push, for completely open standards and open access for digital texts.  Such openness offers the greatest potential for the type of digital environment envisaged through the other position papers.&lt;br /&gt;
&lt;br /&gt;
However, it is important to note that the inter-relationships between technology, policy and organizational roles that has been defined in the print world is also becoming blurred.  When a monograph was published, there was a reasonable degree of understanding regarding how a scholar would send this monograph to a publisher, which would seek revenue through sales, but also agree that libraries could offer the book without cost - under certain conditions - to the scholarly community.  With digital publications, this process and role definition is being established, sometimes with controversy.  The US National Endowment for the Humaniites has announced new guidelines for their Scholarly Editions Grants (http://www.neh.gov/grants/guidelines/editions.html) that states a preference for projects that offer digitized works online throgh open access.  This announcement has raised some concerns among scholars and University Presses regarding business models and rights clearance (http://insidehighered.com/news/2006/09/18/documents).  &lt;br /&gt;
&lt;br /&gt;
Finally, what implications arise from open data in terms of the reward structure for scholars?  Will freely available online digital texts be viewed with the same level of rigor or reputation as those &amp;amp;quot;validated&amp;amp;quot; through publishers, peer review, or other means for assessment?  Libraries are eager to serve scholarly needs in the digital age, ideally with an open policy and legal framework.  It is important, however, to address the corresponding implications of such arrangements in terms of organization roles, business models, and reward structures.&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Scaife_Paper&amp;diff=2026</id>
		<title>OSCE Scaife Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Scaife_Paper&amp;diff=2026"/>
		<updated>2007-01-29T15:28:26Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Open Source Critical Editions&lt;br /&gt;
Workshop at Kings College London&lt;br /&gt;
September 22, 2006&lt;br /&gt;
&lt;br /&gt;
=Tools for Collaborative Editing (some thoughts by Ross Scaife and Dot Porter)=&lt;br /&gt;
&lt;br /&gt;
==Introduction to the Concept==&lt;br /&gt;
&lt;br /&gt;
The Wikipedia entry on &amp;quot;collaborative editor&amp;quot; defines the term quite simply: &amp;quot;A collaborative editor allows simultaneous editing of the same document or video by different participants using different computers.&amp;quot; ([http://en.wikipedia.org/wiki/Collaborative_real-time_editor]) Electronic editions have become steadily more popular over the past decade. Libraries and museums have led the charge, followed by increasing numbers of scholars, both individuals and groups, who form the basis of an active community of electronic editors. As this community grows, so does the need for tools suitable to the types of editions that people and institutions are actually creating. Generally, there are three specific needs of humanists involved in collaborative editing projects. Scholars need to be able to build editions encompassing text, images, and annotations, the latter usually using the Extensible Markup Language (XML), the de facto standard for encoding electronic editions in the humanities, and the mode of expression of the Text Encoding Initiative (TEI). Second, software needs to have access control and version management systems that will allow several different editors to collaborate on an edition with different levels of access and without fear that one editor might inadvertently overwrite another's work. Finally, accessibility. Software needs to be designed in such a way that it will encourage collaborative work among individuals who are geographically dispersed, and may encourage electronic editing by those many accomplished humanities scholars who are familiar with basic computer tools (word processors, web browsers, etc.) but who may be put off by regular XML editing software.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Good collaborative editing software will foster the creation of scholarly works by forging partnerships between individuals and institutions, enabling them to share resources, both physical resources (in the form of texts and images) and intellectual (in the form of subject knowledge and editing experience). Software released under an Open-Source license will especially promote cooperation among smaller institutions that might not have the resources to purchase expensive software. Such software could even become a significant resource not only for scholars, but also for teachers and students, potentially encouraging collaborative projects between schools around the world.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Maintaining an edition with multiple editors contributing to the same document requires a significant amount of work. Editors must be careful not to overwrite changes made by others, for example by coordinating the process so that no two editors work on a file at the same time. Word processing software such as Microsoft Word includes a tool for &amp;quot;Tracking Changes&amp;quot;, which enables users to work collaboratively; however, though the resulting files are suitable for printing, they are not encoded in a standard acceptable for electronic editions. With the increasing scale and scope of electronic editions, the need for a collaborative editing process rooted in accepted standards, and software to support this process, is even stronger.&lt;br /&gt;
&lt;br /&gt;
==How can collaborative editing software help classicists? Give a few real-life examples==&lt;br /&gt;
&lt;br /&gt;
* This page was initially produced and edited by two individuals in Writely ([http://www.writely.com/]) (Cnet review ([http://reviews.cnet.com/4520-9239_7-6627472.html?tag=cnetfd.ld3]) compares other AJAX'ed word processors)&lt;br /&gt;
** readily editable by one or more people, like a wiki. Unlike a wiki, Writely feels like regular word processor, a pared-down MS Word.&lt;br /&gt;
** numerous output formats (html rtf doc odf pdf) to suit a variety of publication/access needs (both print and online publication)&lt;br /&gt;
** provides a view of the history of a document's revisions over time, which helps to show the relative contributions of collaborators over time.&lt;br /&gt;
** documents can be shown to select viewers or made public&lt;br /&gt;
* Similar to Writely, LiveDocuments promises synchronization of Microsoft Office Documents, allowing for collaborative editing/writing in a context familiar to most scholars&lt;br /&gt;
** There is no server requirement, editors need not log on to a central server&lt;br /&gt;
** &amp;quot;LiveDocuments promises Office collaboration without a server&amp;quot; [http://arstechnica.com/news.ars/post/20060908-7701.html]&lt;br /&gt;
* Classics context: note the ideas about a communal text, a personal text, and the text of a given MS presented 11 years ago by the Vergil Project (never implemented, unfortunately)&lt;br /&gt;
** Communal text: &amp;quot;users will participate in the &amp;quot;establishment&amp;quot; of a text that will never reach final form. Here is how it will work. All the texts at this site include a critical apparatus of variant readings, conjectural emendations, and so forth. Because this information is presented on-line, it is possible for interested users to select the readings that they prefer -- to vote, in effect, for the reading that they think should appear in a given passage. These votes can then be tabulated, and the reading receiving the most votes will appear in the Communal Text. Those who consult this version of the text must therefore do so on the understanding that it does not represent the final judgment of any single editorial expert, but the aggregate opinion of the community of users of the site, and that it is subject to change at any moment.&amp;quot;&lt;br /&gt;
** Personal text: &amp;quot;Through this menu item users can record their preferences and use them to establish the text that they habitually consult. Of course, it will be possible to use this feature in other ways as well. Someone who wanted to use this site but felt the need of a little extra editorial authority might simply enter into his or her text whatever readings are printed by his or her favorite editor. On the other hand, a group of scholars interested in constructing a text for some specific purpose might use this resource collaboratively. So might a class on Vergil or on textual criticism. No doubt other applications will be thought of as well.&lt;br /&gt;
** Text of a particular manuscript: &amp;quot;Through this feature it will be possible to see the text as it appears in any of the manuscripts whose readings have been entered into the database. If one were interested in the Palatinus, for example, a diplomatic transcript of that manuscript would (with secondary readings and corrections available via hypertext links). In some cases images are available as well, and we hope eventually to provide facsimiles of all the mss in the database.&amp;quot;&lt;br /&gt;
* Suda On Line is another oldie-but-goodie with strengths and weaknesses ([http://www.stoa.org/sol/])&lt;br /&gt;
* Virtual Humanities Lab at Brown University has been developing a system for collaborative annotation of literary texts. The guidelines for annotation (published here: [http://golf.services.brown.edu/projects/VHL/help/guidelines_annot.pdf]) are simple, the software is accessed through a regular web browser.&lt;br /&gt;
* Compare the proposed Homer Multitext:&lt;br /&gt;
&lt;br /&gt;
{quote}&amp;quot;An ideal edition of Homer would encompass the full historical reality of the Homeric textual tradition as it evolved through time, from the pre-Classical era well into the medieval. Our attempt to create such an edition is already underway. Instead of choosing between variants and plus verses in an attempt to recover the ipsissima verba of Homer, we propose to include them in a multitext edition that embraces the fluidity of the textual traditions of the Iliad and Odyssey. The ideal format for this multitext edition of Homer is not a traditional printed text but an electronic, web-based edition. Unlimited in its ability to handle complex sets of variants, an electronic multitext offers critical readers of Homer the opportunity to consider many historical Iliads and Odysseys from the standpoint of many different sources of transmission, and so also allows the user to recover both a more accurate and more accessible picture of the fluidity of the tradition in the earliest stages of textuality.&amp;quot;&lt;br /&gt;
{quote}&lt;br /&gt;
&lt;br /&gt;
* EDUCE: Ideally, this project needs a strategy for imposing editorial control over the resulting documents in a process that involves establishing the texts, encoding them with standard TEI-XML markup using newly available Open Source software tools, and then publishing the transcripts side-by-side with their associated images following Open Access protocols.&lt;br /&gt;
&lt;br /&gt;
==Collaborative Editors==&lt;br /&gt;
&lt;br /&gt;
Different types of collaborative editors (see Appendix for list of editors)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* synchronous vs. asynchronous. Synchronous editors work in &amp;quot;real-time&amp;quot;. Changes made by one editor are immediately visible to other editors. Asynchronous editors (including Writely, MediaWiki, and version management systems) synchronize working versions either automatically after-the-fact, or (in the case of version management systems) require users to update changes manually.&lt;br /&gt;
* text-only editing vs. image-based editing. Text-only editors, including most XML and word processing programs, focus solely on the editing of the text. Image-based editors (including the EPPT and the University of Victoria Image Markup Tool) provide simple methods for either incorporating images into editions, or building textual annotations onto images.&lt;br /&gt;
* XML editors vs. text-only editors. XML editors, for example oXygen or XMetal, provide support for building XML annotations into texts. The better editors include various other XML support: XPath searching, XSLT development for translation, DTD or schema development for validation.&lt;br /&gt;
* Problems with collaborative editing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Version control: Wiki, Subversion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Administration for collaborative editing has two main issues: version management and access control. Version management deals with the problems of simultaneous editing. When a user makes changes to a document, we must be prepared to combine those changes with other changes by editors working on the same document. Furthermore, it may be necessary to obtain an earlier version of a document for reference, or even to reverse part of a series of changes while leaving other edits in place. A version management system tracks the branching revisions of a document as it is updated by a number of individuals.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Access control sets limits on the documents an editor can modify (coarse-grained access control), and the types of changes he or she can make to those documents (fine-grained access control). Such a system allows a project administrator to delegate editing responsibilities in a controlled manner. Consider, for example, two scholars with different specialized knowledge who are collaborating on an editing project. One scholar studies language, and is responsible for editing the linguistic aspects of a particular text. Another scholar specializes in manuscript studies, and is responsible for describing aspects of the text within the context of a specific manuscript - the scribal handwriting, condition of the manuscript, etc. The document curator, then, can grant the textual editor access to update sets of markup for describing the language of the text, but not for describing information such as scribal handwriting and condition of the manuscript. Likewise, the manuscript scholar would have access to modify sets of markup for describing the manuscript, but not the language of the text. On the other hand, neither of these scholars would be able to modify administrative markup such as the document's headers. Fine-grained access control allows the administrator to enable both scholars to work simultaneously within their domains of expertise without compromising the integrity and control of the editorial process. The document curator or project coordinator creates a set of rules that specify the &amp;quot;shape&amp;quot; of modifications particular users are allowed to make. Then, when a user attempts to modify part of the document, those access control rules are compared to that part of a document; much like a key in a lock, if the &amp;quot;shape&amp;quot; of the rule matches the document, the lock opens and the change is permitted to go through.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Source code management (SCM) systems such as CVS and SVN have shown their ability to assist in collaborative maintenance of computer source code. SCMs allow programmers to maintain parallel branches of their source code, merging sets of changes from one branch to another. However, SCMs take a line-oriented approach to revision management; while this is ideal for computer source code, is not well suited to XML documents, where modifications usually follow the document's hierarchical structure. Furthermore, merging conflicting changes can be a complex process, and often must be dealt with before a user can commit their changes to a central repository. Finally, SCM systems support primarily coarse-grained access control, so that permission to modify part of a document implies permission to modify the entire document; fine-grained access control affords much more flexibility in organizing a collaborative editing project.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Editing needs are also not fully served by content management systems such as the open-source MediaWiki. This system, which underlies the highly successful Wikipedia collaborative encyclopedia project, has demonstrated its ability to handle collaborative editing at a massive scale. Support for access control, however, is quite limited, given the open-editing model of Wikipedia. While supervisors can &amp;quot;lock&amp;quot; documents to prevent them from being modified, it is difficult to limit access in a more complex fashion. Furthermore, although such systems typically support version management, the revisions of a document are treated as following a linear sequence. Such a model does not adequately capture the complexities of parallel changes, where an editor may modify a document unaware of changes being made by another editor to the same document.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
None of the existing systems is designed for a highly collaborative environment with large numbers of concurrent changes and with constant revision tracking. The &amp;quot;perfect&amp;quot; system would combine the version-tracking features of SCM, the scalability of collaborative content management systems, and the security and flexibility of fine-grained access control.&lt;br /&gt;
&lt;br /&gt;
==Finding valid metrics for apportioning scholarly credit==&lt;br /&gt;
&lt;br /&gt;
Few collaborative projects are prominantly describing their methods for crediting participation. For one good example, see the Tibetan and Himalayan Digital Library ([http://www.thdl.org/xml/showEssay.php?xml=/intro/participation.xml&amp;amp;amp;l=d1e650]).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* we need to harness the self-interest of scholars:&lt;br /&gt;
** but collaborative work is often incremental (with many small contributions over time). MediaWiki, with its version management system, does provide a way to track the contributions of individuals over time.&lt;br /&gt;
** peer-assessment may be feasible in some cases (as with assessments of Amazon reviews' helpfulness) but often the number of people involved may be very small, in a field like ours.&lt;br /&gt;
** SOL counts users' contributions as translators and editors but cannot provide any qualitative measure, so one person who provided only a single entry that is of very high quality may seem to have done little&lt;br /&gt;
&lt;br /&gt;
==Conclusions? Future Directions?==&lt;br /&gt;
&lt;br /&gt;
Web-based software would enable collaboration on image- and text-based electronic editions over the Internet, enabling geographically dispersed groups of humanists to collaborate on editions encompassing text, image, and annotations. Even the most tech-savvy humanist working in seclusion is familiar with the dangers of editing electronic files; it is far too easy to copy older versions of files over newer ones, or to accidentally overwrite text through a careless cut and paste. Multiple editors collaborating on the same project require even more coordination and effort to avoid the chance of accidental loss of information. Support management of the complex array of document versions that arise during the collaborative editing process, and by implementing fine-grained access control to documents. Version management would record the history of editors' changes to the electronic edition, allowing for both internal and public review of the status and progress of an electronic edition project. Fine-grained access control would allow project coordinators to delegate editing tasks to individual editors or groups, by limiting modifications to individual parts of a document and its markup. A convenient and flexible interface, running through a standard Internet browser, would allow the coordinator to easily define access-control policies. Tools should take advantage of accepted standards such as the Extensible Markup Language (XML) and the Text Encoding Initiative (TEI), as well as more subject-specific tools such as Epigraphic Documents in TEI XML (EpiDoc) and the Canonical Text Services (CTS) protocol. The community of researchers in the Humanities and Classics in particular would be well-served with a platform that provides the following functionalities:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# Users in diverse locations can simultaneously edit the same document, using a familiar web browser interface.&lt;br /&gt;
# The automatically managed history of editorial changes allows for merging and/or reverting selected changes without causing version conflicts.&lt;br /&gt;
# Coordinators can add the full advantage of collaboration to works-in-progress by importing existing sources without changing schemas or markup.&lt;br /&gt;
# The use of CTS enables uniform citations to electronic editions.&lt;br /&gt;
&lt;br /&gt;
==Appendix: Overview of scholarship==&lt;br /&gt;
&lt;br /&gt;
* &amp;quot;Will Wikipedia Mean the End Of Traditional Encyclopedias?&amp;quot; dialogue between Jimmy Wales and Dale Hoiberg, ~~Wall Street Journal Online~~, September 12, 2006, URL: [http://online.wsj.com/public/article/SB115756239753455284-A4hdSU1xZOC9Y9PFhJZV16jFlLM_20070911.html]&lt;br /&gt;
* &amp;quot;Britannica versus Wikipedia heads to the WSJ,&amp;quot; by Ken Fisher. ~~Arstechnica~~, September 12, 2006, URL: [http://arstechnica.com/news.ars/post/20060912-7726.html]&lt;br /&gt;
* &amp;quot;The Wiki That Edited Me,&amp;quot; by Ryan Singel. ~~Wired News~~, September 7, 2006, URL: [http://www.wired.com/news/technology/0,71737-0.html?tw=rss.index]&lt;br /&gt;
* &amp;quot;Puppy smoothies: Improving the reliability of open, collaborative wikis,&amp;quot; by Tom Cross. ~~First Monday~~, volume 11, number 9 (September 2006), URL: [http://firstmonday.org/issues/issue11_9/cross/index.html]&lt;br /&gt;
* &amp;quot;7 Things you should Know about Collaborative Editing,&amp;quot; EDUCAUSE [http://www.educause.edu/content.asp?page_id=666&amp;amp;amp;ID=ELI7009&amp;amp;amp;bhcp=1]&lt;br /&gt;
* &amp;quot;Undoing Actions in Collaborative Work,&amp;quot; [http://www.eecs.umich.edu/~aprakash/papers/prakash-knister-cscw92.pdf]&lt;br /&gt;
* &amp;quot;A Framework for Undoing Actions in Collaborative Systems,&amp;quot; [http://www.eecs.umich.edu/~aprakash/papers/undo-tochi94.pdf]&lt;br /&gt;
* &amp;quot;Fault-Tolerant Computing in Real-Time Collaborative Editing Systems&amp;quot; [http://www.cse.unl.edu/~xqin/research/ftrce.html]&lt;br /&gt;
* &amp;quot;Access Control in Collaborative Systems&amp;quot; [http://portal.acm.org/citation.cfm?id=1057977.1057979]&lt;br /&gt;
* &amp;quot;A Model for Semi-(a)Synchronous Collaborative Editing&amp;quot; [http://dret.net/biblio/reference/min93]&lt;br /&gt;
* &amp;quot;A Multimedia Desktop Collaboration System&amp;quot; [http://dret.net/biblio/reference/che92b]&lt;br /&gt;
* &amp;quot;A Proposed Model and Functionality Definition for a Collaborative Editing and Conferencing System&amp;quot; [http://dret.net/biblio/reference/lub90b]&lt;br /&gt;
* &amp;quot;A Survey of Experiences of Collaborative Writing,&amp;quot; pp. 87-112, In: Computer Supported Collaborative Writing, Mike Sharples (Ed.), Computer Supported Cooperative Work, Springer-Verlag, London, UK, Computer Supported Cooperative Work, 1993, ISBN 3540197826 [http://dret.net/biblio/reference/bec93]&lt;br /&gt;
* &amp;quot;Atomic Data Abstractions in a Distributed Collaborative Editing System&amp;quot; [http://dret.net/biblio/reference/gre85]&lt;br /&gt;
* &amp;quot;CoDoc: Multi-mode Collaboration over Documents&amp;quot; http://dret.net/biblio/reference/ign04 Engineering Library QA76.758 .C33 2004&lt;br /&gt;
* &amp;quot;Design and Implementation of a Distributed Program for Collaborative Editing&amp;quot; [http://dret.net/biblio/reference/sel86]&lt;br /&gt;
* &amp;quot;Designing a Distributed Collaborative Environment&amp;quot; [http://dret.net/biblio/reference/che92]&lt;br /&gt;
* &amp;quot;Flexible Diff-ing in a Collaborative Writing System&amp;quot; (Math Sciences Library HD66 .C563 1992) [http://dret.net/biblio/reference/neu92]&lt;br /&gt;
* &amp;quot;Using Web Annotations for Asynchronous Collaboration Around Documents,&amp;quot; pp. 309-318, In: David G. Durand (Ed.), Proceedings of the ACM 2000 Conference on Computer Supported Cooperative Work, ACM Press, Philadelphia, Pennsylvania, December 2000 , ISBN 1-58113-222-0. [http://dret.net/biblio/reference/cad00] Engineering Library QA75.5 C65 2000&lt;br /&gt;
* The Wiki Way: Collaboration and Sharing on the Internet: [http://dret.net/biblio/reference/leu01]&lt;br /&gt;
* &amp;quot;The Collaborative Multi-User Editor Project Iris&amp;quot; [http://www11.informatik.tu-muenchen.de/publications/pdf/Koch1995.pdf]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
*Resources:*&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
See http://en.wikipedia.org/wiki/Collaborative_software for a good general discussion of collaborative software in general and [http://en.wikipedia.org/wiki/CSCW] for a definition of &amp;quot;computer-supported cooperative work&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Existing Tools==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
''synchronous'' (see [http://en.wikipedia.org/wiki/Collaborative_real-time_editor]):&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
SubEthaEdit (MacOSX): [http://www.codingmonkeys.de/subethaedit/collaborate.html]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* [http://www.macdevcenter.com/pub/a/mac/2003/12/02/rendezvous.html] (review)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
ACE (platform independent): [http://ace.iserver.ch/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gobby (Linux, Windows, MacOSX): [http://gobby.0x539.de/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
MoonEdit (Linux, Windows, FreeBSD): [http://moonedit.com/index.html.en]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
TeNDaX: [http://www.tendax.net/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Chalk: http://blog.chalk.it/&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
GroupSketch (a tool for synchronous collaborative sketching): [http://grouplab.cpsc.ucalgary.ca/papers/1992/92-GroupSketch-Video.CSCW/groupsketchvideo.pdf]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
GROVE, &amp;quot;a textual multi-user outlining tool&amp;quot;: Ellis, C., Gibbs, S. and Rein, G. (1990). Design and use of a group editor. In Cockton (Ed.), Engineering for Human-Computer Interaction. North-Holland.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
ShrEdit, &amp;quot;a multi-user text editor&amp;quot;: L.J. McGuffin, and G.M. Olson: &amp;quot;ShrEdit: a shared electronic workspace,&amp;quot; CSMIL Technical Report #45, The University of Michigan, 1992.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
DistEdit, &amp;quot;a toolkit for implementing distributed group editors&amp;quot;: (Knister, M.J and Prakash, A. (1990): &amp;quot;DistEdit: A Distributed Toolkit for Supporting Multiple Group 'Editors&amp;quot;, Proceedings of CSCW '90, ACM 1990 Conference on Computer Supported Cooperative Work, Los Angeles, 1990)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
''asynchronous:''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Writely: [http://www.writely.com/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
DocSynch: [http://docsynch.sourceforge.net/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
And, of course, Wiki: [http://www.wiki.org/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
''Backend''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
WebDAV (Web-based Distributed Authoring and Versioning; a set of extensions to the HTTP protocol which allows users to collaboratively edit and manage files on remote web servers): [http://www.webdav.org/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
IETF Delta-V Working Group (This working group will define extensions to HTTP and the WebDAV Distributed Authoring Protocol necessary to enable distributed Web authoring tools to perform, in an interoperable manner, versioning and configuration management of Web resources): [http://www.webdav.org/deltav/deltav-charter.html]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
MATE (Multilevel Annotation Tools Engineering; aims to facilitate re-use of language resources by addressing the problems of creating, acquiring, and maintaining language corpora): [http://mate.nis.sdu.dk/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Plone: A user-friendly and powerful open source Content Management System (&amp;quot;ideal as an intranet and extranet server, as a document publishing system, a portal server and as a groupware tool for collaboration between separately located entities.&amp;quot;; supports XML (see [http://plone.org/documentation/tutorial/xml-in-plone-with-marshall/?searchterm=XML] and [http://pyxml.sourceforge.net/topics/] for more general Python-XML)): [http://plone.org/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
''Plone is built using...''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zope (Z Object Publishing Environment; an open source application server for building content management systems, intranets, portals, and custom applications; Zope also supports XML (see [http://www.zope.org/Members/karl/ParsedXML/ParsedXML and http://www.zope.org/Members/haqa/XMLKit])): [http://www.zope.org/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2025</id>
		<title>OSCE Crane Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2025"/>
		<updated>2007-01-29T15:26:39Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;We need a comprehensive library of initial editions, openly accessible and freely available for re-use in derivative works.  This paper outlines one strategy for starting with print editions and moving into a more purely digital stage. &lt;br /&gt;
&lt;br /&gt;
There are two components to this argument, both on the Perseus Development Wiki:&lt;br /&gt;
&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Open_Content_Scholarly_Sources&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Next_generation_electronic_editions&lt;br /&gt;
&lt;br /&gt;
=Open Content Scholarly Sources=&lt;br /&gt;
&lt;br /&gt;
Google, Microsoft, Yahoo and other internet giants are now creating digital libraries designed to become more comprehensive than any academic library in human history. The current philosophy of these efforts stresses open access.  The creators of the Google project and the Internet Archive have expressed a dedication to open access.  Open access also maximizes the potential audience and thus  reinforces the advertising based business model on which these internet giants have founded their library efforts.&lt;br /&gt;
&lt;br /&gt;
The funders, however, retain varying rights to their work.  Google, for example, has now made available full PDF image books of public domain documents but it asserts proprietary rights over the page images and does not allow third parties to apply their own OCR or document recognition software.  The Open Content Alliance in principle encourages its partners to share everything but individual funders can impose their own restrictions on what they submit to OCA.&lt;br /&gt;
&lt;br /&gt;
We are therefore creating a completely open source library of core resources such as reference works and critical editions.  Our goal is to provide access to foundational information and also a foundation of materials that subsequent authors can modify, update, expand, and otherwise improve.  &lt;br /&gt;
&lt;br /&gt;
Our selection criteria differ from those of the print world.  A print library picks the best, most up-to-date documents available, knowing that print publications can be replaced but cannot change.  In a true digital library, documents can be dynamic and evolve in real time.  A recent encyclopedia will, presumably, be superior to another that is a century old.  But if the century-old encyclopedia can be freely updated and attracts high quality modifications, it can evolve and become more up-to-date and more authoritative than its frozen print counterpart.&lt;br /&gt;
&lt;br /&gt;
The classics component of the Open Content Scholarly Library that Perseus is helping create is being made available under a sharalike/attribution/non-commercial Creative Commons license. It contains the following:&lt;br /&gt;
&lt;br /&gt;
* Source texts of Greek and Latin:  We have already released c. 8.5 million words of Greek and Latin source texts in TEI-compliant XML.  We have also digitized several hundred volumes of source texts.  These will be available as image books with searchable OCR and, where feasible, XML transcriptions.  Unlike most previous collections, this includes, where possible, multiple editions as well as traditional lists of places where on-line editions differ from editions not yet available on-line.&lt;br /&gt;
&lt;br /&gt;
* Lexica of Greek and Latin:  These include major works such as the Liddell Scott Jones Greek-English Lexicon and the Lewis and Short Latin-English Lexicon as well as more specialized works such as Cunliff's Homeric Lexicon.&lt;br /&gt;
&lt;br /&gt;
* Grammars:  These include student grammars such as Smyth's Greek Grammar and Allen and Greenough's Latin Grammar as well as extensive scholarly works such as Kühner-Gerth.&lt;br /&gt;
&lt;br /&gt;
* Commentaries:  These include scholarly editions as well as school commentaries with linguistic annotations.  Commentaries lend themselves particularly well to electronic publication, which is optimally designed for the production, display and management of annotations.&lt;br /&gt;
&lt;br /&gt;
* Tools:  These include Morpheus, the morphological analysis system developed in the late 1980s and still providing useful analyses of Greek and Latin words.  More importantly, this will include the databases with c. 100,000 stems and endings, mined from many sources,  and of potential use to third party morphological analysis systems.  All the core tools in the Perseus Digital Library have been rewritten in Java and will be available as additions to institutional repositories such as Fedora and any developers.&lt;br /&gt;
&lt;br /&gt;
* FRBR Catalog Records for source texts:  Large projects such as dictionaries and text corpora have developed checklists of editions which they have used.  We are creating a modern catalog that builds on prior work (e.g., we use the author and work numbers developed by the TLG and PHI for Greek and Latin author) but provides an extensible architecture that can manage multiple editions, translations (e.g, English, French and German translations of an author), multiple versions of the same editions (e.g., an image book vs. a TEI transcription), multiple citation schemes (e.g., sections vs. chapters in Cicero)..&lt;br /&gt;
&lt;br /&gt;
* Authority lists of people, places, dictionary entries, organizations, etc.  The reference works that we are producing lay the foundation for a comprehensive, extensible set of authority lists -- shared names with which we can uniquely identify particular people, places dictionary entries, organizations, etc.  While such authority lists are difficult -- experts may differ on which Sallust a particular passage designates and will never all agree on which when we have a dictionary word with two distinct meanings vs. two distinct dictionary words.  Nevertheless, all scholarly work depends upon the entries that appear in our reference works and electronic authority lists, however imperfect, are essential tools for large digital collections.&lt;br /&gt;
&lt;br /&gt;
Users include:&lt;br /&gt;
&lt;br /&gt;
* Service providers:  we would like to see the data released useful to as many groups and in as many ways as possible.  Thus, we hope to see the content in Google and the Open Content Alliance as well as scholarly environment such as Chicago's Philologic and the Canadian TAPOR project.&lt;br /&gt;
&lt;br /&gt;
* Experts in the field:  we hope that experts in the field will revise and extend every document that we release, with versioning systems tracking these changes and allowing experts to get the credit which they deserve for the work that they do.&lt;br /&gt;
&lt;br /&gt;
* General students of the field:  we hope to see Wiki based commentaries in which non-experts working their way through a text pose and answer the questions which puzzle them.&lt;br /&gt;
&lt;br /&gt;
* Advanced service developers:  we hope that developers will mine the encylopedias to drive their named entity identification systems (e.g., analyzer the articles in Smith's to determine which Alexander a particular document is discussing), sense disambiguation (e.g., which sense of a word in an on-line lexicon is in play in a  given passage), machine translation (e.g., mine the parallel texts and translations and the bilingual dictionaries so that a modern machine translation system can provide Greek/English, Latin/English translations etc.).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Next Generation Editions=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Summary==&lt;br /&gt;
&lt;br /&gt;
We propose a new generation of primary source corpora that are:&lt;br /&gt;
&lt;br /&gt;
* ''Permanent'':  The texts are not leased from a commercial vendor over a period of time but are permanently accessible, with reference copies and versioning information stored in multiple institutional repositories for long term preservation as well as freely available.&lt;br /&gt;
&lt;br /&gt;
* ''Openly accessible'':  Cultural heritage primary sources in the public domain should be openly accessible to all.  If it is necessary to restrict access to newly digitized materials in order to secure funding, that restriction should be clearly delimited and as short as possible: e.g., those who fund digitization may have exclusive access for five years before the texts are released for universal access.&lt;br /&gt;
&lt;br /&gt;
* ''Multi-versioned'':  The texts themselves can be updated, with all changes tracked in a versioning system. Alternately, the texts provide a stable foundation for standoff markup representing textual variants or advanced interpretation.&lt;br /&gt;
&lt;br /&gt;
* ''Paid for and maintained by academic libraries'':  While external funding may help begin this process, library acquisition budgets are the long term source of funding for costs such as data entry.  Libraries already pay for the production of digital resources by commercial, for-profit entitites, which restrict access to public domain content. The same library budgets can support open access databases built on public domain source materials.&lt;br /&gt;
&lt;br /&gt;
==Open Content Editions==&lt;br /&gt;
&lt;br /&gt;
The Perseus Project has released TEI conformant XML texts with 55 million words of American English, 13 million words of Latin and Greek source texts, and, for most of the Greek and Latin, corresponding English translations. These texts are available under a Creative Commons non-commercial license: they must be used with attribution; changes must be shared; they cannot be used as part of a commercial corpus.  Commercial entities can, however, freely design for profit services that add value to these openly accessible sources.&lt;br /&gt;
&lt;br /&gt;
While these source texts can freely circulate, they will also be part of the university's permanent institutional repository, thus providing a stable, long term home that will outlast any single project or contributor.&lt;br /&gt;
&lt;br /&gt;
The Greek and Latin corpus contains most of the major works of classical literature. The Perseus Latin Collection contains more than half of the classical corpus and that coverage will approach 100% over the course of 2006/2007.&lt;br /&gt;
&lt;br /&gt;
Working wish lists for [[Latin_wishlist | Latin]] and [[Greek_wishlist | Greek]] are available for comment/addition.&lt;br /&gt;
&lt;br /&gt;
==Next Steps==&lt;br /&gt;
&lt;br /&gt;
* ''Links to page images of paper sources'': With Google Library, the Open Content Alliance and Europe's i2010 we see the emerge of digital libraries with millions of books with high quality page images.  Copyright restrictions complicate these efforts but solid versions of most major authors are available in the public domain.  &lt;br /&gt;
&lt;br /&gt;
* ''Full coverage including apparatus, introduction, indices etc.'': Digital editions can include all information in the print text and not only the text.&lt;br /&gt;
&lt;br /&gt;
* ''Semantic markup'':  Markup should reflect meaning and not only appearence.&lt;br /&gt;
&lt;br /&gt;
* ''Collation of multiple sources'': Semantic markup, if applied to the apparatus criticus, should result in machine actionable data, allowing users to compare multiple versions of the same text.&lt;br /&gt;
&lt;br /&gt;
=Building a digital library of primary sources=&lt;br /&gt;
&lt;br /&gt;
The first generation of large scale, on-line text corpora provided transcriptions of primary materials. Projects such as the TLG and the ''Packard Humanities Institute Latin CD ROM'' carefully document the copy texts on which their electronic versions depend. The provenance of texts in the extensive Latin corpus at [[http://www.thelatinlibrary.com the Latin Library]] is often unclear, with volunteer transcribers blending texts and leaving no trail of their changes.&lt;br /&gt;
&lt;br /&gt;
We now see vast libraries with millions of digital books either in active development or in advanced stages of planning. Most, if not all, of books now in the public domain will be available in electronic form. Rights disputes may slow digitization of the rest but Google's aggressive stance may, at worst, make publishers more open to pursuing an acceptable arrangement with Yahoo, Microsoft and others now entering this market. In this model, readers view scanned page images but search text automatically generated by OCR software. For many purposes, such &amp;quot;image front&amp;quot; collections are quite effective:  narrative prose printed since the mid 19th century lends itself very well to commercial OCR. &lt;br /&gt;
&lt;br /&gt;
Image books do not, however, provide the accuracy and detailed markup that users of primary sources expect.  Text collections with millions of words will contain errors for some time after publication but we want to minimize these errors.  We want to be able to identify pieces of texts by standard citation (e.g., &amp;quot;Liv. 3.22&amp;quot; should retrieve the text of Book 3, Chapter 22 of Livy's History of Rome. We also want text searches to be able to distinguish between primary text, textual notes and other annotations.&lt;br /&gt;
&lt;br /&gt;
The following describes an approach of adding structure to digital image books of primary sources. &lt;br /&gt;
&lt;br /&gt;
* '''Collate an image-front edition with searchable, OCR generated text against other electronic editions of the same text''':  Many classical texts are available on-line in at least one edition.  Once we have scanned a new edition and generated text with OCR, we can collate the OCR against pre-existing electronic editions with surprisingly little effort:  half of the word forms in a book length document are generally unique.  By comparing sequences of unique word forms in pre-existing text and new OCR, we can align use these sequences to align two texts.  In our experiments, we have found that we can immediately align one word in ten.  We can then compare the intervening sequence (on the average nine words long) to identify variations.  Variations include errors in data entry (whether in the OCR or in the pre-existing text), deliberate textual variations and non-textual elements such as headers and textual notes.  Where a variation involves one or two words and we cannot generate a morphological analysis for the new words, then we probably have an error.  If we can generate morphological analyses for the variants in both versions, then we probably have deliberate variations. If we have extra text at the start or end of pages, we probably have headers or notes.  If we have extraneous numbers in the source texts, then these are probably citations.  Even if we are working with a pre-existing text that contains errors or whose provenance is unknown, we can often use this text to determine that page 123 of edition X contains book 3, lines 33 to 57 of a given edition, thus making the OCR generated edition citable by chapter and verse.  If we have an accurate pre-existing text without textual notes, we can compare the results of searching that text with searching the relevant sections of the OCR-generated text.  If a word shows up in the OCR generated text but not in the pre-existing text, then we probably have a match in the textual notes.  While OCR quality varies from text to text and from language to language, we can thus produce initial searches of the textual notes with relatively little effort.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* '''Create an accurate, carefully marked up transcription of a print original''':  In this stage, we aim to capture every character on the printed source page and to represent the logical structure of the document: ideally, the text should be sufficiently well encoded that readers could ask to compare the readings reported by different witnesses (e.g., &amp;quot;display places where M differs from P and provide a statistical analysis of how often these sources differ&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
* '''Create a new edition, traceable to its print original, but able to represent multiple versions representing multiple witnesses and multiple new editions''':  The source text becomes the foundation multiple new editions. Once we have a carefully constructed source text, we can generate as many variations as we like. The source may -- and probably willl -- soon recede into the background but will provide a stable framework whereby we can compare all subsequent editions.&lt;br /&gt;
&lt;br /&gt;
==Choice of source texts==&lt;br /&gt;
&lt;br /&gt;
If we were creating a traditional scholarly text collection, we would want the most up-to-date current editions, In this model, however, we need to balance the authority of the source text against their ability to evolve into richer editions encoding multiple sources and editorial versions. If a serious user community exists, if it values additions to textual scholarship and if it has reasonable technical and editorial mechanisms to enhance its editions, living older texts will overtake any static edition. &lt;br /&gt;
&lt;br /&gt;
The two extreme cases are:&lt;br /&gt;
&lt;br /&gt;
* '''Recent editions that may be at present the most comprehensive and authoritative but cannot be augmented'''.  Whether or not publishers can claim copyright to scholarly reconstructions of primary source materials, editors should certainly have the right to prepare a single version of an edition to which no one else can make changes.&lt;br /&gt;
&lt;br /&gt;
* '''Editions that are are designed to accept -- and document -- new witnesses and editorial decisions'''.  In the simplest case, this would include careful transcriptions of public domain editions. A mature versioning environment tracks each addition and can reconstruct any given version. Versioning software analyzes new transcriptions of witnesses and editions.&lt;br /&gt;
&lt;br /&gt;
In practical terms, the best accessible editions will usually be the best public domain editions, with a few editors initially offering their work. It would probably be best to use public domain editions as initial test cases and to use these to work out inevitable bugs and organizational issues. Current editors may, in any event, find it as easy to add their changes to a well-structured public domain edition than to supervise the markup of their own print editions or the word processing files from which they derive. &lt;br /&gt;
&lt;br /&gt;
==Sources for Images of Print Editions==&lt;br /&gt;
&lt;br /&gt;
* '''Local book scanning''':  A number of institutions (including Perseus) can scan limited numbers of books.  Sheet feeder scanners can process c. 1,000 pages an hour but they require that the source books be disbound. Look down scanners do not damage the source materials and are slower but they still can process 100+ pages in an hour and are useful for smaller jobs.&lt;br /&gt;
&lt;br /&gt;
* '''Large book scanning projects''':  There are now a number of projects that are scanning very large numbers of books.  [[http://books.google.com/ Google Print]] has begun assembling a library that will include tens of millions of books.  Google plans to make the library openly searchable and will return copies of the scanned books to the participating research libraries, but it is not clear how easily other developers will be able to get their own copies on which to apply specialized OCR and content analysis. The [[http://www.opencontentalliance.org/ Open Content Alliance]] constitutes a growing consortium of content providers and third party service providers.  Led by the [[http://www.archive.org Internet Archive]], the OCA has begun making high resolution image books available and is providing [[http://www.archive.org/details/texts a clearing house for related efforts]] such as the [[http://www.archive.org/details/millionbooks Million Book Project]]. The newer robotic scanners do a very good job of turning pages -- even pausing to let one page clinging to another drop off as they turn. They seem to be able to process more than 1,000 pages an hour and thus to exceed the best throughput we have achieved running disbound pages through a sheet feeder -- very impressive. The drawback is that these robots are expensive: the most recent ones from Kirtas cost $140,000-$180,000. You need to get high volume to justify this enconomically. If you can get 1,200 pages an hour, then you might do three books an hour and 120 books a week. That would be about 6,000 books a year -- or about $30-$40 per book for the hardware investement alone exclusive of labor and postprocessing. If you consider 100 hours/week over two years and thus 300 400-page books a week, you get to  15,000 a year and the price clearly comes down. Run that over three years with 45,000 books and the cost becomes manageable.&lt;br /&gt;
&lt;br /&gt;
In practice, editors interested in a few authors can get their source materials scanned at a variety of locations.  Larger series (such as the Patrologia Latina) are well suited to the large scale book scanning projects. The biggest problem involves getting copies of the desired books to a location where large scale scanning is taking place.  The California Digital Library, which serves the UC system, and the University of Toronto were early on partners in OCA and between them would have virtually every edition of Greek or Latin texts published in the past two centuries. An [[http://www.libraryjournal.com/article/CA6277402.html article in LibraryJournal from November 1, 2005]] reports that the European Commission is planning a large digital library project of its own that will focus initially on the public domain.&lt;br /&gt;
&lt;br /&gt;
==Components of next generation electronic editions==&lt;br /&gt;
&lt;br /&gt;
These editions will have the following components:&lt;br /&gt;
&lt;br /&gt;
* '''One or more baseline print editions available as image books''': At least one print edition should be available as an electronic source to which readers can refer if they feel that they have detected a data entry or formatting error. Everything necessary for representing at least one core edition in a tagged file should be available to the community. Given the demands of publishers, these may not be the most up-to-date editions of an author but they are intended as a starting point.  All such texts should, of course, have OCR generated searchable text.  If the original source texts have page numbers, then these should be encoded and citable.&lt;br /&gt;
&lt;br /&gt;
* '''A flexible editing environment which allows user  communities to improve the current document''':  Electronic documents are by nature dynamic and can evolve over time. Where print editions constitute end points of a long stage of development, electronic editions can serve as starting points to on-going development. Initial tasks may focus on correcting OCR errors, adding structural markup and other basic chores.  Ultimately, however, users will want to associate higher level annotations (e.g., specifying that a given &amp;quot;Salamis&amp;quot; is the Salamis in Cyprus rather than near Athens, or indicating that &amp;quot;faciam&amp;quot; is a subjunctive rather than a future, etc.).  Examples of decentralized editing environments that link transcriptions with images of the source pages include [[http://www.pgdp.net/ Distributed Proofreaders]] program of [[http://www.gutenberg.org/ Project Gutenberg]] and the [[http://www.ccel.org/help/facsim/ Digital Facsimile Editions]] of the [[http://www.ccel.org/ Christian Classics Ethereal Library]] ,&lt;br /&gt;
&lt;br /&gt;
* '''A tagged transcript of one or more print editions''':  This should include everything from the original edition, including introduction, textual notes, commentary, index, and any other materials from the source book. At this stage, the idioyncratic line breaks of particular editions should be preserved if the textual notes, commentary or other parts of the book use these line breaks for internal citations. All citations should be tagged and activated: e.g., wherever the text refers to &amp;quot;page 132 line 18&amp;quot; or &amp;quot;chapter 44, line 8&amp;quot;, these expressions should be converted into active links. Textual notes should appear as simple notes and placed within the body of the source texts. This version serves as a temporary work space and should yield to the following stage. It should become the official representation of the original print edition. The [[http://www.uni-mannheim.de/mateo/camenahtdocs/camenahist.html | Camena project]] &lt;br /&gt;
&lt;br /&gt;
* '''Fully interpreted electronic version of the print text''':  While many documents may be complete at this stage, textual notes in critical editions should be converted from human readable descriptions into machine interpretable operations. Thus, readers should be able to view the text as it appears in any given manuscript, view places where any two witnesses disagree with one another, and see analyses of how far different versions of the text differ from one another. This version of the text should become the default and replace the tagged transcript.  &lt;br /&gt;
&lt;br /&gt;
* '''One or more translations''': Translations should have provenance so that readers know whether or not they reflect the online version of the source text.  Translations should, like the editions, include all accompanying materials including introduction, notes, appendices, indices etc.  Like editions, translations should be available both as image books so that readers can, when in doubt, consult the print originals.&lt;br /&gt;
&lt;br /&gt;
The fully interpreted electronic edition should then provide a starting for subsequent edits. The text could evolve in a variety of ways.&lt;br /&gt;
&lt;br /&gt;
* '''Systematic collations''':  Individuals may systematically collate the source text against new witnesses (e.g., manuscripts, papyri, etc.) or new editions (where editors may have derived different conclusions and printed different readings).  All additions must be transparent: thus, we cannot record new readings without providing their jusification.  We can add new readings from manuscripts and other sources without necessarily changing the text. We cannot record different editorial decisions without encoding the source for those decisions.&lt;br /&gt;
&lt;br /&gt;
* '''Coordination of edition, textual notes and at least one reference translation''':  We may have multiple translations reflecting multiple editions of a given work but we should have at least one edition that reflects the content of the base edition and that can represent the different readings in the textual notes. Readers should always be able to see how (or whether) any given reading affects the main translation.  Readers should thus be able to filter out those notes which do not impact upon the English and to analyze the ''aggregate impact'' of choosing one version over another. While small changes of language can have dramatic effects upon meaning, readers should be able to gauge the overall significance of different version.&lt;br /&gt;
&lt;br /&gt;
A great deal more can be done with and for any given edition: we can add (and have added) commentaries, linguistic markup, links to scholarship and other supplementary materials. At the same time, the  but the above represents a basic level of documentation towards which producers should, in our view, aim.&lt;br /&gt;
&lt;br /&gt;
==Editorial Conventions==&lt;br /&gt;
&lt;br /&gt;
* '''Changes from the source text to the transcription''':  The Text Encoding Initiative provides tags to record locations where editors have corrected errors in the source, expanded abbreviations, and regularized spellings.&lt;br /&gt;
&lt;br /&gt;
* '''Markup stylesheet''':  The Text Encoding Initiative offers a range of tags but is not universal. In some cases, we will need to extend the TEI. In other cases, the TEI allows us to represent the same information in different ways: e.g., &amp;lt;name type=&amp;quot;place&amp;quot;&amp;gt;Rome&amp;lt;/name&amp;gt; or &amp;lt;placeName&amp;gt;Rome&amp;lt;/placeName&amp;gt;. The more homogeneous editions can be, the easier it will be to search, browse and maintain them over time.  Perseus has evolved conventions of its own over time, but even within Perseus different projects has approached the same problems differently. We need documentation that is more extensive and that can be updated in real time (e.g., a Wiki).&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2024</id>
		<title>OSCE Crane Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2024"/>
		<updated>2007-01-29T15:25:46Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;We need a comprehensive library of initial editions, openly accessible and freely available for re-use in derivative works.  This paper outlines one strategy for starting with print editions and moving into a more purely digital stage. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are two components to this argument, both on the Perseus Development Wiki:&lt;br /&gt;
&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Open_Content_Scholarly_Sources&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Next_generation_electronic_editions&lt;br /&gt;
&lt;br /&gt;
=Open Content Scholarly Sources=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Google, Microsoft, Yahoo and other internet giants are now creating digital libraries designed to become more comprehensive than any academic library in human history. The current philosophy of these efforts stresses open access.  The creators of the Google project and the Internet Archive have expressed a dedication to open access.  Open access also maximizes the potential audience and thus  reinforces the advertising based business model on which these internet giants have founded their library efforts.&lt;br /&gt;
&lt;br /&gt;
The funders, however, retain varying rights to their work.  Google, for example, has now made available full PDF image books of public domain documents but it asserts proprietary rights over the page images and does not allow third parties to apply their own OCR or document recognition software.  The Open Content Alliance in principle encourages its partners to share everything but individual funders can impose their own restrictions on what they submit to OCA.&lt;br /&gt;
&lt;br /&gt;
We are therefore creating a completely open source library of core resources such as reference works and critical editions.  Our goal is to provide access to foundational information and also a foundation of materials that subsequent authors can modify, update, expand, and otherwise improve.  &lt;br /&gt;
&lt;br /&gt;
Our selection criteria differ from those of the print world.  A print library picks the best, most up-to-date documents available, knowing that print publications can be replaced but cannot change.  In a true digital library, documents can be dynamic and evolve in real time.  A recent encyclopedia will, presumably, be superior to another that is a century old.  But if the century-old encyclopedia can be freely updated and attracts high quality modifications, it can evolve and become more up-to-date and more authoritative than its frozen print counterpart.&lt;br /&gt;
&lt;br /&gt;
The classics component of the Open Content Scholarly Library that Perseus is helping create is being made available under a sharalike/attribution/non-commercial Creative Commons license. It contains the following:&lt;br /&gt;
&lt;br /&gt;
* Source texts of Greek and Latin:  We have already released c. 8.5 million words of Greek and Latin source texts in TEI-compliant XML.  We have also digitized several hundred volumes of source texts.  These will be available as image books with searchable OCR and, where feasible, XML transcriptions.  Unlike most previous collections, this includes, where possible, multiple editions as well as traditional lists of places where on-line editions differ from editions not yet available on-line.&lt;br /&gt;
&lt;br /&gt;
* Lexica of Greek and Latin:  These include major works such as the Liddell Scott Jones Greek-English Lexicon and the Lewis and Short Latin-English Lexicon as well as more specialized works such as Cunliff's Homeric Lexicon.&lt;br /&gt;
&lt;br /&gt;
* Grammars:  These include student grammars such as Smyth's Greek Grammar and Allen and Greenough's Latin Grammar as well as extensive scholarly works such as Kühner-Gerth.&lt;br /&gt;
&lt;br /&gt;
* Commentaries:  These include scholarly editions as well as school commentaries with linguistic annotations.  Commentaries lend themselves particularly well to electronic publication, which is optimally designed for the production, display and management of annotations.&lt;br /&gt;
&lt;br /&gt;
* Tools:  These include Morpheus, the morphological analysis system developed in the late 1980s and still providing useful analyses of Greek and Latin words.  More importantly, this will include the databases with c. 100,000 stems and endings, mined from many sources,  and of potential use to third party morphological analysis systems.  All the core tools in the Perseus Digital Library have been rewritten in Java and will be available as additions to institutional repositories such as Fedora and any developers.&lt;br /&gt;
&lt;br /&gt;
* FRBR Catalog Records for source texts:  Large projects such as dictionaries and text corpora have developed checklists of editions which they have used.  We are creating a modern catalog that builds on prior work (e.g., we use the author and work numbers developed by the TLG and PHI for Greek and Latin author) but provides an extensible architecture that can manage multiple editions, translations (e.g, English, French and German translations of an author), multiple versions of the same editions (e.g., an image book vs. a TEI transcription), multiple citation schemes (e.g., sections vs. chapters in Cicero)..&lt;br /&gt;
&lt;br /&gt;
* Authority lists of people, places, dictionary entries, organizations, etc.  The reference works that we are producing lay the foundation for a comprehensive, extensible set of authority lists -- shared names with which we can uniquely identify particular people, places dictionary entries, organizations, etc.  While such authority lists are difficult -- experts may differ on which Sallust a particular passage designates and will never all agree on which when we have a dictionary word with two distinct meanings vs. two distinct dictionary words.  Nevertheless, all scholarly work depends upon the entries that appear in our reference works and electronic authority lists, however imperfect, are essential tools for large digital collections.&lt;br /&gt;
&lt;br /&gt;
Users include:&lt;br /&gt;
&lt;br /&gt;
* Service providers:  we would like to see the data released useful to as many groups and in as many ways as possible.  Thus, we hope to see the content in Google and the Open Content Alliance as well as scholarly environment such as Chicago's Philologic and the Canadian TAPOR project.&lt;br /&gt;
&lt;br /&gt;
* Experts in the field:  we hope that experts in the field will revise and extend every document that we release, with versioning systems tracking these changes and allowing experts to get the credit which they deserve for the work that they do.&lt;br /&gt;
&lt;br /&gt;
* General students of the field:  we hope to see Wiki based commentaries in which non-experts working their way through a text pose and answer the questions which puzzle them.&lt;br /&gt;
&lt;br /&gt;
* Advanced service developers:  we hope that developers will mine the encylopedias to drive their named entity identification systems (e.g., analyzer the articles in Smith's to determine which Alexander a particular document is discussing), sense disambiguation (e.g., which sense of a word in an on-line lexicon is in play in a  given passage), machine translation (e.g., mine the parallel texts and translations and the bilingual dictionaries so that a modern machine translation system can provide Greek/English, Latin/English translations etc.).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Next Generation Editions=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Summary==&lt;br /&gt;
&lt;br /&gt;
We propose a new generation of primary source corpora that are:&lt;br /&gt;
&lt;br /&gt;
* ''Permanent'':  The texts are not leased from a commercial vendor over a period of time but are permanently accessible, with reference copies and versioning information stored in multiple institutional repositories for long term preservation as well as freely available.&lt;br /&gt;
&lt;br /&gt;
* ''Openly accessible'':  Cultural heritage primary sources in the public domain should be openly accessible to all.  If it is necessary to restrict access to newly digitized materials in order to secure funding, that restriction should be clearly delimited and as short as possible: e.g., those who fund digitization may have exclusive access for five years before the texts are released for universal access.&lt;br /&gt;
&lt;br /&gt;
* ''Multi-versioned'':  The texts themselves can be updated, with all changes tracked in a versioning system. Alternately, the texts provide a stable foundation for standoff markup representing textual variants or advanced interpretation.&lt;br /&gt;
&lt;br /&gt;
* ''Paid for and maintained by academic libraries'':  While external funding may help begin this process, library acquisition budgets are the long term source of funding for costs such as data entry.  Libraries already pay for the production of digital resources by commercial, for-profit entitites, which restrict access to public domain content. The same library budgets can support open access databases built on public domain source materials.&lt;br /&gt;
&lt;br /&gt;
==Open Content Editions==&lt;br /&gt;
&lt;br /&gt;
The Perseus Project has released TEI conformant XML texts with 55 million words of American English, 13 million words of Latin and Greek source texts, and, for most of the Greek and Latin, corresponding English translations. These texts are available under a Creative Commons non-commercial license: they must be used with attribution; changes must be shared; they cannot be used as part of a commercial corpus.  Commercial entities can, however, freely design for profit services that add value to these openly accessible sources.&lt;br /&gt;
&lt;br /&gt;
While these source texts can freely circulate, they will also be part of the university's permanent institutional repository, thus providing a stable, long term home that will outlast any single project or contributor.&lt;br /&gt;
&lt;br /&gt;
The Greek and Latin corpus contains most of the major works of classical literature. The Perseus Latin Collection contains more than half of the classical corpus and that coverage will approach 100% over the course of 2006/2007.&lt;br /&gt;
&lt;br /&gt;
Working wish lists for [[Latin_wishlist | Latin]] and [[Greek_wishlist | Greek]] are available for comment/addition.&lt;br /&gt;
&lt;br /&gt;
==Next Steps==&lt;br /&gt;
&lt;br /&gt;
* ''Links to page images of paper sources'': With Google Library, the Open Content Alliance and Europe's i2010 we see the emerge of digital libraries with millions of books with high quality page images.  Copyright restrictions complicate these efforts but solid versions of most major authors are available in the public domain.  &lt;br /&gt;
&lt;br /&gt;
* ''Full coverage including apparatus, introduction, indices etc.'': Digital editions can include all information in the print text and not only the text.&lt;br /&gt;
&lt;br /&gt;
* ''Semantic markup'':  Markup should reflect meaning and not only appearence.&lt;br /&gt;
&lt;br /&gt;
* ''Collation of multiple sources'': Semantic markup, if applied to the apparatus criticus, should result in machine actionable data, allowing users to compare multiple versions of the same text.&lt;br /&gt;
&lt;br /&gt;
=Building a digital library of primary sources=&lt;br /&gt;
&lt;br /&gt;
The first generation of large scale, on-line text corpora provided transcriptions of primary materials. Projects such as the TLG and the ''Packard Humanities Institute Latin CD ROM'' carefully document the copy texts on which their electronic versions depend. The provenance of texts in the extensive Latin corpus at [[http://www.thelatinlibrary.com the Latin Library]] is often unclear, with volunteer transcribers blending texts and leaving no trail of their changes.&lt;br /&gt;
&lt;br /&gt;
We now see vast libraries with millions of digital books either in active development or in advanced stages of planning. Most, if not all, of books now in the public domain will be available in electronic form. Rights disputes may slow digitization of the rest but Google's aggressive stance may, at worst, make publishers more open to pursuing an acceptable arrangement with Yahoo, Microsoft and others now entering this market. In this model, readers view scanned page images but search text automatically generated by OCR software. For many purposes, such &amp;quot;image front&amp;quot; collections are quite effective:  narrative prose printed since the mid 19th century lends itself very well to commercial OCR. &lt;br /&gt;
&lt;br /&gt;
Image books do not, however, provide the accuracy and detailed markup that users of primary sources expect.  Text collections with millions of words will contain errors for some time after publication but we want to minimize these errors.  We want to be able to identify pieces of texts by standard citation (e.g., &amp;quot;Liv. 3.22&amp;quot; should retrieve the text of Book 3, Chapter 22 of Livy's History of Rome. We also want text searches to be able to distinguish between primary text, textual notes and other annotations.&lt;br /&gt;
&lt;br /&gt;
The following describes an approach of adding structure to digital image books of primary sources. &lt;br /&gt;
&lt;br /&gt;
* '''Collate an image-front edition with searchable, OCR generated text against other electronic editions of the same text''':  Many classical texts are available on-line in at least one edition.  Once we have scanned a new edition and generated text with OCR, we can collate the OCR against pre-existing electronic editions with surprisingly little effort:  half of the word forms in a book length document are generally unique.  By comparing sequences of unique word forms in pre-existing text and new OCR, we can align use these sequences to align two texts.  In our experiments, we have found that we can immediately align one word in ten.  We can then compare the intervening sequence (on the average nine words long) to identify variations.  Variations include errors in data entry (whether in the OCR or in the pre-existing text), deliberate textual variations and non-textual elements such as headers and textual notes.  Where a variation involves one or two words and we cannot generate a morphological analysis for the new words, then we probably have an error.  If we can generate morphological analyses for the variants in both versions, then we probably have deliberate variations. If we have extra text at the start or end of pages, we probably have headers or notes.  If we have extraneous numbers in the source texts, then these are probably citations.  Even if we are working with a pre-existing text that contains errors or whose provenance is unknown, we can often use this text to determine that page 123 of edition X contains book 3, lines 33 to 57 of a given edition, thus making the OCR generated edition citable by chapter and verse.  If we have an accurate pre-existing text without textual notes, we can compare the results of searching that text with searching the relevant sections of the OCR-generated text.  If a word shows up in the OCR generated text but not in the pre-existing text, then we probably have a match in the textual notes.  While OCR quality varies from text to text and from language to language, we can thus produce initial searches of the textual notes with relatively little effort.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* '''Create an accurate, carefully marked up transcription of a print original''':  In this stage, we aim to capture every character on the printed source page and to represent the logical structure of the document: ideally, the text should be sufficiently well encoded that readers could ask to compare the readings reported by different witnesses (e.g., &amp;quot;display places where M differs from P and provide a statistical analysis of how often these sources differ&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
* '''Create a new edition, traceable to its print original, but able to represent multiple versions representing multiple witnesses and multiple new editions''':  The source text becomes the foundation multiple new editions. Once we have a carefully constructed source text, we can generate as many variations as we like. The source may -- and probably willl -- soon recede into the background but will provide a stable framework whereby we can compare all subsequent editions.&lt;br /&gt;
&lt;br /&gt;
==Choice of source texts==&lt;br /&gt;
&lt;br /&gt;
If we were creating a traditional scholarly text collection, we would want the most up-to-date current editions, In this model, however, we need to balance the authority of the source text against their ability to evolve into richer editions encoding multiple sources and editorial versions. If a serious user community exists, if it values additions to textual scholarship and if it has reasonable technical and editorial mechanisms to enhance its editions, living older texts will overtake any static edition. &lt;br /&gt;
&lt;br /&gt;
The two extreme cases are:&lt;br /&gt;
&lt;br /&gt;
* '''Recent editions that may be at present the most comprehensive and authoritative but cannot be augmented'''.  Whether or not publishers can claim copyright to scholarly reconstructions of primary source materials, editors should certainly have the right to prepare a single version of an edition to which no one else can make changes.&lt;br /&gt;
&lt;br /&gt;
* '''Editions that are are designed to accept -- and document -- new witnesses and editorial decisions'''.  In the simplest case, this would include careful transcriptions of public domain editions. A mature versioning environment tracks each addition and can reconstruct any given version. Versioning software analyzes new transcriptions of witnesses and editions.&lt;br /&gt;
&lt;br /&gt;
In practical terms, the best accessible editions will usually be the best public domain editions, with a few editors initially offering their work. It would probably be best to use public domain editions as initial test cases and to use these to work out inevitable bugs and organizational issues. Current editors may, in any event, find it as easy to add their changes to a well-structured public domain edition than to supervise the markup of their own print editions or the word processing files from which they derive. &lt;br /&gt;
&lt;br /&gt;
==Sources for Images of Print Editions==&lt;br /&gt;
&lt;br /&gt;
* '''Local book scanning''':  A number of institutions (including Perseus) can scan limited numbers of books.  Sheet feeder scanners can process c. 1,000 pages an hour but they require that the source books be disbound. Look down scanners do not damage the source materials and are slower but they still can process 100+ pages in an hour and are useful for smaller jobs.&lt;br /&gt;
&lt;br /&gt;
* '''Large book scanning projects''':  There are now a number of projects that are scanning very large numbers of books.  [[http://books.google.com/ Google Print]] has begun assembling a library that will include tens of millions of books.  Google plans to make the library openly searchable and will return copies of the scanned books to the participating research libraries, but it is not clear how easily other developers will be able to get their own copies on which to apply specialized OCR and content analysis. The [[http://www.opencontentalliance.org/ Open Content Alliance]] constitutes a growing consortium of content providers and third party service providers.  Led by the [[http://www.archive.org Internet Archive]], the OCA has begun making high resolution image books available and is providing [[http://www.archive.org/details/texts a clearing house for related efforts]] such as the [[http://www.archive.org/details/millionbooks Million Book Project]]. The newer robotic scanners do a very good job of turning pages -- even pausing to let one page clinging to another drop off as they turn. They seem to be able to process more than 1,000 pages an hour and thus to exceed the best throughput we have achieved running disbound pages through a sheet feeder -- very impressive. The drawback is that these robots are expensive: the most recent ones from Kirtas cost $140,000-$180,000. You need to get high volume to justify this enconomically. If you can get 1,200 pages an hour, then you might do three books an hour and 120 books a week. That would be about 6,000 books a year -- or about $30-$40 per book for the hardware investement alone exclusive of labor and postprocessing. If you consider 100 hours/week over two years and thus 300 400-page books a week, you get to  15,000 a year and the price clearly comes down. Run that over three years with 45,000 books and the cost becomes manageable.&lt;br /&gt;
&lt;br /&gt;
In practice, editors interested in a few authors can get their source materials scanned at a variety of locations.  Larger series (such as the Patrologia Latina) are well suited to the large scale book scanning projects. The biggest problem involves getting copies of the desired books to a location where large scale scanning is taking place.  The California Digital Library, which serves the UC system, and the University of Toronto were early on partners in OCA and between them would have virtually every edition of Greek or Latin texts published in the past two centuries. An [[http://www.libraryjournal.com/article/CA6277402.html article in LibraryJournal from November 1, 2005]] reports that the European Commission is planning a large digital library project of its own that will focus initially on the public domain.&lt;br /&gt;
&lt;br /&gt;
==Components of next generation electronic editions==&lt;br /&gt;
&lt;br /&gt;
These editions will have the following components:&lt;br /&gt;
&lt;br /&gt;
* '''One or more baseline print editions available as image books''': At least one print edition should be available as an electronic source to which readers can refer if they feel that they have detected a data entry or formatting error. Everything necessary for representing at least one core edition in a tagged file should be available to the community. Given the demands of publishers, these may not be the most up-to-date editions of an author but they are intended as a starting point.  All such texts should, of course, have OCR generated searchable text.  If the original source texts have page numbers, then these should be encoded and citable.&lt;br /&gt;
&lt;br /&gt;
* '''A flexible editing environment which allows user  communities to improve the current document''':  Electronic documents are by nature dynamic and can evolve over time. Where print editions constitute end points of a long stage of development, electronic editions can serve as starting points to on-going development. Initial tasks may focus on correcting OCR errors, adding structural markup and other basic chores.  Ultimately, however, users will want to associate higher level annotations (e.g., specifying that a given &amp;quot;Salamis&amp;quot; is the Salamis in Cyprus rather than near Athens, or indicating that &amp;quot;faciam&amp;quot; is a subjunctive rather than a future, etc.).  Examples of decentralized editing environments that link transcriptions with images of the source pages include [[http://www.pgdp.net/ Distributed Proofreaders]] program of [[http://www.gutenberg.org/ Project Gutenberg]] and the [[http://www.ccel.org/help/facsim/ Digital Facsimile Editions]] of the [[http://www.ccel.org/ Christian Classics Ethereal Library]] ,&lt;br /&gt;
&lt;br /&gt;
* '''A tagged transcript of one or more print editions''':  This should include everything from the original edition, including introduction, textual notes, commentary, index, and any other materials from the source book. At this stage, the idioyncratic line breaks of particular editions should be preserved if the textual notes, commentary or other parts of the book use these line breaks for internal citations. All citations should be tagged and activated: e.g., wherever the text refers to &amp;quot;page 132 line 18&amp;quot; or &amp;quot;chapter 44, line 8&amp;quot;, these expressions should be converted into active links. Textual notes should appear as simple notes and placed within the body of the source texts. This version serves as a temporary work space and should yield to the following stage. It should become the official representation of the original print edition. The [[http://www.uni-mannheim.de/mateo/camenahtdocs/camenahist.html | Camena project]] &lt;br /&gt;
&lt;br /&gt;
* '''Fully interpreted electronic version of the print text''':  While many documents may be complete at this stage, textual notes in critical editions should be converted from human readable descriptions into machine interpretable operations. Thus, readers should be able to view the text as it appears in any given manuscript, view places where any two witnesses disagree with one another, and see analyses of how far different versions of the text differ from one another. This version of the text should become the default and replace the tagged transcript.  &lt;br /&gt;
&lt;br /&gt;
* '''One or more translations''': Translations should have provenance so that readers know whether or not they reflect the online version of the source text.  Translations should, like the editions, include all accompanying materials including introduction, notes, appendices, indices etc.  Like editions, translations should be available both as image books so that readers can, when in doubt, consult the print originals.&lt;br /&gt;
&lt;br /&gt;
The fully interpreted electronic edition should then provide a starting for subsequent edits. The text could evolve in a variety of ways.&lt;br /&gt;
&lt;br /&gt;
* '''Systematic collations''':  Individuals may systematically collate the source text against new witnesses (e.g., manuscripts, papyri, etc.) or new editions (where editors may have derived different conclusions and printed different readings).  All additions must be transparent: thus, we cannot record new readings without providing their jusification.  We can add new readings from manuscripts and other sources without necessarily changing the text. We cannot record different editorial decisions without encoding the source for those decisions.&lt;br /&gt;
&lt;br /&gt;
* '''Coordination of edition, textual notes and at least one reference translation''':  We may have multiple translations reflecting multiple editions of a given work but we should have at least one edition that reflects the content of the base edition and that can represent the different readings in the textual notes. Readers should always be able to see how (or whether) any given reading affects the main translation.  Readers should thus be able to filter out those notes which do not impact upon the English and to analyze the ''aggregate impact'' of choosing one version over another. While small changes of language can have dramatic effects upon meaning, readers should be able to gauge the overall significance of different version.&lt;br /&gt;
&lt;br /&gt;
A great deal more can be done with and for any given edition: we can add (and have added) commentaries, linguistic markup, links to scholarship and other supplementary materials. At the same time, the  but the above represents a basic level of documentation towards which producers should, in our view, aim.&lt;br /&gt;
&lt;br /&gt;
==Editorial Conventions==&lt;br /&gt;
&lt;br /&gt;
* '''Changes from the source text to the transcription''':  The Text Encoding Initiative provides tags to record locations where editors have corrected errors in the source, expanded abbreviations, and regularized spellings.&lt;br /&gt;
&lt;br /&gt;
* '''Markup stylesheet''':  The Text Encoding Initiative offers a range of tags but is not universal. In some cases, we will need to extend the TEI. In other cases, the TEI allows us to represent the same information in different ways: e.g., &amp;lt;name type=&amp;quot;place&amp;quot;&amp;gt;Rome&amp;lt;/name&amp;gt; or &amp;lt;placeName&amp;gt;Rome&amp;lt;/placeName&amp;gt;. The more homogeneous editions can be, the easier it will be to search, browse and maintain them over time.  Perseus has evolved conventions of its own over time, but even within Perseus different projects has approached the same problems differently. We need documentation that is more extensive and that can be updated in real time (e.g., a Wiki).&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2023</id>
		<title>OSCE Crane Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2023"/>
		<updated>2007-01-29T15:24:00Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;We need a comprehensive library of initial editions, openly accessible and freely available for re-use in derivative works.  This paper outlines one strategy for starting with print editions and moving into a more purely digital stage. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are two components to this argument, both on the Perseus Development Wiki:&lt;br /&gt;
&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Open_Content_Scholarly_Sources&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Next_generation_electronic_editions&lt;br /&gt;
&lt;br /&gt;
=Open Content Scholarly Sources=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Google, Microsoft, Yahoo and other internet giants are now creating digital libraries designed to become more comprehensive than any academic library in human history. The current philosophy of these efforts stresses open access.  The creators of the Google project and the Internet Archive have expressed a dedication to open access.  Open access also maximizes the potential audience and thus  reinforces the advertising based business model on which these internet giants have founded their library efforts.&lt;br /&gt;
&lt;br /&gt;
The funders, however, retain varying rights to their work.  Google, for example, has now made available full PDF image books of public domain documents but it asserts proprietary rights over the page images and does not allow third parties to apply their own OCR or document recognition software.  The Open Content Alliance in principle encourages its partners to share everything but individual funders can impose their own restrictions on what they submit to OCA.&lt;br /&gt;
&lt;br /&gt;
We are therefore creating a completely open source library of core resources such as reference works and critical editions.  Our goal is to provide access to foundational information and also a foundation of materials that subsequent authors can modify, update, expand, and otherwise improve.  &lt;br /&gt;
&lt;br /&gt;
Our selection criteria differ from those of the print world.  A print library picks the best, most up-to-date documents available, knowing that print publications can be replaced but cannot change.  In a true digital library, documents can be dynamic and evolve in real time.  A recent encyclopedia will, presumably, be superior to another that is a century old.  But if the century-old encyclopedia can be freely updated and attracts high quality modifications, it can evolve and become more up-to-date and more authoritative than its frozen print counterpart.&lt;br /&gt;
&lt;br /&gt;
The classics component of the Open Content Scholarly Library that Perseus is helping create is being made available under a sharalike/attribution/non-commercial Creative Commons license. It contains the following:&lt;br /&gt;
&lt;br /&gt;
* Source texts of Greek and Latin:  We have already released c. 8.5 million words of Greek and Latin source texts in TEI-compliant XML.  We have also digitized several hundred volumes of source texts.  These will be available as image books with searchable OCR and, where feasible, XML transcriptions.  Unlike most previous collections, this includes, where possible, multiple editions as well as traditional lists of places where on-line editions differ from editions not yet available on-line.&lt;br /&gt;
&lt;br /&gt;
* Lexica of Greek and Latin:  These include major works such as the Liddell Scott Jones Greek-English Lexicon and the Lewis and Short Latin-English Lexicon as well as more specialized works such as Cunliff's Homeric Lexicon.&lt;br /&gt;
&lt;br /&gt;
* Grammars:  These include student grammars such as Smyth's Greek Grammar and Allen and Greenough's Latin Grammar as well as extensive scholarly works such as Kühner-Gerth.&lt;br /&gt;
&lt;br /&gt;
* Commentaries:  These include scholarly editions as well as school commentaries with linguistic annotations.  Commentaries lend themselves particularly well to electronic publication, which is optimally designed for the production, display and management of annotations.&lt;br /&gt;
&lt;br /&gt;
* Tools:  These include Morpheus, the morphological analysis system developed in the late 1980s and still providing useful analyses of Greek and Latin words.  More importantly, this will include the databases with c. 100,000 stems and endings, mined from many sources,  and of potential use to third party morphological analysis systems.  All the core tools in the Perseus Digital Library have been rewritten in Java and will be available as additions to institutional repositories such as Fedora and any developers.&lt;br /&gt;
&lt;br /&gt;
* FRBR Catalog Records for source texts:  Large projects such as dictionaries and text corpora have developed checklists of editions which they have used.  We are creating a modern catalog that builds on prior work (e.g., we use the author and work numbers developed by the TLG and PHI for Greek and Latin author) but provides an extensible architecture that can manage multiple editions, translations (e.g, English, French and German translations of an author), multiple versions of the same editions (e.g., an image book vs. a TEI transcription), multiple citation schemes (e.g., sections vs. chapters in Cicero)..&lt;br /&gt;
&lt;br /&gt;
:* Authority lists of people, places, dictionary entries, organizations, etc.  The reference works that we are producing lay the foundation for a comprehensive, extensible set of authority lists -- shared names with which we can uniquely identify particular people, places dictionary entries, organizations, etc.  While such authority lists are difficult -- experts may differ on which Sallust a particular passage designates and will never all agree on which when we have a dictionary word with two distinct meanings vs. two distinct dictionary words.  Nevertheless, all scholarly work depends upon the entries that appear in our reference works and electronic authority lists, however imperfect, are essential tools for large digital collections.&lt;br /&gt;
&lt;br /&gt;
Users include:&lt;br /&gt;
&lt;br /&gt;
:* Service providers:  we would like to see the data released useful to as many groups and in as many ways as possible.  Thus, we hope to see the content in Google and the Open Content Alliance as well as scholarly environment such as Chicago's Philologic and the Canadian TAPOR project.&lt;br /&gt;
&lt;br /&gt;
:* Experts in the field:  we hope that experts in the field will revise and extend every document that we release, with versioning systems tracking these changes and allowing experts to get the credit which they deserve for the work that they do.&lt;br /&gt;
&lt;br /&gt;
:* General students of the field:  we hope to see Wiki based commentaries in which non-experts working their way through a text pose and answer the questions which puzzle them.&lt;br /&gt;
&lt;br /&gt;
:* Advanced service developers:  we hope that developers will mine the encylopedias to drive their named entity identification systems (e.g., analyzer the articles in Smith's to determine which Alexander a particular document is discussing), sense disambiguation (e.g., which sense of a word in an on-line lexicon is in play in a  given passage), machine translation (e.g., mine the parallel texts and translations and the bilingual dictionaries so that a modern machine translation system can provide Greek/English, Latin/English translations etc.).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Next Generation Editions ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Summary=&lt;br /&gt;
&lt;br /&gt;
We propose a new generation of primary source corpora that are:&lt;br /&gt;
&lt;br /&gt;
: * ''Permanent'':  The texts are not leased from a commercial vendor over a period of time but are permanently accessible, with reference copies and versioning information stored in multiple institutional repositories for long term preservation as well as freely available.&lt;br /&gt;
&lt;br /&gt;
: * ''Openly accessible'':  Cultural heritage primary sources in the public domain should be openly accessible to all.  If it is necessary to restrict access to newly digitized materials in order to secure funding, that restriction should be clearly delimited and as short as possible: e.g., those who fund digitization may have exclusive access for five years before the texts are released for universal access.&lt;br /&gt;
&lt;br /&gt;
: * ''Multi-versioned'':  The texts themselves can be updated, with all changes tracked in a versioning system. Alternately, the texts provide a stable foundation for standoff markup representing textual variants or advanced interpretation.&lt;br /&gt;
&lt;br /&gt;
: * ''Paid for and maintained by academic libraries'':  While external funding may help begin this process, library acquisition budgets are the long term source of funding for costs such as data entry.  Libraries already pay for the production of digital resources by commercial, for-profit entitites, which restrict access to public domain content. The same library budgets can support open access databases built on public domain source materials.&lt;br /&gt;
&lt;br /&gt;
=Open Content Editions=&lt;br /&gt;
&lt;br /&gt;
The Perseus Project has released TEI conformant XML texts with 55 million words of American English, 13 million words of Latin and Greek source texts, and, for most of the Greek and Latin, corresponding English translations. These texts are available under a Creative Commons non-commercial license: they must be used with attribution; changes must be shared; they cannot be used as part of a commercial corpus.  Commercial entities can, however, freely design for profit services that add value to these openly accessible sources.&lt;br /&gt;
&lt;br /&gt;
While these source texts can freely circulate, they will also be part of the university's permanent institutional repository, thus providing a stable, long term home that will outlast any single project or contributor.&lt;br /&gt;
&lt;br /&gt;
The Greek and Latin corpus contains most of the major works of classical literature. The Perseus Latin Collection contains more than half of the classical corpus and that coverage will approach 100% over the course of 2006/2007.&lt;br /&gt;
&lt;br /&gt;
Working wish lists for [[Latin_wishlist | Latin]] and [[Greek_wishlist | Greek]] are available for comment/addition.&lt;br /&gt;
&lt;br /&gt;
=Next Steps=&lt;br /&gt;
&lt;br /&gt;
* ''Links to page images of paper sources'': With Google Library, the Open Content Alliance and Europe's i2010 we see the emerge of digital libraries with millions of books with high quality page images.  Copyright restrictions complicate these efforts but solid versions of most major authors are available in the public domain.  &lt;br /&gt;
&lt;br /&gt;
* ''Full coverage including apparatus, introduction, indices etc.'': Digital editions can include all information in the print text and not only the text.&lt;br /&gt;
&lt;br /&gt;
* ''Semantic markup'':  Markup should reflect meaning and not only appearence.&lt;br /&gt;
&lt;br /&gt;
* ''Collation of multiple sources'': Semantic markup, if applied to the apparatus criticus, should result in machine actionable data, allowing users to compare multiple versions of the same text.&lt;br /&gt;
&lt;br /&gt;
=Building a digital library of primary sources=&lt;br /&gt;
&lt;br /&gt;
The first generation of large scale, on-line text corpora provided transcriptions of primary materials. Projects such as the TLG and the ''Packard Humanities Institute Latin CD ROM'' carefully document the copy texts on which their electronic versions depend. The provenance of texts in the extensive Latin corpus at [[http://www.thelatinlibrary.com the Latin Library]] is often unclear, with volunteer transcribers blending texts and leaving no trail of their changes.&lt;br /&gt;
&lt;br /&gt;
We now see vast libraries with millions of digital books either in active development or in advanced stages of planning. Most, if not all, of books now in the public domain will be available in electronic form. Rights disputes may slow digitization of the rest but Google's aggressive stance may, at worst, make publishers more open to pursuing an acceptable arrangement with Yahoo, Microsoft and others now entering this market. In this model, readers view scanned page images but search text automatically generated by OCR software. For many purposes, such &amp;quot;image front&amp;quot; collections are quite effective:  narrative prose printed since the mid 19th century lends itself very well to commercial OCR. &lt;br /&gt;
&lt;br /&gt;
Image books do not, however, provide the accuracy and detailed markup that users of primary sources expect.  Text collections with millions of words will contain errors for some time after publication but we want to minimize these errors.  We want to be able to identify pieces of texts by standard citation (e.g., &amp;quot;Liv. 3.22&amp;quot; should retrieve the text of Book 3, Chapter 22 of Livy's History of Rome. We also want text searches to be able to distinguish between primary text, textual notes and other annotations.&lt;br /&gt;
&lt;br /&gt;
The following describes an approach of adding structure to digital image books of primary sources. &lt;br /&gt;
&lt;br /&gt;
* '''Collate an image-front edition with searchable, OCR generated text against other electronic editions of the same text''':  Many classical texts are available on-line in at least one edition.  Once we have scanned a new edition and generated text with OCR, we can collate the OCR against pre-existing electronic editions with surprisingly little effort:  half of the word forms in a book length document are generally unique.  By comparing sequences of unique word forms in pre-existing text and new OCR, we can align use these sequences to align two texts.  In our experiments, we have found that we can immediately align one word in ten.  We can then compare the intervening sequence (on the average nine words long) to identify variations.  Variations include errors in data entry (whether in the OCR or in the pre-existing text), deliberate textual variations and non-textual elements such as headers and textual notes.  Where a variation involves one or two words and we cannot generate a morphological analysis for the new words, then we probably have an error.  If we can generate morphological analyses for the variants in both versions, then we probably have deliberate variations. If we have extra text at the start or end of pages, we probably have headers or notes.  If we have extraneous numbers in the source texts, then these are probably citations.  Even if we are working with a pre-existing text that contains errors or whose provenance is unknown, we can often use this text to determine that page 123 of edition X contains book 3, lines 33 to 57 of a given edition, thus making the OCR generated edition citable by chapter and verse.  If we have an accurate pre-existing text without textual notes, we can compare the results of searching that text with searching the relevant sections of the OCR-generated text.  If a word shows up in the OCR generated text but not in the pre-existing text, then we probably have a match in the textual notes.  While OCR quality varies from text to text and from language to language, we can thus produce initial searches of the textual notes with relatively little effort.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* '''Create an accurate, carefully marked up transcription of a print original''':  In this stage, we aim to capture every character on the printed source page and to represent the logical structure of the document: ideally, the text should be sufficiently well encoded that readers could ask to compare the readings reported by different witnesses (e.g., &amp;quot;display places where M differs from P and provide a statistical analysis of how often these sources differ&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
* '''Create a new edition, traceable to its print original, but able to represent multiple versions representing multiple witnesses and multiple new editions''':  The source text becomes the foundation multiple new editions. Once we have a carefully constructed source text, we can generate as many variations as we like. The source may -- and probably willl -- soon recede into the background but will provide a stable framework whereby we can compare all subsequent editions.&lt;br /&gt;
&lt;br /&gt;
====Choice of source texts====&lt;br /&gt;
&lt;br /&gt;
If we were creating a traditional scholarly text collection, we would want the most up-to-date current editions, In this model, however, we need to balance the authority of the source text against their ability to evolve into richer editions encoding multiple sources and editorial versions. If a serious user community exists, if it values additions to textual scholarship and if it has reasonable technical and editorial mechanisms to enhance its editions, living older texts will overtake any static edition. &lt;br /&gt;
&lt;br /&gt;
The two extreme cases are:&lt;br /&gt;
&lt;br /&gt;
* '''Recent editions that may be at present the most comprehensive and authoritative but cannot be augmented'''.  Whether or not publishers can claim copyright to scholarly reconstructions of primary source materials, editors should certainly have the right to prepare a single version of an edition to which no one else can make changes.&lt;br /&gt;
&lt;br /&gt;
* '''Editions that are are designed to accept -- and document -- new witnesses and editorial decisions'''.  In the simplest case, this would include careful transcriptions of public domain editions. A mature versioning environment tracks each addition and can reconstruct any given version. Versioning software analyzes new transcriptions of witnesses and editions.&lt;br /&gt;
&lt;br /&gt;
In practical terms, the best accessible editions will usually be the best public domain editions, with a few editors initially offering their work. It would probably be best to use public domain editions as initial test cases and to use these to work out inevitable bugs and organizational issues. Current editors may, in any event, find it as easy to add their changes to a well-structured public domain edition than to supervise the markup of their own print editions or the word processing files from which they derive. &lt;br /&gt;
&lt;br /&gt;
====Sources for Images of Print Editions====&lt;br /&gt;
&lt;br /&gt;
* '''Local book scanning''':  A number of institutions (including Perseus) can scan limited numbers of books.  Sheet feeder scanners can process c. 1,000 pages an hour but they require that the source books be disbound. Look down scanners do not damage the source materials and are slower but they still can process 100+ pages in an hour and are useful for smaller jobs.&lt;br /&gt;
&lt;br /&gt;
* '''Large book scanning projects''':  There are now a number of projects that are scanning very large numbers of books.  [[http://books.google.com/ Google Print]] has begun assembling a library that will include tens of millions of books.  Google plans to make the library openly searchable and will return copies of the scanned books to the participating research libraries, but it is not clear how easily other developers will be able to get their own copies on which to apply specialized OCR and content analysis. The [[http://www.opencontentalliance.org/ Open Content Alliance]] constitutes a growing consortium of content providers and third party service providers.  Led by the [[http://www.archive.org Internet Archive]], the OCA has begun making high resolution image books available and is providing [[http://www.archive.org/details/texts a clearing house for related efforts]] such as the [[http://www.archive.org/details/millionbooks Million Book Project]]. The newer robotic scanners do a very good job of turning pages -- even pausing to let one page clinging to another drop off as they turn. They seem to be able to process more than 1,000 pages an hour and thus to exceed the best throughput we have achieved running disbound pages through a sheet feeder -- very impressive. The drawback is that these robots are expensive: the most recent ones from Kirtas cost $140,000-$180,000. You need to get high volume to justify this enconomically. If you can get 1,200 pages an hour, then you might do three books an hour and 120 books a week. That would be about 6,000 books a year -- or about $30-$40 per book for the hardware investement alone exclusive of labor and postprocessing. If you consider 100 hours/week over two years and thus 300 400-page books a week, you get to  15,000 a year and the price clearly comes down. Run that over three years with 45,000 books and the cost becomes manageable.&lt;br /&gt;
&lt;br /&gt;
In practice, editors interested in a few authors can get their source materials scanned at a variety of locations.  Larger series (such as the Patrologia Latina) are well suited to the large scale book scanning projects. The biggest problem involves getting copies of the desired books to a location where large scale scanning is taking place.  The California Digital Library, which serves the UC system, and the University of Toronto were early on partners in OCA and between them would have virtually every edition of Greek or Latin texts published in the past two centuries. An [[http://www.libraryjournal.com/article/CA6277402.html article in LibraryJournal from November 1, 2005]] reports that the European Commission is planning a large digital library project of its own that will focus initially on the public domain.&lt;br /&gt;
&lt;br /&gt;
====Components of next generation electronic editions====&lt;br /&gt;
These editions will have the following components:&lt;br /&gt;
&lt;br /&gt;
* '''One or more baseline print editions available as image books''': At least one print edition should be available as an electronic source to which readers can refer if they feel that they have detected a data entry or formatting error. Everything necessary for representing at least one core edition in a tagged file should be available to the community. Given the demands of publishers, these may not be the most up-to-date editions of an author but they are intended as a starting point.  All such texts should, of course, have OCR generated searchable text.  If the original source texts have page numbers, then these should be encoded and citable.&lt;br /&gt;
&lt;br /&gt;
* '''A flexible editing environment which allows user  communities to improve the current document''':  Electronic documents are by nature dynamic and can evolve over time. Where print editions constitute end points of a long stage of development, electronic editions can serve as starting points to on-going development. Initial tasks may focus on correcting OCR errors, adding structural markup and other basic chores.  Ultimately, however, users will want to associate higher level annotations (e.g., specifying that a given &amp;quot;Salamis&amp;quot; is the Salamis in Cyprus rather than near Athens, or indicating that &amp;quot;faciam&amp;quot; is a subjunctive rather than a future, etc.).  Examples of decentralized editing environments that link transcriptions with images of the source pages include [[http://www.pgdp.net/ Distributed Proofreaders]] program of [[http://www.gutenberg.org/ Project Gutenberg]] and the [[http://www.ccel.org/help/facsim/ Digital Facsimile Editions]] of the [[http://www.ccel.org/ Christian Classics Ethereal Library]] ,&lt;br /&gt;
&lt;br /&gt;
* '''A tagged transcript of one or more print editions''':  This should include everything from the original edition, including introduction, textual notes, commentary, index, and any other materials from the source book. At this stage, the idioyncratic line breaks of particular editions should be preserved if the textual notes, commentary or other parts of the book use these line breaks for internal citations. All citations should be tagged and activated: e.g., wherever the text refers to &amp;quot;page 132 line 18&amp;quot; or &amp;quot;chapter 44, line 8&amp;quot;, these expressions should be converted into active links. Textual notes should appear as simple notes and placed within the body of the source texts. This version serves as a temporary work space and should yield to the following stage. It should become the official representation of the original print edition. The [[http://www.uni-mannheim.de/mateo/camenahtdocs/camenahist.html | Camena project]] &lt;br /&gt;
&lt;br /&gt;
* '''Fully interpreted electronic version of the print text''':  While many documents may be complete at this stage, textual notes in critical editions should be converted from human readable descriptions into machine interpretable operations. Thus, readers should be able to view the text as it appears in any given manuscript, view places where any two witnesses disagree with one another, and see analyses of how far different versions of the text differ from one another. This version of the text should become the default and replace the tagged transcript.  &lt;br /&gt;
&lt;br /&gt;
* '''One or more translations''': Translations should have provenance so that readers know whether or not they reflect the online version of the source text.  Translations should, like the editions, include all accompanying materials including introduction, notes, appendices, indices etc.  Like editions, translations should be available both as image books so that readers can, when in doubt, consult the print originals.&lt;br /&gt;
&lt;br /&gt;
The fully interpreted electronic edition should then provide a starting for subsequent edits. The text could evolve in a variety of ways.&lt;br /&gt;
&lt;br /&gt;
* '''Systematic collations''':  Individuals may systematically collate the source text against new witnesses (e.g., manuscripts, papyri, etc.) or new editions (where editors may have derived different conclusions and printed different readings).  All additions must be transparent: thus, we cannot record new readings without providing their jusification.  We can add new readings from manuscripts and other sources without necessarily changing the text. We cannot record different editorial decisions without encoding the source for those decisions.&lt;br /&gt;
&lt;br /&gt;
* '''Coordination of edition, textual notes and at least one reference translation''':  We may have multiple translations reflecting multiple editions of a given work but we should have at least one edition that reflects the content of the base edition and that can represent the different readings in the textual notes. Readers should always be able to see how (or whether) any given reading affects the main translation.  Readers should thus be able to filter out those notes which do not impact upon the English and to analyze the ''aggregate impact'' of choosing one version over another. While small changes of language can have dramatic effects upon meaning, readers should be able to gauge the overall significance of different version.&lt;br /&gt;
&lt;br /&gt;
A great deal more can be done with and for any given edition: we can add (and have added) commentaries, linguistic markup, links to scholarship and other supplementary materials. At the same time, the  but the above represents a basic level of documentation towards which producers should, in our view, aim.&lt;br /&gt;
&lt;br /&gt;
====Editorial Conventions====&lt;br /&gt;
&lt;br /&gt;
* '''Changes from the source text to the transcription''':  The Text Encoding Initiative provides tags to record locations where editors have corrected errors in the source, expanded abbreviations, and regularized spellings.&lt;br /&gt;
&lt;br /&gt;
* '''Markup stylesheet''':  The Text Encoding Initiative offers a range of tags but is not universal. In some cases, we will need to extend the TEI. In other cases, the TEI allows us to represent the same information in different ways: e.g., &amp;lt;name type=&amp;quot;place&amp;quot;&amp;gt;Rome&amp;lt;/name&amp;gt; or &amp;lt;placeName&amp;gt;Rome&amp;lt;/placeName&amp;gt;. The more homogeneous editions can be, the easier it will be to search, browse and maintain them over time.  Perseus has evolved conventions of its own over time, but even within Perseus different projects has approached the same problems differently. We need documentation that is more extensive and that can be updated in real time (e.g., a Wiki).&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Dunn_Paper&amp;diff=2022</id>
		<title>OSCE Dunn Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Dunn_Paper&amp;diff=2022"/>
		<updated>2007-01-29T15:22:48Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=e-Science and the critical edition: a discussion paper=&lt;br /&gt;
&lt;br /&gt;
==Stuart Dunn and Tobias Blanke&amp;lt;br&amp;gt;Arts and Humanities e-Science Support Centre, King's College London==&lt;br /&gt;
&lt;br /&gt;
At the end of the Nineties, a national e-Science Core Programme was established in the UK. Its agenda was driven by scientists who needed new technologies and concepts to cope with the ever increasing amount of data, both from experiments and simulations as well as knowledge gathering exercises. Faced with this 'data deluge', a new data-driven science was conceptualized with the scientist and research methods at the center of new data technologies. The idea of e-Science and the e-Scientist was accompanied by the development of new high-speed computing networks that promised solutions to a variety of problems in coping with the vast amount of information. 'Grid technologies' were the result of a global effort from computer scientists working together witch practitioners to advance existing network technologies like the internet in order to create a global space of sharing resources and services.&lt;br /&gt;
&lt;br /&gt;
Several e-Science initiatives in the UK are promoting to advance research work in virtual spaces with advanced computing - in particular network technologies. Technologies and methodologies for the automation and support of research processes are being investigated. Grid technologies and methodologies address how globally distributed data resources can be used in the research process or how computational power can be shared. At the same time, new forms of scholarly communications in 'virtual organizations' are developed. For example, the Access Grid promises tools to support structured meetings of researchers in group-to-group collaborations, a benefit which will be keenly felt by A&amp;amp;H researchers as they move towards larger and more formal collaborations. The advantages of direct communication in face-to-face meetings is combined with the ability to share instantly digital items among the groups. Grid technologies integrate two recent developments in research that are inseparable from each other: the new possibilities due to improved technologies complement new highly collaborative research.&lt;br /&gt;
&lt;br /&gt;
E-Science therefore stands for the development and deployment of a networked infrastructure and culture through which resources can be shared in a secure environment. These resources can be everything from processing power, data, or expertise that researchers can share. This networked infrastructure allows a culture of collaboration, in which new forms of collaboration can emerge, and new and advanced methodologies can be explored.&lt;br /&gt;
&lt;br /&gt;
A key to the success of e-Science is the provision of shared access to research facilities and therefore to provide answers to the increasing globalisation of research. Researchers from around the world can work together and use each other's resources as if they were collocated. Digital knowledge objects shall be created and (re-)used in virtual collaboration spaces. E-research is about joining things up and not purely about CPU power or computer networking. It is about pro-active relationships as between server to server and programme to programme and research practitioner to research practitioner. This global collaboration in a virtual space will be of key significance to what Arts and Humanities (A&amp;amp;H) researchers are going to be doing over the next ten years; and will fundamentally alter their relationship with the resources they use. &lt;br /&gt;
&lt;br /&gt;
Critical editions provide a key example of such resources. A recent expert seminar convened at the University of Sheffield by the AHDS e-Science Scoping Survey (http://ahds.ac.uk/e-science/e-science-scoping-study.htm) debated the application of e-science methods and technologies to the critical edition. It was considered that the concepts of the Virtual Research Environment (http://www.ahessc.ac.uk/briefing_papers/VRE_briefing_paper.html) and Virtual Organization have the potential to enable a paradigm shift from the 'traditional' model of the critical edition, whereby the text is produced by an individual researcher or small group of scholars and presented to a wider community as a static document, and an alternative whereby texts are produced and owned collaboratively by that community. In the latter case the text is produced as part of an iterative and ongoing process, under the collective influence of a group of researchers. The same principle could apply to elements of the 'digital infrastructure' on which much collaborative work relies - thesauri, dictionaries, lexica and so on. This raises complex issues of academic integrity and trust: the high-profile debate of the applicability of Wikipedia in research contexts is well known, and few would argue that a totally unfettered editorial process is appropriate. However such methodologies have very profound implications for the way humanities research is done, and the challenge is to quantify and qualify the shades of grey between Wikipedia and the traditional critical edition model.&lt;br /&gt;
&lt;br /&gt;
'''Some key questions are:'''&lt;br /&gt;
&lt;br /&gt;
* What technologies are needed to enable the collaborative research environments required for such 'democratization' of the critical edition?&lt;br /&gt;
* Do users need such editions? Will they ever trust them?&lt;br /&gt;
* How should access to the editorial process be managed? Who decides who gets to edit the text? Should it be managed at all? &lt;br /&gt;
* How should version control be maintained?&lt;br /&gt;
* How should annotations and edits be captured, both in terms of the finished article and the workflow process?&lt;br /&gt;
* What kind of peer-review process needs to be in place? &lt;br /&gt;
* How should cataloguing, referencing and citation of such documents be approached?&lt;br /&gt;
* How can such texts fit in to existing library and information (infra)structures? Will these need to be rethought?&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Deckers_Paper&amp;diff=2021</id>
		<title>OSCE Deckers Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Deckers_Paper&amp;diff=2021"/>
		<updated>2007-01-29T15:19:37Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Some thoughts on authority and peer review=&lt;br /&gt;
&lt;br /&gt;
==Daniel Deckers==&lt;br /&gt;
&lt;br /&gt;
''After months of thorough research, a scholar adds a surprising new reading to a digital edition. The next day, a frequent user of the platform where this edition resides wishes to contribute in a small way and corrects what he believes is an obvious error in the data entry (just why didn't he spot it before?).''&lt;br /&gt;
&lt;br /&gt;
While this scenario is arguably an exaggeration, and the particular case may&lt;br /&gt;
seem to be one of the more easily averted problems with an open digital&lt;br /&gt;
edition, it is an example of what worries we will have to address to win&lt;br /&gt;
acceptance for such editions.&lt;br /&gt;
&lt;br /&gt;
Initially, there will probably be two main approaches to create a corpus of&lt;br /&gt;
open digital editions in our field. One will be to convert existing (older?)&lt;br /&gt;
editions into an electronic version. Assuming unambiguous encoding&lt;br /&gt;
guidelines are adopted, the obvious main tenet for quality assurance is&lt;br /&gt;
minimising the number of errors of transmission from the printed text to the&lt;br /&gt;
electronic one. As with other electronic texts created from existing printed&lt;br /&gt;
books, rather simple, wiki-like approaches might probably work, and there is&lt;br /&gt;
going to be little contention as the printed text would be a decisive&lt;br /&gt;
reference in these cases (but see below).&lt;br /&gt;
&lt;br /&gt;
The other will be to create newly prepared critical editions as electronic&lt;br /&gt;
editions in the first place. For this, a traditional group of scholarly&lt;br /&gt;
editors could just use their established routine, creating a more or less&lt;br /&gt;
authoritative edition as the case may be, foregoing possible advantages of&lt;br /&gt;
the digital medium, yet ensuring a high standard of research is reflected in&lt;br /&gt;
the edition. In this case, we are unlikely to see substantial improvement by&lt;br /&gt;
the casual reader, and the number of specialists who could make meaningful&lt;br /&gt;
additions right from the publication date is going to be rather small.&lt;br /&gt;
&lt;br /&gt;
However, as we aim to have editions that evolve, and as we are not going to&lt;br /&gt;
contend ourselves with just duplicating what could be done in the era of the&lt;br /&gt;
printed book, the most interesting questions will be raised by how we can&lt;br /&gt;
integrate more casual contributors efficiently on various levels without&lt;br /&gt;
sacrificing quality and scholarly integrity of our output.&lt;br /&gt;
&lt;br /&gt;
Let's try to classify possible contributions to an evolving digital edition:&lt;br /&gt;
&lt;br /&gt;
* correction of typographical errors (which might have been either introduced in transfering material to the digital medium or have already existed in a previous printed edition)&lt;br /&gt;
&lt;br /&gt;
* correction of errors in interpreting previous materials (e.g. incorrect encoding of an older apparatus, misattribution, etc.)&lt;br /&gt;
&lt;br /&gt;
* new readings from a manuscript not previously included in the edition&lt;br /&gt;
&lt;br /&gt;
* new additional materials (commentary, translations, etc.)&lt;br /&gt;
&lt;br /&gt;
* updates or corrections to existing digital material (improved readings, new conjectures, more accurate commentary, more elegant translations, etc.)&lt;br /&gt;
&lt;br /&gt;
For all of these categories but the first, we have the problem that anything&lt;br /&gt;
based on interpretation is open to same, and that some of the work required&lt;br /&gt;
presupposes access to and familiarity with source materials that will not&lt;br /&gt;
necessarily all be freely available, and thus the results not easily&lt;br /&gt;
verifiable even by those select few with the necessary skills.&lt;br /&gt;
&lt;br /&gt;
While we would want to have corrections contributed by all willing to, and&lt;br /&gt;
new materials by those able to, we would want updates to the existing&lt;br /&gt;
material from those qualified to make the necessary judgements.&lt;br /&gt;
&lt;br /&gt;
We cannot, therefore, use a completely egalitarian approach, i.e. the&lt;br /&gt;
unadulterated wiki principle. While this might help with typographical&lt;br /&gt;
errors (though would you trust to this when your crucial new theory could be&lt;br /&gt;
thwarted by a spelling difference?), its use in the other areas would have&lt;br /&gt;
to presuppose that every contributor would use extreme restraint in only&lt;br /&gt;
adding and changing things within his specialised competence. Even if we&lt;br /&gt;
added a voting system (as suggested by previous speakers) to judge changes&lt;br /&gt;
(or the likelihood of particular variant readings, as has been suggested), I&lt;br /&gt;
doubt that mere numbers will be suitable in arriving at the right&lt;br /&gt;
conclusions.&lt;br /&gt;
&lt;br /&gt;
Must we therefore limit ourselves to a strictly hierarchical approach and&lt;br /&gt;
install an editorial team to supervise our corpus of digital editions? While&lt;br /&gt;
such a team would have to be a cross between a group of scholarly editors&lt;br /&gt;
such as found in large-scale editorial enterprises at academies of sciences&lt;br /&gt;
and a board of editors as with a scholarly journal, and it could conceivably&lt;br /&gt;
be rather efficient when the corpus is limited to a very specific field of&lt;br /&gt;
ancient authors, such a solution is hardly envisionable if we really aim to&lt;br /&gt;
have a large number of editions ranging from the most Classical texts to the&lt;br /&gt;
Byzantine.&lt;br /&gt;
&lt;br /&gt;
What then are the instruments to assure quality and yet go beyond&lt;br /&gt;
traditional structures? One answer lies in the encoding itself: Since we aim&lt;br /&gt;
to encode variants, we can also encode modern day variants, i.e. encode the&lt;br /&gt;
opinions of several modern contributors. The question of quality then also&lt;br /&gt;
becomes one of authoritative version(s). There are two ways of backing the&lt;br /&gt;
quality of encoded alternating interpretations (or readings, corrections&lt;br /&gt;
etc.), one in the authority of their contributor(s), the other in the&lt;br /&gt;
authority of those that support them. In both cases, the authority can be&lt;br /&gt;
either extrinsic, e.g. rest on the esteem a scholar or group of scholars is&lt;br /&gt;
held in, or intrinsic, i.e. based on some system of evaluating the&lt;br /&gt;
trustworthiness of other contributions by a particular individual.&lt;br /&gt;
&lt;br /&gt;
It would appear what we need is a system to map the authoritativeness of&lt;br /&gt;
experts in our fields onto a user management system. While very finegrained&lt;br /&gt;
approaches are theoretically possible, I cannot see that more than a rather&lt;br /&gt;
simple classification of which user is an expert in which particular&lt;br /&gt;
subfields would meet with acceptance. This, combined with a system of&lt;br /&gt;
letting users evaluate the usefulness of contributions (i.e. a &amp;quot;voting&lt;br /&gt;
system&amp;quot;) that weighs the votes according to authority, and possibly feeds&lt;br /&gt;
back to the authority value for the contributor, might already be enough&lt;br /&gt;
checks and balances to keep our editions useable. Only time will show.&lt;br /&gt;
&lt;br /&gt;
Since we do need to ensure that quotes from our editions can be referenced&lt;br /&gt;
unambiguously, we can also easily have the option of &amp;quot;freezing&amp;quot; certain&lt;br /&gt;
versions of them. Among such milestone versions we'd obviously have the&lt;br /&gt;
original edition, whether a transcription of a printed one or a newly&lt;br /&gt;
created one, as well as those that are the result of planned addition of new&lt;br /&gt;
materials in the context of specific projects. Moreover, to account for more&lt;br /&gt;
accidental improvements through various small contributions, there could&lt;br /&gt;
also be periodic milestones. So as not to have these be mere snapshots, we&lt;br /&gt;
must presuppose, again, some kind of editorial team that decides what&lt;br /&gt;
accumulated materials should be committed to this new authoritative version.&lt;br /&gt;
&lt;br /&gt;
This latter approach could of course be combined with voting mechanisms and&lt;br /&gt;
the authority system, and thus perhaps automated to a degree over time. Add&lt;br /&gt;
discussion forums and similar to the platform for our corpus of editions,&lt;br /&gt;
and changes that are found to be potentially dubious (or potentially&lt;br /&gt;
benefical) might be marked for discussion and then in some way of vote&lt;br /&gt;
decided upon for the next milestone of an edition.&lt;br /&gt;
&lt;br /&gt;
While most of these thoughts may seem to presuppose a single repository and&lt;br /&gt;
platform in a single place, in fact I believe this can and will be extended&lt;br /&gt;
to much more distributed systems. We do, however, need a more or less&lt;br /&gt;
universal system of judgement of authority (though concurrent systems are&lt;br /&gt;
envisionable, and I leave the interesting effects this would have on&lt;br /&gt;
definitive versions on texts to the reader). At the same time, we will need&lt;br /&gt;
ways of ascertaining that texts with the information on contributors are&lt;br /&gt;
intact and integral. I propose both requirements (one on the user, the other&lt;br /&gt;
on the server side) might be met by systems of credentials based on somewhat&lt;br /&gt;
similar principles as the web of trust used for public-key cryptography.&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Deckers_Paper&amp;diff=2020</id>
		<title>OSCE Deckers Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Deckers_Paper&amp;diff=2020"/>
		<updated>2007-01-29T15:19:09Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Some thoughts on authority and peer review=&lt;br /&gt;
&lt;br /&gt;
==Daniel Deckers==&lt;br /&gt;
&lt;br /&gt;
''After months of thorough research, a scholar adds a surprising new reading&lt;br /&gt;
to a digital edition. The next day, a frequent user of the platform where&lt;br /&gt;
this edition resides wishes to contribute in a small way and corrects what&lt;br /&gt;
he believes is an obvious error in the data entry (just why didn't he spot&lt;br /&gt;
it before?).''&lt;br /&gt;
&lt;br /&gt;
While this scenario is arguably an exaggeration, and the particular case may&lt;br /&gt;
seem to be one of the more easily averted problems with an open digital&lt;br /&gt;
edition, it is an example of what worries we will have to address to win&lt;br /&gt;
acceptance for such editions.&lt;br /&gt;
&lt;br /&gt;
Initially, there will probably be two main approaches to create a corpus of&lt;br /&gt;
open digital editions in our field. One will be to convert existing (older?)&lt;br /&gt;
editions into an electronic version. Assuming unambiguous encoding&lt;br /&gt;
guidelines are adopted, the obvious main tenet for quality assurance is&lt;br /&gt;
minimising the number of errors of transmission from the printed text to the&lt;br /&gt;
electronic one. As with other electronic texts created from existing printed&lt;br /&gt;
books, rather simple, wiki-like approaches might probably work, and there is&lt;br /&gt;
going to be little contention as the printed text would be a decisive&lt;br /&gt;
reference in these cases (but see below).&lt;br /&gt;
&lt;br /&gt;
The other will be to create newly prepared critical editions as electronic&lt;br /&gt;
editions in the first place. For this, a traditional group of scholarly&lt;br /&gt;
editors could just use their established routine, creating a more or less&lt;br /&gt;
authoritative edition as the case may be, foregoing possible advantages of&lt;br /&gt;
the digital medium, yet ensuring a high standard of research is reflected in&lt;br /&gt;
the edition. In this case, we are unlikely to see substantial improvement by&lt;br /&gt;
the casual reader, and the number of specialists who could make meaningful&lt;br /&gt;
additions right from the publication date is going to be rather small.&lt;br /&gt;
&lt;br /&gt;
However, as we aim to have editions that evolve, and as we are not going to&lt;br /&gt;
contend ourselves with just duplicating what could be done in the era of the&lt;br /&gt;
printed book, the most interesting questions will be raised by how we can&lt;br /&gt;
integrate more casual contributors efficiently on various levels without&lt;br /&gt;
sacrificing quality and scholarly integrity of our output.&lt;br /&gt;
&lt;br /&gt;
Let's try to classify possible contributions to an evolving digital edition:&lt;br /&gt;
&lt;br /&gt;
* correction of typographical errors (which might have been either introduced in transfering material to the digital medium or have already existed in a previous printed edition)&lt;br /&gt;
&lt;br /&gt;
* correction of errors in interpreting previous materials (e.g. incorrect encoding of an older apparatus, misattribution, etc.)&lt;br /&gt;
&lt;br /&gt;
* new readings from a manuscript not previously included in the edition&lt;br /&gt;
&lt;br /&gt;
* new additional materials (commentary, translations, etc.)&lt;br /&gt;
&lt;br /&gt;
* updates or corrections to existing digital material (improved readings, new conjectures, more accurate commentary, more elegant translations, etc.)&lt;br /&gt;
&lt;br /&gt;
For all of these categories but the first, we have the problem that anything&lt;br /&gt;
based on interpretation is open to same, and that some of the work required&lt;br /&gt;
presupposes access to and familiarity with source materials that will not&lt;br /&gt;
necessarily all be freely available, and thus the results not easily&lt;br /&gt;
verifiable even by those select few with the necessary skills.&lt;br /&gt;
&lt;br /&gt;
While we would want to have corrections contributed by all willing to, and&lt;br /&gt;
new materials by those able to, we would want updates to the existing&lt;br /&gt;
material from those qualified to make the necessary judgements.&lt;br /&gt;
&lt;br /&gt;
We cannot, therefore, use a completely egalitarian approach, i.e. the&lt;br /&gt;
unadulterated wiki principle. While this might help with typographical&lt;br /&gt;
errors (though would you trust to this when your crucial new theory could be&lt;br /&gt;
thwarted by a spelling difference?), its use in the other areas would have&lt;br /&gt;
to presuppose that every contributor would use extreme restraint in only&lt;br /&gt;
adding and changing things within his specialised competence. Even if we&lt;br /&gt;
added a voting system (as suggested by previous speakers) to judge changes&lt;br /&gt;
(or the likelihood of particular variant readings, as has been suggested), I&lt;br /&gt;
doubt that mere numbers will be suitable in arriving at the right&lt;br /&gt;
conclusions.&lt;br /&gt;
&lt;br /&gt;
Must we therefore limit ourselves to a strictly hierarchical approach and&lt;br /&gt;
install an editorial team to supervise our corpus of digital editions? While&lt;br /&gt;
such a team would have to be a cross between a group of scholarly editors&lt;br /&gt;
such as found in large-scale editorial enterprises at academies of sciences&lt;br /&gt;
and a board of editors as with a scholarly journal, and it could conceivably&lt;br /&gt;
be rather efficient when the corpus is limited to a very specific field of&lt;br /&gt;
ancient authors, such a solution is hardly envisionable if we really aim to&lt;br /&gt;
have a large number of editions ranging from the most Classical texts to the&lt;br /&gt;
Byzantine.&lt;br /&gt;
&lt;br /&gt;
What then are the instruments to assure quality and yet go beyond&lt;br /&gt;
traditional structures? One answer lies in the encoding itself: Since we aim&lt;br /&gt;
to encode variants, we can also encode modern day variants, i.e. encode the&lt;br /&gt;
opinions of several modern contributors. The question of quality then also&lt;br /&gt;
becomes one of authoritative version(s). There are two ways of backing the&lt;br /&gt;
quality of encoded alternating interpretations (or readings, corrections&lt;br /&gt;
etc.), one in the authority of their contributor(s), the other in the&lt;br /&gt;
authority of those that support them. In both cases, the authority can be&lt;br /&gt;
either extrinsic, e.g. rest on the esteem a scholar or group of scholars is&lt;br /&gt;
held in, or intrinsic, i.e. based on some system of evaluating the&lt;br /&gt;
trustworthiness of other contributions by a particular individual.&lt;br /&gt;
&lt;br /&gt;
It would appear what we need is a system to map the authoritativeness of&lt;br /&gt;
experts in our fields onto a user management system. While very finegrained&lt;br /&gt;
approaches are theoretically possible, I cannot see that more than a rather&lt;br /&gt;
simple classification of which user is an expert in which particular&lt;br /&gt;
subfields would meet with acceptance. This, combined with a system of&lt;br /&gt;
letting users evaluate the usefulness of contributions (i.e. a &amp;quot;voting&lt;br /&gt;
system&amp;quot;) that weighs the votes according to authority, and possibly feeds&lt;br /&gt;
back to the authority value for the contributor, might already be enough&lt;br /&gt;
checks and balances to keep our editions useable. Only time will show.&lt;br /&gt;
&lt;br /&gt;
Since we do need to ensure that quotes from our editions can be referenced&lt;br /&gt;
unambiguously, we can also easily have the option of &amp;quot;freezing&amp;quot; certain&lt;br /&gt;
versions of them. Among such milestone versions we'd obviously have the&lt;br /&gt;
original edition, whether a transcription of a printed one or a newly&lt;br /&gt;
created one, as well as those that are the result of planned addition of new&lt;br /&gt;
materials in the context of specific projects. Moreover, to account for more&lt;br /&gt;
accidental improvements through various small contributions, there could&lt;br /&gt;
also be periodic milestones. So as not to have these be mere snapshots, we&lt;br /&gt;
must presuppose, again, some kind of editorial team that decides what&lt;br /&gt;
accumulated materials should be committed to this new authoritative version.&lt;br /&gt;
&lt;br /&gt;
This latter approach could of course be combined with voting mechanisms and&lt;br /&gt;
the authority system, and thus perhaps automated to a degree over time. Add&lt;br /&gt;
discussion forums and similar to the platform for our corpus of editions,&lt;br /&gt;
and changes that are found to be potentially dubious (or potentially&lt;br /&gt;
benefical) might be marked for discussion and then in some way of vote&lt;br /&gt;
decided upon for the next milestone of an edition.&lt;br /&gt;
&lt;br /&gt;
While most of these thoughts may seem to presuppose a single repository and&lt;br /&gt;
platform in a single place, in fact I believe this can and will be extended&lt;br /&gt;
to much more distributed systems. We do, however, need a more or less&lt;br /&gt;
universal system of judgement of authority (though concurrent systems are&lt;br /&gt;
envisionable, and I leave the interesting effects this would have on&lt;br /&gt;
definitive versions on texts to the reader). At the same time, we will need&lt;br /&gt;
ways of ascertaining that texts with the information on contributors are&lt;br /&gt;
intact and integral. I propose both requirements (one on the user, the other&lt;br /&gt;
on the server side) might be met by systems of credentials based on somewhat&lt;br /&gt;
similar principles as the web of trust used for public-key cryptography.&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Deckers_Paper&amp;diff=2019</id>
		<title>OSCE Deckers Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Deckers_Paper&amp;diff=2019"/>
		<updated>2007-01-29T15:15:20Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Some thoughts on authority and peer review=&lt;br /&gt;
&lt;br /&gt;
==Daniel Deckers==&lt;br /&gt;
&lt;br /&gt;
~~After months of thorough research, a scholar adds a surprising new reading&lt;br /&gt;
to a digital edition. The next day, a frequent user of the platform where&lt;br /&gt;
this edition resides wishes to contribute in a small way and corrects what&lt;br /&gt;
he believes is an obvious error in the data entry (just why didn't he spot&lt;br /&gt;
it before?).~~&lt;br /&gt;
&lt;br /&gt;
While this scenario is arguably an exaggeration, and the particular case may&lt;br /&gt;
seem to be one of the more easily averted problems with an open digital&lt;br /&gt;
edition, it is an example of what worries we will have to address to win&lt;br /&gt;
acceptance for such editions.&lt;br /&gt;
&lt;br /&gt;
Initially, there will probably be two main approaches to create a corpus of&lt;br /&gt;
open digital editions in our field. One will be to convert existing (older?)&lt;br /&gt;
editions into an electronic version. Assuming unambiguous encoding&lt;br /&gt;
guidelines are adopted, the obvious main tenet for quality assurance is&lt;br /&gt;
minimising the number of errors of transmission from the printed text to the&lt;br /&gt;
electronic one. As with other electronic texts created from existing printed&lt;br /&gt;
books, rather simple, wiki-like approaches might probably work, and there is&lt;br /&gt;
going to be little contention as the printed text would be a decisive&lt;br /&gt;
reference in these cases (but see below).&lt;br /&gt;
&lt;br /&gt;
The other will be to create newly prepared critical editions as electronic&lt;br /&gt;
editions in the first place. For this, a traditional group of scholarly&lt;br /&gt;
editors could just use their established routine, creating a more or less&lt;br /&gt;
authoritative edition as the case may be, foregoing possible advantages of&lt;br /&gt;
the digital medium, yet ensuring a high standard of research is reflected in&lt;br /&gt;
the edition. In this case, we are unlikely to see substantial improvement by&lt;br /&gt;
the casual reader, and the number of specialists who could make meaningful&lt;br /&gt;
additions right from the publication date is going to be rather small.&lt;br /&gt;
&lt;br /&gt;
However, as we aim to have editions that evolve, and as we are not going to&lt;br /&gt;
contend ourselves with just duplicating what could be done in the era of the&lt;br /&gt;
printed book, the most interesting questions will be raised by how we can&lt;br /&gt;
integrate more casual contributors efficiently on various levels without&lt;br /&gt;
sacrificing quality and scholarly integrity of our output.&lt;br /&gt;
&lt;br /&gt;
Let's try to classify possible contributions to an evolving digital edition:&lt;br /&gt;
&lt;br /&gt;
- correction of typographical errors (which might have been either introduced in transfering material to the digital medium or have already existed in a previous printed edition)&lt;br /&gt;
&lt;br /&gt;
- correction of errors in interpreting previous materials (e.g. incorrect encoding of an older apparatus, misattribution, etc.)&lt;br /&gt;
&lt;br /&gt;
- new readings from a manuscript not previously included in the edition&lt;br /&gt;
&lt;br /&gt;
- new additional materials (commentary, translations, etc.)&lt;br /&gt;
&lt;br /&gt;
- updates or corrections to existing digital material (improved readings, new conjectures, more accurate commentary, more elegant translations, etc.)&lt;br /&gt;
&lt;br /&gt;
For all of these categories but the first, we have the problem that anything&lt;br /&gt;
based on interpretation is open to same, and that some of the work required&lt;br /&gt;
presupposes access to and familiarity with source materials that will not&lt;br /&gt;
necessarily all be freely available, and thus the results not easily&lt;br /&gt;
verifiable even by those select few with the necessary skills.&lt;br /&gt;
&lt;br /&gt;
While we would want to have corrections contributed by all willing to, and&lt;br /&gt;
new materials by those able to, we would want updates to the existing&lt;br /&gt;
material from those qualified to make the necessary judgements.&lt;br /&gt;
&lt;br /&gt;
We cannot, therefore, use a completely egalitarian approach, i.e. the&lt;br /&gt;
unadulterated wiki principle. While this might help with typographical&lt;br /&gt;
errors (though would you trust to this when your crucial new theory could be&lt;br /&gt;
thwarted by a spelling difference?), its use in the other areas would have&lt;br /&gt;
to presuppose that every contributor would use extreme restraint in only&lt;br /&gt;
adding and changing things within his specialised competence. Even if we&lt;br /&gt;
added a voting system (as suggested by previous speakers) to judge changes&lt;br /&gt;
(or the likelihood of particular variant readings, as has been suggested), I&lt;br /&gt;
doubt that mere numbers will be suitable in arriving at the right&lt;br /&gt;
conclusions.&lt;br /&gt;
&lt;br /&gt;
Must we therefore limit ourselves to a strictly hierarchical approach and&lt;br /&gt;
install an editorial team to supervise our corpus of digital editions? While&lt;br /&gt;
such a team would have to be a cross between a group of scholarly editors&lt;br /&gt;
such as found in large-scale editorial enterprises at academies of sciences&lt;br /&gt;
and a board of editors as with a scholarly journal, and it could conceivably&lt;br /&gt;
be rather efficient when the corpus is limited to a very specific field of&lt;br /&gt;
ancient authors, such a solution is hardly envisionable if we really aim to&lt;br /&gt;
have a large number of editions ranging from the most Classical texts to the&lt;br /&gt;
Byzantine.&lt;br /&gt;
&lt;br /&gt;
What then are the instruments to assure quality and yet go beyond&lt;br /&gt;
traditional structures? One answer lies in the encoding itself: Since we aim&lt;br /&gt;
to encode variants, we can also encode modern day variants, i.e. encode the&lt;br /&gt;
opinions of several modern contributors. The question of quality then also&lt;br /&gt;
becomes one of authoritative version(s). There are two ways of backing the&lt;br /&gt;
quality of encoded alternating interpretations (or readings, corrections&lt;br /&gt;
etc.), one in the authority of their contributor(s), the other in the&lt;br /&gt;
authority of those that support them. In both cases, the authority can be&lt;br /&gt;
either extrinsic, e.g. rest on the esteem a scholar or group of scholars is&lt;br /&gt;
held in, or intrinsic, i.e. based on some system of evaluating the&lt;br /&gt;
trustworthiness of other contributions by a particular individual.&lt;br /&gt;
&lt;br /&gt;
It would appear what we need is a system to map the authoritativeness of&lt;br /&gt;
experts in our fields onto a user management system. While very finegrained&lt;br /&gt;
approaches are theoretically possible, I cannot see that more than a rather&lt;br /&gt;
simple classification of which user is an expert in which particular&lt;br /&gt;
subfields would meet with acceptance. This, combined with a system of&lt;br /&gt;
letting users evaluate the usefulness of contributions (i.e. a &amp;quot;voting&lt;br /&gt;
system&amp;quot;) that weighs the votes according to authority, and possibly feeds&lt;br /&gt;
back to the authority value for the contributor, might already be enough&lt;br /&gt;
checks and balances to keep our editions useable. Only time will show.&lt;br /&gt;
&lt;br /&gt;
Since we do need to ensure that quotes from our editions can be referenced&lt;br /&gt;
unambiguously, we can also easily have the option of &amp;quot;freezing&amp;quot; certain&lt;br /&gt;
versions of them. Among such milestone versions we'd obviously have the&lt;br /&gt;
original edition, whether a transcription of a printed one or a newly&lt;br /&gt;
created one, as well as those that are the result of planned addition of new&lt;br /&gt;
materials in the context of specific projects. Moreover, to account for more&lt;br /&gt;
accidental improvements through various small contributions, there could&lt;br /&gt;
also be periodic milestones. So as not to have these be mere snapshots, we&lt;br /&gt;
must presuppose, again, some kind of editorial team that decides what&lt;br /&gt;
accumulated materials should be committed to this new authoritative version.&lt;br /&gt;
&lt;br /&gt;
This latter approach could of course be combined with voting mechanisms and&lt;br /&gt;
the authority system, and thus perhaps automated to a degree over time. Add&lt;br /&gt;
discussion forums and similar to the platform for our corpus of editions,&lt;br /&gt;
and changes that are found to be potentially dubious (or potentially&lt;br /&gt;
benefical) might be marked for discussion and then in some way of vote&lt;br /&gt;
decided upon for the next milestone of an edition.&lt;br /&gt;
&lt;br /&gt;
While most of these thoughts may seem to presuppose a single repository and&lt;br /&gt;
platform in a single place, in fact I believe this can and will be extended&lt;br /&gt;
to much more distributed systems. We do, however, need a more or less&lt;br /&gt;
universal system of judgement of authority (though concurrent systems are&lt;br /&gt;
envisionable, and I leave the interesting effects this would have on&lt;br /&gt;
definitive versions on texts to the reader). At the same time, we will need&lt;br /&gt;
ways of ascertaining that texts with the information on contributors are&lt;br /&gt;
intact and integral. I propose both requirements (one on the user, the other&lt;br /&gt;
on the server side) might be met by systems of credentials based on somewhat&lt;br /&gt;
similar principles as the web of trust used for public-key cryptography.&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Garces_Response&amp;diff=2018</id>
		<title>OSCE Garces Response</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Garces_Response&amp;diff=2018"/>
		<updated>2007-01-29T15:14:18Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Response to Neel Smith=&lt;br /&gt;
&lt;br /&gt;
It is easy to agree with Neel Smith's proposed method of moving from a conceptual model to the analysis of its distinctive features, and to then translate the resulting functional requirements into technical ones. Equally agreable is the simplicity of his proposal: reliance on well-established technical protocols - HTTP as the transport mechanism, XML for service requests and replies - in order to provide basic citational functionality - on different levels of granularity - and indexing services. The crux of his presentation, it seems to me, is, at least for our purposes, his advocacy of an agreement on &amp;amp;quot;the meaning of standard values&amp;amp;quot;, i.e. an ontology to serve as the shared matrix of our disciplinary discourse venturing into the digital era.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
What I would like to do in my brief response is to briefly raise two of the more complex issues relating to his proposal by simply asking two sets of questions:&lt;br /&gt;
&lt;br /&gt;
===Time===&lt;br /&gt;
&lt;br /&gt;
Neel Smith recognises the necessity of publications - particularly scholarly publications - to adhere to both &amp;amp;quot;permanence and citability&amp;amp;quot; - the latter requiring the former. Yet, is it not one of the advantages of digital publications to they can be revised easily - corrected, even developed - without the wait for the edition to be out of print? And is this feature of impermanence not even more desirable in a collaborative framework, making the resource richer as it is moulded by subsequent generations of collaborators? How, then, do we refer to a work-in-progress in an authoritative way, how do we point at a moving object? &lt;br /&gt;
&lt;br /&gt;
===Essence===&lt;br /&gt;
&lt;br /&gt;
In addition to the complex web of entities created by the publication of manuscripts and printed works, what ~~new entities~~ is the digital medium supplementing and how should we deal with them?&lt;br /&gt;
&lt;br /&gt;
Beyond a mere transcription of a 'text', we can now have texts intimately interwoven with layers of markup - some encoding the same features, but differing, others adding further features, and still others combining any number of them. Should we distinguish these from the base text and/or from each other?&lt;br /&gt;
&lt;br /&gt;
We can have furthermore, a plurality of 'editions' coming out of the same repository, catering for different interpretive needs. Again: should we - and if so: how? - distinguish archival and presentational layers?&lt;br /&gt;
&lt;br /&gt;
Concerning his concrete proposal a further related question must be asked: does a hierarchical model cater for all these - old and new - categories?&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Garces_Response&amp;diff=2017</id>
		<title>OSCE Garces Response</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Garces_Response&amp;diff=2017"/>
		<updated>2007-01-29T15:13:33Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Response to Neel Smith=&lt;br /&gt;
&lt;br /&gt;
It is easy to agree with Neel Smith's proposed method of moving from a conceptual model to the analysis of its distinctive features, and to then translate the resulting functional requirements into technical ones. Equally agreable is the simplicity of his proposal: reliance on well-established technical protocols - HTTP as the transport mechanism, XML for service requests and replies - in order to provide basic citational functionality - on different levels of granularity - and indexing services. The crux of his presentation, it seems to me, is, at least for our purposes, his advocacy of an agreement on &amp;amp;quot;the meaning of standard values&amp;amp;quot;, i.e. an ontology to serve as the shared matrix of our disciplinary discourse venturing into the digital era.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
What I would like to do in my brief response is to briefly raise two of the more complex issues relating to his proposal by simply asking two sets of questions:&lt;br /&gt;
&lt;br /&gt;
===Time===&lt;br /&gt;
&lt;br /&gt;
Neel Smith recognises the necessity of publications - particularly scholarly publications - to adhere to both &amp;amp;quot;permanence and citability&amp;amp;quot; - the latter requiring the former. Yet, is it not one of the advantages of digital publications to they can be revised easily - corrected, even developed - without the wait for the edition to be out of print? And is this feature of impermanence not even more desirable in a collaborative framework, making the resource richer as it is moulded by subsequent generations of collaborators? How, then, do we refer to a work-in-progress in an authoritative way, how do we point at a moving object? &lt;br /&gt;
&lt;br /&gt;
===Essence===&lt;br /&gt;
&lt;br /&gt;
In addition to the complex web of entities created by the publication of manuscripts and printed works, what ~~new entities~~ is the digital medium supplementing and how should we deal with them?&lt;br /&gt;
&lt;br /&gt;
Beyond a mere transcription of a 'text', we can now have texts intimately interwoven with layers of markup - some encoding the same features, but differing, others adding further features, and still others combining any number of them. Should we distinguish these from the base text and/or from each other?&lt;br /&gt;
&lt;br /&gt;
We can have furthermore, a plurality of 'editions' coming out of the same repository, catering for different interpretive needs. Again: should we - and if so: how? - distinguish archival and presentational layers?&lt;br /&gt;
&lt;br /&gt;
Concerning his concrete proposal a further related question must be asked: does a hierarchical model cater for all these - old and new - categories?&lt;br /&gt;
&lt;br /&gt;
[OSCE index&amp;gt;Main.osce] | [OSCE programme&amp;gt;programme]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Smith_Paper&amp;diff=2016</id>
		<title>OSCE Smith Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Smith_Paper&amp;diff=2016"/>
		<updated>2007-01-29T15:12:10Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Neel Smith, College of the Holy Cross:  OCSE position paper&lt;br /&gt;
&lt;br /&gt;
=An architecture for a distributed library incorporating open-source critical editions=&lt;br /&gt;
&lt;br /&gt;
In this position paper, I outline recent work at the Center for Hellenic Studies (Washington, D.C.) on a suite of protocols for creating a distributed library of interoperable scholarly resources.  In the opening section, I provide some background to our approach.  In the following section, I describe the service stack we are currently testing in collaboration with the Perseus project.  At our meeting in London, I hope to use my introductory time to illustrate the ideas presented here with a couple of concrete examples of applications.&lt;br /&gt;
&lt;br /&gt;
==Background: digital publications==&lt;br /&gt;
&lt;br /&gt;
Designing a technical architecture for scholarly publication is the last link in a logical chain.   We must first  define what we mean by “publication,”  identify its distinctive features, and translate those into functional requirements.  Functional requirements in turn can be expressed as technical requirements, and we can then choose an architecture that satisfies those requirements. Here I summarize very briefly views on those topics I have spelled out more fully in a paper entitled &amp;quot;[Digital publication for digital libraries&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/digitalpub].&amp;quot;&lt;br /&gt;
&lt;br /&gt;
In the scholarly world, publication serves as the *permanent record of reference* for scholarly work.  In any medium therefore, scholarly publications must be designed for both *permanence* and *citability*.&lt;br /&gt;
&lt;br /&gt;
I would translate these defining characteristics of scholarly publication into at least three functional requirements:&lt;br /&gt;
&lt;br /&gt;
* it must be identically replicable&lt;br /&gt;
* it must be alienated from its author&lt;br /&gt;
* it must be citable in a fixed version&lt;br /&gt;
&lt;br /&gt;
We could rephrase these functional requirements by defining the form of  scholarly published works &lt;br /&gt;
as *works possessing an explicitly identified edition and explicitly identified citation scheme, &lt;br /&gt;
that can be irrevocably and identically replicated*. &lt;br /&gt;
&lt;br /&gt;
In  &amp;quot;[Digital publication for digital libraries&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/digitalpub],&amp;quot;  I develop arguments for a list of technical specifications that are necessary to satisfy this understanding of digital publication.  Rather than repeat those in detail here, I wish simply to underscore that a digital publication has to capture the *functionality* rather than the appearance of a scholarly work.  Beyond identifying appropriate ways to represent an open-source critical edition (e.g., recommended applications of TEI encoding to a document), then, we need to develop an infrastructure for working with critical editions in the broader context of a distirbuted and interoperating digital library.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Architecture:  digital libraries.==&lt;br /&gt;
The natural architecture permitting interactions among potentially distributed objects is a suite of network services following defined protocols.   In much of our work defining services for scholarly work, we have been influenced by the pioneering work of the Open Geospatial Consortium developing service protocols to enable distributed GIS operation.  (See the [http://www.opengeospatial.org/&amp;gt;Open Geospatial Consortium home page].)&lt;br /&gt;
&lt;br /&gt;
Our initial goal is to work with the most fundamental kinds of services to provide functionality that other services can in turn build on.  A structured “diff” service describing differences in the structure and content of two XML fragments, for example, might be layered on top of an elementary retrieval service that abstracts the problem of retrieving text passages from canonical references.  The structured diff service in turn might serve as a base for a higher-order service statistically summarizing or analyzing differences in two pieces of text.&lt;br /&gt;
&lt;br /&gt;
Part of the attraction of the service model is its technical simplicity, since protocols for scholarly services can be layered on top of well established technical protocols:  HTTP as the transport mechanism, XML for service requests and replies.  Part of the attraction, too, is that this hierarchical model corresponds to a scholarly ideal:  it simultaneously allows for high-level abstraction of complexity, while ensuring the transparency of supporting or underlying functionality.&lt;br /&gt;
&lt;br /&gt;
===Fundamental services===&lt;br /&gt;
While we can easily imagine interesting, complex services we might like to have as easily available as an internet access point, I would argue that the most fundamental services for scholarly publication are those supporting the *simple identification and retrieval of fundamental objects with stable, location-independent references* --  services, in other words, that directly support our view of publication as a permanent and citable record.&lt;br /&gt;
&lt;br /&gt;
For many kinds of material we refer to, citation is comparatively straightforward.  We often work with collections of discrete objects cited simply by a unique identifier:  an “author-year” label to identify one entry in a bibliographic list, a museum inventory number to identify a specific archaeological artifact, a catalog number to identify a listing in a collection like Erbse's ~~scholia vetera~~ of the ~~Iliad~~.  Even when we refer to specific properties of an object (the author property of a bibliographic entry, the die axis of a coin, Erbse's source attribution of a scholiastic comment ...), we continue to cite the object as a discrete entity.  One fundamental service we need then is a service for identification and retrieval of discrete entities in a collection.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Texts present a different challenge.  In the first place, the entities we refer to with textual citations are not simple discrete objects, as librarians attempting to catalog texts are aware.  The Functional Requirements for Bibliographic Records (FRBR) describes a hierarchical model for texts, from the notional work, to the expression of that work in some version, to the manifestation of a version in some concrete form, to an individual item.  (A good introduction to FRBR is the U.S. Library of Congress' page [What is FRBR?&amp;gt;http://www.loc.gov/cds/FRBR.html]).   Classicists and biblical scholars have long implied a similar but not identical abstraction of notional work from particular versions in their use of version-independent, canonical reference systems.  One difference is that classicists' citation practice normally associates texts in groups or corpora that may or may not appear in documentary components of FRBR;  another is that FRBR's “manifestation” distinguishes different reproductions of a given expression (such as identical printings of a given edition) that may not be significant for scholarly citation.&lt;br /&gt;
&lt;br /&gt;
FRBR, of course, as a cataloging model does not address citation, and a second problem texts present is that we must allow for continuous citation.  Canonical citation schemes are often hierarchical (e.g., book/chapter/section of a prose work);  our service must support citation to this level of granularity, and beyond that should allow citation of subsections of text for a specific version.&lt;br /&gt;
&lt;br /&gt;
A second fundamental service, then, is a service for identifying texts and retrieving textual references in accordance with the semantics of citation practice traditional in fields like classics or biblical studies.&lt;br /&gt;
&lt;br /&gt;
To make these two methods of identifying and retrieving citable objects useful together in a distributed library, we can define a third basic service:  indexing information to either form of citation.  An index of personal names in a text, for example, might literally index strings with forms of names to a text reference, but it might also, more usefully for many purposes, index identifiers in a prosopographic collection to textual references.  The identifier could both disambiguate superficial strings of characters in the text, and provide a key to the prosopographic collection.&lt;br /&gt;
&lt;br /&gt;
At CHS, we have drafted standards for these three services, and have implemented each as a java servlet.  For more detailed information, see this page on &amp;quot;[Fundamental services for scholarly reference&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/tic].&amp;quot;&lt;br /&gt;
&lt;br /&gt;
===Ancillary services and standards===&lt;br /&gt;
As the abstract in the conference program indicates, to create an effectively interoperable network of resources, we need to agree not only on service protocols, but on the meaning of standard *values* that can be used in the framework of the protocol.  Having an agreed-upon system for finding what texts a service offers, discovering their citation schemes, and requesting sections of the text in that scheme will not help us to interoperate if we can't agree on how to identify Herodotus' ~~Histories~~, or an inscription from Aphrodisias.  To support the three fundamental services previously described, we have also developed ancillary services and standards to address these issues.&lt;br /&gt;
&lt;br /&gt;
Texts cited by canonical reference are a comparatively stable set of resources.  Technically, we need a simple service that resolves some kind of query string to standard identifiers, comparable to the [uBio service&amp;gt;http://www.ubio.org/] that scientists can use to automatically search for standard taxonomic identifiers for species.  In contrast to uBio, however, our service must be able to support a hierarchical scheme of identifiers so that we can refer to texts at the level of works, versions (such as a specific translation or edition) or individual exemplars.  To fill this technical gap, we have developed a hierarchical Registry service (see [fuller information with links&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/registry]).&lt;br /&gt;
&lt;br /&gt;
Institutionally, we need to find appropriate custodians to manage these authority lists for given domains.  CHS has taken responsibility for maintaining a Registry service for identifiers of Greek literary works;  the Aphrodisias project would be a logical choice to assume responsibility for assigning identifiers to inscriptions from Aphrodisias. (Whether choosing to administer a service directly, or to take editorial responsibility for material served elsewhere is not important.)  The internet's DNS system offers a good analogy to what we might ultimately develop:  the equivalent of a root server or servers is being run at CHS as a Registry of authoritative registries for given domains or corpora;  individual registries in turn may be disseminated so that an actual application might consult a local copy of the registry information to resolve a reference.&lt;br /&gt;
&lt;br /&gt;
In contrast to canonically cited texts, collections of discrete objects may be created so freely that a comparable system of registries would be unrealistically burdensome.  What authority should I register my collection with if I, as an individual scholar, create a database of results of my work, and want to expose it to the world using a Collections Service?  I am the only authority responsible for defining the unique identifiers in my collection, so I need a namespace of my own within which  I can freely manage my collection's IDs.   This is very similar to the problem that authors of XML document structures face, and we are adopting a very similar solution.  Just as XML namespaces utilize the same mechanism used for URLs to provide unique namespaces to anyone creating a new XML structure, so we use that structure to provide unique *data namespaces*.  At CHS, a Collection of data about digital images is given unique identifiers from the data namespace chs.harvard.edu/datans/images;  the Perseus project could, for example, use a data namespace like perseus.tufts.edu/images/namespaces, and if both collections have an image with the same ID, they can be correctly resolved.&lt;br /&gt;
&lt;br /&gt;
We need to consider one further important difference between reference by unique ID and the kind of canonical reference we use for texts.  Unique IDs can be represented by simple strings of characters;  the semantics of a reference within a hierarchical citation scheme to a text in a FRBR-like hierarchy cannnot.  We have therefore proposed a syntax for a notation scheme with explicit semantics, following the requirements of the IETF's URN system.  These Canonical Text Services URNs make it possible to reduce the complexity of a reference like “First occurrence of the string 'cano' in line 1 of book 1 of Vergil's ~~Aeneid~~” to a flat string that can then be used by any application that understands CTS-URNs.   (For more information and links, see [CTS URNs&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/cts-urn].)&lt;br /&gt;
&lt;br /&gt;
For an overview of CHS work on these topics, see &amp;quot;[Ancillary services suppporting scholarly reference&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/ancillary].&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==Composite objects and the TICI stack==&lt;br /&gt;
&lt;br /&gt;
An extraordinary range of scholarly citation can be handled through the simple mechanisms of Collection Services, and Canonical Text Services, while indexing using Reference Index Services enables a complex web of associations to be built on top these citation mechanisms.  We want to incorporate spatial manipulation into our stack of services, but for the present are very happy to let others, including the Open Geospatial Consortium, take the lead in this area.  In the summer of 2006, we began to build the first examples of compound objects, adding to the simple identification and retrieval of Collections and Canonical Texts, more specialized manipulation for binary images.&lt;br /&gt;
&lt;br /&gt;
Image Procesing Services perform operations such as scaling an image, selecting a subsection of it, or altering its brightness and contrast.   (See “[Image Processing Services&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/images].”)  By itself, an image processing service is of little use;  it really becomes valuable only in association with some other information.  Collections services already provide a ready means of working with metadata about each image;  Reference Index Services make it possible to associate binary image identifiers with objects in other collections, or with texts.  An index of, say, page images to CTS URNs could define the relation between a text and images of pages in a specific edition;  a CTS instance could provide access to an XML text, while a related Image Processing Service could work with the image data.&lt;br /&gt;
&lt;br /&gt;
At CHS, the result in the fall of 2006 is a stack of four principle interrelated services: Texts, Indexes, Collections and Images, that together provide a sufficient infrastructure for a surprising range of scholarly publications.  We have been closely collaborating with the Perseus project over the last several months to test these services, and build end-user applications on top of them.  Text browsing and reading applications work simultaneously with CHS implementations of  Canonical Text Services in Washington, D.C., at Holy Cross College in Worcester, Massachusetts, and at Furman University in Greenville, S.C., as well as with an independent implementation using completely different back-end technology at the Perseus project at Tufts University.&lt;br /&gt;
&lt;br /&gt;
For more information, see &amp;quot;[An overview of services for composite objects&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/composites]&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==Current work: Scenarios==&lt;br /&gt;
&lt;br /&gt;
Even as small a set of services as the TICI stack allows for very complex networks of information, and it is becoming increasingly apparent that we need to plan now for a further dimension to our work:  a means of making machine-parseable statements about the relations among these resources.&lt;br /&gt;
&lt;br /&gt;
In September, 2006, we have begun work on a simple XML schema for inventorying and describing the relations among stable, citable resources anywhere in the TICI stack.  These inventories, which we are provisionally calling “Scenarios,” are in a sense a digital extension of bibliography:  they add to the  static lists of print bibliography a specification of how resources relate to each other.  Scenarios are declarative or descriptive, not functional:  applications may use the information in a Scenario as they choose, but as a print bibliography ideally catalogs resources needed to read a print publication, Scenarios catalog resources needed to read a digital publication. &lt;br /&gt;
&lt;br /&gt;
A simple text reader can, for example, list a single resource with a CTS URN referring to a passage in a text;  in  this instance, the Scenario amounts to a simple bookmark.  But a text reader that filters the text with information from an index might overlay links on the words of a Latin text to a morphological index.  Its Scenario can specify how the text resource and index relate.  An even more sophisticated reader might in turn associate the lemma with other morphological data;  this could appear as a Collection in the application's Scenario.&lt;br /&gt;
&lt;br /&gt;
Our work on Scenarios is very preliminary at this point, but illustrates a number of themes that are relevant to the broader topic of this conference:  the leverage we can obtain from building on openly available resources, the ways very simple, even minimal resources can in their complex interrelations lead to  sophisticated scholarly productions, and the ease of interoperation that is possible when we can work with common protocols and standards.&lt;br /&gt;
&lt;br /&gt;
'''More information'''&lt;br /&gt;
* Documentation of technical work at CHS, ~~[Digital Incunabula&amp;gt;http://chs75.harvard.edu/projects/diginc/home]~~&lt;br /&gt;
* &amp;quot;Update blog&amp;quot; with syndicated feeds for [announcements and updates&amp;gt;https://chs76.harvard.edu/weblog/neel/] from the CHS Technical Working Group&lt;br /&gt;
&lt;br /&gt;
'''License'''&lt;br /&gt;
(c) Neel Smith 2006 &lt;br /&gt;
Distributed under the [Creative Commons Attribution-Share-alike license v. 2.5&amp;gt;http://creativecommons.org/licenses/by-sa/2.5/]&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Choudhury_Paper&amp;diff=2015</id>
		<title>OSCE Choudhury Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Choudhury_Paper&amp;diff=2015"/>
		<updated>2007-01-29T15:10:06Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Position Paper on Licensing/Legal Matters=&lt;br /&gt;
&lt;br /&gt;
==Sayeed Choudhury==&lt;br /&gt;
==Library Digital Programs, Sheridan Libraries, Johns Hopkins University==&lt;br /&gt;
&lt;br /&gt;
In their position paper, Stuart Dunn and Tobias Blanke raise an interesting and relevant question regarding digital texts: &amp;amp;quot;How can such texts fit in existing library and information (infra)structures?  Will these need to be rethought?&amp;amp;quot;  Winston Tabb, Sheridan Dean of University Libraries at Johns Hopkins, has stated that libraries are built upon three pillars - collections, services and infrastructure.  Arguably, collections have represented the most important element in the print world, with services and infrastructure supporting the collections.  In the digital world, these elements are becoming blurred. It may be appropriate to assert that the ~~principles~~ by which libraries (and archives and museums) have operated remain valid, but the ~~practices~~ need to be reconsidered.  Not surprisingly, libraries are facing new challenges, and opportunities, with the development of infrastructure to support digital collections and services.&lt;br /&gt;
&lt;br /&gt;
At the heart of this infrastructure development effort is the repository.  There are many defintions for repository, but for the purpose of this discussion, the most useful one is offered by Cliff Lynch who stated a &amp;amp;quot;repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.&amp;amp;quot; (http://www.arl.org/newsltr/226/ir.html)&lt;br /&gt;
&lt;br /&gt;
The emphasis on both services and preservation is particularly noteworthy.  From a preservation perspective, it is important to note that both open standards and open source augment our abiilty to support digital preservation. (http://www.ils.unc.edu/callee/oss_preservation.htm).  From the service perspective, other position papers have raised several interesting (potential) needs or uses for digital texts.  One theme becomes patently clear in reading these papers: scholars will not only need access to view digital texts, but will also need the ability to download (en masse), manipulate, transform and repurpose digital texts.  The collaborative editing envsiaged by Ross Scaife and Dot Porter would be difficult without fully open access to digital texts.  The type of markup described by Gabriel Bodard almost certainly requires complete access to digital texts. Greg Crane has often discussed the possibilities for machine translation, language modeling and document analysis with large corpora of digitized texts (http://www.dlib.org/dlib/march06/crane/03crane.html).&lt;br /&gt;
&lt;br /&gt;
These ideas raise very important questions.  Are the libraries involved with Google Book Search (http://books.google.com/) providing only part of the solution?  Even more disconcerting is the idea that these libraries, though well-intentioned, may even inhibit the ability of scholars to work with digital texts in a manner that supports new scholarship.  Will Google work with the scholarly community to build tools and services, or only consider commercial opportunities?  Understandably, libraries, including those working with the Open Content Alliance (http://www.opencontentalliance.org/), consider whether to digitize books already available through Google Book Search in an effort to avoid duplicative efforts.  However, it's important to consider both the collections and services aspects.&lt;br /&gt;
&lt;br /&gt;
Repository development obviously entails a high degree of technology work, but repositories, particularly institutional respositories should respond to a policy and legal framework.  From a technological perspective, it is optimal to develop an unconstrained, open system that can be constrained or modified according to local policy or legal frameworks; it is difficult, if not impossible, to move in the other direction.  The e-Science community has noted that it is important to consider openness even in terms of the data.  The SPARC Open Data (http://www.arl.org/sparc/opendata/) states: &amp;amp;quot;Many advocates of Open Data believe that, although there are substantial potential benefits from sharing and reusing digital data upon which scientific advances are built, today much of it is being lost or underutilized because of legal, technological and other barriers.&amp;amp;quot; That is, even the most open system may not support preservation or scholarly needs if the data is constrained through proprietary formats or legal restrictions. &lt;br /&gt;
&lt;br /&gt;
With these observation in mind, it seems obvious that the scholarly community should adopt, even push, for completely open standards and open access for digital texts.  Such openness offers the greatest potential for the type of digital environment envisaged through the other position papers.&lt;br /&gt;
&lt;br /&gt;
However, it is important to note that the inter-relationships between technology, policy and organizational roles that has been defined in the print world is also becoming blurred.  When a monograph was published, there was a reasonable degree of understanding regarding how a scholar would send this monograph to a publisher, which would seek revenue through sales, but also agree that libraries could offer the book without cost - under certain conditions - to the scholarly community.  With digital publications, this process and role definition is being established, sometimes with controversy.  The US National Endowment for the Humaniites has announced new guidelines for their Scholarly Editions Grants (http://www.neh.gov/grants/guidelines/editions.html) that states a preference for projects that offer digitized works online throgh open access.  This announcement has raised some concerns among scholars and University Presses regarding business models and rights clearance (http://insidehighered.com/news/2006/09/18/documents).  &lt;br /&gt;
&lt;br /&gt;
Finally, what implications arise from open data in terms of the reward structure for scholars?  Will freely available online digital texts be viewed with the same level of rigor or reputation as those &amp;amp;quot;validated&amp;amp;quot; through publishers, peer review, or other means for assessment?  Libraries are eager to serve scholarly needs in the digital age, ideally with an open policy and legal framework.  It is important, however, to address the corresponding implications of such arrangements in terms of organization roles, business models, and reward structures.&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Choudhury_Paper&amp;diff=2014</id>
		<title>OSCE Choudhury Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Choudhury_Paper&amp;diff=2014"/>
		<updated>2007-01-29T15:09:41Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Position Paper on Licensing/Legal Matters=&lt;br /&gt;
&lt;br /&gt;
'''Sayeed Choudhury&lt;br /&gt;
Library Digital Programs, Sheridan Libraries, Johns Hopkins University'''&lt;br /&gt;
&lt;br /&gt;
In their position paper, Stuart Dunn and Tobias Blanke raise an interesting and relevant question regarding digital texts: &amp;amp;quot;How can such texts fit in existing library and information (infra)structures?  Will these need to be rethought?&amp;amp;quot;  Winston Tabb, Sheridan Dean of University Libraries at Johns Hopkins, has stated that libraries are built upon three pillars - collections, services and infrastructure.  Arguably, collections have represented the most important element in the print world, with services and infrastructure supporting the collections.  In the digital world, these elements are becoming blurred. It may be appropriate to assert that the ~~principles~~ by which libraries (and archives and museums) have operated remain valid, but the ~~practices~~ need to be reconsidered.  Not surprisingly, libraries are facing new challenges, and opportunities, with the development of infrastructure to support digital collections and services.&lt;br /&gt;
&lt;br /&gt;
At the heart of this infrastructure development effort is the repository.  There are many defintions for repository, but for the purpose of this discussion, the most useful one is offered by Cliff Lynch who stated a &amp;amp;quot;repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.&amp;amp;quot; (http://www.arl.org/newsltr/226/ir.html)&lt;br /&gt;
&lt;br /&gt;
The emphasis on both services and preservation is particularly noteworthy.  From a preservation perspective, it is important to note that both open standards and open source augment our abiilty to support digital preservation. (http://www.ils.unc.edu/callee/oss_preservation.htm).  From the service perspective, other position papers have raised several interesting (potential) needs or uses for digital texts.  One theme becomes patently clear in reading these papers: scholars will not only need access to view digital texts, but will also need the ability to download (en masse), manipulate, transform and repurpose digital texts.  The collaborative editing envsiaged by Ross Scaife and Dot Porter would be difficult without fully open access to digital texts.  The type of markup described by Gabriel Bodard almost certainly requires complete access to digital texts. Greg Crane has often discussed the possibilities for machine translation, language modeling and document analysis with large corpora of digitized texts (http://www.dlib.org/dlib/march06/crane/03crane.html).&lt;br /&gt;
&lt;br /&gt;
These ideas raise very important questions.  Are the libraries involved with Google Book Search (http://books.google.com/) providing only part of the solution?  Even more disconcerting is the idea that these libraries, though well-intentioned, may even inhibit the ability of scholars to work with digital texts in a manner that supports new scholarship.  Will Google work with the scholarly community to build tools and services, or only consider commercial opportunities?  Understandably, libraries, including those working with the Open Content Alliance (http://www.opencontentalliance.org/), consider whether to digitize books already available through Google Book Search in an effort to avoid duplicative efforts.  However, it's important to consider both the collections and services aspects.&lt;br /&gt;
&lt;br /&gt;
Repository development obviously entails a high degree of technology work, but repositories, particularly institutional respositories should respond to a policy and legal framework.  From a technological perspective, it is optimal to develop an unconstrained, open system that can be constrained or modified according to local policy or legal frameworks; it is difficult, if not impossible, to move in the other direction.  The e-Science community has noted that it is important to consider openness even in terms of the data.  The SPARC Open Data (http://www.arl.org/sparc/opendata/) states: &amp;amp;quot;Many advocates of Open Data believe that, although there are substantial potential benefits from sharing and reusing digital data upon which scientific advances are built, today much of it is being lost or underutilized because of legal, technological and other barriers.&amp;amp;quot; That is, even the most open system may not support preservation or scholarly needs if the data is constrained through proprietary formats or legal restrictions. &lt;br /&gt;
&lt;br /&gt;
With these observation in mind, it seems obvious that the scholarly community should adopt, even push, for completely open standards and open access for digital texts.  Such openness offers the greatest potential for the type of digital environment envisaged through the other position papers.&lt;br /&gt;
&lt;br /&gt;
However, it is important to note that the inter-relationships between technology, policy and organizational roles that has been defined in the print world is also becoming blurred.  When a monograph was published, there was a reasonable degree of understanding regarding how a scholar would send this monograph to a publisher, which would seek revenue through sales, but also agree that libraries could offer the book without cost - under certain conditions - to the scholarly community.  With digital publications, this process and role definition is being established, sometimes with controversy.  The US National Endowment for the Humaniites has announced new guidelines for their Scholarly Editions Grants (http://www.neh.gov/grants/guidelines/editions.html) that states a preference for projects that offer digitized works online throgh open access.  This announcement has raised some concerns among scholars and University Presses regarding business models and rights clearance (http://insidehighered.com/news/2006/09/18/documents).  &lt;br /&gt;
&lt;br /&gt;
Finally, what implications arise from open data in terms of the reward structure for scholars?  Will freely available online digital texts be viewed with the same level of rigor or reputation as those &amp;amp;quot;validated&amp;amp;quot; through publishers, peer review, or other means for assessment?  Libraries are eager to serve scholarly needs in the digital age, ideally with an open policy and legal framework.  It is important, however, to address the corresponding implications of such arrangements in terms of organization roles, business models, and reward structures.&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Choudhury_Paper&amp;diff=2013</id>
		<title>OSCE Choudhury Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Choudhury_Paper&amp;diff=2013"/>
		<updated>2007-01-29T15:08:50Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Position Paper on Licensing/Legal Matters===&lt;br /&gt;
&lt;br /&gt;
==Sayeed Choudhury==&lt;br /&gt;
&lt;br /&gt;
=Library Digital Programs, Sheridan Libraries, Johns Hopkins University=&lt;br /&gt;
&lt;br /&gt;
In their position paper, Stuart Dunn and Tobias Blanke raise an interesting and relevant question regarding digital texts: &amp;amp;quot;How can such texts fit in existing library and information (infra)structures?  Will these need to be rethought?&amp;amp;quot;  Winston Tabb, Sheridan Dean of University Libraries at Johns Hopkins, has stated that libraries are built upon three pillars - collections, services and infrastructure.  Arguably, collections have represented the most important element in the print world, with services and infrastructure supporting the collections.  In the digital world, these elements are becoming blurred. It may be appropriate to assert that the ~~principles~~ by which libraries (and archives and museums) have operated remain valid, but the ~~practices~~ need to be reconsidered.  Not surprisingly, libraries are facing new challenges, and opportunities, with the development of infrastructure to support digital collections and services.&lt;br /&gt;
&lt;br /&gt;
At the heart of this infrastructure development effort is the repository.  There are many defintions for repository, but for the purpose of this discussion, the most useful one is offered by Cliff Lynch who stated a &amp;amp;quot;repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.&amp;amp;quot; (http://www.arl.org/newsltr/226/ir.html)&lt;br /&gt;
&lt;br /&gt;
The emphasis on both services and preservation is particularly noteworthy.  From a preservation perspective, it is important to note that both open standards and open source augment our abiilty to support digital preservation. (http://www.ils.unc.edu/callee/oss_preservation.htm).  From the service perspective, other position papers have raised several interesting (potential) needs or uses for digital texts.  One theme becomes patently clear in reading these papers: scholars will not only need access to view digital texts, but will also need the ability to download (en masse), manipulate, transform and repurpose digital texts.  The collaborative editing envsiaged by Ross Scaife and Dot Porter would be difficult without fully open access to digital texts.  The type of markup described by Gabriel Bodard almost certainly requires complete access to digital texts. Greg Crane has often discussed the possibilities for machine translation, language modeling and document analysis with large corpora of digitized texts (http://www.dlib.org/dlib/march06/crane/03crane.html).&lt;br /&gt;
&lt;br /&gt;
These ideas raise very important questions.  Are the libraries involved with Google Book Search (http://books.google.com/) providing only part of the solution?  Even more disconcerting is the idea that these libraries, though well-intentioned, may even inhibit the ability of scholars to work with digital texts in a manner that supports new scholarship.  Will Google work with the scholarly community to build tools and services, or only consider commercial opportunities?  Understandably, libraries, including those working with the Open Content Alliance (http://www.opencontentalliance.org/), consider whether to digitize books already available through Google Book Search in an effort to avoid duplicative efforts.  However, it's important to consider both the collections and services aspects.&lt;br /&gt;
&lt;br /&gt;
Repository development obviously entails a high degree of technology work, but repositories, particularly institutional respositories should respond to a policy and legal framework.  From a technological perspective, it is optimal to develop an unconstrained, open system that can be constrained or modified according to local policy or legal frameworks; it is difficult, if not impossible, to move in the other direction.  The e-Science community has noted that it is important to consider openness even in terms of the data.  The SPARC Open Data (http://www.arl.org/sparc/opendata/) states: &amp;amp;quot;Many advocates of Open Data believe that, although there are substantial potential benefits from sharing and reusing digital data upon which scientific advances are built, today much of it is being lost or underutilized because of legal, technological and other barriers.&amp;amp;quot; That is, even the most open system may not support preservation or scholarly needs if the data is constrained through proprietary formats or legal restrictions. &lt;br /&gt;
&lt;br /&gt;
With these observation in mind, it seems obvious that the scholarly community should adopt, even push, for completely open standards and open access for digital texts.  Such openness offers the greatest potential for the type of digital environment envisaged through the other position papers.&lt;br /&gt;
&lt;br /&gt;
However, it is important to note that the inter-relationships between technology, policy and organizational roles that has been defined in the print world is also becoming blurred.  When a monograph was published, there was a reasonable degree of understanding regarding how a scholar would send this monograph to a publisher, which would seek revenue through sales, but also agree that libraries could offer the book without cost - under certain conditions - to the scholarly community.  With digital publications, this process and role definition is being established, sometimes with controversy.  The US National Endowment for the Humaniites has announced new guidelines for their Scholarly Editions Grants (http://www.neh.gov/grants/guidelines/editions.html) that states a preference for projects that offer digitized works online throgh open access.  This announcement has raised some concerns among scholars and University Presses regarding business models and rights clearance (http://insidehighered.com/news/2006/09/18/documents).  &lt;br /&gt;
&lt;br /&gt;
Finally, what implications arise from open data in terms of the reward structure for scholars?  Will freely available online digital texts be viewed with the same level of rigor or reputation as those &amp;amp;quot;validated&amp;amp;quot; through publishers, peer review, or other means for assessment?  Libraries are eager to serve scholarly needs in the digital age, ideally with an open policy and legal framework.  It is important, however, to address the corresponding implications of such arrangements in terms of organization roles, business models, and reward structures.&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Scaife_Paper&amp;diff=2012</id>
		<title>OSCE Scaife Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Scaife_Paper&amp;diff=2012"/>
		<updated>2007-01-29T15:06:25Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Open Source Critical Editions&lt;br /&gt;
Workshop at Kings College London&lt;br /&gt;
September 22, 2006&lt;br /&gt;
&lt;br /&gt;
===Tools for Collaborative Editing (some thoughts by Ross Scaife and Dot Porter)===&lt;br /&gt;
&lt;br /&gt;
==Introduction to the Concept==&lt;br /&gt;
&lt;br /&gt;
The Wikipedia entry on &amp;quot;collaborative editor&amp;quot; defines the term quite simply: &amp;quot;A collaborative editor allows simultaneous editing of the same document or video by different participants using different computers.&amp;quot; ([http://en.wikipedia.org/wiki/Collaborative_real-time_editor]) Electronic editions have become steadily more popular over the past decade. Libraries and museums have led the charge, followed by increasing numbers of scholars, both individuals and groups, who form the basis of an active community of electronic editors. As this community grows, so does the need for tools suitable to the types of editions that people and institutions are actually creating. Generally, there are three specific needs of humanists involved in collaborative editing projects. Scholars need to be able to build editions encompassing text, images, and annotations, the latter usually using the Extensible Markup Language (XML), the de facto standard for encoding electronic editions in the humanities, and the mode of expression of the Text Encoding Initiative (TEI). Second, software needs to have access control and version management systems that will allow several different editors to collaborate on an edition with different levels of access and without fear that one editor might inadvertently overwrite another's work. Finally, accessibility. Software needs to be designed in such a way that it will encourage collaborative work among individuals who are geographically dispersed, and may encourage electronic editing by those many accomplished humanities scholars who are familiar with basic computer tools (word processors, web browsers, etc.) but who may be put off by regular XML editing software.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Good collaborative editing software will foster the creation of scholarly works by forging partnerships between individuals and institutions, enabling them to share resources, both physical resources (in the form of texts and images) and intellectual (in the form of subject knowledge and editing experience). Software released under an Open-Source license will especially promote cooperation among smaller institutions that might not have the resources to purchase expensive software. Such software could even become a significant resource not only for scholars, but also for teachers and students, potentially encouraging collaborative projects between schools around the world.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Maintaining an edition with multiple editors contributing to the same document requires a significant amount of work. Editors must be careful not to overwrite changes made by others, for example by coordinating the process so that no two editors work on a file at the same time. Word processing software such as Microsoft Word includes a tool for &amp;quot;Tracking Changes&amp;quot;, which enables users to work collaboratively; however, though the resulting files are suitable for printing, they are not encoded in a standard acceptable for electronic editions. With the increasing scale and scope of electronic editions, the need for a collaborative editing process rooted in accepted standards, and software to support this process, is even stronger.&lt;br /&gt;
&lt;br /&gt;
==How can collaborative editing software help classicists? Give a few real-life examples==&lt;br /&gt;
&lt;br /&gt;
* This page was initially produced and edited by two individuals in Writely ([http://www.writely.com/]) (Cnet review ([http://reviews.cnet.com/4520-9239_7-6627472.html?tag=cnetfd.ld3]) compares other AJAX'ed word processors)&lt;br /&gt;
** readily editable by one or more people, like a wiki. Unlike a wiki, Writely feels like regular word processor, a pared-down MS Word.&lt;br /&gt;
** numerous output formats (html rtf doc odf pdf) to suit a variety of publication/access needs (both print and online publication)&lt;br /&gt;
** provides a view of the history of a document's revisions over time, which helps to show the relative contributions of collaborators over time.&lt;br /&gt;
** documents can be shown to select viewers or made public&lt;br /&gt;
* Similar to Writely, LiveDocuments promises synchronization of Microsoft Office Documents, allowing for collaborative editing/writing in a context familiar to most scholars&lt;br /&gt;
** There is no server requirement, editors need not log on to a central server&lt;br /&gt;
** &amp;quot;LiveDocuments promises Office collaboration without a server&amp;quot; [http://arstechnica.com/news.ars/post/20060908-7701.html]&lt;br /&gt;
* Classics context: note the ideas about a communal text, a personal text, and the text of a given MS presented 11 years ago by the Vergil Project (never implemented, unfortunately)&lt;br /&gt;
** Communal text: &amp;quot;users will participate in the &amp;quot;establishment&amp;quot; of a text that will never reach final form. Here is how it will work. All the texts at this site include a critical apparatus of variant readings, conjectural emendations, and so forth. Because this information is presented on-line, it is possible for interested users to select the readings that they prefer -- to vote, in effect, for the reading that they think should appear in a given passage. These votes can then be tabulated, and the reading receiving the most votes will appear in the Communal Text. Those who consult this version of the text must therefore do so on the understanding that it does not represent the final judgment of any single editorial expert, but the aggregate opinion of the community of users of the site, and that it is subject to change at any moment.&amp;quot;&lt;br /&gt;
** Personal text: &amp;quot;Through this menu item users can record their preferences and use them to establish the text that they habitually consult. Of course, it will be possible to use this feature in other ways as well. Someone who wanted to use this site but felt the need of a little extra editorial authority might simply enter into his or her text whatever readings are printed by his or her favorite editor. On the other hand, a group of scholars interested in constructing a text for some specific purpose might use this resource collaboratively. So might a class on Vergil or on textual criticism. No doubt other applications will be thought of as well.&lt;br /&gt;
** Text of a particular manuscript: &amp;quot;Through this feature it will be possible to see the text as it appears in any of the manuscripts whose readings have been entered into the database. If one were interested in the Palatinus, for example, a diplomatic transcript of that manuscript would (with secondary readings and corrections available via hypertext links). In some cases images are available as well, and we hope eventually to provide facsimiles of all the mss in the database.&amp;quot;&lt;br /&gt;
* Suda On Line is another oldie-but-goodie with strengths and weaknesses ([http://www.stoa.org/sol/])&lt;br /&gt;
* Virtual Humanities Lab at Brown University has been developing a system for collaborative annotation of literary texts. The guidelines for annotation (published here: [http://golf.services.brown.edu/projects/VHL/help/guidelines_annot.pdf]) are simple, the software is accessed through a regular web browser.&lt;br /&gt;
* Compare the proposed Homer Multitext:&lt;br /&gt;
&lt;br /&gt;
{quote}&amp;quot;An ideal edition of Homer would encompass the full historical reality of the Homeric textual tradition as it evolved through time, from the pre-Classical era well into the medieval. Our attempt to create such an edition is already underway. Instead of choosing between variants and plus verses in an attempt to recover the ipsissima verba of Homer, we propose to include them in a multitext edition that embraces the fluidity of the textual traditions of the Iliad and Odyssey. The ideal format for this multitext edition of Homer is not a traditional printed text but an electronic, web-based edition. Unlimited in its ability to handle complex sets of variants, an electronic multitext offers critical readers of Homer the opportunity to consider many historical Iliads and Odysseys from the standpoint of many different sources of transmission, and so also allows the user to recover both a more accurate and more accessible picture of the fluidity of the tradition in the earliest stages of textuality.&amp;quot;&lt;br /&gt;
{quote}&lt;br /&gt;
&lt;br /&gt;
* EDUCE: Ideally, this project needs a strategy for imposing editorial control over the resulting documents in a process that involves establishing the texts, encoding them with standard TEI-XML markup using newly available Open Source software tools, and then publishing the transcripts side-by-side with their associated images following Open Access protocols.&lt;br /&gt;
&lt;br /&gt;
==Collaborative Editors==&lt;br /&gt;
&lt;br /&gt;
Different types of collaborative editors (see Appendix for list of editors)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* synchronous vs. asynchronous. Synchronous editors work in &amp;quot;real-time&amp;quot;. Changes made by one editor are immediately visible to other editors. Asynchronous editors (including Writely, MediaWiki, and version management systems) synchronize working versions either automatically after-the-fact, or (in the case of version management systems) require users to update changes manually.&lt;br /&gt;
* text-only editing vs. image-based editing. Text-only editors, including most XML and word processing programs, focus solely on the editing of the text. Image-based editors (including the EPPT and the University of Victoria Image Markup Tool) provide simple methods for either incorporating images into editions, or building textual annotations onto images.&lt;br /&gt;
* XML editors vs. text-only editors. XML editors, for example oXygen or XMetal, provide support for building XML annotations into texts. The better editors include various other XML support: XPath searching, XSLT development for translation, DTD or schema development for validation.&lt;br /&gt;
* Problems with collaborative editing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Version control: Wiki, Subversion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Administration for collaborative editing has two main issues: version management and access control. Version management deals with the problems of simultaneous editing. When a user makes changes to a document, we must be prepared to combine those changes with other changes by editors working on the same document. Furthermore, it may be necessary to obtain an earlier version of a document for reference, or even to reverse part of a series of changes while leaving other edits in place. A version management system tracks the branching revisions of a document as it is updated by a number of individuals.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Access control sets limits on the documents an editor can modify (coarse-grained access control), and the types of changes he or she can make to those documents (fine-grained access control). Such a system allows a project administrator to delegate editing responsibilities in a controlled manner. Consider, for example, two scholars with different specialized knowledge who are collaborating on an editing project. One scholar studies language, and is responsible for editing the linguistic aspects of a particular text. Another scholar specializes in manuscript studies, and is responsible for describing aspects of the text within the context of a specific manuscript - the scribal handwriting, condition of the manuscript, etc. The document curator, then, can grant the textual editor access to update sets of markup for describing the language of the text, but not for describing information such as scribal handwriting and condition of the manuscript. Likewise, the manuscript scholar would have access to modify sets of markup for describing the manuscript, but not the language of the text. On the other hand, neither of these scholars would be able to modify administrative markup such as the document's headers. Fine-grained access control allows the administrator to enable both scholars to work simultaneously within their domains of expertise without compromising the integrity and control of the editorial process. The document curator or project coordinator creates a set of rules that specify the &amp;quot;shape&amp;quot; of modifications particular users are allowed to make. Then, when a user attempts to modify part of the document, those access control rules are compared to that part of a document; much like a key in a lock, if the &amp;quot;shape&amp;quot; of the rule matches the document, the lock opens and the change is permitted to go through.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Source code management (SCM) systems such as CVS and SVN have shown their ability to assist in collaborative maintenance of computer source code. SCMs allow programmers to maintain parallel branches of their source code, merging sets of changes from one branch to another. However, SCMs take a line-oriented approach to revision management; while this is ideal for computer source code, is not well suited to XML documents, where modifications usually follow the document's hierarchical structure. Furthermore, merging conflicting changes can be a complex process, and often must be dealt with before a user can commit their changes to a central repository. Finally, SCM systems support primarily coarse-grained access control, so that permission to modify part of a document implies permission to modify the entire document; fine-grained access control affords much more flexibility in organizing a collaborative editing project.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Editing needs are also not fully served by content management systems such as the open-source MediaWiki. This system, which underlies the highly successful Wikipedia collaborative encyclopedia project, has demonstrated its ability to handle collaborative editing at a massive scale. Support for access control, however, is quite limited, given the open-editing model of Wikipedia. While supervisors can &amp;quot;lock&amp;quot; documents to prevent them from being modified, it is difficult to limit access in a more complex fashion. Furthermore, although such systems typically support version management, the revisions of a document are treated as following a linear sequence. Such a model does not adequately capture the complexities of parallel changes, where an editor may modify a document unaware of changes being made by another editor to the same document.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
None of the existing systems is designed for a highly collaborative environment with large numbers of concurrent changes and with constant revision tracking. The &amp;quot;perfect&amp;quot; system would combine the version-tracking features of SCM, the scalability of collaborative content management systems, and the security and flexibility of fine-grained access control.&lt;br /&gt;
&lt;br /&gt;
==Finding valid metrics for apportioning scholarly credit==&lt;br /&gt;
&lt;br /&gt;
Few collaborative projects are prominantly describing their methods for crediting participation. For one good example, see the Tibetan and Himalayan Digital Library ([http://www.thdl.org/xml/showEssay.php?xml=/intro/participation.xml&amp;amp;amp;l=d1e650]).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* we need to harness the self-interest of scholars:&lt;br /&gt;
** but collaborative work is often incremental (with many small contributions over time). MediaWiki, with its version management system, does provide a way to track the contributions of individuals over time.&lt;br /&gt;
** peer-assessment may be feasible in some cases (as with assessments of Amazon reviews' helpfulness) but often the number of people involved may be very small, in a field like ours.&lt;br /&gt;
** SOL counts users' contributions as translators and editors but cannot provide any qualitative measure, so one person who provided only a single entry that is of very high quality may seem to have done little&lt;br /&gt;
&lt;br /&gt;
==Conclusions? Future Directions?==&lt;br /&gt;
&lt;br /&gt;
Web-based software would enable collaboration on image- and text-based electronic editions over the Internet, enabling geographically dispersed groups of humanists to collaborate on editions encompassing text, image, and annotations. Even the most tech-savvy humanist working in seclusion is familiar with the dangers of editing electronic files; it is far too easy to copy older versions of files over newer ones, or to accidentally overwrite text through a careless cut and paste. Multiple editors collaborating on the same project require even more coordination and effort to avoid the chance of accidental loss of information. Support management of the complex array of document versions that arise during the collaborative editing process, and by implementing fine-grained access control to documents. Version management would record the history of editors' changes to the electronic edition, allowing for both internal and public review of the status and progress of an electronic edition project. Fine-grained access control would allow project coordinators to delegate editing tasks to individual editors or groups, by limiting modifications to individual parts of a document and its markup. A convenient and flexible interface, running through a standard Internet browser, would allow the coordinator to easily define access-control policies. Tools should take advantage of accepted standards such as the Extensible Markup Language (XML) and the Text Encoding Initiative (TEI), as well as more subject-specific tools such as Epigraphic Documents in TEI XML (EpiDoc) and the Canonical Text Services (CTS) protocol. The community of researchers in the Humanities and Classics in particular would be well-served with a platform that provides the following functionalities:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# Users in diverse locations can simultaneously edit the same document, using a familiar web browser interface.&lt;br /&gt;
# The automatically managed history of editorial changes allows for merging and/or reverting selected changes without causing version conflicts.&lt;br /&gt;
# Coordinators can add the full advantage of collaboration to works-in-progress by importing existing sources without changing schemas or markup.&lt;br /&gt;
# The use of CTS enables uniform citations to electronic editions.&lt;br /&gt;
&lt;br /&gt;
==Appendix: Overview of scholarship==&lt;br /&gt;
&lt;br /&gt;
* &amp;quot;Will Wikipedia Mean the End Of Traditional Encyclopedias?&amp;quot; dialogue between Jimmy Wales and Dale Hoiberg, ~~Wall Street Journal Online~~, September 12, 2006, URL: [http://online.wsj.com/public/article/SB115756239753455284-A4hdSU1xZOC9Y9PFhJZV16jFlLM_20070911.html]&lt;br /&gt;
* &amp;quot;Britannica versus Wikipedia heads to the WSJ,&amp;quot; by Ken Fisher. ~~Arstechnica~~, September 12, 2006, URL: [http://arstechnica.com/news.ars/post/20060912-7726.html]&lt;br /&gt;
* &amp;quot;The Wiki That Edited Me,&amp;quot; by Ryan Singel. ~~Wired News~~, September 7, 2006, URL: [http://www.wired.com/news/technology/0,71737-0.html?tw=rss.index]&lt;br /&gt;
* &amp;quot;Puppy smoothies: Improving the reliability of open, collaborative wikis,&amp;quot; by Tom Cross. ~~First Monday~~, volume 11, number 9 (September 2006), URL: [http://firstmonday.org/issues/issue11_9/cross/index.html]&lt;br /&gt;
* &amp;quot;7 Things you should Know about Collaborative Editing,&amp;quot; EDUCAUSE [http://www.educause.edu/content.asp?page_id=666&amp;amp;amp;ID=ELI7009&amp;amp;amp;bhcp=1]&lt;br /&gt;
* &amp;quot;Undoing Actions in Collaborative Work,&amp;quot; [http://www.eecs.umich.edu/~aprakash/papers/prakash-knister-cscw92.pdf]&lt;br /&gt;
* &amp;quot;A Framework for Undoing Actions in Collaborative Systems,&amp;quot; [http://www.eecs.umich.edu/~aprakash/papers/undo-tochi94.pdf]&lt;br /&gt;
* &amp;quot;Fault-Tolerant Computing in Real-Time Collaborative Editing Systems&amp;quot; [http://www.cse.unl.edu/~xqin/research/ftrce.html]&lt;br /&gt;
* &amp;quot;Access Control in Collaborative Systems&amp;quot; [http://portal.acm.org/citation.cfm?id=1057977.1057979]&lt;br /&gt;
* &amp;quot;A Model for Semi-(a)Synchronous Collaborative Editing&amp;quot; [http://dret.net/biblio/reference/min93]&lt;br /&gt;
* &amp;quot;A Multimedia Desktop Collaboration System&amp;quot; [http://dret.net/biblio/reference/che92b]&lt;br /&gt;
* &amp;quot;A Proposed Model and Functionality Definition for a Collaborative Editing and Conferencing System&amp;quot; [http://dret.net/biblio/reference/lub90b]&lt;br /&gt;
* &amp;quot;A Survey of Experiences of Collaborative Writing,&amp;quot; pp. 87-112, In: Computer Supported Collaborative Writing, Mike Sharples (Ed.), Computer Supported Cooperative Work, Springer-Verlag, London, UK, Computer Supported Cooperative Work, 1993, ISBN 3540197826 [http://dret.net/biblio/reference/bec93]&lt;br /&gt;
* &amp;quot;Atomic Data Abstractions in a Distributed Collaborative Editing System&amp;quot; [http://dret.net/biblio/reference/gre85]&lt;br /&gt;
* &amp;quot;CoDoc: Multi-mode Collaboration over Documents&amp;quot; http://dret.net/biblio/reference/ign04 Engineering Library QA76.758 .C33 2004&lt;br /&gt;
* &amp;quot;Design and Implementation of a Distributed Program for Collaborative Editing&amp;quot; [http://dret.net/biblio/reference/sel86]&lt;br /&gt;
* &amp;quot;Designing a Distributed Collaborative Environment&amp;quot; [http://dret.net/biblio/reference/che92]&lt;br /&gt;
* &amp;quot;Flexible Diff-ing in a Collaborative Writing System&amp;quot; (Math Sciences Library HD66 .C563 1992) [http://dret.net/biblio/reference/neu92]&lt;br /&gt;
* &amp;quot;Using Web Annotations for Asynchronous Collaboration Around Documents,&amp;quot; pp. 309-318, In: David G. Durand (Ed.), Proceedings of the ACM 2000 Conference on Computer Supported Cooperative Work, ACM Press, Philadelphia, Pennsylvania, December 2000 , ISBN 1-58113-222-0. [http://dret.net/biblio/reference/cad00] Engineering Library QA75.5 C65 2000&lt;br /&gt;
* The Wiki Way: Collaboration and Sharing on the Internet: [http://dret.net/biblio/reference/leu01]&lt;br /&gt;
* &amp;quot;The Collaborative Multi-User Editor Project Iris&amp;quot; [http://www11.informatik.tu-muenchen.de/publications/pdf/Koch1995.pdf]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
*Resources:*&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
See http://en.wikipedia.org/wiki/Collaborative_software for a good general discussion of collaborative software in general and [http://en.wikipedia.org/wiki/CSCW] for a definition of &amp;quot;computer-supported cooperative work&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
~~Existing Tools~~&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
~~synchronous~~ (see [http://en.wikipedia.org/wiki/Collaborative_real-time_editor]):&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
SubEthaEdit (MacOSX): [http://www.codingmonkeys.de/subethaedit/collaborate.html]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* [http://www.macdevcenter.com/pub/a/mac/2003/12/02/rendezvous.html] (review)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
ACE (platform independent): [http://ace.iserver.ch/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gobby (Linux, Windows, MacOSX): [http://gobby.0x539.de/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
MoonEdit (Linux, Windows, FreeBSD): [http://moonedit.com/index.html.en]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
TeNDaX: [http://www.tendax.net/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Chalk: http://blog.chalk.it/&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
GroupSketch (a tool for synchronous collaborative sketching): [http://grouplab.cpsc.ucalgary.ca/papers/1992/92-GroupSketch-Video.CSCW/groupsketchvideo.pdf]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
GROVE, &amp;quot;a textual multi-user outlining tool&amp;quot;: Ellis, C., Gibbs, S. and Rein, G. (1990). Design and use of a group editor. In Cockton (Ed.), Engineering for Human-Computer Interaction. North-Holland.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
ShrEdit, &amp;quot;a multi-user text editor&amp;quot;: L.J. McGuffin, and G.M. Olson: &amp;quot;ShrEdit: a shared electronic workspace,&amp;quot; CSMIL Technical Report #45, The University of Michigan, 1992.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
DistEdit, &amp;quot;a toolkit for implementing distributed group editors&amp;quot;: (Knister, M.J and Prakash, A. (1990): &amp;quot;DistEdit: A Distributed Toolkit for Supporting Multiple Group 'Editors&amp;quot;, Proceedings of CSCW '90, ACM 1990 Conference on Computer Supported Cooperative Work, Los Angeles, 1990)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
~~asynchronous:~~&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Writely: [http://www.writely.com/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
DocSynch: [http://docsynch.sourceforge.net/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
And, of course, Wiki: [http://www.wiki.org/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
~~Backend~~&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
WebDAV (Web-based Distributed Authoring and Versioning; a set of extensions to the HTTP protocol which allows users to collaboratively edit and manage files on remote web servers): [http://www.webdav.org/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
IETF Delta-V Working Group (This working group will define extensions to HTTP and the WebDAV Distributed Authoring Protocol necessary to enable distributed Web authoring tools to perform, in an interoperable manner, versioning and configuration management of Web resources): [http://www.webdav.org/deltav/deltav-charter.html]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
MATE (Multilevel Annotation Tools Engineering; aims to facilitate re-use of language resources by addressing the problems of creating, acquiring, and maintaining language corpora): [http://mate.nis.sdu.dk/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Plone: A user-friendly and powerful open source Content Management System (&amp;quot;ideal as an intranet and extranet server, as a document publishing system, a portal server and as a groupware tool for collaboration between separately located entities.&amp;quot;; supports XML (see [http://plone.org/documentation/tutorial/xml-in-plone-with-marshall/?searchterm=XML] and [http://pyxml.sourceforge.net/topics/] for more general Python-XML)): [http://plone.org/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
~~Plone is built using...~~&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zope (Z Object Publishing Environment; an open source application server for building content management systems, intranets, portals, and custom applications; Zope also supports XML (see [http://www.zope.org/Members/karl/ParsedXML/ParsedXML and http://www.zope.org/Members/haqa/XMLKit])): [http://www.zope.org/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2011</id>
		<title>OSCE Crane Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2011"/>
		<updated>2007-01-29T15:04:33Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;We need a comprehensive library of initial editions, openly accessible and freely available for re-use in derivative works.  This paper outlines one strategy for starting with print editions and moving into a more purely digital stage. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are two components to this argument, both on the Perseus Development Wiki:&lt;br /&gt;
&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Open_Content_Scholarly_Sources&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Next_generation_electronic_editions&lt;br /&gt;
&lt;br /&gt;
==Open Content Scholarly Sources==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Google, Microsoft, Yahoo and other internet giants are now creating digital libraries designed to become more comprehensive than any academic library in human history. The current philosophy of these efforts stresses open access.  The creators of the Google project and the Internet Archive have expressed a dedication to open access.  Open access also maximizes the potential audience and thus  reinforces the advertising based business model on which these internet giants have founded their library efforts.&lt;br /&gt;
&lt;br /&gt;
The funders, however, retain varying rights to their work.  Google, for example, has now made available full PDF image books of public domain documents but it asserts proprietary rights over the page images and does not allow third parties to apply their own OCR or document recognition software.  The Open Content Alliance in principle encourages its partners to share everything but individual funders can impose their own restrictions on what they submit to OCA.&lt;br /&gt;
&lt;br /&gt;
We are therefore creating a completely open source library of core resources such as reference works and critical editions.  Our goal is to provide access to foundational information and also a foundation of materials that subsequent authors can modify, update, expand, and otherwise improve.  &lt;br /&gt;
&lt;br /&gt;
Our selection criteria differ from those of the print world.  A print library picks the best, most up-to-date documents available, knowing that print publications can be replaced but cannot change.  In a true digital library, documents can be dynamic and evolve in real time.  A recent encyclopedia will, presumably, be superior to another that is a century old.  But if the century-old encyclopedia can be freely updated and attracts high quality modifications, it can evolve and become more up-to-date and more authoritative than its frozen print counterpart.&lt;br /&gt;
&lt;br /&gt;
The classics component of the Open Content Scholarly Library that Perseus is helping create is being made available under a sharalike/attribution/non-commercial Creative Commons license. It contains the following:&lt;br /&gt;
&lt;br /&gt;
:* Source texts of Greek and Latin:  We have already released c. 8.5 million words of Greek and Latin source texts in TEI-compliant XML.  We have also digitized several hundred volumes of source texts.  These will be available as image books with searchable OCR and, where feasible, XML transcriptions.  Unlike most previous collections, this includes, where possible, multiple editions as well as traditional lists of places where on-line editions differ from editions not yet available on-line.&lt;br /&gt;
&lt;br /&gt;
:* Lexica of Greek and Latin:  These include major works such as the Liddell Scott Jones Greek-English Lexicon and the Lewis and Short Latin-English Lexicon as well as more specialized works such as Cunliff's Homeric Lexicon.&lt;br /&gt;
&lt;br /&gt;
:* Grammars:  These include student grammars such as Smyth's Greek Grammar and Allen and Greenough's Latin Grammar as well as extensive scholarly works such as Kühner-Gerth.&lt;br /&gt;
&lt;br /&gt;
:* Commentaries:  These include scholarly editions as well as school commentaries with linguistic annotations.  Commentaries lend themselves particularly well to electronic publication, which is optimally designed for the production, display and management of annotations.&lt;br /&gt;
&lt;br /&gt;
:* Tools:  These include Morpheus, the morphological analysis system developed in the late 1980s and still providing useful analyses of Greek and Latin words.  More importantly, this will include the databases with c. 100,000 stems and endings, mined from many sources,  and of potential use to third party morphological analysis systems.  All the core tools in the Perseus Digital Library have been rewritten in Java and will be available as additions to institutional repositories such as Fedora and any developers.&lt;br /&gt;
&lt;br /&gt;
:* FRBR Catalog Records for source texts:  Large projects such as dictionaries and text corpora have developed checklists of editions which they have used.  We are creating a modern catalog that builds on prior work (e.g., we use the author and work numbers developed by the TLG and PHI for Greek and Latin author) but provides an extensible architecture that can manage multiple editions, translations (e.g, English, French and German translations of an author), multiple versions of the same editions (e.g., an image book vs. a TEI transcription), multiple citation schemes (e.g., sections vs. chapters in Cicero)..&lt;br /&gt;
&lt;br /&gt;
:* Authority lists of people, places, dictionary entries, organizations, etc.  The reference works that we are producing lay the foundation for a comprehensive, extensible set of authority lists -- shared names with which we can uniquely identify particular people, places dictionary entries, organizations, etc.  While such authority lists are difficult -- experts may differ on which Sallust a particular passage designates and will never all agree on which when we have a dictionary word with two distinct meanings vs. two distinct dictionary words.  Nevertheless, all scholarly work depends upon the entries that appear in our reference works and electronic authority lists, however imperfect, are essential tools for large digital collections.&lt;br /&gt;
&lt;br /&gt;
Users include:&lt;br /&gt;
&lt;br /&gt;
:* Service providers:  we would like to see the data released useful to as many groups and in as many ways as possible.  Thus, we hope to see the content in Google and the Open Content Alliance as well as scholarly environment such as Chicago's Philologic and the Canadian TAPOR project.&lt;br /&gt;
&lt;br /&gt;
:* Experts in the field:  we hope that experts in the field will revise and extend every document that we release, with versioning systems tracking these changes and allowing experts to get the credit which they deserve for the work that they do.&lt;br /&gt;
&lt;br /&gt;
:* General students of the field:  we hope to see Wiki based commentaries in which non-experts working their way through a text pose and answer the questions which puzzle them.&lt;br /&gt;
&lt;br /&gt;
:* Advanced service developers:  we hope that developers will mine the encylopedias to drive their named entity identification systems (e.g., analyzer the articles in Smith's to determine which Alexander a particular document is discussing), sense disambiguation (e.g., which sense of a word in an on-line lexicon is in play in a  given passage), machine translation (e.g., mine the parallel texts and translations and the bilingual dictionaries so that a modern machine translation system can provide Greek/English, Latin/English translations etc.).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Next Generation Editions ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Summary=&lt;br /&gt;
&lt;br /&gt;
We propose a new generation of primary source corpora that are:&lt;br /&gt;
&lt;br /&gt;
: * ''Permanent'':  The texts are not leased from a commercial vendor over a period of time but are permanently accessible, with reference copies and versioning information stored in multiple institutional repositories for long term preservation as well as freely available.&lt;br /&gt;
&lt;br /&gt;
: * ''Openly accessible'':  Cultural heritage primary sources in the public domain should be openly accessible to all.  If it is necessary to restrict access to newly digitized materials in order to secure funding, that restriction should be clearly delimited and as short as possible: e.g., those who fund digitization may have exclusive access for five years before the texts are released for universal access.&lt;br /&gt;
&lt;br /&gt;
: * ''Multi-versioned'':  The texts themselves can be updated, with all changes tracked in a versioning system. Alternately, the texts provide a stable foundation for standoff markup representing textual variants or advanced interpretation.&lt;br /&gt;
&lt;br /&gt;
: * ''Paid for and maintained by academic libraries'':  While external funding may help begin this process, library acquisition budgets are the long term source of funding for costs such as data entry.  Libraries already pay for the production of digital resources by commercial, for-profit entitites, which restrict access to public domain content. The same library budgets can support open access databases built on public domain source materials.&lt;br /&gt;
&lt;br /&gt;
=Open Content Editions=&lt;br /&gt;
&lt;br /&gt;
The Perseus Project has released TEI conformant XML texts with 55 million words of American English, 13 million words of Latin and Greek source texts, and, for most of the Greek and Latin, corresponding English translations. These texts are available under a Creative Commons non-commercial license: they must be used with attribution; changes must be shared; they cannot be used as part of a commercial corpus.  Commercial entities can, however, freely design for profit services that add value to these openly accessible sources.&lt;br /&gt;
&lt;br /&gt;
While these source texts can freely circulate, they will also be part of the university's permanent institutional repository, thus providing a stable, long term home that will outlast any single project or contributor.&lt;br /&gt;
&lt;br /&gt;
The Greek and Latin corpus contains most of the major works of classical literature. The Perseus Latin Collection contains more than half of the classical corpus and that coverage will approach 100% over the course of 2006/2007.&lt;br /&gt;
&lt;br /&gt;
Working wish lists for [[Latin_wishlist | Latin]] and [[Greek_wishlist | Greek]] are available for comment/addition.&lt;br /&gt;
&lt;br /&gt;
=Next Steps=&lt;br /&gt;
&lt;br /&gt;
* ''Links to page images of paper sources'': With Google Library, the Open Content Alliance and Europe's i2010 we see the emerge of digital libraries with millions of books with high quality page images.  Copyright restrictions complicate these efforts but solid versions of most major authors are available in the public domain.  &lt;br /&gt;
&lt;br /&gt;
* ''Full coverage including apparatus, introduction, indices etc.'': Digital editions can include all information in the print text and not only the text.&lt;br /&gt;
&lt;br /&gt;
* ''Semantic markup'':  Markup should reflect meaning and not only appearence.&lt;br /&gt;
&lt;br /&gt;
* ''Collation of multiple sources'': Semantic markup, if applied to the apparatus criticus, should result in machine actionable data, allowing users to compare multiple versions of the same text.&lt;br /&gt;
&lt;br /&gt;
=Building a digital library of primary sources=&lt;br /&gt;
&lt;br /&gt;
The first generation of large scale, on-line text corpora provided transcriptions of primary materials. Projects such as the TLG and the ''Packard Humanities Institute Latin CD ROM'' carefully document the copy texts on which their electronic versions depend. The provenance of texts in the extensive Latin corpus at [[http://www.thelatinlibrary.com the Latin Library]] is often unclear, with volunteer transcribers blending texts and leaving no trail of their changes.&lt;br /&gt;
&lt;br /&gt;
We now see vast libraries with millions of digital books either in active development or in advanced stages of planning. Most, if not all, of books now in the public domain will be available in electronic form. Rights disputes may slow digitization of the rest but Google's aggressive stance may, at worst, make publishers more open to pursuing an acceptable arrangement with Yahoo, Microsoft and others now entering this market. In this model, readers view scanned page images but search text automatically generated by OCR software. For many purposes, such &amp;quot;image front&amp;quot; collections are quite effective:  narrative prose printed since the mid 19th century lends itself very well to commercial OCR. &lt;br /&gt;
&lt;br /&gt;
Image books do not, however, provide the accuracy and detailed markup that users of primary sources expect.  Text collections with millions of words will contain errors for some time after publication but we want to minimize these errors.  We want to be able to identify pieces of texts by standard citation (e.g., &amp;quot;Liv. 3.22&amp;quot; should retrieve the text of Book 3, Chapter 22 of Livy's History of Rome. We also want text searches to be able to distinguish between primary text, textual notes and other annotations.&lt;br /&gt;
&lt;br /&gt;
The following describes an approach of adding structure to digital image books of primary sources. &lt;br /&gt;
&lt;br /&gt;
* '''Collate an image-front edition with searchable, OCR generated text against other electronic editions of the same text''':  Many classical texts are available on-line in at least one edition.  Once we have scanned a new edition and generated text with OCR, we can collate the OCR against pre-existing electronic editions with surprisingly little effort:  half of the word forms in a book length document are generally unique.  By comparing sequences of unique word forms in pre-existing text and new OCR, we can align use these sequences to align two texts.  In our experiments, we have found that we can immediately align one word in ten.  We can then compare the intervening sequence (on the average nine words long) to identify variations.  Variations include errors in data entry (whether in the OCR or in the pre-existing text), deliberate textual variations and non-textual elements such as headers and textual notes.  Where a variation involves one or two words and we cannot generate a morphological analysis for the new words, then we probably have an error.  If we can generate morphological analyses for the variants in both versions, then we probably have deliberate variations. If we have extra text at the start or end of pages, we probably have headers or notes.  If we have extraneous numbers in the source texts, then these are probably citations.  Even if we are working with a pre-existing text that contains errors or whose provenance is unknown, we can often use this text to determine that page 123 of edition X contains book 3, lines 33 to 57 of a given edition, thus making the OCR generated edition citable by chapter and verse.  If we have an accurate pre-existing text without textual notes, we can compare the results of searching that text with searching the relevant sections of the OCR-generated text.  If a word shows up in the OCR generated text but not in the pre-existing text, then we probably have a match in the textual notes.  While OCR quality varies from text to text and from language to language, we can thus produce initial searches of the textual notes with relatively little effort.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* '''Create an accurate, carefully marked up transcription of a print original''':  In this stage, we aim to capture every character on the printed source page and to represent the logical structure of the document: ideally, the text should be sufficiently well encoded that readers could ask to compare the readings reported by different witnesses (e.g., &amp;quot;display places where M differs from P and provide a statistical analysis of how often these sources differ&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
* '''Create a new edition, traceable to its print original, but able to represent multiple versions representing multiple witnesses and multiple new editions''':  The source text becomes the foundation multiple new editions. Once we have a carefully constructed source text, we can generate as many variations as we like. The source may -- and probably willl -- soon recede into the background but will provide a stable framework whereby we can compare all subsequent editions.&lt;br /&gt;
&lt;br /&gt;
====Choice of source texts====&lt;br /&gt;
&lt;br /&gt;
If we were creating a traditional scholarly text collection, we would want the most up-to-date current editions, In this model, however, we need to balance the authority of the source text against their ability to evolve into richer editions encoding multiple sources and editorial versions. If a serious user community exists, if it values additions to textual scholarship and if it has reasonable technical and editorial mechanisms to enhance its editions, living older texts will overtake any static edition. &lt;br /&gt;
&lt;br /&gt;
The two extreme cases are:&lt;br /&gt;
&lt;br /&gt;
* '''Recent editions that may be at present the most comprehensive and authoritative but cannot be augmented'''.  Whether or not publishers can claim copyright to scholarly reconstructions of primary source materials, editors should certainly have the right to prepare a single version of an edition to which no one else can make changes.&lt;br /&gt;
&lt;br /&gt;
* '''Editions that are are designed to accept -- and document -- new witnesses and editorial decisions'''.  In the simplest case, this would include careful transcriptions of public domain editions. A mature versioning environment tracks each addition and can reconstruct any given version. Versioning software analyzes new transcriptions of witnesses and editions.&lt;br /&gt;
&lt;br /&gt;
In practical terms, the best accessible editions will usually be the best public domain editions, with a few editors initially offering their work. It would probably be best to use public domain editions as initial test cases and to use these to work out inevitable bugs and organizational issues. Current editors may, in any event, find it as easy to add their changes to a well-structured public domain edition than to supervise the markup of their own print editions or the word processing files from which they derive. &lt;br /&gt;
&lt;br /&gt;
====Sources for Images of Print Editions====&lt;br /&gt;
&lt;br /&gt;
* '''Local book scanning''':  A number of institutions (including Perseus) can scan limited numbers of books.  Sheet feeder scanners can process c. 1,000 pages an hour but they require that the source books be disbound. Look down scanners do not damage the source materials and are slower but they still can process 100+ pages in an hour and are useful for smaller jobs.&lt;br /&gt;
&lt;br /&gt;
* '''Large book scanning projects''':  There are now a number of projects that are scanning very large numbers of books.  [[http://books.google.com/ Google Print]] has begun assembling a library that will include tens of millions of books.  Google plans to make the library openly searchable and will return copies of the scanned books to the participating research libraries, but it is not clear how easily other developers will be able to get their own copies on which to apply specialized OCR and content analysis. The [[http://www.opencontentalliance.org/ Open Content Alliance]] constitutes a growing consortium of content providers and third party service providers.  Led by the [[http://www.archive.org Internet Archive]], the OCA has begun making high resolution image books available and is providing [[http://www.archive.org/details/texts a clearing house for related efforts]] such as the [[http://www.archive.org/details/millionbooks Million Book Project]]. The newer robotic scanners do a very good job of turning pages -- even pausing to let one page clinging to another drop off as they turn. They seem to be able to process more than 1,000 pages an hour and thus to exceed the best throughput we have achieved running disbound pages through a sheet feeder -- very impressive. The drawback is that these robots are expensive: the most recent ones from Kirtas cost $140,000-$180,000. You need to get high volume to justify this enconomically. If you can get 1,200 pages an hour, then you might do three books an hour and 120 books a week. That would be about 6,000 books a year -- or about $30-$40 per book for the hardware investement alone exclusive of labor and postprocessing. If you consider 100 hours/week over two years and thus 300 400-page books a week, you get to  15,000 a year and the price clearly comes down. Run that over three years with 45,000 books and the cost becomes manageable.&lt;br /&gt;
&lt;br /&gt;
In practice, editors interested in a few authors can get their source materials scanned at a variety of locations.  Larger series (such as the Patrologia Latina) are well suited to the large scale book scanning projects. The biggest problem involves getting copies of the desired books to a location where large scale scanning is taking place.  The California Digital Library, which serves the UC system, and the University of Toronto were early on partners in OCA and between them would have virtually every edition of Greek or Latin texts published in the past two centuries. An [[http://www.libraryjournal.com/article/CA6277402.html article in LibraryJournal from November 1, 2005]] reports that the European Commission is planning a large digital library project of its own that will focus initially on the public domain.&lt;br /&gt;
&lt;br /&gt;
====Components of next generation electronic editions====&lt;br /&gt;
These editions will have the following components:&lt;br /&gt;
&lt;br /&gt;
* '''One or more baseline print editions available as image books''': At least one print edition should be available as an electronic source to which readers can refer if they feel that they have detected a data entry or formatting error. Everything necessary for representing at least one core edition in a tagged file should be available to the community. Given the demands of publishers, these may not be the most up-to-date editions of an author but they are intended as a starting point.  All such texts should, of course, have OCR generated searchable text.  If the original source texts have page numbers, then these should be encoded and citable.&lt;br /&gt;
&lt;br /&gt;
* '''A flexible editing environment which allows user  communities to improve the current document''':  Electronic documents are by nature dynamic and can evolve over time. Where print editions constitute end points of a long stage of development, electronic editions can serve as starting points to on-going development. Initial tasks may focus on correcting OCR errors, adding structural markup and other basic chores.  Ultimately, however, users will want to associate higher level annotations (e.g., specifying that a given &amp;quot;Salamis&amp;quot; is the Salamis in Cyprus rather than near Athens, or indicating that &amp;quot;faciam&amp;quot; is a subjunctive rather than a future, etc.).  Examples of decentralized editing environments that link transcriptions with images of the source pages include [[http://www.pgdp.net/ Distributed Proofreaders]] program of [[http://www.gutenberg.org/ Project Gutenberg]] and the [[http://www.ccel.org/help/facsim/ Digital Facsimile Editions]] of the [[http://www.ccel.org/ Christian Classics Ethereal Library]] ,&lt;br /&gt;
&lt;br /&gt;
* '''A tagged transcript of one or more print editions''':  This should include everything from the original edition, including introduction, textual notes, commentary, index, and any other materials from the source book. At this stage, the idioyncratic line breaks of particular editions should be preserved if the textual notes, commentary or other parts of the book use these line breaks for internal citations. All citations should be tagged and activated: e.g., wherever the text refers to &amp;quot;page 132 line 18&amp;quot; or &amp;quot;chapter 44, line 8&amp;quot;, these expressions should be converted into active links. Textual notes should appear as simple notes and placed within the body of the source texts. This version serves as a temporary work space and should yield to the following stage. It should become the official representation of the original print edition. The [[http://www.uni-mannheim.de/mateo/camenahtdocs/camenahist.html | Camena project]] &lt;br /&gt;
&lt;br /&gt;
* '''Fully interpreted electronic version of the print text''':  While many documents may be complete at this stage, textual notes in critical editions should be converted from human readable descriptions into machine interpretable operations. Thus, readers should be able to view the text as it appears in any given manuscript, view places where any two witnesses disagree with one another, and see analyses of how far different versions of the text differ from one another. This version of the text should become the default and replace the tagged transcript.  &lt;br /&gt;
&lt;br /&gt;
* '''One or more translations''': Translations should have provenance so that readers know whether or not they reflect the online version of the source text.  Translations should, like the editions, include all accompanying materials including introduction, notes, appendices, indices etc.  Like editions, translations should be available both as image books so that readers can, when in doubt, consult the print originals.&lt;br /&gt;
&lt;br /&gt;
The fully interpreted electronic edition should then provide a starting for subsequent edits. The text could evolve in a variety of ways.&lt;br /&gt;
&lt;br /&gt;
* '''Systematic collations''':  Individuals may systematically collate the source text against new witnesses (e.g., manuscripts, papyri, etc.) or new editions (where editors may have derived different conclusions and printed different readings).  All additions must be transparent: thus, we cannot record new readings without providing their jusification.  We can add new readings from manuscripts and other sources without necessarily changing the text. We cannot record different editorial decisions without encoding the source for those decisions.&lt;br /&gt;
&lt;br /&gt;
* '''Coordination of edition, textual notes and at least one reference translation''':  We may have multiple translations reflecting multiple editions of a given work but we should have at least one edition that reflects the content of the base edition and that can represent the different readings in the textual notes. Readers should always be able to see how (or whether) any given reading affects the main translation.  Readers should thus be able to filter out those notes which do not impact upon the English and to analyze the ''aggregate impact'' of choosing one version over another. While small changes of language can have dramatic effects upon meaning, readers should be able to gauge the overall significance of different version.&lt;br /&gt;
&lt;br /&gt;
A great deal more can be done with and for any given edition: we can add (and have added) commentaries, linguistic markup, links to scholarship and other supplementary materials. At the same time, the  but the above represents a basic level of documentation towards which producers should, in our view, aim.&lt;br /&gt;
&lt;br /&gt;
====Editorial Conventions====&lt;br /&gt;
&lt;br /&gt;
* '''Changes from the source text to the transcription''':  The Text Encoding Initiative provides tags to record locations where editors have corrected errors in the source, expanded abbreviations, and regularized spellings.&lt;br /&gt;
&lt;br /&gt;
* '''Markup stylesheet''':  The Text Encoding Initiative offers a range of tags but is not universal. In some cases, we will need to extend the TEI. In other cases, the TEI allows us to represent the same information in different ways: e.g., &amp;lt;name type=&amp;quot;place&amp;quot;&amp;gt;Rome&amp;lt;/name&amp;gt; or &amp;lt;placeName&amp;gt;Rome&amp;lt;/placeName&amp;gt;. The more homogeneous editions can be, the easier it will be to search, browse and maintain them over time.  Perseus has evolved conventions of its own over time, but even within Perseus different projects has approached the same problems differently. We need documentation that is more extensive and that can be updated in real time (e.g., a Wiki).&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2010</id>
		<title>OSCE Crane Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2010"/>
		<updated>2007-01-29T15:03:57Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;We need a comprehensive library of initial editions, openly accessible and freely available for re-use in derivative works.  This paper outlines one strategy for starting with print editions and moving into a more purely digital stage. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are two components to this argument, both on the Perseus Development Wiki:&lt;br /&gt;
&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Open_Content_Scholarly_Sources&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Next_generation_electronic_editions&lt;br /&gt;
&lt;br /&gt;
== &lt;br /&gt;
Open Content Scholarly Sources ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Google, Microsoft, Yahoo and other internet giants are now creating digital libraries designed to become more comprehensive than any academic library in human history. The current philosophy of these efforts stresses open access.  The creators of the Google project and the Internet Archive have expressed a dedication to open access.  Open access also maximizes the potential audience and thus  reinforces the advertising based business model on which these internet giants have founded their library efforts.&lt;br /&gt;
&lt;br /&gt;
The funders, however, retain varying rights to their work.  Google, for example, has now made available full PDF image books of public domain documents but it asserts proprietary rights over the page images and does not allow third parties to apply their own OCR or document recognition software.  The Open Content Alliance in principle encourages its partners to share everything but individual funders can impose their own restrictions on what they submit to OCA.&lt;br /&gt;
&lt;br /&gt;
We are therefore creating a completely open source library of core resources such as reference works and critical editions.  Our goal is to provide access to foundational information and also a foundation of materials that subsequent authors can modify, update, expand, and otherwise improve.  &lt;br /&gt;
&lt;br /&gt;
Our selection criteria differ from those of the print world.  A print library picks the best, most up-to-date documents available, knowing that print publications can be replaced but cannot change.  In a true digital library, documents can be dynamic and evolve in real time.  A recent encyclopedia will, presumably, be superior to another that is a century old.  But if the century-old encyclopedia can be freely updated and attracts high quality modifications, it can evolve and become more up-to-date and more authoritative than its frozen print counterpart.&lt;br /&gt;
&lt;br /&gt;
The classics component of the Open Content Scholarly Library that Perseus is helping create is being made available under a sharalike/attribution/non-commercial Creative Commons license. It contains the following:&lt;br /&gt;
&lt;br /&gt;
:* Source texts of Greek and Latin:  We have already released c. 8.5 million words of Greek and Latin source texts in TEI-compliant XML.  We have also digitized several hundred volumes of source texts.  These will be available as image books with searchable OCR and, where feasible, XML transcriptions.  Unlike most previous collections, this includes, where possible, multiple editions as well as traditional lists of places where on-line editions differ from editions not yet available on-line.&lt;br /&gt;
&lt;br /&gt;
:* Lexica of Greek and Latin:  These include major works such as the Liddell Scott Jones Greek-English Lexicon and the Lewis and Short Latin-English Lexicon as well as more specialized works such as Cunliff's Homeric Lexicon.&lt;br /&gt;
&lt;br /&gt;
:* Grammars:  These include student grammars such as Smyth's Greek Grammar and Allen and Greenough's Latin Grammar as well as extensive scholarly works such as Kühner-Gerth.&lt;br /&gt;
&lt;br /&gt;
:* Commentaries:  These include scholarly editions as well as school commentaries with linguistic annotations.  Commentaries lend themselves particularly well to electronic publication, which is optimally designed for the production, display and management of annotations.&lt;br /&gt;
&lt;br /&gt;
:* Tools:  These include Morpheus, the morphological analysis system developed in the late 1980s and still providing useful analyses of Greek and Latin words.  More importantly, this will include the databases with c. 100,000 stems and endings, mined from many sources,  and of potential use to third party morphological analysis systems.  All the core tools in the Perseus Digital Library have been rewritten in Java and will be available as additions to institutional repositories such as Fedora and any developers.&lt;br /&gt;
&lt;br /&gt;
:* FRBR Catalog Records for source texts:  Large projects such as dictionaries and text corpora have developed checklists of editions which they have used.  We are creating a modern catalog that builds on prior work (e.g., we use the author and work numbers developed by the TLG and PHI for Greek and Latin author) but provides an extensible architecture that can manage multiple editions, translations (e.g, English, French and German translations of an author), multiple versions of the same editions (e.g., an image book vs. a TEI transcription), multiple citation schemes (e.g., sections vs. chapters in Cicero)..&lt;br /&gt;
&lt;br /&gt;
:* Authority lists of people, places, dictionary entries, organizations, etc.  The reference works that we are producing lay the foundation for a comprehensive, extensible set of authority lists -- shared names with which we can uniquely identify particular people, places dictionary entries, organizations, etc.  While such authority lists are difficult -- experts may differ on which Sallust a particular passage designates and will never all agree on which when we have a dictionary word with two distinct meanings vs. two distinct dictionary words.  Nevertheless, all scholarly work depends upon the entries that appear in our reference works and electronic authority lists, however imperfect, are essential tools for large digital collections.&lt;br /&gt;
&lt;br /&gt;
Users include:&lt;br /&gt;
&lt;br /&gt;
:* Service providers:  we would like to see the data released useful to as many groups and in as many ways as possible.  Thus, we hope to see the content in Google and the Open Content Alliance as well as scholarly environment such as Chicago's Philologic and the Canadian TAPOR project.&lt;br /&gt;
&lt;br /&gt;
:* Experts in the field:  we hope that experts in the field will revise and extend every document that we release, with versioning systems tracking these changes and allowing experts to get the credit which they deserve for the work that they do.&lt;br /&gt;
&lt;br /&gt;
:* General students of the field:  we hope to see Wiki based commentaries in which non-experts working their way through a text pose and answer the questions which puzzle them.&lt;br /&gt;
&lt;br /&gt;
:* Advanced service developers:  we hope that developers will mine the encylopedias to drive their named entity identification systems (e.g., analyzer the articles in Smith's to determine which Alexander a particular document is discussing), sense disambiguation (e.g., which sense of a word in an on-line lexicon is in play in a  given passage), machine translation (e.g., mine the parallel texts and translations and the bilingual dictionaries so that a modern machine translation system can provide Greek/English, Latin/English translations etc.).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Next Generation Editions ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Summary=&lt;br /&gt;
&lt;br /&gt;
We propose a new generation of primary source corpora that are:&lt;br /&gt;
&lt;br /&gt;
: * ''Permanent'':  The texts are not leased from a commercial vendor over a period of time but are permanently accessible, with reference copies and versioning information stored in multiple institutional repositories for long term preservation as well as freely available.&lt;br /&gt;
&lt;br /&gt;
: * ''Openly accessible'':  Cultural heritage primary sources in the public domain should be openly accessible to all.  If it is necessary to restrict access to newly digitized materials in order to secure funding, that restriction should be clearly delimited and as short as possible: e.g., those who fund digitization may have exclusive access for five years before the texts are released for universal access.&lt;br /&gt;
&lt;br /&gt;
: * ''Multi-versioned'':  The texts themselves can be updated, with all changes tracked in a versioning system. Alternately, the texts provide a stable foundation for standoff markup representing textual variants or advanced interpretation.&lt;br /&gt;
&lt;br /&gt;
: * ''Paid for and maintained by academic libraries'':  While external funding may help begin this process, library acquisition budgets are the long term source of funding for costs such as data entry.  Libraries already pay for the production of digital resources by commercial, for-profit entitites, which restrict access to public domain content. The same library budgets can support open access databases built on public domain source materials.&lt;br /&gt;
&lt;br /&gt;
=Open Content Editions=&lt;br /&gt;
&lt;br /&gt;
The Perseus Project has released TEI conformant XML texts with 55 million words of American English, 13 million words of Latin and Greek source texts, and, for most of the Greek and Latin, corresponding English translations. These texts are available under a Creative Commons non-commercial license: they must be used with attribution; changes must be shared; they cannot be used as part of a commercial corpus.  Commercial entities can, however, freely design for profit services that add value to these openly accessible sources.&lt;br /&gt;
&lt;br /&gt;
While these source texts can freely circulate, they will also be part of the university's permanent institutional repository, thus providing a stable, long term home that will outlast any single project or contributor.&lt;br /&gt;
&lt;br /&gt;
The Greek and Latin corpus contains most of the major works of classical literature. The Perseus Latin Collection contains more than half of the classical corpus and that coverage will approach 100% over the course of 2006/2007.&lt;br /&gt;
&lt;br /&gt;
Working wish lists for [[Latin_wishlist | Latin]] and [[Greek_wishlist | Greek]] are available for comment/addition.&lt;br /&gt;
&lt;br /&gt;
=Next Steps=&lt;br /&gt;
&lt;br /&gt;
* ''Links to page images of paper sources'': With Google Library, the Open Content Alliance and Europe's i2010 we see the emerge of digital libraries with millions of books with high quality page images.  Copyright restrictions complicate these efforts but solid versions of most major authors are available in the public domain.  &lt;br /&gt;
&lt;br /&gt;
* ''Full coverage including apparatus, introduction, indices etc.'': Digital editions can include all information in the print text and not only the text.&lt;br /&gt;
&lt;br /&gt;
* ''Semantic markup'':  Markup should reflect meaning and not only appearence.&lt;br /&gt;
&lt;br /&gt;
* ''Collation of multiple sources'': Semantic markup, if applied to the apparatus criticus, should result in machine actionable data, allowing users to compare multiple versions of the same text.&lt;br /&gt;
&lt;br /&gt;
=Building a digital library of primary sources=&lt;br /&gt;
&lt;br /&gt;
The first generation of large scale, on-line text corpora provided transcriptions of primary materials. Projects such as the TLG and the ''Packard Humanities Institute Latin CD ROM'' carefully document the copy texts on which their electronic versions depend. The provenance of texts in the extensive Latin corpus at [[http://www.thelatinlibrary.com the Latin Library]] is often unclear, with volunteer transcribers blending texts and leaving no trail of their changes.&lt;br /&gt;
&lt;br /&gt;
We now see vast libraries with millions of digital books either in active development or in advanced stages of planning. Most, if not all, of books now in the public domain will be available in electronic form. Rights disputes may slow digitization of the rest but Google's aggressive stance may, at worst, make publishers more open to pursuing an acceptable arrangement with Yahoo, Microsoft and others now entering this market. In this model, readers view scanned page images but search text automatically generated by OCR software. For many purposes, such &amp;quot;image front&amp;quot; collections are quite effective:  narrative prose printed since the mid 19th century lends itself very well to commercial OCR. &lt;br /&gt;
&lt;br /&gt;
Image books do not, however, provide the accuracy and detailed markup that users of primary sources expect.  Text collections with millions of words will contain errors for some time after publication but we want to minimize these errors.  We want to be able to identify pieces of texts by standard citation (e.g., &amp;quot;Liv. 3.22&amp;quot; should retrieve the text of Book 3, Chapter 22 of Livy's History of Rome. We also want text searches to be able to distinguish between primary text, textual notes and other annotations.&lt;br /&gt;
&lt;br /&gt;
The following describes an approach of adding structure to digital image books of primary sources. &lt;br /&gt;
&lt;br /&gt;
* '''Collate an image-front edition with searchable, OCR generated text against other electronic editions of the same text''':  Many classical texts are available on-line in at least one edition.  Once we have scanned a new edition and generated text with OCR, we can collate the OCR against pre-existing electronic editions with surprisingly little effort:  half of the word forms in a book length document are generally unique.  By comparing sequences of unique word forms in pre-existing text and new OCR, we can align use these sequences to align two texts.  In our experiments, we have found that we can immediately align one word in ten.  We can then compare the intervening sequence (on the average nine words long) to identify variations.  Variations include errors in data entry (whether in the OCR or in the pre-existing text), deliberate textual variations and non-textual elements such as headers and textual notes.  Where a variation involves one or two words and we cannot generate a morphological analysis for the new words, then we probably have an error.  If we can generate morphological analyses for the variants in both versions, then we probably have deliberate variations. If we have extra text at the start or end of pages, we probably have headers or notes.  If we have extraneous numbers in the source texts, then these are probably citations.  Even if we are working with a pre-existing text that contains errors or whose provenance is unknown, we can often use this text to determine that page 123 of edition X contains book 3, lines 33 to 57 of a given edition, thus making the OCR generated edition citable by chapter and verse.  If we have an accurate pre-existing text without textual notes, we can compare the results of searching that text with searching the relevant sections of the OCR-generated text.  If a word shows up in the OCR generated text but not in the pre-existing text, then we probably have a match in the textual notes.  While OCR quality varies from text to text and from language to language, we can thus produce initial searches of the textual notes with relatively little effort.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* '''Create an accurate, carefully marked up transcription of a print original''':  In this stage, we aim to capture every character on the printed source page and to represent the logical structure of the document: ideally, the text should be sufficiently well encoded that readers could ask to compare the readings reported by different witnesses (e.g., &amp;quot;display places where M differs from P and provide a statistical analysis of how often these sources differ&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
* '''Create a new edition, traceable to its print original, but able to represent multiple versions representing multiple witnesses and multiple new editions''':  The source text becomes the foundation multiple new editions. Once we have a carefully constructed source text, we can generate as many variations as we like. The source may -- and probably willl -- soon recede into the background but will provide a stable framework whereby we can compare all subsequent editions.&lt;br /&gt;
&lt;br /&gt;
====Choice of source texts====&lt;br /&gt;
&lt;br /&gt;
If we were creating a traditional scholarly text collection, we would want the most up-to-date current editions, In this model, however, we need to balance the authority of the source text against their ability to evolve into richer editions encoding multiple sources and editorial versions. If a serious user community exists, if it values additions to textual scholarship and if it has reasonable technical and editorial mechanisms to enhance its editions, living older texts will overtake any static edition. &lt;br /&gt;
&lt;br /&gt;
The two extreme cases are:&lt;br /&gt;
&lt;br /&gt;
* '''Recent editions that may be at present the most comprehensive and authoritative but cannot be augmented'''.  Whether or not publishers can claim copyright to scholarly reconstructions of primary source materials, editors should certainly have the right to prepare a single version of an edition to which no one else can make changes.&lt;br /&gt;
&lt;br /&gt;
* '''Editions that are are designed to accept -- and document -- new witnesses and editorial decisions'''.  In the simplest case, this would include careful transcriptions of public domain editions. A mature versioning environment tracks each addition and can reconstruct any given version. Versioning software analyzes new transcriptions of witnesses and editions.&lt;br /&gt;
&lt;br /&gt;
In practical terms, the best accessible editions will usually be the best public domain editions, with a few editors initially offering their work. It would probably be best to use public domain editions as initial test cases and to use these to work out inevitable bugs and organizational issues. Current editors may, in any event, find it as easy to add their changes to a well-structured public domain edition than to supervise the markup of their own print editions or the word processing files from which they derive. &lt;br /&gt;
&lt;br /&gt;
====Sources for Images of Print Editions====&lt;br /&gt;
&lt;br /&gt;
* '''Local book scanning''':  A number of institutions (including Perseus) can scan limited numbers of books.  Sheet feeder scanners can process c. 1,000 pages an hour but they require that the source books be disbound. Look down scanners do not damage the source materials and are slower but they still can process 100+ pages in an hour and are useful for smaller jobs.&lt;br /&gt;
&lt;br /&gt;
* '''Large book scanning projects''':  There are now a number of projects that are scanning very large numbers of books.  [[http://books.google.com/ Google Print]] has begun assembling a library that will include tens of millions of books.  Google plans to make the library openly searchable and will return copies of the scanned books to the participating research libraries, but it is not clear how easily other developers will be able to get their own copies on which to apply specialized OCR and content analysis. The [[http://www.opencontentalliance.org/ Open Content Alliance]] constitutes a growing consortium of content providers and third party service providers.  Led by the [[http://www.archive.org Internet Archive]], the OCA has begun making high resolution image books available and is providing [[http://www.archive.org/details/texts a clearing house for related efforts]] such as the [[http://www.archive.org/details/millionbooks Million Book Project]]. The newer robotic scanners do a very good job of turning pages -- even pausing to let one page clinging to another drop off as they turn. They seem to be able to process more than 1,000 pages an hour and thus to exceed the best throughput we have achieved running disbound pages through a sheet feeder -- very impressive. The drawback is that these robots are expensive: the most recent ones from Kirtas cost $140,000-$180,000. You need to get high volume to justify this enconomically. If you can get 1,200 pages an hour, then you might do three books an hour and 120 books a week. That would be about 6,000 books a year -- or about $30-$40 per book for the hardware investement alone exclusive of labor and postprocessing. If you consider 100 hours/week over two years and thus 300 400-page books a week, you get to  15,000 a year and the price clearly comes down. Run that over three years with 45,000 books and the cost becomes manageable.&lt;br /&gt;
&lt;br /&gt;
In practice, editors interested in a few authors can get their source materials scanned at a variety of locations.  Larger series (such as the Patrologia Latina) are well suited to the large scale book scanning projects. The biggest problem involves getting copies of the desired books to a location where large scale scanning is taking place.  The California Digital Library, which serves the UC system, and the University of Toronto were early on partners in OCA and between them would have virtually every edition of Greek or Latin texts published in the past two centuries. An [[http://www.libraryjournal.com/article/CA6277402.html article in LibraryJournal from November 1, 2005]] reports that the European Commission is planning a large digital library project of its own that will focus initially on the public domain.&lt;br /&gt;
&lt;br /&gt;
====Components of next generation electronic editions====&lt;br /&gt;
These editions will have the following components:&lt;br /&gt;
&lt;br /&gt;
* '''One or more baseline print editions available as image books''': At least one print edition should be available as an electronic source to which readers can refer if they feel that they have detected a data entry or formatting error. Everything necessary for representing at least one core edition in a tagged file should be available to the community. Given the demands of publishers, these may not be the most up-to-date editions of an author but they are intended as a starting point.  All such texts should, of course, have OCR generated searchable text.  If the original source texts have page numbers, then these should be encoded and citable.&lt;br /&gt;
&lt;br /&gt;
* '''A flexible editing environment which allows user  communities to improve the current document''':  Electronic documents are by nature dynamic and can evolve over time. Where print editions constitute end points of a long stage of development, electronic editions can serve as starting points to on-going development. Initial tasks may focus on correcting OCR errors, adding structural markup and other basic chores.  Ultimately, however, users will want to associate higher level annotations (e.g., specifying that a given &amp;quot;Salamis&amp;quot; is the Salamis in Cyprus rather than near Athens, or indicating that &amp;quot;faciam&amp;quot; is a subjunctive rather than a future, etc.).  Examples of decentralized editing environments that link transcriptions with images of the source pages include [[http://www.pgdp.net/ Distributed Proofreaders]] program of [[http://www.gutenberg.org/ Project Gutenberg]] and the [[http://www.ccel.org/help/facsim/ Digital Facsimile Editions]] of the [[http://www.ccel.org/ Christian Classics Ethereal Library]] ,&lt;br /&gt;
&lt;br /&gt;
* '''A tagged transcript of one or more print editions''':  This should include everything from the original edition, including introduction, textual notes, commentary, index, and any other materials from the source book. At this stage, the idioyncratic line breaks of particular editions should be preserved if the textual notes, commentary or other parts of the book use these line breaks for internal citations. All citations should be tagged and activated: e.g., wherever the text refers to &amp;quot;page 132 line 18&amp;quot; or &amp;quot;chapter 44, line 8&amp;quot;, these expressions should be converted into active links. Textual notes should appear as simple notes and placed within the body of the source texts. This version serves as a temporary work space and should yield to the following stage. It should become the official representation of the original print edition. The [[http://www.uni-mannheim.de/mateo/camenahtdocs/camenahist.html | Camena project]] &lt;br /&gt;
&lt;br /&gt;
* '''Fully interpreted electronic version of the print text''':  While many documents may be complete at this stage, textual notes in critical editions should be converted from human readable descriptions into machine interpretable operations. Thus, readers should be able to view the text as it appears in any given manuscript, view places where any two witnesses disagree with one another, and see analyses of how far different versions of the text differ from one another. This version of the text should become the default and replace the tagged transcript.  &lt;br /&gt;
&lt;br /&gt;
* '''One or more translations''': Translations should have provenance so that readers know whether or not they reflect the online version of the source text.  Translations should, like the editions, include all accompanying materials including introduction, notes, appendices, indices etc.  Like editions, translations should be available both as image books so that readers can, when in doubt, consult the print originals.&lt;br /&gt;
&lt;br /&gt;
The fully interpreted electronic edition should then provide a starting for subsequent edits. The text could evolve in a variety of ways.&lt;br /&gt;
&lt;br /&gt;
* '''Systematic collations''':  Individuals may systematically collate the source text against new witnesses (e.g., manuscripts, papyri, etc.) or new editions (where editors may have derived different conclusions and printed different readings).  All additions must be transparent: thus, we cannot record new readings without providing their jusification.  We can add new readings from manuscripts and other sources without necessarily changing the text. We cannot record different editorial decisions without encoding the source for those decisions.&lt;br /&gt;
&lt;br /&gt;
* '''Coordination of edition, textual notes and at least one reference translation''':  We may have multiple translations reflecting multiple editions of a given work but we should have at least one edition that reflects the content of the base edition and that can represent the different readings in the textual notes. Readers should always be able to see how (or whether) any given reading affects the main translation.  Readers should thus be able to filter out those notes which do not impact upon the English and to analyze the ''aggregate impact'' of choosing one version over another. While small changes of language can have dramatic effects upon meaning, readers should be able to gauge the overall significance of different version.&lt;br /&gt;
&lt;br /&gt;
A great deal more can be done with and for any given edition: we can add (and have added) commentaries, linguistic markup, links to scholarship and other supplementary materials. At the same time, the  but the above represents a basic level of documentation towards which producers should, in our view, aim.&lt;br /&gt;
&lt;br /&gt;
====Editorial Conventions====&lt;br /&gt;
&lt;br /&gt;
* '''Changes from the source text to the transcription''':  The Text Encoding Initiative provides tags to record locations where editors have corrected errors in the source, expanded abbreviations, and regularized spellings.&lt;br /&gt;
&lt;br /&gt;
* '''Markup stylesheet''':  The Text Encoding Initiative offers a range of tags but is not universal. In some cases, we will need to extend the TEI. In other cases, the TEI allows us to represent the same information in different ways: e.g., &amp;lt;name type=&amp;quot;place&amp;quot;&amp;gt;Rome&amp;lt;/name&amp;gt; or &amp;lt;placeName&amp;gt;Rome&amp;lt;/placeName&amp;gt;. The more homogeneous editions can be, the easier it will be to search, browse and maintain them over time.  Perseus has evolved conventions of its own over time, but even within Perseus different projects has approached the same problems differently. We need documentation that is more extensive and that can be updated in real time (e.g., a Wiki).&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Dunn_Paper&amp;diff=2009</id>
		<title>OSCE Dunn Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Dunn_Paper&amp;diff=2009"/>
		<updated>2007-01-29T15:02:11Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== e-Science and the critical edition: a discussion paper ==&lt;br /&gt;
&lt;br /&gt;
'''Stuart Dunn and Tobias Blanke'''&lt;br /&gt;
&lt;br /&gt;
'''Arts and Humanities e-Science Support Centre, King's College London'''&lt;br /&gt;
&lt;br /&gt;
At the end of the Nineties, a national e-Science Core Programme was established in the UK. Its agenda was driven by scientists who needed new technologies and concepts to cope with the ever increasing amount of data, both from experiments and simulations as well as knowledge gathering exercises. Faced with this 'data deluge', a new data-driven science was conceptualized with the scientist and research methods at the center of new data technologies. The idea of e-Science and the e-Scientist was accompanied by the development of new high-speed computing networks that promised solutions to a variety of problems in coping with the vast amount of information. 'Grid technologies' were the result of a global effort from computer scientists working together witch practitioners to advance existing network technologies like the internet in order to create a global space of sharing resources and services.&lt;br /&gt;
&lt;br /&gt;
Several e-Science initiatives in the UK are promoting to advance research work in virtual spaces with advanced computing - in particular network technologies. Technologies and methodologies for the automation and support of research processes are being investigated. Grid technologies and methodologies address how globally distributed data resources can be used in the research process or how computational power can be shared. At the same time, new forms of scholarly communications in 'virtual organizations' are developed. For example, the Access Grid promises tools to support structured meetings of researchers in group-to-group collaborations, a benefit which will be keenly felt by A&amp;amp;H researchers as they move towards larger and more formal collaborations. The advantages of direct communication in face-to-face meetings is combined with the ability to share instantly digital items among the groups. Grid technologies integrate two recent developments in research that are inseparable from each other: the new possibilities due to improved technologies complement new highly collaborative research.&lt;br /&gt;
&lt;br /&gt;
E-Science therefore stands for the development and deployment of a networked infrastructure and culture through which resources can be shared in a secure environment. These resources can be everything from processing power, data, or expertise that researchers can share. This networked infrastructure allows a culture of collaboration, in which new forms of collaboration can emerge, and new and advanced methodologies can be explored.&lt;br /&gt;
&lt;br /&gt;
A key to the success of e-Science is the provision of shared access to research facilities and therefore to provide answers to the increasing globalisation of research. Researchers from around the world can work together and use each other's resources as if they were collocated. Digital knowledge objects shall be created and (re-)used in virtual collaboration spaces. E-research is about joining things up and not purely about CPU power or computer networking. It is about pro-active relationships as between server to server and programme to programme and research practitioner to research practitioner. This global collaboration in a virtual space will be of key significance to what Arts and Humanities (A&amp;amp;H) researchers are going to be doing over the next ten years; and will fundamentally alter their relationship with the resources they use. &lt;br /&gt;
&lt;br /&gt;
Critical editions provide a key example of such resources. A recent expert seminar convened at the University of Sheffield by the AHDS e-Science Scoping Survey (http://ahds.ac.uk/e-science/e-science-scoping-study.htm) debated the application of e-science methods and technologies to the critical edition. It was considered that the concepts of the Virtual Research Environment (http://www.ahessc.ac.uk/briefing_papers/VRE_briefing_paper.html) and Virtual Organization have the potential to enable a paradigm shift from the 'traditional' model of the critical edition, whereby the text is produced by an individual researcher or small group of scholars and presented to a wider community as a static document, and an alternative whereby texts are produced and owned collaboratively by that community. In the latter case the text is produced as part of an iterative and ongoing process, under the collective influence of a group of researchers. The same principle could apply to elements of the 'digital infrastructure' on which much collaborative work relies - thesauri, dictionaries, lexica and so on. This raises complex issues of academic integrity and trust: the high-profile debate of the applicability of Wikipedia in research contexts is well known, and few would argue that a totally unfettered editorial process is appropriate. However such methodologies have very profound implications for the way humanities research is done, and the challenge is to quantify and qualify the shades of grey between Wikipedia and the traditional critical edition model.&lt;br /&gt;
&lt;br /&gt;
'''Some key questions are:'''&lt;br /&gt;
&lt;br /&gt;
* What technologies are needed to enable the collaborative research environments required for such 'democratization' of the critical edition?&lt;br /&gt;
* Do users need such editions? Will they ever trust them?&lt;br /&gt;
* How should access to the editorial process be managed? Who decides who gets to edit the text? Should it be managed at all? &lt;br /&gt;
* How should version control be maintained?&lt;br /&gt;
* How should annotations and edits be captured, both in terms of the finished article and the workflow process?&lt;br /&gt;
* What kind of peer-review process needs to be in place? &lt;br /&gt;
* How should cataloguing, referencing and citation of such documents be approached?&lt;br /&gt;
* How can such texts fit in to existing library and information (infra)structures? Will these need to be rethought?&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Dunn_Paper&amp;diff=2008</id>
		<title>OSCE Dunn Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Dunn_Paper&amp;diff=2008"/>
		<updated>2007-01-29T15:01:49Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== e-Science and the critical edition: a discussion paper ==&lt;br /&gt;
&lt;br /&gt;
'''Stuart Dunn and Tobias Blanke'''&lt;br /&gt;
'''&lt;br /&gt;
Arts and Humanities e-Science Support Centre, King's College London&lt;br /&gt;
'''&lt;br /&gt;
At the end of the Nineties, a national e-Science Core Programme was established in the UK. Its agenda was driven by scientists who needed new technologies and concepts to cope with the ever increasing amount of data, both from experiments and simulations as well as knowledge gathering exercises. Faced with this 'data deluge', a new data-driven science was conceptualized with the scientist and research methods at the center of new data technologies. The idea of e-Science and the e-Scientist was accompanied by the development of new high-speed computing networks that promised solutions to a variety of problems in coping with the vast amount of information. 'Grid technologies' were the result of a global effort from computer scientists working together witch practitioners to advance existing network technologies like the internet in order to create a global space of sharing resources and services.&lt;br /&gt;
&lt;br /&gt;
Several e-Science initiatives in the UK are promoting to advance research work in virtual spaces with advanced computing - in particular network technologies. Technologies and methodologies for the automation and support of research processes are being investigated. Grid technologies and methodologies address how globally distributed data resources can be used in the research process or how computational power can be shared. At the same time, new forms of scholarly communications in 'virtual organizations' are developed. For example, the Access Grid promises tools to support structured meetings of researchers in group-to-group collaborations, a benefit which will be keenly felt by A&amp;amp;H researchers as they move towards larger and more formal collaborations. The advantages of direct communication in face-to-face meetings is combined with the ability to share instantly digital items among the groups. Grid technologies integrate two recent developments in research that are inseparable from each other: the new possibilities due to improved technologies complement new highly collaborative research.&lt;br /&gt;
&lt;br /&gt;
E-Science therefore stands for the development and deployment of a networked infrastructure and culture through which resources can be shared in a secure environment. These resources can be everything from processing power, data, or expertise that researchers can share. This networked infrastructure allows a culture of collaboration, in which new forms of collaboration can emerge, and new and advanced methodologies can be explored.&lt;br /&gt;
&lt;br /&gt;
A key to the success of e-Science is the provision of shared access to research facilities and therefore to provide answers to the increasing globalisation of research. Researchers from around the world can work together and use each other's resources as if they were collocated. Digital knowledge objects shall be created and (re-)used in virtual collaboration spaces. E-research is about joining things up and not purely about CPU power or computer networking. It is about pro-active relationships as between server to server and programme to programme and research practitioner to research practitioner. This global collaboration in a virtual space will be of key significance to what Arts and Humanities (A&amp;amp;H) researchers are going to be doing over the next ten years; and will fundamentally alter their relationship with the resources they use. &lt;br /&gt;
&lt;br /&gt;
Critical editions provide a key example of such resources. A recent expert seminar convened at the University of Sheffield by the AHDS e-Science Scoping Survey (http://ahds.ac.uk/e-science/e-science-scoping-study.htm) debated the application of e-science methods and technologies to the critical edition. It was considered that the concepts of the Virtual Research Environment (http://www.ahessc.ac.uk/briefing_papers/VRE_briefing_paper.html) and Virtual Organization have the potential to enable a paradigm shift from the 'traditional' model of the critical edition, whereby the text is produced by an individual researcher or small group of scholars and presented to a wider community as a static document, and an alternative whereby texts are produced and owned collaboratively by that community. In the latter case the text is produced as part of an iterative and ongoing process, under the collective influence of a group of researchers. The same principle could apply to elements of the 'digital infrastructure' on which much collaborative work relies - thesauri, dictionaries, lexica and so on. This raises complex issues of academic integrity and trust: the high-profile debate of the applicability of Wikipedia in research contexts is well known, and few would argue that a totally unfettered editorial process is appropriate. However such methodologies have very profound implications for the way humanities research is done, and the challenge is to quantify and qualify the shades of grey between Wikipedia and the traditional critical edition model.&lt;br /&gt;
&lt;br /&gt;
'''Some key questions are:'''&lt;br /&gt;
&lt;br /&gt;
* What technologies are needed to enable the collaborative research environments required for such 'democratization' of the critical edition?&lt;br /&gt;
* Do users need such editions? Will they ever trust them?&lt;br /&gt;
* How should access to the editorial process be managed? Who decides who gets to edit the text? Should it be managed at all? &lt;br /&gt;
* How should version control be maintained?&lt;br /&gt;
* How should annotations and edits be captured, both in terms of the finished article and the workflow process?&lt;br /&gt;
* What kind of peer-review process needs to be in place? &lt;br /&gt;
* How should cataloguing, referencing and citation of such documents be approached?&lt;br /&gt;
* How can such texts fit in to existing library and information (infra)structures? Will these need to be rethought?&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Smith_Paper&amp;diff=2007</id>
		<title>OSCE Smith Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Smith_Paper&amp;diff=2007"/>
		<updated>2007-01-29T14:56:46Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Neel Smith, College of the Holy Cross:  OCSE position paper&lt;br /&gt;
&lt;br /&gt;
1 An architecture for a distributed library incorporating open-source critical editions&lt;br /&gt;
&lt;br /&gt;
In this position paper, I outline recent work at the Center for Hellenic Studies (Washington, D.C.) on a suite of protocols for creating a distributed library of interoperable scholarly resources.  In the opening section, I provide some background to our approach.  In the following section, I describe the service stack we are currently testing in collaboration with the Perseus project.  At our meeting in London, I hope to use my introductory time to illustrate the ideas presented here with a couple of concrete examples of applications.&lt;br /&gt;
&lt;br /&gt;
1.1 Background: digital publications&lt;br /&gt;
&lt;br /&gt;
Designing a technical architecture for scholarly publication is the last link in a logical chain.   We must first  define what we mean by “publication,”  identify its distinctive features, and translate those into functional requirements.  Functional requirements in turn can be expressed as technical requirements, and we can then choose an architecture that satisfies those requirements. Here I summarize very briefly views on those topics I have spelled out more fully in a paper entitled &amp;quot;[Digital publication for digital libraries&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/digitalpub].&amp;quot;&lt;br /&gt;
&lt;br /&gt;
In the scholarly world, publication serves as the *permanent record of reference* for scholarly work.  In any medium therefore, scholarly publications must be designed for both *permanence* and *citability*.&lt;br /&gt;
&lt;br /&gt;
I would translate these defining characteristics of scholarly publication into at least three functional requirements:&lt;br /&gt;
&lt;br /&gt;
* it must be identically replicable&lt;br /&gt;
* it must be alienated from its author&lt;br /&gt;
* it must be citable in a fixed version&lt;br /&gt;
&lt;br /&gt;
We could rephrase these functional requirements by defining the form of  scholarly published works &lt;br /&gt;
as *works possessing an explicitly identified edition and explicitly identified citation scheme, &lt;br /&gt;
that can be irrevocably and identically replicated*. &lt;br /&gt;
&lt;br /&gt;
In  &amp;quot;[Digital publication for digital libraries&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/digitalpub],&amp;quot;  I develop arguments for a list of technical specifications that are necessary to satisfy this understanding of digital publication.  Rather than repeat those in detail here, I wish simply to underscore that a digital publication has to capture the *functionality* rather than the appearance of a scholarly work.  Beyond identifying appropriate ways to represent an open-source critical edition (e.g., recommended applications of TEI encoding to a document), then, we need to develop an infrastructure for working with critical editions in the broader context of a distirbuted and interoperating digital library.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1.1 Architecture:  digital libraries.&lt;br /&gt;
The natural architecture permitting interactions among potentially distributed objects is a suite of network services following defined protocols.   In much of our work defining services for scholarly work, we have been influenced by the pioneering work of the Open Geospatial Consortium developing service protocols to enable distributed GIS operation.  (See the [http://www.opengeospatial.org/&amp;gt;Open Geospatial Consortium home page].)&lt;br /&gt;
&lt;br /&gt;
Our initial goal is to work with the most fundamental kinds of services to provide functionality that other services can in turn build on.  A structured “diff” service describing differences in the structure and content of two XML fragments, for example, might be layered on top of an elementary retrieval service that abstracts the problem of retrieving text passages from canonical references.  The structured diff service in turn might serve as a base for a higher-order service statistically summarizing or analyzing differences in two pieces of text.&lt;br /&gt;
&lt;br /&gt;
Part of the attraction of the service model is its technical simplicity, since protocols for scholarly services can be layered on top of well established technical protocols:  HTTP as the transport mechanism, XML for service requests and replies.  Part of the attraction, too, is that this hierarchical model corresponds to a scholarly ideal:  it simultaneously allows for high-level abstraction of complexity, while ensuring the transparency of supporting or underlying functionality.&lt;br /&gt;
&lt;br /&gt;
1.1.1 Fundamental services&lt;br /&gt;
While we can easily imagine interesting, complex services we might like to have as easily available as an internet access point, I would argue that the most fundamental services for scholarly publication are those supporting the *simple identification and retrieval of fundamental objects with stable, location-independent references* --  services, in other words, that directly support our view of publication as a permanent and citable record.&lt;br /&gt;
&lt;br /&gt;
For many kinds of material we refer to, citation is comparatively straightforward.  We often work with collections of discrete objects cited simply by a unique identifier:  an “author-year” label to identify one entry in a bibliographic list, a museum inventory number to identify a specific archaeological artifact, a catalog number to identify a listing in a collection like Erbse's ~~scholia vetera~~ of the ~~Iliad~~.  Even when we refer to specific properties of an object (the author property of a bibliographic entry, the die axis of a coin, Erbse's source attribution of a scholiastic comment ...), we continue to cite the object as a discrete entity.  One fundamental service we need then is a service for identification and retrieval of discrete entities in a collection.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Texts present a different challenge.  In the first place, the entities we refer to with textual citations are not simple discrete objects, as librarians attempting to catalog texts are aware.  The Functional Requirements for Bibliographic Records (FRBR) describes a hierarchical model for texts, from the notional work, to the expression of that work in some version, to the manifestation of a version in some concrete form, to an individual item.  (A good introduction to FRBR is the U.S. Library of Congress' page [What is FRBR?&amp;gt;http://www.loc.gov/cds/FRBR.html]).   Classicists and biblical scholars have long implied a similar but not identical abstraction of notional work from particular versions in their use of version-independent, canonical reference systems.  One difference is that classicists' citation practice normally associates texts in groups or corpora that may or may not appear in documentary components of FRBR;  another is that FRBR's “manifestation” distinguishes different reproductions of a given expression (such as identical printings of a given edition) that may not be significant for scholarly citation.&lt;br /&gt;
&lt;br /&gt;
FRBR, of course, as a cataloging model does not address citation, and a second problem texts present is that we must allow for continuous citation.  Canonical citation schemes are often hierarchical (e.g., book/chapter/section of a prose work);  our service must support citation to this level of granularity, and beyond that should allow citation of subsections of text for a specific version.&lt;br /&gt;
&lt;br /&gt;
A second fundamental service, then, is a service for identifying texts and retrieving textual references in accordance with the semantics of citation practice traditional in fields like classics or biblical studies.&lt;br /&gt;
&lt;br /&gt;
To make these two methods of identifying and retrieving citable objects useful together in a distributed library, we can define a third basic service:  indexing information to either form of citation.  An index of personal names in a text, for example, might literally index strings with forms of names to a text reference, but it might also, more usefully for many purposes, index identifiers in a prosopographic collection to textual references.  The identifier could both disambiguate superficial strings of characters in the text, and provide a key to the prosopographic collection.&lt;br /&gt;
&lt;br /&gt;
At CHS, we have drafted standards for these three services, and have implemented each as a java servlet.  For more detailed information, see this page on &amp;quot;[Fundamental services for scholarly reference&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/tic].&amp;quot;&lt;br /&gt;
&lt;br /&gt;
1.1.1 Ancillary services and standards&lt;br /&gt;
As the abstract in the conference program indicates, to create an effectively interoperable network of resources, we need to agree not only on service protocols, but on the meaning of standard *values* that can be used in the framework of the protocol.  Having an agreed-upon system for finding what texts a service offers, discovering their citation schemes, and requesting sections of the text in that scheme will not help us to interoperate if we can't agree on how to identify Herodotus' ~~Histories~~, or an inscription from Aphrodisias.  To support the three fundamental services previously described, we have also developed ancillary services and standards to address these issues.&lt;br /&gt;
&lt;br /&gt;
Texts cited by canonical reference are a comparatively stable set of resources.  Technically, we need a simple service that resolves some kind of query string to standard identifiers, comparable to the [uBio service&amp;gt;http://www.ubio.org/] that scientists can use to automatically search for standard taxonomic identifiers for species.  In contrast to uBio, however, our service must be able to support a hierarchical scheme of identifiers so that we can refer to texts at the level of works, versions (such as a specific translation or edition) or individual exemplars.  To fill this technical gap, we have developed a hierarchical Registry service (see [fuller information with links&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/registry]).&lt;br /&gt;
&lt;br /&gt;
Institutionally, we need to find appropriate custodians to manage these authority lists for given domains.  CHS has taken responsibility for maintaining a Registry service for identifiers of Greek literary works;  the Aphrodisias project would be a logical choice to assume responsibility for assigning identifiers to inscriptions from Aphrodisias. (Whether choosing to administer a service directly, or to take editorial responsibility for material served elsewhere is not important.)  The internet's DNS system offers a good analogy to what we might ultimately develop:  the equivalent of a root server or servers is being run at CHS as a Registry of authoritative registries for given domains or corpora;  individual registries in turn may be disseminated so that an actual application might consult a local copy of the registry information to resolve a reference.&lt;br /&gt;
&lt;br /&gt;
In contrast to canonically cited texts, collections of discrete objects may be created so freely that a comparable system of registries would be unrealistically burdensome.  What authority should I register my collection with if I, as an individual scholar, create a database of results of my work, and want to expose it to the world using a Collections Service?  I am the only authority responsible for defining the unique identifiers in my collection, so I need a namespace of my own within which  I can freely manage my collection's IDs.   This is very similar to the problem that authors of XML document structures face, and we are adopting a very similar solution.  Just as XML namespaces utilize the same mechanism used for URLs to provide unique namespaces to anyone creating a new XML structure, so we use that structure to provide unique *data namespaces*.  At CHS, a Collection of data about digital images is given unique identifiers from the data namespace chs.harvard.edu/datans/images;  the Perseus project could, for example, use a data namespace like perseus.tufts.edu/images/namespaces, and if both collections have an image with the same ID, they can be correctly resolved.&lt;br /&gt;
&lt;br /&gt;
We need to consider one further important difference between reference by unique ID and the kind of canonical reference we use for texts.  Unique IDs can be represented by simple strings of characters;  the semantics of a reference within a hierarchical citation scheme to a text in a FRBR-like hierarchy cannnot.  We have therefore proposed a syntax for a notation scheme with explicit semantics, following the requirements of the IETF's URN system.  These Canonical Text Services URNs make it possible to reduce the complexity of a reference like “First occurrence of the string 'cano' in line 1 of book 1 of Vergil's ~~Aeneid~~” to a flat string that can then be used by any application that understands CTS-URNs.   (For more information and links, see [CTS URNs&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/cts-urn].)&lt;br /&gt;
&lt;br /&gt;
For an overview of CHS work on these topics, see &amp;quot;[Ancillary services suppporting scholarly reference&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/ancillary].&amp;quot;&lt;br /&gt;
&lt;br /&gt;
1.1 Composite objects and the TICI stack &lt;br /&gt;
An extraordinary range of scholarly citation can be handled through the simple mechanisms of Collection Services, and Canonical Text Services, while indexing using Reference Index Services enables a complex web of associations to be built on top these citation mechanisms.  We want to incorporate spatial manipulation into our stack of services, but for the present are very happy to let others, including the Open Geospatial Consortium, take the lead in this area.  In the summer of 2006, we began to build the first examples of compound objects, adding to the simple identification and retrieval of Collections and Canonical Texts, more specialized manipulation for binary images.&lt;br /&gt;
&lt;br /&gt;
Image Procesing Services perform operations such as scaling an image, selecting a subsection of it, or altering its brightness and contrast.   (See “[Image Processing Services&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/images].”)  By itself, an image processing service is of little use;  it really becomes valuable only in association with some other information.  Collections services already provide a ready means of working with metadata about each image;  Reference Index Services make it possible to associate binary image identifiers with objects in other collections, or with texts.  An index of, say, page images to CTS URNs could define the relation between a text and images of pages in a specific edition;  a CTS instance could provide access to an XML text, while a related Image Processing Service could work with the image data.&lt;br /&gt;
&lt;br /&gt;
At CHS, the result in the fall of 2006 is a stack of four principle interrelated services: Texts, Indexes, Collections and Images, that together provide a sufficient infrastructure for a surprising range of scholarly publications.  We have been closely collaborating with the Perseus project over the last several months to test these services, and build end-user applications on top of them.  Text browsing and reading applications work simultaneously with CHS implementations of  Canonical Text Services in Washington, D.C., at Holy Cross College in Worcester, Massachusetts, and at Furman University in Greenville, S.C., as well as with an independent implementation using completely different back-end technology at the Perseus project at Tufts University.&lt;br /&gt;
&lt;br /&gt;
For more information, see &amp;quot;[An overview of services for composite objects&amp;gt;http://chs75.harvard.edu/projects/diginc/techpub/composites]&amp;quot;&lt;br /&gt;
&lt;br /&gt;
1.1 Current work: Scenarios&lt;br /&gt;
&lt;br /&gt;
Even as small a set of services as the TICI stack allows for very complex networks of information, and it is becoming increasingly apparent that we need to plan now for a further dimension to our work:  a means of making machine-parseable statements about the relations among these resources.&lt;br /&gt;
&lt;br /&gt;
In September, 2006, we have begun work on a simple XML schema for inventorying and describing the relations among stable, citable resources anywhere in the TICI stack.  These inventories, which we are provisionally calling “Scenarios,” are in a sense a digital extension of bibliography:  they add to the  static lists of print bibliography a specification of how resources relate to each other.  Scenarios are declarative or descriptive, not functional:  applications may use the information in a Scenario as they choose, but as a print bibliography ideally catalogs resources needed to read a print publication, Scenarios catalog resources needed to read a digital publication. &lt;br /&gt;
&lt;br /&gt;
A simple text reader can, for example, list a single resource with a CTS URN referring to a passage in a text;  in  this instance, the Scenario amounts to a simple bookmark.  But a text reader that filters the text with information from an index might overlay links on the words of a Latin text to a morphological index.  Its Scenario can specify how the text resource and index relate.  An even more sophisticated reader might in turn associate the lemma with other morphological data;  this could appear as a Collection in the application's Scenario.&lt;br /&gt;
&lt;br /&gt;
Our work on Scenarios is very preliminary at this point, but illustrates a number of themes that are relevant to the broader topic of this conference:  the leverage we can obtain from building on openly available resources, the ways very simple, even minimal resources can in their complex interrelations lead to  sophisticated scholarly productions, and the ease of interoperation that is possible when we can work with common protocols and standards.&lt;br /&gt;
&lt;br /&gt;
1 More information&lt;br /&gt;
* Documentation of technical work at CHS, ~~[Digital Incunabula&amp;gt;http://chs75.harvard.edu/projects/diginc/home]~~&lt;br /&gt;
* &amp;quot;Update blog&amp;quot; with syndicated feeds for [announcements and updates&amp;gt;https://chs76.harvard.edu/weblog/neel/] from the CHS Technical Working Group&lt;br /&gt;
&lt;br /&gt;
1 License&lt;br /&gt;
(c) Neel Smith 2006 &lt;br /&gt;
Distributed under the [Creative Commons Attribution-Share-alike license v. 2.5&amp;gt;http://creativecommons.org/licenses/by-sa/2.5/]&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Choudhury_Paper&amp;diff=2006</id>
		<title>OSCE Choudhury Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Choudhury_Paper&amp;diff=2006"/>
		<updated>2007-01-29T14:55:28Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;1 Position Paper on Licensing/Legal Matters&lt;br /&gt;
&lt;br /&gt;
1.1 Sayeed Choudhury&lt;br /&gt;
&lt;br /&gt;
1.1.1 Library Digital Programs, Sheridan Libraries, Johns Hopkins University&lt;br /&gt;
&lt;br /&gt;
In their position paper, Stuart Dunn and Tobias Blanke raise an interesting and relevant question regarding digital texts: &amp;amp;quot;How can such texts fit in existing library and information (infra)structures?  Will these need to be rethought?&amp;amp;quot;  Winston Tabb, Sheridan Dean of University Libraries at Johns Hopkins, has stated that libraries are built upon three pillars - collections, services and infrastructure.  Arguably, collections have represented the most important element in the print world, with services and infrastructure supporting the collections.  In the digital world, these elements are becoming blurred. It may be appropriate to assert that the ~~principles~~ by which libraries (and archives and museums) have operated remain valid, but the ~~practices~~ need to be reconsidered.  Not surprisingly, libraries are facing new challenges, and opportunities, with the development of infrastructure to support digital collections and services.&lt;br /&gt;
&lt;br /&gt;
At the heart of this infrastructure development effort is the repository.  There are many defintions for repository, but for the purpose of this discussion, the most useful one is offered by Cliff Lynch who stated a &amp;amp;quot;repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.&amp;amp;quot; (http://www.arl.org/newsltr/226/ir.html)&lt;br /&gt;
&lt;br /&gt;
The emphasis on both services and preservation is particularly noteworthy.  From a preservation perspective, it is important to note that both open standards and open source augment our abiilty to support digital preservation. (http://www.ils.unc.edu/callee/oss_preservation.htm).  From the service perspective, other position papers have raised several interesting (potential) needs or uses for digital texts.  One theme becomes patently clear in reading these papers: scholars will not only need access to view digital texts, but will also need the ability to download (en masse), manipulate, transform and repurpose digital texts.  The collaborative editing envsiaged by Ross Scaife and Dot Porter would be difficult without fully open access to digital texts.  The type of markup described by Gabriel Bodard almost certainly requires complete access to digital texts. Greg Crane has often discussed the possibilities for machine translation, language modeling and document analysis with large corpora of digitized texts (http://www.dlib.org/dlib/march06/crane/03crane.html).&lt;br /&gt;
&lt;br /&gt;
These ideas raise very important questions.  Are the libraries involved with Google Book Search (http://books.google.com/) providing only part of the solution?  Even more disconcerting is the idea that these libraries, though well-intentioned, may even inhibit the ability of scholars to work with digital texts in a manner that supports new scholarship.  Will Google work with the scholarly community to build tools and services, or only consider commercial opportunities?  Understandably, libraries, including those working with the Open Content Alliance (http://www.opencontentalliance.org/), consider whether to digitize books already available through Google Book Search in an effort to avoid duplicative efforts.  However, it's important to consider both the collections and services aspects.&lt;br /&gt;
&lt;br /&gt;
Repository development obviously entails a high degree of technology work, but repositories, particularly institutional respositories should respond to a policy and legal framework.  From a technological perspective, it is optimal to develop an unconstrained, open system that can be constrained or modified according to local policy or legal frameworks; it is difficult, if not impossible, to move in the other direction.  The e-Science community has noted that it is important to consider openness even in terms of the data.  The SPARC Open Data (http://www.arl.org/sparc/opendata/) states: &amp;amp;quot;Many advocates of Open Data believe that, although there are substantial potential benefits from sharing and reusing digital data upon which scientific advances are built, today much of it is being lost or underutilized because of legal, technological and other barriers.&amp;amp;quot; That is, even the most open system may not support preservation or scholarly needs if the data is constrained through proprietary formats or legal restrictions. &lt;br /&gt;
&lt;br /&gt;
With these observation in mind, it seems obvious that the scholarly community should adopt, even push, for completely open standards and open access for digital texts.  Such openness offers the greatest potential for the type of digital environment envisaged through the other position papers.&lt;br /&gt;
&lt;br /&gt;
However, it is important to note that the inter-relationships between technology, policy and organizational roles that has been defined in the print world is also becoming blurred.  When a monograph was published, there was a reasonable degree of understanding regarding how a scholar would send this monograph to a publisher, which would seek revenue through sales, but also agree that libraries could offer the book without cost - under certain conditions - to the scholarly community.  With digital publications, this process and role definition is being established, sometimes with controversy.  The US National Endowment for the Humaniites has announced new guidelines for their Scholarly Editions Grants (http://www.neh.gov/grants/guidelines/editions.html) that states a preference for projects that offer digitized works online throgh open access.  This announcement has raised some concerns among scholars and University Presses regarding business models and rights clearance (http://insidehighered.com/news/2006/09/18/documents).  &lt;br /&gt;
&lt;br /&gt;
Finally, what implications arise from open data in terms of the reward structure for scholars?  Will freely available online digital texts be viewed with the same level of rigor or reputation as those &amp;amp;quot;validated&amp;amp;quot; through publishers, peer review, or other means for assessment?  Libraries are eager to serve scholarly needs in the digital age, ideally with an open policy and legal framework.  It is important, however, to address the corresponding implications of such arrangements in terms of organization roles, business models, and reward structures.&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Choudhury_Paper&amp;diff=2005</id>
		<title>OSCE Choudhury Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Choudhury_Paper&amp;diff=2005"/>
		<updated>2007-01-29T14:55:03Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[OSCE index&amp;gt;Main.osce] | [OSCE programme&amp;gt;programme]&lt;br /&gt;
&lt;br /&gt;
1 Position Paper on Licensing/Legal Matters&lt;br /&gt;
&lt;br /&gt;
1.1 Sayeed Choudhury&lt;br /&gt;
&lt;br /&gt;
1.1.1 Library Digital Programs, Sheridan Libraries, Johns Hopkins University&lt;br /&gt;
&lt;br /&gt;
In their position paper, Stuart Dunn and Tobias Blanke raise an interesting and relevant question regarding digital texts: &amp;amp;quot;How can such texts fit in existing library and information (infra)structures?  Will these need to be rethought?&amp;amp;quot;  Winston Tabb, Sheridan Dean of University Libraries at Johns Hopkins, has stated that libraries are built upon three pillars - collections, services and infrastructure.  Arguably, collections have represented the most important element in the print world, with services and infrastructure supporting the collections.  In the digital world, these elements are becoming blurred. It may be appropriate to assert that the ~~principles~~ by which libraries (and archives and museums) have operated remain valid, but the ~~practices~~ need to be reconsidered.  Not surprisingly, libraries are facing new challenges, and opportunities, with the development of infrastructure to support digital collections and services.&lt;br /&gt;
&lt;br /&gt;
At the heart of this infrastructure development effort is the repository.  There are many defintions for repository, but for the purpose of this discussion, the most useful one is offered by Cliff Lynch who stated a &amp;amp;quot;repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.&amp;amp;quot; (http://www.arl.org/newsltr/226/ir.html)&lt;br /&gt;
&lt;br /&gt;
The emphasis on both services and preservation is particularly noteworthy.  From a preservation perspective, it is important to note that both open standards and open source augment our abiilty to support digital preservation. (http://www.ils.unc.edu/callee/oss_preservation.htm).  From the service perspective, other position papers have raised several interesting (potential) needs or uses for digital texts.  One theme becomes patently clear in reading these papers: scholars will not only need access to view digital texts, but will also need the ability to download (en masse), manipulate, transform and repurpose digital texts.  The collaborative editing envsiaged by Ross Scaife and Dot Porter would be difficult without fully open access to digital texts.  The type of markup described by Gabriel Bodard almost certainly requires complete access to digital texts. Greg Crane has often discussed the possibilities for machine translation, language modeling and document analysis with large corpora of digitized texts (http://www.dlib.org/dlib/march06/crane/03crane.html).&lt;br /&gt;
&lt;br /&gt;
These ideas raise very important questions.  Are the libraries involved with Google Book Search (http://books.google.com/) providing only part of the solution?  Even more disconcerting is the idea that these libraries, though well-intentioned, may even inhibit the ability of scholars to work with digital texts in a manner that supports new scholarship.  Will Google work with the scholarly community to build tools and services, or only consider commercial opportunities?  Understandably, libraries, including those working with the Open Content Alliance (http://www.opencontentalliance.org/), consider whether to digitize books already available through Google Book Search in an effort to avoid duplicative efforts.  However, it's important to consider both the collections and services aspects.&lt;br /&gt;
&lt;br /&gt;
Repository development obviously entails a high degree of technology work, but repositories, particularly institutional respositories should respond to a policy and legal framework.  From a technological perspective, it is optimal to develop an unconstrained, open system that can be constrained or modified according to local policy or legal frameworks; it is difficult, if not impossible, to move in the other direction.  The e-Science community has noted that it is important to consider openness even in terms of the data.  The SPARC Open Data (http://www.arl.org/sparc/opendata/) states: &amp;amp;quot;Many advocates of Open Data believe that, although there are substantial potential benefits from sharing and reusing digital data upon which scientific advances are built, today much of it is being lost or underutilized because of legal, technological and other barriers.&amp;amp;quot; That is, even the most open system may not support preservation or scholarly needs if the data is constrained through proprietary formats or legal restrictions. &lt;br /&gt;
&lt;br /&gt;
With these observation in mind, it seems obvious that the scholarly community should adopt, even push, for completely open standards and open access for digital texts.  Such openness offers the greatest potential for the type of digital environment envisaged through the other position papers.&lt;br /&gt;
&lt;br /&gt;
However, it is important to note that the inter-relationships between technology, policy and organizational roles that has been defined in the print world is also becoming blurred.  When a monograph was published, there was a reasonable degree of understanding regarding how a scholar would send this monograph to a publisher, which would seek revenue through sales, but also agree that libraries could offer the book without cost - under certain conditions - to the scholarly community.  With digital publications, this process and role definition is being established, sometimes with controversy.  The US National Endowment for the Humaniites has announced new guidelines for their Scholarly Editions Grants (http://www.neh.gov/grants/guidelines/editions.html) that states a preference for projects that offer digitized works online throgh open access.  This announcement has raised some concerns among scholars and University Presses regarding business models and rights clearance (http://insidehighered.com/news/2006/09/18/documents).  &lt;br /&gt;
&lt;br /&gt;
Finally, what implications arise from open data in terms of the reward structure for scholars?  Will freely available online digital texts be viewed with the same level of rigor or reputation as those &amp;amp;quot;validated&amp;amp;quot; through publishers, peer review, or other means for assessment?  Libraries are eager to serve scholarly needs in the digital age, ideally with an open policy and legal framework.  It is important, however, to address the corresponding implications of such arrangements in terms of organization roles, business models, and reward structures.&lt;br /&gt;
&lt;br /&gt;
[[Category:OCSE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Scaife_Paper&amp;diff=2004</id>
		<title>OSCE Scaife Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Scaife_Paper&amp;diff=2004"/>
		<updated>2007-01-29T14:53:42Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[OSCE index&amp;gt;Main.osce] | [OSCE programme&amp;gt;programme]&lt;br /&gt;
&lt;br /&gt;
1 Open Source Critical Editions&lt;br /&gt;
1 Workshop at Kings College London&lt;br /&gt;
1 September 22, 2006&lt;br /&gt;
1.1 Tools for Collaborative Editing (some thoughts by Ross Scaife and Dot Porter)&lt;br /&gt;
1.1.1 1. Introduction to the Concept&lt;br /&gt;
&lt;br /&gt;
The Wikipedia entry on &amp;quot;collaborative editor&amp;quot; defines the term quite simply: &amp;quot;A collaborative editor allows simultaneous editing of the same document or video by different participants using different computers.&amp;quot; ([http://en.wikipedia.org/wiki/Collaborative_real-time_editor]) Electronic editions have become steadily more popular over the past decade. Libraries and museums have led the charge, followed by increasing numbers of scholars, both individuals and groups, who form the basis of an active community of electronic editors. As this community grows, so does the need for tools suitable to the types of editions that people and institutions are actually creating. Generally, there are three specific needs of humanists involved in collaborative editing projects. Scholars need to be able to build editions encompassing text, images, and annotations, the latter usually using the Extensible Markup Language (XML), the de facto standard for encoding electronic editions in the humanities, and the mode of expression of the Text Encoding Initiative (TEI). Second, software needs to have access control and version management systems that will allow several different editors to collaborate on an edition with different levels of access and without fear that one editor might inadvertently overwrite another's work. Finally, accessibility. Software needs to be designed in such a way that it will encourage collaborative work among individuals who are geographically dispersed, and may encourage electronic editing by those many accomplished humanities scholars who are familiar with basic computer tools (word processors, web browsers, etc.) but who may be put off by regular XML editing software.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Good collaborative editing software will foster the creation of scholarly works by forging partnerships between individuals and institutions, enabling them to share resources, both physical resources (in the form of texts and images) and intellectual (in the form of subject knowledge and editing experience). Software released under an Open-Source license will especially promote cooperation among smaller institutions that might not have the resources to purchase expensive software. Such software could even become a significant resource not only for scholars, but also for teachers and students, potentially encouraging collaborative projects between schools around the world.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Maintaining an edition with multiple editors contributing to the same document requires a significant amount of work. Editors must be careful not to overwrite changes made by others, for example by coordinating the process so that no two editors work on a file at the same time. Word processing software such as Microsoft Word includes a tool for &amp;quot;Tracking Changes&amp;quot;, which enables users to work collaboratively; however, though the resulting files are suitable for printing, they are not encoded in a standard acceptable for electronic editions. With the increasing scale and scope of electronic editions, the need for a collaborative editing process rooted in accepted standards, and software to support this process, is even stronger.&lt;br /&gt;
&lt;br /&gt;
1.1.1 2. How can collaborative editing software help classicists? Give a few real-life examples&lt;br /&gt;
&lt;br /&gt;
* This page was initially produced and edited by two individuals in Writely ([http://www.writely.com/]) (Cnet review ([http://reviews.cnet.com/4520-9239_7-6627472.html?tag=cnetfd.ld3]) compares other AJAX'ed word processors)&lt;br /&gt;
** readily editable by one or more people, like a wiki. Unlike a wiki, Writely feels like regular word processor, a pared-down MS Word.&lt;br /&gt;
** numerous output formats (html rtf doc odf pdf) to suit a variety of publication/access needs (both print and online publication)&lt;br /&gt;
** provides a view of the history of a document's revisions over time, which helps to show the relative contributions of collaborators over time.&lt;br /&gt;
** documents can be shown to select viewers or made public&lt;br /&gt;
* Similar to Writely, LiveDocuments promises synchronization of Microsoft Office Documents, allowing for collaborative editing/writing in a context familiar to most scholars&lt;br /&gt;
** There is no server requirement, editors need not log on to a central server&lt;br /&gt;
** &amp;quot;LiveDocuments promises Office collaboration without a server&amp;quot; [http://arstechnica.com/news.ars/post/20060908-7701.html]&lt;br /&gt;
* Classics context: note the ideas about a communal text, a personal text, and the text of a given MS presented 11 years ago by the Vergil Project (never implemented, unfortunately)&lt;br /&gt;
** Communal text: &amp;quot;users will participate in the &amp;quot;establishment&amp;quot; of a text that will never reach final form. Here is how it will work. All the texts at this site include a critical apparatus of variant readings, conjectural emendations, and so forth. Because this information is presented on-line, it is possible for interested users to select the readings that they prefer -- to vote, in effect, for the reading that they think should appear in a given passage. These votes can then be tabulated, and the reading receiving the most votes will appear in the Communal Text. Those who consult this version of the text must therefore do so on the understanding that it does not represent the final judgment of any single editorial expert, but the aggregate opinion of the community of users of the site, and that it is subject to change at any moment.&amp;quot;&lt;br /&gt;
** Personal text: &amp;quot;Through this menu item users can record their preferences and use them to establish the text that they habitually consult. Of course, it will be possible to use this feature in other ways as well. Someone who wanted to use this site but felt the need of a little extra editorial authority might simply enter into his or her text whatever readings are printed by his or her favorite editor. On the other hand, a group of scholars interested in constructing a text for some specific purpose might use this resource collaboratively. So might a class on Vergil or on textual criticism. No doubt other applications will be thought of as well.&lt;br /&gt;
** Text of a particular manuscript: &amp;quot;Through this feature it will be possible to see the text as it appears in any of the manuscripts whose readings have been entered into the database. If one were interested in the Palatinus, for example, a diplomatic transcript of that manuscript would (with secondary readings and corrections available via hypertext links). In some cases images are available as well, and we hope eventually to provide facsimiles of all the mss in the database.&amp;quot;&lt;br /&gt;
* Suda On Line is another oldie-but-goodie with strengths and weaknesses ([http://www.stoa.org/sol/])&lt;br /&gt;
* Virtual Humanities Lab at Brown University has been developing a system for collaborative annotation of literary texts. The guidelines for annotation (published here: [http://golf.services.brown.edu/projects/VHL/help/guidelines_annot.pdf]) are simple, the software is accessed through a regular web browser.&lt;br /&gt;
* Compare the proposed Homer Multitext:&lt;br /&gt;
&lt;br /&gt;
{quote}&amp;quot;An ideal edition of Homer would encompass the full historical reality of the Homeric textual tradition as it evolved through time, from the pre-Classical era well into the medieval. Our attempt to create such an edition is already underway. Instead of choosing between variants and plus verses in an attempt to recover the ipsissima verba of Homer, we propose to include them in a multitext edition that embraces the fluidity of the textual traditions of the Iliad and Odyssey. The ideal format for this multitext edition of Homer is not a traditional printed text but an electronic, web-based edition. Unlimited in its ability to handle complex sets of variants, an electronic multitext offers critical readers of Homer the opportunity to consider many historical Iliads and Odysseys from the standpoint of many different sources of transmission, and so also allows the user to recover both a more accurate and more accessible picture of the fluidity of the tradition in the earliest stages of textuality.&amp;quot;&lt;br /&gt;
{quote}&lt;br /&gt;
&lt;br /&gt;
* EDUCE: Ideally, this project needs a strategy for imposing editorial control over the resulting documents in a process that involves establishing the texts, encoding them with standard TEI-XML markup using newly available Open Source software tools, and then publishing the transcripts side-by-side with their associated images following Open Access protocols.&lt;br /&gt;
&lt;br /&gt;
1.1.1 3. Collaborative Editors&lt;br /&gt;
&lt;br /&gt;
Different types of collaborative editors (see Appendix for list of editors)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* synchronous vs. asynchronous. Synchronous editors work in &amp;quot;real-time&amp;quot;. Changes made by one editor are immediately visible to other editors. Asynchronous editors (including Writely, MediaWiki, and version management systems) synchronize working versions either automatically after-the-fact, or (in the case of version management systems) require users to update changes manually.&lt;br /&gt;
* text-only editing vs. image-based editing. Text-only editors, including most XML and word processing programs, focus solely on the editing of the text. Image-based editors (including the EPPT and the University of Victoria Image Markup Tool) provide simple methods for either incorporating images into editions, or building textual annotations onto images.&lt;br /&gt;
* XML editors vs. text-only editors. XML editors, for example oXygen or XMetal, provide support for building XML annotations into texts. The better editors include various other XML support: XPath searching, XSLT development for translation, DTD or schema development for validation.&lt;br /&gt;
* Problems with collaborative editing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Version control: Wiki, Subversion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Administration for collaborative editing has two main issues: version management and access control. Version management deals with the problems of simultaneous editing. When a user makes changes to a document, we must be prepared to combine those changes with other changes by editors working on the same document. Furthermore, it may be necessary to obtain an earlier version of a document for reference, or even to reverse part of a series of changes while leaving other edits in place. A version management system tracks the branching revisions of a document as it is updated by a number of individuals.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Access control sets limits on the documents an editor can modify (coarse-grained access control), and the types of changes he or she can make to those documents (fine-grained access control). Such a system allows a project administrator to delegate editing responsibilities in a controlled manner. Consider, for example, two scholars with different specialized knowledge who are collaborating on an editing project. One scholar studies language, and is responsible for editing the linguistic aspects of a particular text. Another scholar specializes in manuscript studies, and is responsible for describing aspects of the text within the context of a specific manuscript - the scribal handwriting, condition of the manuscript, etc. The document curator, then, can grant the textual editor access to update sets of markup for describing the language of the text, but not for describing information such as scribal handwriting and condition of the manuscript. Likewise, the manuscript scholar would have access to modify sets of markup for describing the manuscript, but not the language of the text. On the other hand, neither of these scholars would be able to modify administrative markup such as the document's headers. Fine-grained access control allows the administrator to enable both scholars to work simultaneously within their domains of expertise without compromising the integrity and control of the editorial process. The document curator or project coordinator creates a set of rules that specify the &amp;quot;shape&amp;quot; of modifications particular users are allowed to make. Then, when a user attempts to modify part of the document, those access control rules are compared to that part of a document; much like a key in a lock, if the &amp;quot;shape&amp;quot; of the rule matches the document, the lock opens and the change is permitted to go through.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Source code management (SCM) systems such as CVS and SVN have shown their ability to assist in collaborative maintenance of computer source code. SCMs allow programmers to maintain parallel branches of their source code, merging sets of changes from one branch to another. However, SCMs take a line-oriented approach to revision management; while this is ideal for computer source code, is not well suited to XML documents, where modifications usually follow the document's hierarchical structure. Furthermore, merging conflicting changes can be a complex process, and often must be dealt with before a user can commit their changes to a central repository. Finally, SCM systems support primarily coarse-grained access control, so that permission to modify part of a document implies permission to modify the entire document; fine-grained access control affords much more flexibility in organizing a collaborative editing project.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Editing needs are also not fully served by content management systems such as the open-source MediaWiki. This system, which underlies the highly successful Wikipedia collaborative encyclopedia project, has demonstrated its ability to handle collaborative editing at a massive scale. Support for access control, however, is quite limited, given the open-editing model of Wikipedia. While supervisors can &amp;quot;lock&amp;quot; documents to prevent them from being modified, it is difficult to limit access in a more complex fashion. Furthermore, although such systems typically support version management, the revisions of a document are treated as following a linear sequence. Such a model does not adequately capture the complexities of parallel changes, where an editor may modify a document unaware of changes being made by another editor to the same document.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
None of the existing systems is designed for a highly collaborative environment with large numbers of concurrent changes and with constant revision tracking. The &amp;quot;perfect&amp;quot; system would combine the version-tracking features of SCM, the scalability of collaborative content management systems, and the security and flexibility of fine-grained access control.&lt;br /&gt;
&lt;br /&gt;
1.1.1 4. Finding valid metrics for apportioning scholarly credit&lt;br /&gt;
&lt;br /&gt;
Few collaborative projects are prominantly describing their methods for crediting participation. For one good example, see the Tibetan and Himalayan Digital Library ([http://www.thdl.org/xml/showEssay.php?xml=/intro/participation.xml&amp;amp;amp;l=d1e650]).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* we need to harness the self-interest of scholars:&lt;br /&gt;
** but collaborative work is often incremental (with many small contributions over time). MediaWiki, with its version management system, does provide a way to track the contributions of individuals over time.&lt;br /&gt;
** peer-assessment may be feasible in some cases (as with assessments of Amazon reviews' helpfulness) but often the number of people involved may be very small, in a field like ours.&lt;br /&gt;
** SOL counts users' contributions as translators and editors but cannot provide any qualitative measure, so one person who provided only a single entry that is of very high quality may seem to have done little&lt;br /&gt;
&lt;br /&gt;
1.1.1 5. Conclusions? Future Directions?&lt;br /&gt;
&lt;br /&gt;
Web-based software would enable collaboration on image- and text-based electronic editions over the Internet, enabling geographically dispersed groups of humanists to collaborate on editions encompassing text, image, and annotations. Even the most tech-savvy humanist working in seclusion is familiar with the dangers of editing electronic files; it is far too easy to copy older versions of files over newer ones, or to accidentally overwrite text through a careless cut and paste. Multiple editors collaborating on the same project require even more coordination and effort to avoid the chance of accidental loss of information. Support management of the complex array of document versions that arise during the collaborative editing process, and by implementing fine-grained access control to documents. Version management would record the history of editors' changes to the electronic edition, allowing for both internal and public review of the status and progress of an electronic edition project. Fine-grained access control would allow project coordinators to delegate editing tasks to individual editors or groups, by limiting modifications to individual parts of a document and its markup. A convenient and flexible interface, running through a standard Internet browser, would allow the coordinator to easily define access-control policies. Tools should take advantage of accepted standards such as the Extensible Markup Language (XML) and the Text Encoding Initiative (TEI), as well as more subject-specific tools such as Epigraphic Documents in TEI XML (EpiDoc) and the Canonical Text Services (CTS) protocol. The community of researchers in the Humanities and Classics in particular would be well-served with a platform that provides the following functionalities:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# Users in diverse locations can simultaneously edit the same document, using a familiar web browser interface.&lt;br /&gt;
# The automatically managed history of editorial changes allows for merging and/or reverting selected changes without causing version conflicts.&lt;br /&gt;
# Coordinators can add the full advantage of collaboration to works-in-progress by importing existing sources without changing schemas or markup.&lt;br /&gt;
# The use of CTS enables uniform citations to electronic editions.&lt;br /&gt;
&lt;br /&gt;
1.1.1 Appendix: Overview of scholarship&lt;br /&gt;
&lt;br /&gt;
* &amp;quot;Will Wikipedia Mean the End Of Traditional Encyclopedias?&amp;quot; dialogue between Jimmy Wales and Dale Hoiberg, ~~Wall Street Journal Online~~, September 12, 2006, URL: [http://online.wsj.com/public/article/SB115756239753455284-A4hdSU1xZOC9Y9PFhJZV16jFlLM_20070911.html]&lt;br /&gt;
* &amp;quot;Britannica versus Wikipedia heads to the WSJ,&amp;quot; by Ken Fisher. ~~Arstechnica~~, September 12, 2006, URL: [http://arstechnica.com/news.ars/post/20060912-7726.html]&lt;br /&gt;
* &amp;quot;The Wiki That Edited Me,&amp;quot; by Ryan Singel. ~~Wired News~~, September 7, 2006, URL: [http://www.wired.com/news/technology/0,71737-0.html?tw=rss.index]&lt;br /&gt;
* &amp;quot;Puppy smoothies: Improving the reliability of open, collaborative wikis,&amp;quot; by Tom Cross. ~~First Monday~~, volume 11, number 9 (September 2006), URL: [http://firstmonday.org/issues/issue11_9/cross/index.html]&lt;br /&gt;
* &amp;quot;7 Things you should Know about Collaborative Editing,&amp;quot; EDUCAUSE [http://www.educause.edu/content.asp?page_id=666&amp;amp;amp;ID=ELI7009&amp;amp;amp;bhcp=1]&lt;br /&gt;
* &amp;quot;Undoing Actions in Collaborative Work,&amp;quot; [http://www.eecs.umich.edu/~aprakash/papers/prakash-knister-cscw92.pdf]&lt;br /&gt;
* &amp;quot;A Framework for Undoing Actions in Collaborative Systems,&amp;quot; [http://www.eecs.umich.edu/~aprakash/papers/undo-tochi94.pdf]&lt;br /&gt;
* &amp;quot;Fault-Tolerant Computing in Real-Time Collaborative Editing Systems&amp;quot; [http://www.cse.unl.edu/~xqin/research/ftrce.html]&lt;br /&gt;
* &amp;quot;Access Control in Collaborative Systems&amp;quot; [http://portal.acm.org/citation.cfm?id=1057977.1057979]&lt;br /&gt;
* &amp;quot;A Model for Semi-(a)Synchronous Collaborative Editing&amp;quot; [http://dret.net/biblio/reference/min93]&lt;br /&gt;
* &amp;quot;A Multimedia Desktop Collaboration System&amp;quot; [http://dret.net/biblio/reference/che92b]&lt;br /&gt;
* &amp;quot;A Proposed Model and Functionality Definition for a Collaborative Editing and Conferencing System&amp;quot; [http://dret.net/biblio/reference/lub90b]&lt;br /&gt;
* &amp;quot;A Survey of Experiences of Collaborative Writing,&amp;quot; pp. 87-112, In: Computer Supported Collaborative Writing, Mike Sharples (Ed.), Computer Supported Cooperative Work, Springer-Verlag, London, UK, Computer Supported Cooperative Work, 1993, ISBN 3540197826 [http://dret.net/biblio/reference/bec93]&lt;br /&gt;
* &amp;quot;Atomic Data Abstractions in a Distributed Collaborative Editing System&amp;quot; [http://dret.net/biblio/reference/gre85]&lt;br /&gt;
* &amp;quot;CoDoc: Multi-mode Collaboration over Documents&amp;quot; http://dret.net/biblio/reference/ign04 Engineering Library QA76.758 .C33 2004&lt;br /&gt;
* &amp;quot;Design and Implementation of a Distributed Program for Collaborative Editing&amp;quot; [http://dret.net/biblio/reference/sel86]&lt;br /&gt;
* &amp;quot;Designing a Distributed Collaborative Environment&amp;quot; [http://dret.net/biblio/reference/che92]&lt;br /&gt;
* &amp;quot;Flexible Diff-ing in a Collaborative Writing System&amp;quot; (Math Sciences Library HD66 .C563 1992) [http://dret.net/biblio/reference/neu92]&lt;br /&gt;
* &amp;quot;Using Web Annotations for Asynchronous Collaboration Around Documents,&amp;quot; pp. 309-318, In: David G. Durand (Ed.), Proceedings of the ACM 2000 Conference on Computer Supported Cooperative Work, ACM Press, Philadelphia, Pennsylvania, December 2000 , ISBN 1-58113-222-0. [http://dret.net/biblio/reference/cad00] Engineering Library QA75.5 C65 2000&lt;br /&gt;
* The Wiki Way: Collaboration and Sharing on the Internet: [http://dret.net/biblio/reference/leu01]&lt;br /&gt;
* &amp;quot;The Collaborative Multi-User Editor Project Iris&amp;quot; [http://www11.informatik.tu-muenchen.de/publications/pdf/Koch1995.pdf]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
*Resources:*&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
See http://en.wikipedia.org/wiki/Collaborative_software for a good general discussion of collaborative software in general and [http://en.wikipedia.org/wiki/CSCW] for a definition of &amp;quot;computer-supported cooperative work&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
~~Existing Tools~~&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
~~synchronous~~ (see [http://en.wikipedia.org/wiki/Collaborative_real-time_editor]):&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
SubEthaEdit (MacOSX): [http://www.codingmonkeys.de/subethaedit/collaborate.html]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* [http://www.macdevcenter.com/pub/a/mac/2003/12/02/rendezvous.html] (review)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
ACE (platform independent): [http://ace.iserver.ch/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gobby (Linux, Windows, MacOSX): [http://gobby.0x539.de/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
MoonEdit (Linux, Windows, FreeBSD): [http://moonedit.com/index.html.en]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
TeNDaX: [http://www.tendax.net/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Chalk: http://blog.chalk.it/&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
GroupSketch (a tool for synchronous collaborative sketching): [http://grouplab.cpsc.ucalgary.ca/papers/1992/92-GroupSketch-Video.CSCW/groupsketchvideo.pdf]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
GROVE, &amp;quot;a textual multi-user outlining tool&amp;quot;: Ellis, C., Gibbs, S. and Rein, G. (1990). Design and use of a group editor. In Cockton (Ed.), Engineering for Human-Computer Interaction. North-Holland.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
ShrEdit, &amp;quot;a multi-user text editor&amp;quot;: L.J. McGuffin, and G.M. Olson: &amp;quot;ShrEdit: a shared electronic workspace,&amp;quot; CSMIL Technical Report #45, The University of Michigan, 1992.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
DistEdit, &amp;quot;a toolkit for implementing distributed group editors&amp;quot;: (Knister, M.J and Prakash, A. (1990): &amp;quot;DistEdit: A Distributed Toolkit for Supporting Multiple Group 'Editors&amp;quot;, Proceedings of CSCW '90, ACM 1990 Conference on Computer Supported Cooperative Work, Los Angeles, 1990)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
~~asynchronous:~~&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Writely: [http://www.writely.com/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
DocSynch: [http://docsynch.sourceforge.net/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
And, of course, Wiki: [http://www.wiki.org/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
~~Backend~~&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
WebDAV (Web-based Distributed Authoring and Versioning; a set of extensions to the HTTP protocol which allows users to collaboratively edit and manage files on remote web servers): [http://www.webdav.org/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
IETF Delta-V Working Group (This working group will define extensions to HTTP and the WebDAV Distributed Authoring Protocol necessary to enable distributed Web authoring tools to perform, in an interoperable manner, versioning and configuration management of Web resources): [http://www.webdav.org/deltav/deltav-charter.html]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
MATE (Multilevel Annotation Tools Engineering; aims to facilitate re-use of language resources by addressing the problems of creating, acquiring, and maintaining language corpora): [http://mate.nis.sdu.dk/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Plone: A user-friendly and powerful open source Content Management System (&amp;quot;ideal as an intranet and extranet server, as a document publishing system, a portal server and as a groupware tool for collaboration between separately located entities.&amp;quot;; supports XML (see [http://plone.org/documentation/tutorial/xml-in-plone-with-marshall/?searchterm=XML] and [http://pyxml.sourceforge.net/topics/] for more general Python-XML)): [http://plone.org/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
~~Plone is built using...~~&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zope (Z Object Publishing Environment; an open source application server for building content management systems, intranets, portals, and custom applications; Zope also supports XML (see [http://www.zope.org/Members/karl/ParsedXML/ParsedXML and http://www.zope.org/Members/haqa/XMLKit])): [http://www.zope.org/]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2003</id>
		<title>OSCE Crane Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2003"/>
		<updated>2007-01-29T14:52:42Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;We need a comprehensive library of initial editions, openly accessible and freely available for re-use in derivative works.  This paper outlines one strategy for starting with print editions and moving into a more purely digital stage. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are two components to this argument, both on the Perseus Development Wiki:&lt;br /&gt;
&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Open_Content_Scholarly_Sources&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Next_generation_electronic_editions&lt;br /&gt;
&lt;br /&gt;
Open Content Scholarly Sources ----&lt;br /&gt;
&lt;br /&gt;
Google, Microsoft, Yahoo and other internet giants are now creating digital libraries designed to become more comprehensive than any academic library in human history. The current philosophy of these efforts stresses open access.  The creators of the Google project and the Internet Archive have expressed a dedication to open access.  Open access also maximizes the potential audience and thus  reinforces the advertising based business model on which these internet giants have founded their library efforts.&lt;br /&gt;
&lt;br /&gt;
The funders, however, retain varying rights to their work.  Google, for example, has now made available full PDF image books of public domain documents but it asserts proprietary rights over the page images and does not allow third parties to apply their own OCR or document recognition software.  The Open Content Alliance in principle encourages its partners to share everything but individual funders can impose their own restrictions on what they submit to OCA.&lt;br /&gt;
&lt;br /&gt;
We are therefore creating a completely open source library of core resources such as reference works and critical editions.  Our goal is to provide access to foundational information and also a foundation of materials that subsequent authors can modify, update, expand, and otherwise improve.  &lt;br /&gt;
&lt;br /&gt;
Our selection criteria differ from those of the print world.  A print library picks the best, most up-to-date documents available, knowing that print publications can be replaced but cannot change.  In a true digital library, documents can be dynamic and evolve in real time.  A recent encyclopedia will, presumably, be superior to another that is a century old.  But if the century-old encyclopedia can be freely updated and attracts high quality modifications, it can evolve and become more up-to-date and more authoritative than its frozen print counterpart.&lt;br /&gt;
&lt;br /&gt;
The classics component of the Open Content Scholarly Library that Perseus is helping create is being made available under a sharalike/attribution/non-commercial Creative Commons license. It contains the following:&lt;br /&gt;
&lt;br /&gt;
:* Source texts of Greek and Latin:  We have already released c. 8.5 million words of Greek and Latin source texts in TEI-compliant XML.  We have also digitized several hundred volumes of source texts.  These will be available as image books with searchable OCR and, where feasible, XML transcriptions.  Unlike most previous collections, this includes, where possible, multiple editions as well as traditional lists of places where on-line editions differ from editions not yet available on-line.&lt;br /&gt;
&lt;br /&gt;
:* Lexica of Greek and Latin:  These include major works such as the Liddell Scott Jones Greek-English Lexicon and the Lewis and Short Latin-English Lexicon as well as more specialized works such as Cunliff's Homeric Lexicon.&lt;br /&gt;
&lt;br /&gt;
:* Grammars:  These include student grammars such as Smyth's Greek Grammar and Allen and Greenough's Latin Grammar as well as extensive scholarly works such as Kühner-Gerth.&lt;br /&gt;
&lt;br /&gt;
:* Commentaries:  These include scholarly editions as well as school commentaries with linguistic annotations.  Commentaries lend themselves particularly well to electronic publication, which is optimally designed for the production, display and management of annotations.&lt;br /&gt;
&lt;br /&gt;
:* Tools:  These include Morpheus, the morphological analysis system developed in the late 1980s and still providing useful analyses of Greek and Latin words.  More importantly, this will include the databases with c. 100,000 stems and endings, mined from many sources,  and of potential use to third party morphological analysis systems.  All the core tools in the Perseus Digital Library have been rewritten in Java and will be available as additions to institutional repositories such as Fedora and any developers.&lt;br /&gt;
&lt;br /&gt;
:* FRBR Catalog Records for source texts:  Large projects such as dictionaries and text corpora have developed checklists of editions which they have used.  We are creating a modern catalog that builds on prior work (e.g., we use the author and work numbers developed by the TLG and PHI for Greek and Latin author) but provides an extensible architecture that can manage multiple editions, translations (e.g, English, French and German translations of an author), multiple versions of the same editions (e.g., an image book vs. a TEI transcription), multiple citation schemes (e.g., sections vs. chapters in Cicero)..&lt;br /&gt;
&lt;br /&gt;
:* Authority lists of people, places, dictionary entries, organizations, etc.  The reference works that we are producing lay the foundation for a comprehensive, extensible set of authority lists -- shared names with which we can uniquely identify particular people, places dictionary entries, organizations, etc.  While such authority lists are difficult -- experts may differ on which Sallust a particular passage designates and will never all agree on which when we have a dictionary word with two distinct meanings vs. two distinct dictionary words.  Nevertheless, all scholarly work depends upon the entries that appear in our reference works and electronic authority lists, however imperfect, are essential tools for large digital collections.&lt;br /&gt;
&lt;br /&gt;
Users include:&lt;br /&gt;
&lt;br /&gt;
:* Service providers:  we would like to see the data released useful to as many groups and in as many ways as possible.  Thus, we hope to see the content in Google and the Open Content Alliance as well as scholarly environment such as Chicago's Philologic and the Canadian TAPOR project.&lt;br /&gt;
&lt;br /&gt;
:* Experts in the field:  we hope that experts in the field will revise and extend every document that we release, with versioning systems tracking these changes and allowing experts to get the credit which they deserve for the work that they do.&lt;br /&gt;
&lt;br /&gt;
:* General students of the field:  we hope to see Wiki based commentaries in which non-experts working their way through a text pose and answer the questions which puzzle them.&lt;br /&gt;
&lt;br /&gt;
:* Advanced service developers:  we hope that developers will mine the encylopedias to drive their named entity identification systems (e.g., analyzer the articles in Smith's to determine which Alexander a particular document is discussing), sense disambiguation (e.g., which sense of a word in an on-line lexicon is in play in a  given passage), machine translation (e.g., mine the parallel texts and translations and the bilingual dictionaries so that a modern machine translation system can provide Greek/English, Latin/English translations etc.).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Next Generation Editions ----&lt;br /&gt;
&lt;br /&gt;
=Summary=&lt;br /&gt;
&lt;br /&gt;
We propose a new generation of primary source corpora that are:&lt;br /&gt;
&lt;br /&gt;
: * ''Permanent'':  The texts are not leased from a commercial vendor over a period of time but are permanently accessible, with reference copies and versioning information stored in multiple institutional repositories for long term preservation as well as freely available.&lt;br /&gt;
&lt;br /&gt;
: * ''Openly accessible'':  Cultural heritage primary sources in the public domain should be openly accessible to all.  If it is necessary to restrict access to newly digitized materials in order to secure funding, that restriction should be clearly delimited and as short as possible: e.g., those who fund digitization may have exclusive access for five years before the texts are released for universal access.&lt;br /&gt;
&lt;br /&gt;
: * ''Multi-versioned'':  The texts themselves can be updated, with all changes tracked in a versioning system. Alternately, the texts provide a stable foundation for standoff markup representing textual variants or advanced interpretation.&lt;br /&gt;
&lt;br /&gt;
: * ''Paid for and maintained by academic libraries'':  While external funding may help begin this process, library acquisition budgets are the long term source of funding for costs such as data entry.  Libraries already pay for the production of digital resources by commercial, for-profit entitites, which restrict access to public domain content. The same library budgets can support open access databases built on public domain source materials.&lt;br /&gt;
&lt;br /&gt;
=Open Content Editions=&lt;br /&gt;
&lt;br /&gt;
The Perseus Project has released TEI conformant XML texts with 55 million words of American English, 13 million words of Latin and Greek source texts, and, for most of the Greek and Latin, corresponding English translations. These texts are available under a Creative Commons non-commercial license: they must be used with attribution; changes must be shared; they cannot be used as part of a commercial corpus.  Commercial entities can, however, freely design for profit services that add value to these openly accessible sources.&lt;br /&gt;
&lt;br /&gt;
While these source texts can freely circulate, they will also be part of the university's permanent institutional repository, thus providing a stable, long term home that will outlast any single project or contributor.&lt;br /&gt;
&lt;br /&gt;
The Greek and Latin corpus contains most of the major works of classical literature. The Perseus Latin Collection contains more than half of the classical corpus and that coverage will approach 100% over the course of 2006/2007.&lt;br /&gt;
&lt;br /&gt;
Working wish lists for [[Latin_wishlist | Latin]] and [[Greek_wishlist | Greek]] are available for comment/addition.&lt;br /&gt;
&lt;br /&gt;
=Next Steps=&lt;br /&gt;
&lt;br /&gt;
* ''Links to page images of paper sources'': With Google Library, the Open Content Alliance and Europe's i2010 we see the emerge of digital libraries with millions of books with high quality page images.  Copyright restrictions complicate these efforts but solid versions of most major authors are available in the public domain.  &lt;br /&gt;
&lt;br /&gt;
* ''Full coverage including apparatus, introduction, indices etc.'': Digital editions can include all information in the print text and not only the text.&lt;br /&gt;
&lt;br /&gt;
* ''Semantic markup'':  Markup should reflect meaning and not only appearence.&lt;br /&gt;
&lt;br /&gt;
* ''Collation of multiple sources'': Semantic markup, if applied to the apparatus criticus, should result in machine actionable data, allowing users to compare multiple versions of the same text.&lt;br /&gt;
&lt;br /&gt;
=Building a digital library of primary sources=&lt;br /&gt;
&lt;br /&gt;
The first generation of large scale, on-line text corpora provided transcriptions of primary materials. Projects such as the TLG and the ''Packard Humanities Institute Latin CD ROM'' carefully document the copy texts on which their electronic versions depend. The provenance of texts in the extensive Latin corpus at [[http://www.thelatinlibrary.com the Latin Library]] is often unclear, with volunteer transcribers blending texts and leaving no trail of their changes.&lt;br /&gt;
&lt;br /&gt;
We now see vast libraries with millions of digital books either in active development or in advanced stages of planning. Most, if not all, of books now in the public domain will be available in electronic form. Rights disputes may slow digitization of the rest but Google's aggressive stance may, at worst, make publishers more open to pursuing an acceptable arrangement with Yahoo, Microsoft and others now entering this market. In this model, readers view scanned page images but search text automatically generated by OCR software. For many purposes, such &amp;quot;image front&amp;quot; collections are quite effective:  narrative prose printed since the mid 19th century lends itself very well to commercial OCR. &lt;br /&gt;
&lt;br /&gt;
Image books do not, however, provide the accuracy and detailed markup that users of primary sources expect.  Text collections with millions of words will contain errors for some time after publication but we want to minimize these errors.  We want to be able to identify pieces of texts by standard citation (e.g., &amp;quot;Liv. 3.22&amp;quot; should retrieve the text of Book 3, Chapter 22 of Livy's History of Rome. We also want text searches to be able to distinguish between primary text, textual notes and other annotations.&lt;br /&gt;
&lt;br /&gt;
The following describes an approach of adding structure to digital image books of primary sources. &lt;br /&gt;
&lt;br /&gt;
* '''Collate an image-front edition with searchable, OCR generated text against other electronic editions of the same text''':  Many classical texts are available on-line in at least one edition.  Once we have scanned a new edition and generated text with OCR, we can collate the OCR against pre-existing electronic editions with surprisingly little effort:  half of the word forms in a book length document are generally unique.  By comparing sequences of unique word forms in pre-existing text and new OCR, we can align use these sequences to align two texts.  In our experiments, we have found that we can immediately align one word in ten.  We can then compare the intervening sequence (on the average nine words long) to identify variations.  Variations include errors in data entry (whether in the OCR or in the pre-existing text), deliberate textual variations and non-textual elements such as headers and textual notes.  Where a variation involves one or two words and we cannot generate a morphological analysis for the new words, then we probably have an error.  If we can generate morphological analyses for the variants in both versions, then we probably have deliberate variations. If we have extra text at the start or end of pages, we probably have headers or notes.  If we have extraneous numbers in the source texts, then these are probably citations.  Even if we are working with a pre-existing text that contains errors or whose provenance is unknown, we can often use this text to determine that page 123 of edition X contains book 3, lines 33 to 57 of a given edition, thus making the OCR generated edition citable by chapter and verse.  If we have an accurate pre-existing text without textual notes, we can compare the results of searching that text with searching the relevant sections of the OCR-generated text.  If a word shows up in the OCR generated text but not in the pre-existing text, then we probably have a match in the textual notes.  While OCR quality varies from text to text and from language to language, we can thus produce initial searches of the textual notes with relatively little effort.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* '''Create an accurate, carefully marked up transcription of a print original''':  In this stage, we aim to capture every character on the printed source page and to represent the logical structure of the document: ideally, the text should be sufficiently well encoded that readers could ask to compare the readings reported by different witnesses (e.g., &amp;quot;display places where M differs from P and provide a statistical analysis of how often these sources differ&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
* '''Create a new edition, traceable to its print original, but able to represent multiple versions representing multiple witnesses and multiple new editions''':  The source text becomes the foundation multiple new editions. Once we have a carefully constructed source text, we can generate as many variations as we like. The source may -- and probably willl -- soon recede into the background but will provide a stable framework whereby we can compare all subsequent editions.&lt;br /&gt;
&lt;br /&gt;
====Choice of source texts====&lt;br /&gt;
&lt;br /&gt;
If we were creating a traditional scholarly text collection, we would want the most up-to-date current editions, In this model, however, we need to balance the authority of the source text against their ability to evolve into richer editions encoding multiple sources and editorial versions. If a serious user community exists, if it values additions to textual scholarship and if it has reasonable technical and editorial mechanisms to enhance its editions, living older texts will overtake any static edition. &lt;br /&gt;
&lt;br /&gt;
The two extreme cases are:&lt;br /&gt;
&lt;br /&gt;
* '''Recent editions that may be at present the most comprehensive and authoritative but cannot be augmented'''.  Whether or not publishers can claim copyright to scholarly reconstructions of primary source materials, editors should certainly have the right to prepare a single version of an edition to which no one else can make changes.&lt;br /&gt;
&lt;br /&gt;
* '''Editions that are are designed to accept -- and document -- new witnesses and editorial decisions'''.  In the simplest case, this would include careful transcriptions of public domain editions. A mature versioning environment tracks each addition and can reconstruct any given version. Versioning software analyzes new transcriptions of witnesses and editions.&lt;br /&gt;
&lt;br /&gt;
In practical terms, the best accessible editions will usually be the best public domain editions, with a few editors initially offering their work. It would probably be best to use public domain editions as initial test cases and to use these to work out inevitable bugs and organizational issues. Current editors may, in any event, find it as easy to add their changes to a well-structured public domain edition than to supervise the markup of their own print editions or the word processing files from which they derive. &lt;br /&gt;
&lt;br /&gt;
====Sources for Images of Print Editions====&lt;br /&gt;
&lt;br /&gt;
* '''Local book scanning''':  A number of institutions (including Perseus) can scan limited numbers of books.  Sheet feeder scanners can process c. 1,000 pages an hour but they require that the source books be disbound. Look down scanners do not damage the source materials and are slower but they still can process 100+ pages in an hour and are useful for smaller jobs.&lt;br /&gt;
&lt;br /&gt;
* '''Large book scanning projects''':  There are now a number of projects that are scanning very large numbers of books.  [[http://books.google.com/ Google Print]] has begun assembling a library that will include tens of millions of books.  Google plans to make the library openly searchable and will return copies of the scanned books to the participating research libraries, but it is not clear how easily other developers will be able to get their own copies on which to apply specialized OCR and content analysis. The [[http://www.opencontentalliance.org/ Open Content Alliance]] constitutes a growing consortium of content providers and third party service providers.  Led by the [[http://www.archive.org Internet Archive]], the OCA has begun making high resolution image books available and is providing [[http://www.archive.org/details/texts a clearing house for related efforts]] such as the [[http://www.archive.org/details/millionbooks Million Book Project]]. The newer robotic scanners do a very good job of turning pages -- even pausing to let one page clinging to another drop off as they turn. They seem to be able to process more than 1,000 pages an hour and thus to exceed the best throughput we have achieved running disbound pages through a sheet feeder -- very impressive. The drawback is that these robots are expensive: the most recent ones from Kirtas cost $140,000-$180,000. You need to get high volume to justify this enconomically. If you can get 1,200 pages an hour, then you might do three books an hour and 120 books a week. That would be about 6,000 books a year -- or about $30-$40 per book for the hardware investement alone exclusive of labor and postprocessing. If you consider 100 hours/week over two years and thus 300 400-page books a week, you get to  15,000 a year and the price clearly comes down. Run that over three years with 45,000 books and the cost becomes manageable.&lt;br /&gt;
&lt;br /&gt;
In practice, editors interested in a few authors can get their source materials scanned at a variety of locations.  Larger series (such as the Patrologia Latina) are well suited to the large scale book scanning projects. The biggest problem involves getting copies of the desired books to a location where large scale scanning is taking place.  The California Digital Library, which serves the UC system, and the University of Toronto were early on partners in OCA and between them would have virtually every edition of Greek or Latin texts published in the past two centuries. An [[http://www.libraryjournal.com/article/CA6277402.html article in LibraryJournal from November 1, 2005]] reports that the European Commission is planning a large digital library project of its own that will focus initially on the public domain.&lt;br /&gt;
&lt;br /&gt;
====Components of next generation electronic editions====&lt;br /&gt;
These editions will have the following components:&lt;br /&gt;
&lt;br /&gt;
* '''One or more baseline print editions available as image books''': At least one print edition should be available as an electronic source to which readers can refer if they feel that they have detected a data entry or formatting error. Everything necessary for representing at least one core edition in a tagged file should be available to the community. Given the demands of publishers, these may not be the most up-to-date editions of an author but they are intended as a starting point.  All such texts should, of course, have OCR generated searchable text.  If the original source texts have page numbers, then these should be encoded and citable.&lt;br /&gt;
&lt;br /&gt;
* '''A flexible editing environment which allows user  communities to improve the current document''':  Electronic documents are by nature dynamic and can evolve over time. Where print editions constitute end points of a long stage of development, electronic editions can serve as starting points to on-going development. Initial tasks may focus on correcting OCR errors, adding structural markup and other basic chores.  Ultimately, however, users will want to associate higher level annotations (e.g., specifying that a given &amp;quot;Salamis&amp;quot; is the Salamis in Cyprus rather than near Athens, or indicating that &amp;quot;faciam&amp;quot; is a subjunctive rather than a future, etc.).  Examples of decentralized editing environments that link transcriptions with images of the source pages include [[http://www.pgdp.net/ Distributed Proofreaders]] program of [[http://www.gutenberg.org/ Project Gutenberg]] and the [[http://www.ccel.org/help/facsim/ Digital Facsimile Editions]] of the [[http://www.ccel.org/ Christian Classics Ethereal Library]] ,&lt;br /&gt;
&lt;br /&gt;
* '''A tagged transcript of one or more print editions''':  This should include everything from the original edition, including introduction, textual notes, commentary, index, and any other materials from the source book. At this stage, the idioyncratic line breaks of particular editions should be preserved if the textual notes, commentary or other parts of the book use these line breaks for internal citations. All citations should be tagged and activated: e.g., wherever the text refers to &amp;quot;page 132 line 18&amp;quot; or &amp;quot;chapter 44, line 8&amp;quot;, these expressions should be converted into active links. Textual notes should appear as simple notes and placed within the body of the source texts. This version serves as a temporary work space and should yield to the following stage. It should become the official representation of the original print edition. The [[http://www.uni-mannheim.de/mateo/camenahtdocs/camenahist.html | Camena project]] &lt;br /&gt;
&lt;br /&gt;
* '''Fully interpreted electronic version of the print text''':  While many documents may be complete at this stage, textual notes in critical editions should be converted from human readable descriptions into machine interpretable operations. Thus, readers should be able to view the text as it appears in any given manuscript, view places where any two witnesses disagree with one another, and see analyses of how far different versions of the text differ from one another. This version of the text should become the default and replace the tagged transcript.  &lt;br /&gt;
&lt;br /&gt;
* '''One or more translations''': Translations should have provenance so that readers know whether or not they reflect the online version of the source text.  Translations should, like the editions, include all accompanying materials including introduction, notes, appendices, indices etc.  Like editions, translations should be available both as image books so that readers can, when in doubt, consult the print originals.&lt;br /&gt;
&lt;br /&gt;
The fully interpreted electronic edition should then provide a starting for subsequent edits. The text could evolve in a variety of ways.&lt;br /&gt;
&lt;br /&gt;
* '''Systematic collations''':  Individuals may systematically collate the source text against new witnesses (e.g., manuscripts, papyri, etc.) or new editions (where editors may have derived different conclusions and printed different readings).  All additions must be transparent: thus, we cannot record new readings without providing their jusification.  We can add new readings from manuscripts and other sources without necessarily changing the text. We cannot record different editorial decisions without encoding the source for those decisions.&lt;br /&gt;
&lt;br /&gt;
* '''Coordination of edition, textual notes and at least one reference translation''':  We may have multiple translations reflecting multiple editions of a given work but we should have at least one edition that reflects the content of the base edition and that can represent the different readings in the textual notes. Readers should always be able to see how (or whether) any given reading affects the main translation.  Readers should thus be able to filter out those notes which do not impact upon the English and to analyze the ''aggregate impact'' of choosing one version over another. While small changes of language can have dramatic effects upon meaning, readers should be able to gauge the overall significance of different version.&lt;br /&gt;
&lt;br /&gt;
A great deal more can be done with and for any given edition: we can add (and have added) commentaries, linguistic markup, links to scholarship and other supplementary materials. At the same time, the  but the above represents a basic level of documentation towards which producers should, in our view, aim.&lt;br /&gt;
&lt;br /&gt;
====Editorial Conventions====&lt;br /&gt;
&lt;br /&gt;
* '''Changes from the source text to the transcription''':  The Text Encoding Initiative provides tags to record locations where editors have corrected errors in the source, expanded abbreviations, and regularized spellings.&lt;br /&gt;
&lt;br /&gt;
* '''Markup stylesheet''':  The Text Encoding Initiative offers a range of tags but is not universal. In some cases, we will need to extend the TEI. In other cases, the TEI allows us to represent the same information in different ways: e.g., &amp;lt;name type=&amp;quot;place&amp;quot;&amp;gt;Rome&amp;lt;/name&amp;gt; or &amp;lt;placeName&amp;gt;Rome&amp;lt;/placeName&amp;gt;. The more homogeneous editions can be, the easier it will be to search, browse and maintain them over time.  Perseus has evolved conventions of its own over time, but even within Perseus different projects has approached the same problems differently. We need documentation that is more extensive and that can be updated in real time (e.g., a Wiki).&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2002</id>
		<title>OSCE Crane Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Crane_Paper&amp;diff=2002"/>
		<updated>2007-01-29T14:51:57Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;We need a comprehensive library of initial editions, openly accessible and freely available for re-use in derivative works.  This paper outlines one strategy for starting with print editions and moving into a more purely digital stage. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are two components to this argument, both on the Perseus Development Wiki:&lt;br /&gt;
&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Open_Content_Scholarly_Sources&lt;br /&gt;
http://devwiki.perseus.tufts.edu/wiki/Next_generation_electronic_editions&lt;br /&gt;
&lt;br /&gt;
Open Content Scholarly Sources ----&lt;br /&gt;
&lt;br /&gt;
Google, Microsoft, Yahoo and other internet giants are now creating digital libraries designed to become more comprehensive than any academic library in human history. The current philosophy of these efforts stresses open access.  The creators of the Google project and the Internet Archive have expressed a dedication to open access.  Open access also maximizes the potential audience and thus  reinforces the advertising based business model on which these internet giants have founded their library efforts.&lt;br /&gt;
&lt;br /&gt;
The funders, however, retain varying rights to their work.  Google, for example, has now made available full PDF image books of public domain documents but it asserts proprietary rights over the page images and does not allow third parties to apply their own OCR or document recognition software.  The Open Content Alliance in principle encourages its partners to share everything but individual funders can impose their own restrictions on what they submit to OCA.&lt;br /&gt;
&lt;br /&gt;
We are therefore creating a completely open source library of core resources such as reference works and critical editions.  Our goal is to provide access to foundational information and also a foundation of materials that subsequent authors can modify, update, expand, and otherwise improve.  &lt;br /&gt;
&lt;br /&gt;
Our selection criteria differ from those of the print world.  A print library picks the best, most up-to-date documents available, knowing that print publications can be replaced but cannot change.  In a true digital library, documents can be dynamic and evolve in real time.  A recent encyclopedia will, presumably, be superior to another that is a century old.  But if the century-old encyclopedia can be freely updated and attracts high quality modifications, it can evolve and become more up-to-date and more authoritative than its frozen print counterpart.&lt;br /&gt;
&lt;br /&gt;
The classics component of the Open Content Scholarly Library that Perseus is helping create is being made available under a sharalike/attribution/non-commercial Creative Commons license. It contains the following:&lt;br /&gt;
&lt;br /&gt;
:* Source texts of Greek and Latin:  We have already released c. 8.5 million words of Greek and Latin source texts in TEI-compliant XML.  We have also digitized several hundred volumes of source texts.  These will be available as image books with searchable OCR and, where feasible, XML transcriptions.  Unlike most previous collections, this includes, where possible, multiple editions as well as traditional lists of places where on-line editions differ from editions not yet available on-line.&lt;br /&gt;
&lt;br /&gt;
:* Lexica of Greek and Latin:  These include major works such as the Liddell Scott Jones Greek-English Lexicon and the Lewis and Short Latin-English Lexicon as well as more specialized works such as Cunliff's Homeric Lexicon.&lt;br /&gt;
&lt;br /&gt;
:* Grammars:  These include student grammars such as Smyth's Greek Grammar and Allen and Greenough's Latin Grammar as well as extensive scholarly works such as Kühner-Gerth.&lt;br /&gt;
&lt;br /&gt;
:* Commentaries:  These include scholarly editions as well as school commentaries with linguistic annotations.  Commentaries lend themselves particularly well to electronic publication, which is optimally designed for the production, display and management of annotations.&lt;br /&gt;
&lt;br /&gt;
:* Tools:  These include Morpheus, the morphological analysis system developed in the late 1980s and still providing useful analyses of Greek and Latin words.  More importantly, this will include the databases with c. 100,000 stems and endings, mined from many sources,  and of potential use to third party morphological analysis systems.  All the core tools in the Perseus Digital Library have been rewritten in Java and will be available as additions to institutional repositories such as Fedora and any developers.&lt;br /&gt;
&lt;br /&gt;
:* FRBR Catalog Records for source texts:  Large projects such as dictionaries and text corpora have developed checklists of editions which they have used.  We are creating a modern catalog that builds on prior work (e.g., we use the author and work numbers developed by the TLG and PHI for Greek and Latin author) but provides an extensible architecture that can manage multiple editions, translations (e.g, English, French and German translations of an author), multiple versions of the same editions (e.g., an image book vs. a TEI transcription), multiple citation schemes (e.g., sections vs. chapters in Cicero)..&lt;br /&gt;
&lt;br /&gt;
:* Authority lists of people, places, dictionary entries, organizations, etc.  The reference works that we are producing lay the foundation for a comprehensive, extensible set of authority lists -- shared names with which we can uniquely identify particular people, places dictionary entries, organizations, etc.  While such authority lists are difficult -- experts may differ on which Sallust a particular passage designates and will never all agree on which when we have a dictionary word with two distinct meanings vs. two distinct dictionary words.  Nevertheless, all scholarly work depends upon the entries that appear in our reference works and electronic authority lists, however imperfect, are essential tools for large digital collections.&lt;br /&gt;
&lt;br /&gt;
Users include:&lt;br /&gt;
&lt;br /&gt;
:* Service providers:  we would like to see the data released useful to as many groups and in as many ways as possible.  Thus, we hope to see the content in Google and the Open Content Alliance as well as scholarly environment such as Chicago's Philologic and the Canadian TAPOR project.&lt;br /&gt;
&lt;br /&gt;
:* Experts in the field:  we hope that experts in the field will revise and extend every document that we release, with versioning systems tracking these changes and allowing experts to get the credit which they deserve for the work that they do.&lt;br /&gt;
&lt;br /&gt;
:* General students of the field:  we hope to see Wiki based commentaries in which non-experts working their way through a text pose and answer the questions which puzzle them.&lt;br /&gt;
&lt;br /&gt;
:* Advanced service developers:  we hope that developers will mine the encylopedias to drive their named entity identification systems (e.g., analyzer the articles in Smith's to determine which Alexander a particular document is discussing), sense disambiguation (e.g., which sense of a word in an on-line lexicon is in play in a  given passage), machine translation (e.g., mine the parallel texts and translations and the bilingual dictionaries so that a modern machine translation system can provide Greek/English, Latin/English translations etc.).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Next Generation Editions ----&lt;br /&gt;
&lt;br /&gt;
=Summary=&lt;br /&gt;
&lt;br /&gt;
We propose a new generation of primary source corpora that are:&lt;br /&gt;
&lt;br /&gt;
: * ''Permanent'':  The texts are not leased from a commercial vendor over a period of time but are permanently accessible, with reference copies and versioning information stored in multiple institutional repositories for long term preservation as well as freely available.&lt;br /&gt;
&lt;br /&gt;
: * ''Openly accessible'':  Cultural heritage primary sources in the public domain should be openly accessible to all.  If it is necessary to restrict access to newly digitized materials in order to secure funding, that restriction should be clearly delimited and as short as possible: e.g., those who fund digitization may have exclusive access for five years before the texts are released for universal access.&lt;br /&gt;
&lt;br /&gt;
: * ''Multi-versioned'':  The texts themselves can be updated, with all changes tracked in a versioning system. Alternately, the texts provide a stable foundation for standoff markup representing textual variants or advanced interpretation.&lt;br /&gt;
&lt;br /&gt;
: * ''Paid for and maintained by academic libraries'':  While external funding may help begin this process, library acquisition budgets are the long term source of funding for costs such as data entry.  Libraries already pay for the production of digital resources by commercial, for-profit entitites, which restrict access to public domain content. The same library budgets can support open access databases built on public domain source materials.&lt;br /&gt;
&lt;br /&gt;
=Open Content Editions=&lt;br /&gt;
&lt;br /&gt;
The Perseus Project has released TEI conformant XML texts with 55 million words of American English, 13 million words of Latin and Greek source texts, and, for most of the Greek and Latin, corresponding English translations. These texts are available under a Creative Commons non-commercial license: they must be used with attribution; changes must be shared; they cannot be used as part of a commercial corpus.  Commercial entities can, however, freely design for profit services that add value to these openly accessible sources.&lt;br /&gt;
&lt;br /&gt;
While these source texts can freely circulate, they will also be part of the university's permanent institutional repository, thus providing a stable, long term home that will outlast any single project or contributor.&lt;br /&gt;
&lt;br /&gt;
The Greek and Latin corpus contains most of the major works of classical literature. The Perseus Latin Collection contains more than half of the classical corpus and that coverage will approach 100% over the course of 2006/2007.&lt;br /&gt;
&lt;br /&gt;
Working wish lists for [[Latin_wishlist | Latin]] and [[Greek_wishlist | Greek]] are available for comment/addition.&lt;br /&gt;
&lt;br /&gt;
=Next Steps=&lt;br /&gt;
&lt;br /&gt;
* ''Links to page images of paper sources'': With Google Library, the Open Content Alliance and Europe's i2010 we see the emerge of digital libraries with millions of books with high quality page images.  Copyright restrictions complicate these efforts but solid versions of most major authors are available in the public domain.  &lt;br /&gt;
&lt;br /&gt;
* ''Full coverage including apparatus, introduction, indices etc.'': Digital editions can include all information in the print text and not only the text.&lt;br /&gt;
&lt;br /&gt;
* ''Semantic markup'':  Markup should reflect meaning and not only appearence.&lt;br /&gt;
&lt;br /&gt;
* ''Collation of multiple sources'': Semantic markup, if applied to the apparatus criticus, should result in machine actionable data, allowing users to compare multiple versions of the same text.&lt;br /&gt;
&lt;br /&gt;
=Building a digital library of primary sources=&lt;br /&gt;
&lt;br /&gt;
The first generation of large scale, on-line text corpora provided transcriptions of primary materials. Projects such as the TLG and the ''Packard Humanities Institute Latin CD ROM'' carefully document the copy texts on which their electronic versions depend. The provenance of texts in the extensive Latin corpus at [[http://www.thelatinlibrary.com the Latin Library]] is often unclear, with volunteer transcribers blending texts and leaving no trail of their changes.&lt;br /&gt;
&lt;br /&gt;
We now see vast libraries with millions of digital books either in active development or in advanced stages of planning. Most, if not all, of books now in the public domain will be available in electronic form. Rights disputes may slow digitization of the rest but Google's aggressive stance may, at worst, make publishers more open to pursuing an acceptable arrangement with Yahoo, Microsoft and others now entering this market. In this model, readers view scanned page images but search text automatically generated by OCR software. For many purposes, such &amp;quot;image front&amp;quot; collections are quite effective:  narrative prose printed since the mid 19th century lends itself very well to commercial OCR. &lt;br /&gt;
&lt;br /&gt;
Image books do not, however, provide the accuracy and detailed markup that users of primary sources expect.  Text collections with millions of words will contain errors for some time after publication but we want to minimize these errors.  We want to be able to identify pieces of texts by standard citation (e.g., &amp;quot;Liv. 3.22&amp;quot; should retrieve the text of Book 3, Chapter 22 of Livy's History of Rome. We also want text searches to be able to distinguish between primary text, textual notes and other annotations.&lt;br /&gt;
&lt;br /&gt;
The following describes an approach of adding structure to digital image books of primary sources. &lt;br /&gt;
&lt;br /&gt;
* '''Collate an image-front edition with searchable, OCR generated text against other electronic editions of the same text''':  Many classical texts are available on-line in at least one edition.  Once we have scanned a new edition and generated text with OCR, we can collate the OCR against pre-existing electronic editions with surprisingly little effort:  half of the word forms in a book length document are generally unique.  By comparing sequences of unique word forms in pre-existing text and new OCR, we can align use these sequences to align two texts.  In our experiments, we have found that we can immediately align one word in ten.  We can then compare the intervening sequence (on the average nine words long) to identify variations.  Variations include errors in data entry (whether in the OCR or in the pre-existing text), deliberate textual variations and non-textual elements such as headers and textual notes.  Where a variation involves one or two words and we cannot generate a morphological analysis for the new words, then we probably have an error.  If we can generate morphological analyses for the variants in both versions, then we probably have deliberate variations. If we have extra text at the start or end of pages, we probably have headers or notes.  If we have extraneous numbers in the source texts, then these are probably citations.  Even if we are working with a pre-existing text that contains errors or whose provenance is unknown, we can often use this text to determine that page 123 of edition X contains book 3, lines 33 to 57 of a given edition, thus making the OCR generated edition citable by chapter and verse.  If we have an accurate pre-existing text without textual notes, we can compare the results of searching that text with searching the relevant sections of the OCR-generated text.  If a word shows up in the OCR generated text but not in the pre-existing text, then we probably have a match in the textual notes.  While OCR quality varies from text to text and from language to language, we can thus produce initial searches of the textual notes with relatively little effort.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* '''Create an accurate, carefully marked up transcription of a print original''':  In this stage, we aim to capture every character on the printed source page and to represent the logical structure of the document: ideally, the text should be sufficiently well encoded that readers could ask to compare the readings reported by different witnesses (e.g., &amp;quot;display places where M differs from P and provide a statistical analysis of how often these sources differ&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
* '''Create a new edition, traceable to its print original, but able to represent multiple versions representing multiple witnesses and multiple new editions''':  The source text becomes the foundation multiple new editions. Once we have a carefully constructed source text, we can generate as many variations as we like. The source may -- and probably willl -- soon recede into the background but will provide a stable framework whereby we can compare all subsequent editions.&lt;br /&gt;
&lt;br /&gt;
====Choice of source texts====&lt;br /&gt;
&lt;br /&gt;
If we were creating a traditional scholarly text collection, we would want the most up-to-date current editions, In this model, however, we need to balance the authority of the source text against their ability to evolve into richer editions encoding multiple sources and editorial versions. If a serious user community exists, if it values additions to textual scholarship and if it has reasonable technical and editorial mechanisms to enhance its editions, living older texts will overtake any static edition. &lt;br /&gt;
&lt;br /&gt;
The two extreme cases are:&lt;br /&gt;
&lt;br /&gt;
* '''Recent editions that may be at present the most comprehensive and authoritative but cannot be augmented'''.  Whether or not publishers can claim copyright to scholarly reconstructions of primary source materials, editors should certainly have the right to prepare a single version of an edition to which no one else can make changes.&lt;br /&gt;
&lt;br /&gt;
* '''Editions that are are designed to accept -- and document -- new witnesses and editorial decisions'''.  In the simplest case, this would include careful transcriptions of public domain editions. A mature versioning environment tracks each addition and can reconstruct any given version. Versioning software analyzes new transcriptions of witnesses and editions.&lt;br /&gt;
&lt;br /&gt;
In practical terms, the best accessible editions will usually be the best public domain editions, with a few editors initially offering their work. It would probably be best to use public domain editions as initial test cases and to use these to work out inevitable bugs and organizational issues. Current editors may, in any event, find it as easy to add their changes to a well-structured public domain edition than to supervise the markup of their own print editions or the word processing files from which they derive. &lt;br /&gt;
&lt;br /&gt;
====Sources for Images of Print Editions====&lt;br /&gt;
&lt;br /&gt;
* '''Local book scanning''':  A number of institutions (including Perseus) can scan limited numbers of books.  Sheet feeder scanners can process c. 1,000 pages an hour but they require that the source books be disbound. Look down scanners do not damage the source materials and are slower but they still can process 100+ pages in an hour and are useful for smaller jobs.&lt;br /&gt;
&lt;br /&gt;
* '''Large book scanning projects''':  There are now a number of projects that are scanning very large numbers of books.  [[http://books.google.com/ Google Print]] has begun assembling a library that will include tens of millions of books.  Google plans to make the library openly searchable and will return copies of the scanned books to the participating research libraries, but it is not clear how easily other developers will be able to get their own copies on which to apply specialized OCR and content analysis. The [[http://www.opencontentalliance.org/ Open Content Alliance]] constitutes a growing consortium of content providers and third party service providers.  Led by the [[http://www.archive.org Internet Archive]], the OCA has begun making high resolution image books available and is providing [[http://www.archive.org/details/texts a clearing house for related efforts]] such as the [[http://www.archive.org/details/millionbooks Million Book Project]]. The newer robotic scanners do a very good job of turning pages -- even pausing to let one page clinging to another drop off as they turn. They seem to be able to process more than 1,000 pages an hour and thus to exceed the best throughput we have achieved running disbound pages through a sheet feeder -- very impressive. The drawback is that these robots are expensive: the most recent ones from Kirtas cost $140,000-$180,000. You need to get high volume to justify this enconomically. If you can get 1,200 pages an hour, then you might do three books an hour and 120 books a week. That would be about 6,000 books a year -- or about $30-$40 per book for the hardware investement alone exclusive of labor and postprocessing. If you consider 100 hours/week over two years and thus 300 400-page books a week, you get to  15,000 a year and the price clearly comes down. Run that over three years with 45,000 books and the cost becomes manageable.&lt;br /&gt;
&lt;br /&gt;
In practice, editors interested in a few authors can get their source materials scanned at a variety of locations.  Larger series (such as the Patrologia Latina) are well suited to the large scale book scanning projects. The biggest problem involves getting copies of the desired books to a location where large scale scanning is taking place.  The California Digital Library, which serves the UC system, and the University of Toronto were early on partners in OCA and between them would have virtually every edition of Greek or Latin texts published in the past two centuries. An [[http://www.libraryjournal.com/article/CA6277402.html article in LibraryJournal from November 1, 2005]] reports that the European Commission is planning a large digital library project of its own that will focus initially on the public domain.&lt;br /&gt;
&lt;br /&gt;
====Components of next generation electronic editions====&lt;br /&gt;
These editions will have the following components:&lt;br /&gt;
&lt;br /&gt;
* '''One or more baseline print editions available as image books''': At least one print edition should be available as an electronic source to which readers can refer if they feel that they have detected a data entry or formatting error. Everything necessary for representing at least one core edition in a tagged file should be available to the community. Given the demands of publishers, these may not be the most up-to-date editions of an author but they are intended as a starting point.  All such texts should, of course, have OCR generated searchable text.  If the original source texts have page numbers, then these should be encoded and citable.&lt;br /&gt;
&lt;br /&gt;
* '''A flexible editing environment which allows user  communities to improve the current document''':  Electronic documents are by nature dynamic and can evolve over time. Where print editions constitute end points of a long stage of development, electronic editions can serve as starting points to on-going development. Initial tasks may focus on correcting OCR errors, adding structural markup and other basic chores.  Ultimately, however, users will want to associate higher level annotations (e.g., specifying that a given &amp;quot;Salamis&amp;quot; is the Salamis in Cyprus rather than near Athens, or indicating that &amp;quot;faciam&amp;quot; is a subjunctive rather than a future, etc.).  Examples of decentralized editing environments that link transcriptions with images of the source pages include [[http://www.pgdp.net/ Distributed Proofreaders]] program of [[http://www.gutenberg.org/ Project Gutenberg]] and the [[http://www.ccel.org/help/facsim/ Digital Facsimile Editions]] of the [[http://www.ccel.org/ Christian Classics Ethereal Library]] ,&lt;br /&gt;
&lt;br /&gt;
* '''A tagged transcript of one or more print editions''':  This should include everything from the original edition, including introduction, textual notes, commentary, index, and any other materials from the source book. At this stage, the idioyncratic line breaks of particular editions should be preserved if the textual notes, commentary or other parts of the book use these line breaks for internal citations. All citations should be tagged and activated: e.g., wherever the text refers to &amp;quot;page 132 line 18&amp;quot; or &amp;quot;chapter 44, line 8&amp;quot;, these expressions should be converted into active links. Textual notes should appear as simple notes and placed within the body of the source texts. This version serves as a temporary work space and should yield to the following stage. It should become the official representation of the original print edition. The [[http://www.uni-mannheim.de/mateo/camenahtdocs/camenahist.html | Camena project]] &lt;br /&gt;
&lt;br /&gt;
* '''Fully interpreted electronic version of the print text''':  While many documents may be complete at this stage, textual notes in critical editions should be converted from human readable descriptions into machine interpretable operations. Thus, readers should be able to view the text as it appears in any given manuscript, view places where any two witnesses disagree with one another, and see analyses of how far different versions of the text differ from one another. This version of the text should become the default and replace the tagged transcript.  &lt;br /&gt;
&lt;br /&gt;
* '''One or more translations''': Translations should have provenance so that readers know whether or not they reflect the online version of the source text.  Translations should, like the editions, include all accompanying materials including introduction, notes, appendices, indices etc.  Like editions, translations should be available both as image books so that readers can, when in doubt, consult the print originals.&lt;br /&gt;
&lt;br /&gt;
The fully interpreted electronic edition should then provide a starting for subsequent edits. The text could evolve in a variety of ways.&lt;br /&gt;
&lt;br /&gt;
* '''Systematic collations''':  Individuals may systematically collate the source text against new witnesses (e.g., manuscripts, papyri, etc.) or new editions (where editors may have derived different conclusions and printed different readings).  All additions must be transparent: thus, we cannot record new readings without providing their jusification.  We can add new readings from manuscripts and other sources without necessarily changing the text. We cannot record different editorial decisions without encoding the source for those decisions.&lt;br /&gt;
&lt;br /&gt;
* '''Coordination of edition, textual notes and at least one reference translation''':  We may have multiple translations reflecting multiple editions of a given work but we should have at least one edition that reflects the content of the base edition and that can represent the different readings in the textual notes. Readers should always be able to see how (or whether) any given reading affects the main translation.  Readers should thus be able to filter out those notes which do not impact upon the English and to analyze the ''aggregate impact'' of choosing one version over another. While small changes of language can have dramatic effects upon meaning, readers should be able to gauge the overall significance of different version.&lt;br /&gt;
&lt;br /&gt;
A great deal more can be done with and for any given edition: we can add (and have added) commentaries, linguistic markup, links to scholarship and other supplementary materials. At the same time, the  but the above represents a basic level of documentation towards which producers should, in our view, aim.&lt;br /&gt;
&lt;br /&gt;
====Editorial Conventions====&lt;br /&gt;
&lt;br /&gt;
* '''Changes from the source text to the transcription''':  The Text Encoding Initiative provides tags to record locations where editors have corrected errors in the source, expanded abbreviations, and regularized spellings.&lt;br /&gt;
&lt;br /&gt;
* '''Markup stylesheet''':  The Text Encoding Initiative offers a range of tags but is not universal. In some cases, we will need to extend the TEI. In other cases, the TEI allows us to represent the same information in different ways: e.g., &amp;lt;name type=&amp;quot;place&amp;quot;&amp;gt;Rome&amp;lt;/name&amp;gt; or &amp;lt;placeName&amp;gt;Rome&amp;lt;/placeName&amp;gt;. The more homogeneous editions can be, the easier it will be to search, browse and maintain them over time.  Perseus has evolved conventions of its own over time, but even within Perseus different projects has approached the same problems differently. We need documentation that is more extensive and that can be updated in real time (e.g., a Wiki).&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Dunn_Paper&amp;diff=2001</id>
		<title>OSCE Dunn Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Dunn_Paper&amp;diff=2001"/>
		<updated>2007-01-29T14:50:58Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;1 e-Science and the critical edition: a discussion paper&lt;br /&gt;
&lt;br /&gt;
1.1 Stuart Dunn and Tobias Blanke&lt;br /&gt;
&lt;br /&gt;
1.1.1 Arts and Humanities e-Science Support Centre, King's College London&lt;br /&gt;
&lt;br /&gt;
At the end of the Nineties, a national e-Science Core Programme was established in the UK. Its agenda was driven by scientists who needed new technologies and concepts to cope with the ever increasing amount of data, both from experiments and simulations as well as knowledge gathering exercises. Faced with this 'data deluge', a new data-driven science was conceptualized with the scientist and research methods at the center of new data technologies. The idea of e-Science and the e-Scientist was accompanied by the development of new high-speed computing networks that promised solutions to a variety of problems in coping with the vast amount of information. 'Grid technologies' were the result of a global effort from computer scientists working together witch practitioners to advance existing network technologies like the internet in order to create a global space of sharing resources and services.&lt;br /&gt;
&lt;br /&gt;
Several e-Science initiatives in the UK are promoting to advance research work in virtual spaces with advanced computing - in particular network technologies. Technologies and methodologies for the automation and support of research processes are being investigated. Grid technologies and methodologies address how globally distributed data resources can be used in the research process or how computational power can be shared. At the same time, new forms of scholarly communications in 'virtual organizations' are developed. For example, the Access Grid promises tools to support structured meetings of researchers in group-to-group collaborations, a benefit which will be keenly felt by A&amp;amp;H researchers as they move towards larger and more formal collaborations. The advantages of direct communication in face-to-face meetings is combined with the ability to share instantly digital items among the groups. Grid technologies integrate two recent developments in research that are inseparable from each other: the new possibilities due to improved technologies complement new highly collaborative research.&lt;br /&gt;
&lt;br /&gt;
E-Science therefore stands for the development and deployment of a networked infrastructure and culture through which resources can be shared in a secure environment. These resources can be everything from processing power, data, or expertise that researchers can share. This networked infrastructure allows a culture of collaboration, in which new forms of collaboration can emerge, and new and advanced methodologies can be explored.&lt;br /&gt;
&lt;br /&gt;
A key to the success of e-Science is the provision of shared access to research facilities and therefore to provide answers to the increasing globalisation of research. Researchers from around the world can work together and use each other's resources as if they were collocated. Digital knowledge objects shall be created and (re-)used in virtual collaboration spaces. E-research is about joining things up and not purely about CPU power or computer networking. It is about pro-active relationships as between server to server and programme to programme and research practitioner to research practitioner. This global collaboration in a virtual space will be of key significance to what Arts and Humanities (A&amp;amp;H) researchers are going to be doing over the next ten years; and will fundamentally alter their relationship with the resources they use. &lt;br /&gt;
&lt;br /&gt;
Critical editions provide a key example of such resources. A recent expert seminar convened at the University of Sheffield by the AHDS e-Science Scoping Survey (http://ahds.ac.uk/e-science/e-science-scoping-study.htm) debated the application of e-science methods and technologies to the critical edition. It was considered that the concepts of the Virtual Research Environment (http://www.ahessc.ac.uk/briefing_papers/VRE_briefing_paper.html) and Virtual Organization have the potential to enable a paradigm shift from the 'traditional' model of the critical edition, whereby the text is produced by an individual researcher or small group of scholars and presented to a wider community as a static document, and an alternative whereby texts are produced and owned collaboratively by that community. In the latter case the text is produced as part of an iterative and ongoing process, under the collective influence of a group of researchers. The same principle could apply to elements of the 'digital infrastructure' on which much collaborative work relies - thesauri, dictionaries, lexica and so on. This raises complex issues of academic integrity and trust: the high-profile debate of the applicability of Wikipedia in research contexts is well known, and few would argue that a totally unfettered editorial process is appropriate. However such methodologies have very profound implications for the way humanities research is done, and the challenge is to quantify and qualify the shades of grey between Wikipedia and the traditional critical edition model.&lt;br /&gt;
&lt;br /&gt;
1.1.1 Some key questions are:&lt;br /&gt;
&lt;br /&gt;
* What technologies are needed to enable the collaborative research environments required for such 'democratization' of the critical edition?&lt;br /&gt;
* Do users need such editions? Will they ever trust them?&lt;br /&gt;
* How should access to the editorial process be managed? Who decides who gets to edit the text? Should it be managed at all? &lt;br /&gt;
* How should version control be maintained?&lt;br /&gt;
* How should annotations and edits be captured, both in terms of the finished article and the workflow process?&lt;br /&gt;
* What kind of peer-review process needs to be in place? &lt;br /&gt;
* How should cataloguing, referencing and citation of such documents be approached?&lt;br /&gt;
* How can such texts fit in to existing library and information (infra)structures? Will these need to be rethought?&lt;br /&gt;
&lt;br /&gt;
[[Category:OSCE]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Dunn_Paper&amp;diff=2000</id>
		<title>OSCE Dunn Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Dunn_Paper&amp;diff=2000"/>
		<updated>2007-01-29T14:49:58Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
1 e-Science and the critical edition: a discussion paper&lt;br /&gt;
&lt;br /&gt;
1.1 Stuart Dunn and Tobias Blanke&lt;br /&gt;
&lt;br /&gt;
1.1.1 Arts and Humanities e-Science Support Centre, King's College London&lt;br /&gt;
&lt;br /&gt;
At the end of the Nineties, a national e-Science Core Programme was established in the UK. Its agenda was driven by scientists who needed new technologies and concepts to cope with the ever increasing amount of data, both from experiments and simulations as well as knowledge gathering exercises. Faced with this 'data deluge', a new data-driven science was conceptualized with the scientist and research methods at the center of new data technologies. The idea of e-Science and the e-Scientist was accompanied by the development of new high-speed computing networks that promised solutions to a variety of problems in coping with the vast amount of information. 'Grid technologies' were the result of a global effort from computer scientists working together witch practitioners to advance existing network technologies like the internet in order to create a global space of sharing resources and services.&lt;br /&gt;
&lt;br /&gt;
Several e-Science initiatives in the UK are promoting to advance research work in virtual spaces with advanced computing - in particular network technologies. Technologies and methodologies for the automation and support of research processes are being investigated. Grid technologies and methodologies address how globally distributed data resources can be used in the research process or how computational power can be shared. At the same time, new forms of scholarly communications in 'virtual organizations' are developed. For example, the Access Grid promises tools to support structured meetings of researchers in group-to-group collaborations, a benefit which will be keenly felt by A&amp;amp;H researchers as they move towards larger and more formal collaborations. The advantages of direct communication in face-to-face meetings is combined with the ability to share instantly digital items among the groups. Grid technologies integrate two recent developments in research that are inseparable from each other: the new possibilities due to improved technologies complement new highly collaborative research.&lt;br /&gt;
&lt;br /&gt;
E-Science therefore stands for the development and deployment of a networked infrastructure and culture through which resources can be shared in a secure environment. These resources can be everything from processing power, data, or expertise that researchers can share. This networked infrastructure allows a culture of collaboration, in which new forms of collaboration can emerge, and new and advanced methodologies can be explored.&lt;br /&gt;
&lt;br /&gt;
A key to the success of e-Science is the provision of shared access to research facilities and therefore to provide answers to the increasing globalisation of research. Researchers from around the world can work together and use each other's resources as if they were collocated. Digital knowledge objects shall be created and (re-)used in virtual collaboration spaces. E-research is about joining things up and not purely about CPU power or computer networking. It is about pro-active relationships as between server to server and programme to programme and research practitioner to research practitioner. This global collaboration in a virtual space will be of key significance to what Arts and Humanities (A&amp;amp;H) researchers are going to be doing over the next ten years; and will fundamentally alter their relationship with the resources they use. &lt;br /&gt;
&lt;br /&gt;
Critical editions provide a key example of such resources. A recent expert seminar convened at the University of Sheffield by the AHDS e-Science Scoping Survey (http://ahds.ac.uk/e-science/e-science-scoping-study.htm) debated the application of e-science methods and technologies to the critical edition. It was considered that the concepts of the Virtual Research Environment (http://www.ahessc.ac.uk/briefing_papers/VRE_briefing_paper.html) and Virtual Organization have the potential to enable a paradigm shift from the 'traditional' model of the critical edition, whereby the text is produced by an individual researcher or small group of scholars and presented to a wider community as a static document, and an alternative whereby texts are produced and owned collaboratively by that community. In the latter case the text is produced as part of an iterative and ongoing process, under the collective influence of a group of researchers. The same principle could apply to elements of the 'digital infrastructure' on which much collaborative work relies - thesauri, dictionaries, lexica and so on. This raises complex issues of academic integrity and trust: the high-profile debate of the applicability of Wikipedia in research contexts is well known, and few would argue that a totally unfettered editorial process is appropriate. However such methodologies have very profound implications for the way humanities research is done, and the challenge is to quantify and qualify the shades of grey between Wikipedia and the traditional critical edition model.&lt;br /&gt;
&lt;br /&gt;
1.1.1 Some key questions are:&lt;br /&gt;
&lt;br /&gt;
* What technologies are needed to enable the collaborative research environments required for such 'democratization' of the critical edition?&lt;br /&gt;
* Do users need such editions? Will they ever trust them?&lt;br /&gt;
* How should access to the editorial process be managed? Who decides who gets to edit the text? Should it be managed at all? &lt;br /&gt;
* How should version control be maintained?&lt;br /&gt;
* How should annotations and edits be captured, both in terms of the finished article and the workflow process?&lt;br /&gt;
* What kind of peer-review process needs to be in place? &lt;br /&gt;
* How should cataloguing, referencing and citation of such documents be approached?&lt;br /&gt;
* How can such texts fit in to existing library and information (infra)structures? Will these need to be rethought?&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=OSCE_Dunn_Paper&amp;diff=1999</id>
		<title>OSCE Dunn Paper</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=OSCE_Dunn_Paper&amp;diff=1999"/>
		<updated>2007-01-29T14:49:47Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[OSCE index&amp;gt;Main.osce] | [OSCE programme&amp;gt;programme]&lt;br /&gt;
&lt;br /&gt;
1 e-Science and the critical edition: a discussion paper&lt;br /&gt;
&lt;br /&gt;
1.1 Stuart Dunn and Tobias Blanke&lt;br /&gt;
&lt;br /&gt;
1.1.1 Arts and Humanities e-Science Support Centre, King's College London&lt;br /&gt;
&lt;br /&gt;
At the end of the Nineties, a national e-Science Core Programme was established in the UK. Its agenda was driven by scientists who needed new technologies and concepts to cope with the ever increasing amount of data, both from experiments and simulations as well as knowledge gathering exercises. Faced with this 'data deluge', a new data-driven science was conceptualized with the scientist and research methods at the center of new data technologies. The idea of e-Science and the e-Scientist was accompanied by the development of new high-speed computing networks that promised solutions to a variety of problems in coping with the vast amount of information. 'Grid technologies' were the result of a global effort from computer scientists working together witch practitioners to advance existing network technologies like the internet in order to create a global space of sharing resources and services.&lt;br /&gt;
&lt;br /&gt;
Several e-Science initiatives in the UK are promoting to advance research work in virtual spaces with advanced computing - in particular network technologies. Technologies and methodologies for the automation and support of research processes are being investigated. Grid technologies and methodologies address how globally distributed data resources can be used in the research process or how computational power can be shared. At the same time, new forms of scholarly communications in 'virtual organizations' are developed. For example, the Access Grid promises tools to support structured meetings of researchers in group-to-group collaborations, a benefit which will be keenly felt by A&amp;amp;H researchers as they move towards larger and more formal collaborations. The advantages of direct communication in face-to-face meetings is combined with the ability to share instantly digital items among the groups. Grid technologies integrate two recent developments in research that are inseparable from each other: the new possibilities due to improved technologies complement new highly collaborative research.&lt;br /&gt;
&lt;br /&gt;
E-Science therefore stands for the development and deployment of a networked infrastructure and culture through which resources can be shared in a secure environment. These resources can be everything from processing power, data, or expertise that researchers can share. This networked infrastructure allows a culture of collaboration, in which new forms of collaboration can emerge, and new and advanced methodologies can be explored.&lt;br /&gt;
&lt;br /&gt;
A key to the success of e-Science is the provision of shared access to research facilities and therefore to provide answers to the increasing globalisation of research. Researchers from around the world can work together and use each other's resources as if they were collocated. Digital knowledge objects shall be created and (re-)used in virtual collaboration spaces. E-research is about joining things up and not purely about CPU power or computer networking. It is about pro-active relationships as between server to server and programme to programme and research practitioner to research practitioner. This global collaboration in a virtual space will be of key significance to what Arts and Humanities (A&amp;amp;H) researchers are going to be doing over the next ten years; and will fundamentally alter their relationship with the resources they use. &lt;br /&gt;
&lt;br /&gt;
Critical editions provide a key example of such resources. A recent expert seminar convened at the University of Sheffield by the AHDS e-Science Scoping Survey (http://ahds.ac.uk/e-science/e-science-scoping-study.htm) debated the application of e-science methods and technologies to the critical edition. It was considered that the concepts of the Virtual Research Environment (http://www.ahessc.ac.uk/briefing_papers/VRE_briefing_paper.html) and Virtual Organization have the potential to enable a paradigm shift from the 'traditional' model of the critical edition, whereby the text is produced by an individual researcher or small group of scholars and presented to a wider community as a static document, and an alternative whereby texts are produced and owned collaboratively by that community. In the latter case the text is produced as part of an iterative and ongoing process, under the collective influence of a group of researchers. The same principle could apply to elements of the 'digital infrastructure' on which much collaborative work relies - thesauri, dictionaries, lexica and so on. This raises complex issues of academic integrity and trust: the high-profile debate of the applicability of Wikipedia in research contexts is well known, and few would argue that a totally unfettered editorial process is appropriate. However such methodologies have very profound implications for the way humanities research is done, and the challenge is to quantify and qualify the shades of grey between Wikipedia and the traditional critical edition model.&lt;br /&gt;
&lt;br /&gt;
1.1.1 Some key questions are:&lt;br /&gt;
&lt;br /&gt;
* What technologies are needed to enable the collaborative research environments required for such 'democratization' of the critical edition?&lt;br /&gt;
* Do users need such editions? Will they ever trust them?&lt;br /&gt;
* How should access to the editorial process be managed? Who decides who gets to edit the text? Should it be managed at all? &lt;br /&gt;
* How should version control be maintained?&lt;br /&gt;
* How should annotations and edits be captured, both in terms of the finished article and the workflow process?&lt;br /&gt;
* What kind of peer-review process needs to be in place? &lt;br /&gt;
* How should cataloguing, referencing and citation of such documents be approached?&lt;br /&gt;
* How can such texts fit in to existing library and information (infra)structures? Will these need to be rethought?&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Vindolanda_Tablets_Online&amp;diff=1830</id>
		<title>Vindolanda Tablets Online</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Vindolanda_Tablets_Online&amp;diff=1830"/>
		<updated>2006-11-24T15:40:53Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Vindolanda Tablets Online ===&lt;br /&gt;
&lt;br /&gt;
URL: &amp;lt;span class=&amp;quot;wikiexternallink&amp;quot;&amp;gt;http://vindolanda.csad.ox.ac.uk/&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
This online edition of the Vindolanda writing tablets, excavated from the Roman fort at Vindolanda in northern England, includes the following elements:&lt;br /&gt;
&lt;br /&gt;
* Tablets - a searchable online edition of the tablets (volumes I and II)&lt;br /&gt;
* Exhibition - an introduction to the tablets and their context&lt;br /&gt;
* Reference - a guide to aspects of the tabletsï¿½ content&lt;br /&gt;
* Help - navigation and using the site&lt;br /&gt;
&lt;br /&gt;
Also available are highlights from the tablets.&lt;br /&gt;
&lt;br /&gt;
The website is part of the Script, Image and the Culture of Writing in the Ancient World programme, supported by the Andrew W. Mellon Foundation. It is a collaborative project between the Centre for the Study of Ancient Documents and the Academic Computing Development Team, Oxford University.&lt;br /&gt;
&lt;br /&gt;
Scholarly publications should refer to this site as:&lt;br /&gt;
&lt;br /&gt;
Vindolanda Tablets Online &amp;lt;span class=&amp;quot;nobr&amp;quot;&amp;gt;http://vindolanda.csad.ox.ac.uk/&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Feedback: if you are using Vindolanda Tablets Online for teaching, research or general interest, please send us your comments on the site.&lt;br /&gt;
&lt;br /&gt;
[[category:Projects]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Suda_Online&amp;diff=1829</id>
		<title>Suda Online</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Suda_Online&amp;diff=1829"/>
		<updated>2006-11-24T15:40:42Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Suda Online (SOL) ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span class=&amp;quot;wikiexternallink&amp;quot;&amp;gt;http://www.stoa.org/sol/&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
Certain fundamental sources for the study of the ancient world are currently accessible only to a few specially trained researchers because they have never been provided with a sufficiently convenient interpretive apparatus or, in some cases, even translated into modern languages. The Suda On Line project attacks that inaccessibility by engaging the efforts of scholars world-wide in the translation and annotation of a substantial text that is being made available exclusively through the internet. We have chosen to begin with the Byzantine encyclopedia known as the Suda, a 10th century CE compilation of material on ancient literature, history, and biography. A massive work of about 30,000 entries, and written in sometimes dense Byzantine Greek prose, the Suda is an invaluable source for many details that would otherwise be unknown to us about Greek and Roman antiquity, as well as an important text for the study of Byzantine intellectual history.&lt;br /&gt;
&lt;br /&gt;
Begun in January of 1998, the Suda On Line (SOL) already involves the efforts of over one hundred scholars throughout the world. The goal of the project is to assemble an xml-encoded database, searchable and browsable on the web, with continuously improved annotations, bibliographies and hypertextual links to other electronic resources in addition to the core translation of entries in the Suda. Individual work becomes available on the web as soon as possible, with the minimum necessary initial proofreading and editorial oversight. A large pool of registered editors is empowered to alter and improve the materials in the database continuously as they see fit. The display of each entry includes an indication of the level of editorial scrutiny it has received. We mean to encourage the greatest possible participation in the project and the smallest possible delay in presenting a high quality resource to a wide public readership.&lt;br /&gt;
&lt;br /&gt;
Our goal is not only to provide the SOL as a useful tool for researchers, but also to explore and facilitate the modes of scholarship now made possible by open source technology and the internet: the result will be a scholarly effort that is cooperative rather than solitary, communal rather than proprietary, worldwide rather than localized, evolving rather than static. Accordingly our work aims at two concrete results: in addition to our development of the Suda On Line itself as a respectable scholarly resource, we want to make a generalized, well-documented version of our software freely available for other collaboration-minded scholars to adapt for their own purposes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span class=&amp;quot;wikilink&amp;quot;&amp;gt;[../Main/Projects.html Projects]&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[category:Projects]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=POxy_Oxyrhynchus_Online&amp;diff=1828</id>
		<title>POxy Oxyrhynchus Online</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=POxy_Oxyrhynchus_Online&amp;diff=1828"/>
		<updated>2006-11-24T15:40:26Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Oxyrhynchus Papyri Project (POxy: Oxyrhynchus Online) ===&lt;br /&gt;
&lt;br /&gt;
URL: &amp;lt;span class=&amp;quot;wikiexternallink&amp;quot;&amp;gt;http://www.papyrology.ox.ac.uk/&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
The Oxyrhynchus Papyri Project is putting online the corpus of papyri excavated from Oxyrhynchus (Al-Bashnasa in Egypt) by Bernard Grenfell and Arthur Hunt from 1897. The Project has an online table of contents for volumes 1-70 of the Oxyrhynchus Papyri. The table of contents can be navigated by volume number or papyrus number. Digital images of the papyri are currently available from volume 47 onwards. Images are available as 150 dpi resolution for all online papyri with an increasing number also available with a resolution of 300 dpi. Each papyrus record includes location information, editorial details, and notes. The Project's Web site also includes an introduction to Oxyrhynchus and the excavations; details of how the papyri were digitized; and the online version of the exhibition, 'Oxyrhynchus: A City and its Texts' (Ashmolean, 1998).&lt;br /&gt;
&lt;br /&gt;
(source: &amp;lt;span class=&amp;quot;wikiexternallink&amp;quot;&amp;gt;[http://www.humbul.ac.uk/output/full2.php?id=1023 Humbul Humanities Hub]&amp;lt;/span&amp;gt;)&lt;br /&gt;
&lt;br /&gt;
[[category:Projects]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Opentext&amp;diff=1827</id>
		<title>Opentext</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Opentext&amp;diff=1827"/>
		<updated>2006-11-24T15:40:10Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== OpenText.org ===&lt;br /&gt;
&lt;br /&gt;
URL: &amp;lt;span class=&amp;quot;wikiexternallink&amp;quot;&amp;gt;[http://www.opentext.org/ http://www.opentext.org]&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
The OpenText.org project is a web-based initiative to develop annotated Greek texts and tools for their analysis. The project aims both to serve, and to collaborate with, the scholarly community. Texts are annotated with various levels of linguistic information, such as text-critical, grammatical, semantic and discourse features.&lt;br /&gt;
&lt;br /&gt;
Beginning with the New Testament, the project aims to construct a representative corpus of Hellenistic Greek to facilitate linguistic and literary research of these important documents. These texts are then annotated through the addition of linguistic and literary features (including marking morphological, syntactical and discourse elements) following a comprehensive model currently under development. The resulting texts can be viewed and searched on this site. It is hoped that interested users will collaborate in the correction and enhancement of this annotation, and become involved in the annotation process themselves.&lt;br /&gt;
&lt;br /&gt;
The key features of the project are:&lt;br /&gt;
&lt;br /&gt;
* texts annotated at distinct linguistic levels&lt;br /&gt;
* the use of an XML encoding scheme to mark-up texts&lt;br /&gt;
* an 'open' and collaborative approach to encourage the annotation and use of texts&lt;br /&gt;
* an on-line tool kit to allow searching and analysis of texts&lt;br /&gt;
* a forum to allow the exchange of ideas and to respond to requests for specific searches&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span class=&amp;quot;wikilink&amp;quot;&amp;gt;[../Main/Projects.html Projects]&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[category:Projects]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=The_Oath_in_Archaic_and_Classical_Greece&amp;diff=1826</id>
		<title>The Oath in Archaic and Classical Greece</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=The_Oath_in_Archaic_and_Classical_Greece&amp;diff=1826"/>
		<updated>2006-11-24T15:39:47Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== The Oath in Archaic and Classical Greece ===&lt;br /&gt;
&lt;br /&gt;
* 2004-2007&lt;br /&gt;
* A research project funded by the Leverhulme Trust&lt;br /&gt;
* Director: Professor A.H. Sommerstein&lt;br /&gt;
&lt;br /&gt;
The oath was an institution of fundamental importance across an enormously wide range of social interactions throughout the ancient Greek world, its binding force one of the most important contributions of religion to social stability and harmony. For this reason, oaths are uttered, prescribed, or referred to in almost every kind of literary or inscriptional text we have from archaic and classical Greece, and a comprehensive study of the subject requires a survey covering all these texts.&lt;br /&gt;
&lt;br /&gt;
The project team for &amp;quot;The Oath in Classical Greece&amp;quot; consists of Professor Sommerstein and two research fellows, Dr Andrew Bayliss and Dr Isabelle Torrance, appointed for a three-year term commencing in September 2004.&lt;br /&gt;
&lt;br /&gt;
The objectives of the project are:&lt;br /&gt;
&lt;br /&gt;
* To create a database including all references to oaths in Greek texts of all kinds from the archaic and classical periods (i.e. down to 322 BC); when complete, the database would be made publicly available via the internet.&lt;br /&gt;
* To analyse and interpret this evidence, in stages as it is collected, and present the results in seminar and conference papers, in articles and eventually in a co-authored monograph on the nature, employment and functions of oaths in archaic and classical Greek societies.&lt;br /&gt;
&lt;br /&gt;
The cutoff date of 322 BC (coinciding with the death of Aristotle, the last writings of the Attic orators, and the end of the classical Athenian democracy) was chosen because at about that date there are fundamental changes in the geographical extent of the Greek-speaking world, its ethnic and cultural composition, its political organization and the nature of the available evidence.&lt;br /&gt;
&lt;br /&gt;
There has been no comprehensive, dedicated scholarly study of the oath in ancient Greek society since Rudolf Hirzel's Der Eid (1902), and during the century since then much new evidence has become available and the study of society, ancient and modern, has been revolutionized. Information technology has now made it possible to carry out a complete survey of the evidence far faster and more efficiently than had previously been practicable, and the project is therefore centred on the creation of an electronic database, which will greatly ease the identification of significant correlations, variations and developments, and can be expected to illuminate such significant issues as the following:&lt;br /&gt;
&lt;br /&gt;
* Which ancient Greek social institutions were typically thought to require oaths (with or without additional sanctions) to ensure their proper functioning, and which were not?&lt;br /&gt;
* To what extent did oath practices vary with time or place within the Greek world?&lt;br /&gt;
* To what extent did oath practices, and the persuasive effect of an oath, vary according to the gender and/or status (e.g. citizen/foreigner, free/slave) of the swearer?&lt;br /&gt;
* Did the oath practices of the imaginary worlds created by poets differ from those of the world in which they and their audiences actually lived?&lt;br /&gt;
* Is there any evidence that might indicate whether, from the mid/ late fifth century BC, when traditional religious and ethical beliefs were being widely contested in intellectual circles, oaths came to be regarded as less securely reliable than formerly?&lt;br /&gt;
* To what extent were the brief oath-like expressions common in conversation (usually translatable as &amp;quot;yes/no, by [name of god]&amp;quot;) regarded as having the binding force of a true oath?&lt;br /&gt;
&lt;br /&gt;
The database will be founded on a corpus comprising all texts in Greek, whether inscriptional or literary, that were certainly or probably written between the introduction of alphabetic writing and 322 BC. All references (explicit or by necessary implication) to oaths and swearing will be identified, and for each such reference a record will be created. Where the reference is to an oath taken, tendered or offered on a specific occasion, or prescribed to be taken or tendered under specific circumstances, the record will comprise the following fields:&lt;br /&gt;
&lt;br /&gt;
* source reference&lt;br /&gt;
* category (literary, subliterary or inscriptional)&lt;br /&gt;
* subcategory (genre of literature, type of inscription, etc.)&lt;br /&gt;
* date of source&lt;br /&gt;
* provenance of source (if literary, this means domicile of author)&lt;br /&gt;
* whether oath is set in a historical or a fictitious context&lt;br /&gt;
* date or occasion of oath (if the passage refers to a single occasion)&lt;br /&gt;
* circumstances in which oath taken/tendered (if it was prescribed in those circumstances by law or custom)&lt;br /&gt;
* place&lt;br /&gt;
* person or authority proposing oath&lt;br /&gt;
* person(s) taking, or asked to take, oath&lt;br /&gt;
* (if oath was volunteered by swearer) person to whom addressed (&amp;quot;swearee&amp;quot;)&lt;br /&gt;
* what the swearer was asked, or offered, to affirm or promise&lt;br /&gt;
* god(s) or other powers invoked&lt;br /&gt;
* linguistic formula marking utterance as oath&lt;br /&gt;
* consequences (if any) attached to taking oath&lt;br /&gt;
* consequences (if any) attached to refusal to take oath&lt;br /&gt;
* rewards specified for keeping oath&lt;br /&gt;
* punishments specified for breaking oath&lt;br /&gt;
* special sanctifying circumstances (location, sacrifice, etc.)&lt;br /&gt;
* (if referring to a single occasion) whether oath was taken or refused&lt;br /&gt;
* (if referring to a single occasion) effect of oath on behaviour or attitudes of others&lt;br /&gt;
* (if referring to a single occasion) whether oath was kept or (disputably or indisputably) broken&lt;br /&gt;
* (if oath broken) recorded consequences, if any&lt;br /&gt;
* further remarks&lt;br /&gt;
&lt;br /&gt;
There will be an annex to the database consisting of retrospective passages in sources later than 322 BC referring to oaths taken before that date; many of these statements are undoubtedly derived from pre-322 texts, and some are of high importance, but they must be kept separate from the main database because the risk cannot be excluded that they may be, as it were, contaminated by the cultural milieu of the later author.&lt;br /&gt;
&lt;br /&gt;
The database, which will be an Access or MySQL relational database, will be created by the University of Nottingham's R&amp;amp;amp;NT (Research and New Technologies) Database Team with the assistance of the University's Humanities Technology Officer, who will also provide the project team with any training they may need to use the database, as well as monitoring and managing its development over the course of the project.&lt;br /&gt;
&lt;br /&gt;
The database will be created in stages, according to type of source, the staging being so planned that well-defined bodies of evidence would become available for analysis and interpretation fairly early in the process. Thereafter analytical and interpretative work will proceed alongside the expansion of the database.&lt;br /&gt;
&lt;br /&gt;
Once fully populated with data, the database will be provided with an interface, including URL, HTML code and PHP scripts, that will allow it to be made accessible and effectively searchable via the internet. The resulting website will be hosted by the University of Nottingham, and deposit with the Arts and Humanities Data Service will be negotiated also. This final stage in the development of the database will not only make it available to the wider scholarly community but will also greatly facilitate the process of analysis and interpretation in the later stages of the project.&lt;br /&gt;
&lt;br /&gt;
Next to the database itself, the most important outcome of the project will be a monograph, co-authored by Professor Sommerstein and the two research fellows, on the oath in archaic and classical Greek society. This will probably consist of three main parts, Part I discussing the nature and functions of oaths in the Greek world in general terms, Part II their specific uses within polis communities and in inter-state relations, Part III their exploitation in key genres of creative literature. It is hoped that a provisional version of Parts II and III will be complete by the end of the project period, but much of the writing of Part I and revision of the remainder would need to be done after the end of the period, with a target completion date of 2009.&lt;br /&gt;
&lt;br /&gt;
[[category:Projects]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Leuven_Database_of_Ancient_Books&amp;diff=1825</id>
		<title>Leuven Database of Ancient Books</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Leuven_Database_of_Ancient_Books&amp;diff=1825"/>
		<updated>2006-11-24T15:39:18Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Leuven Database of Ancient Books (LDAB) ===&lt;br /&gt;
&lt;br /&gt;
URL: &amp;lt;span class=&amp;quot;wikiexternallink&amp;quot;&amp;gt;http://ldab.arts.kuleuven.ac.be/&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
LDAB attempts to collect the basic information on all ancient literary texts, as opposed to documents. The user can find the oldest preserved copies of each text as well as a view of the reception of ancient literature throughout the Hellenistic, Roman and Byzantine period.&lt;br /&gt;
&lt;br /&gt;
LDAB is a FileMaker 5.5 database, running on a Mac OS X 10.2 platform.&lt;br /&gt;
&lt;br /&gt;
[[category:Projects]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
	<entry>
		<id>https://wiki.digitalclassicist.org/index.php?title=Epigraphic_Database_Heidelberg&amp;diff=1824</id>
		<title>Epigraphic Database Heidelberg</title>
		<link rel="alternate" type="text/html" href="https://wiki.digitalclassicist.org/index.php?title=Epigraphic_Database_Heidelberg&amp;diff=1824"/>
		<updated>2006-11-24T15:39:01Z</updated>

		<summary type="html">&lt;p&gt;NotisToufexis: /* Concept */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Epigraphische Datenbank Heidelberg (EDH) ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span class=&amp;quot;wikiexternallink&amp;quot;&amp;gt;http://www.uni-heidelberg.de/institute/sonst/adw/edh/index.html&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Director: Prof. Dr. Dr. h.c. mult. Gï¿½za Alfï¿½ldy&lt;br /&gt;
&lt;br /&gt;
=== Concept ===&lt;br /&gt;
&lt;br /&gt;
(from the EDH web-site)&lt;br /&gt;
&lt;br /&gt;
The aim of the project Epigraphic Database Heidelberg (EDH) is to integrate Latin inscriptions from all parts of the Roman Empire into an extensive database. Since 2004 Greek inscriptions from the same chronological timespan are also being entered. It consists of three databases the Epigraphic database, the Epigraphic Bibliography and the Photographic Database. It exists at an international level alongside other database projects, which serve as a working tool for the swift and simple collection, viewing, supplementing and interdisciplinary analysis of epigraphic material. Furthermore it is possible to the create KWIC indices and to combine the stored information as freely as possible&lt;br /&gt;
&lt;br /&gt;
At present, the Epigraphic database contains over 36.000 inscriptions and thus includes most of the especially noteworthy inscriptions published outside the main editions. In contrast to similar projects, the database presents revised and often corrected versions. Control of this sort is above all necessary in the case of earlier publications, which do not fulfill the standards of modern textual editorial practice. Moreover, the database is not confined to the mere texts, but links them to all the available bibliographical data and information on the inscriptions proper and on the monuments or objects they are inscribed upon. Time-consuming though it is, by means of this method of working the database meets high scholarly demands.&lt;br /&gt;
&lt;br /&gt;
[[category:Projects]]&lt;/div&gt;</summary>
		<author><name>NotisToufexis</name></author>
	</entry>
</feed>