Help:Editing

2010-06-03T18:30:25Z

NotisToufexis:

Diogenes

2007-08-24T08:48:18Z

NotisToufexis: /* Description */

=== Download ===

* http://www.dur.ac.uk/p.j.heslin/Software/Diogenes/index.php

=== Description ===

Diogenes is a tool for searching and browsing the databases of ancient texts, primarily in Latin and Greek, that are published by the [http://www.tlg.uci.edu/ Thesaurus Linguae Graecae] and the [http://www.packhum.org/ Packard Humanities Institute]. It is free software: you are encouraged to modify, improve, and redistribute it under the terms of the [http://www.dur.ac.uk/p.j.heslin/Software/Diogenes/license.php GNU General Public license].

The goal of this software package is to provide a free, transparent and flexible interface to the classical databases on CD-Rom in the PHI format, which include the TLG, the PHI corpus of Latin texts up to AD 200, the Duke Documentary Papyri collection, and the PHI-sponsored corpora of ancient inscriptions.

* The latest version of Diogenes is much easier to install for Mac (OS X 10.3 or higher), Windows, and Linux.

[[Category:Tools]]

Concording Greek and Latin texts

2007-08-24T08:45:40Z

NotisToufexis: /* Concording a Greek text */

==Concording a Latin text==
A nice, free tool for quick concordances of one's own collection of Latin texts (or a single text) is [http://www.textworld.com/scp/ Simple Concordance Program] by Alan Reed. Easy to learn, easy to use. In SCP you can design your own alphabets, so you can also use it, e. g., for Greek texts in Betacode.

If you need a repository of Latin texts to concord, a bit more carefully proofread texts than those at the Latin Library can be found at the [http://neptune.fltr.ucl.ac.be/corpora/ Itinera electronica], courtesy of the Universite Catholique de Louvain, Belgium. Itinera electronica have their own [http://neptune.fltr.ucl.ac.be/corpora/corpora.htm online concording facilities] as well.

Another valuable tool (if you bring your own texts to it) is [http://portal.tapor.ca/portal/portal TAPoR], Text Analysis Portal for Research (a project based at McMaster University, and consisting of a network of six of the leading Humanities computing centres in Canada).

==Concording a Greek text==

A nice, free and cross-platform tool for concording Greek texts encoded in Unicode is Laurence Anthony's AntConc, available under http://www.antlab.sci.waseda.ac.jp/software.html.

[[category:FAQ]]

Concording Greek and Latin texts

2007-08-24T08:43:41Z

NotisToufexis: /* Concording a Greek text */

NotisToufexis:

=Response to Neel Smith=

It is easy to agree with Neel Smith's proposed method of moving from a conceptual model to the analysis of its distinctive features, and to then translate the resulting functional requirements into technical ones. Equally agreable is the simplicity of his proposal: reliance on well-established technical protocols - HTTP as the transport mechanism, XML for service requests and replies - in order to provide basic citational functionality - on different levels of granularity - and indexing services. The crux of his presentation, it seems to me, is, at least for our purposes, his advocacy of an agreement on "the meaning of standard values", i.e. an ontology to serve as the shared matrix of our disciplinary discourse venturing into the digital era.

What I would like to do in my brief response is to briefly raise two of the more complex issues relating to his proposal by simply asking two sets of questions:

===Time===

Neel Smith recognises the necessity of publications - particularly scholarly publications - to adhere to both "permanence and citability" - the latter requiring the former. Yet, is it not one of the advantages of digital publications to they can be revised easily - corrected, even developed - without the wait for the edition to be out of print? And is this feature of impermanence not even more desirable in a collaborative framework, making the resource richer as it is moulded by subsequent generations of collaborators? How, then, do we refer to a work-in-progress in an authoritative way, how do we point at a moving object?

===Essence===

In addition to the complex web of entities created by the publication of manuscripts and printed works, what ~~new entities~~ is the digital medium supplementing and how should we deal with them?

Beyond a mere transcription of a 'text', we can now have texts intimately interwoven with layers of markup - some encoding the same features, but differing, others adding further features, and still others combining any number of them. Should we distinguish these from the base text and/or from each other?

We can have furthermore, a plurality of 'editions' coming out of the same repository, catering for different interpretive needs. Again: should we - and if so: how? - distinguish archival and presentational layers?

Concerning his concrete proposal a further related question must be asked: does a hierarchical model cater for all these - old and new - categories?

[[Category:OSCE]]

2007-01-29T14:51:57Z

NotisToufexis:

We need a comprehensive library of initial editions, openly accessible and freely available for re-use in derivative works. This paper outlines one strategy for starting with print editions and moving into a more purely digital stage.

There are two components to this argument, both on the Perseus Development Wiki:

http://devwiki.perseus.tufts.edu/wiki/Open_Content_Scholarly_Sources
http://devwiki.perseus.tufts.edu/wiki/Next_generation_electronic_editions

Open Content Scholarly Sources ----

Google, Microsoft, Yahoo and other internet giants are now creating digital libraries designed to become more comprehensive than any academic library in human history. The current philosophy of these efforts stresses open access. The creators of the Google project and the Internet Archive have expressed a dedication to open access. Open access also maximizes the potential audience and thus reinforces the advertising based business model on which these internet giants have founded their library efforts.

The funders, however, retain varying rights to their work. Google, for example, has now made available full PDF image books of public domain documents but it asserts proprietary rights over the page images and does not allow third parties to apply their own OCR or document recognition software. The Open Content Alliance in principle encourages its partners to share everything but individual funders can impose their own restrictions on what they submit to OCA.

We are therefore creating a completely open source library of core resources such as reference works and critical editions. Our goal is to provide access to foundational information and also a foundation of materials that subsequent authors can modify, update, expand, and otherwise improve.

Our selection criteria differ from those of the print world. A print library picks the best, most up-to-date documents available, knowing that print publications can be replaced but cannot change. In a true digital library, documents can be dynamic and evolve in real time. A recent encyclopedia will, presumably, be superior to another that is a century old. But if the century-old encyclopedia can be freely updated and attracts high quality modifications, it can evolve and become more up-to-date and more authoritative than its frozen print counterpart.

The classics component of the Open Content Scholarly Library that Perseus is helping create is being made available under a sharalike/attribution/non-commercial Creative Commons license. It contains the following:

:* Source texts of Greek and Latin: We have already released c. 8.5 million words of Greek and Latin source texts in TEI-compliant XML. We have also digitized several hundred volumes of source texts. These will be available as image books with searchable OCR and, where feasible, XML transcriptions. Unlike most previous collections, this includes, where possible, multiple editions as well as traditional lists of places where on-line editions differ from editions not yet available on-line.

:* Lexica of Greek and Latin: These include major works such as the Liddell Scott Jones Greek-English Lexicon and the Lewis and Short Latin-English Lexicon as well as more specialized works such as Cunliff's Homeric Lexicon.

:* Grammars: These include student grammars such as Smyth's Greek Grammar and Allen and Greenough's Latin Grammar as well as extensive scholarly works such as Kühner-Gerth.

:* Commentaries: These include scholarly editions as well as school commentaries with linguistic annotations. Commentaries lend themselves particularly well to electronic publication, which is optimally designed for the production, display and management of annotations.

:* Tools: These include Morpheus, the morphological analysis system developed in the late 1980s and still providing useful analyses of Greek and Latin words. More importantly, this will include the databases with c. 100,000 stems and endings, mined from many sources, and of potential use to third party morphological analysis systems. All the core tools in the Perseus Digital Library have been rewritten in Java and will be available as additions to institutional repositories such as Fedora and any developers.

:* FRBR Catalog Records for source texts: Large projects such as dictionaries and text corpora have developed checklists of editions which they have used. We are creating a modern catalog that builds on prior work (e.g., we use the author and work numbers developed by the TLG and PHI for Greek and Latin author) but provides an extensible architecture that can manage multiple editions, translations (e.g, English, French and German translations of an author), multiple versions of the same editions (e.g., an image book vs. a TEI transcription), multiple citation schemes (e.g., sections vs. chapters in Cicero)..

:* Authority lists of people, places, dictionary entries, organizations, etc. The reference works that we are producing lay the foundation for a comprehensive, extensible set of authority lists -- shared names with which we can uniquely identify particular people, places dictionary entries, organizations, etc. While such authority lists are difficult -- experts may differ on which Sallust a particular passage designates and will never all agree on which when we have a dictionary word with two distinct meanings vs. two distinct dictionary words. Nevertheless, all scholarly work depends upon the entries that appear in our reference works and electronic authority lists, however imperfect, are essential tools for large digital collections.

Users include:

:* Service providers: we would like to see the data released useful to as many groups and in as many ways as possible. Thus, we hope to see the content in Google and the Open Content Alliance as well as scholarly environment such as Chicago's Philologic and the Canadian TAPOR project.

:* Experts in the field: we hope that experts in the field will revise and extend every document that we release, with versioning systems tracking these changes and allowing experts to get the credit which they deserve for the work that they do.

:* General students of the field: we hope to see Wiki based commentaries in which non-experts working their way through a text pose and answer the questions which puzzle them.

:* Advanced service developers: we hope that developers will mine the encylopedias to drive their named entity identification systems (e.g., analyzer the articles in Smith's to determine which Alexander a particular document is discussing), sense disambiguation (e.g., which sense of a word in an on-line lexicon is in play in a given passage), machine translation (e.g., mine the parallel texts and translations and the bilingual dictionaries so that a modern machine translation system can provide Greek/English, Latin/English translations etc.).

Next Generation Editions ----

=Summary=

We propose a new generation of primary source corpora that are:

: * ''Permanent'': The texts are not leased from a commercial vendor over a period of time but are permanently accessible, with reference copies and versioning information stored in multiple institutional repositories for long term preservation as well as freely available.

: * ''Openly accessible'': Cultural heritage primary sources in the public domain should be openly accessible to all. If it is necessary to restrict access to newly digitized materials in order to secure funding, that restriction should be clearly delimited and as short as possible: e.g., those who fund digitization may have exclusive access for five years before the texts are released for universal access.

: * ''Multi-versioned'': The texts themselves can be updated, with all changes tracked in a versioning system. Alternately, the texts provide a stable foundation for standoff markup representing textual variants or advanced interpretation.

: * ''Paid for and maintained by academic libraries'': While external funding may help begin this process, library acquisition budgets are the long term source of funding for costs such as data entry. Libraries already pay for the production of digital resources by commercial, for-profit entitites, which restrict access to public domain content. The same library budgets can support open access databases built on public domain source materials.

=Open Content Editions=

The Perseus Project has released TEI conformant XML texts with 55 million words of American English, 13 million words of Latin and Greek source texts, and, for most of the Greek and Latin, corresponding English translations. These texts are available under a Creative Commons non-commercial license: they must be used with attribution; changes must be shared; they cannot be used as part of a commercial corpus. Commercial entities can, however, freely design for profit services that add value to these openly accessible sources.

While these source texts can freely circulate, they will also be part of the university's permanent institutional repository, thus providing a stable, long term home that will outlast any single project or contributor.

The Greek and Latin corpus contains most of the major works of classical literature. The Perseus Latin Collection contains more than half of the classical corpus and that coverage will approach 100% over the course of 2006/2007.

Working wish lists for [[Latin_wishlist | Latin]] and [[Greek_wishlist | Greek]] are available for comment/addition.

=Next Steps=

* ''Links to page images of paper sources'': With Google Library, the Open Content Alliance and Europe's i2010 we see the emerge of digital libraries with millions of books with high quality page images. Copyright restrictions complicate these efforts but solid versions of most major authors are available in the public domain.

* ''Full coverage including apparatus, introduction, indices etc.'': Digital editions can include all information in the print text and not only the text.

* ''Semantic markup'': Markup should reflect meaning and not only appearence.

* ''Collation of multiple sources'': Semantic markup, if applied to the apparatus criticus, should result in machine actionable data, allowing users to compare multiple versions of the same text.

=Building a digital library of primary sources=

The first generation of large scale, on-line text corpora provided transcriptions of primary materials. Projects such as the TLG and the ''Packard Humanities Institute Latin CD ROM'' carefully document the copy texts on which their electronic versions depend. The provenance of texts in the extensive Latin corpus at [[http://www.thelatinlibrary.com the Latin Library]] is often unclear, with volunteer transcribers blending texts and leaving no trail of their changes.

We now see vast libraries with millions of digital books either in active development or in advanced stages of planning. Most, if not all, of books now in the public domain will be available in electronic form. Rights disputes may slow digitization of the rest but Google's aggressive stance may, at worst, make publishers more open to pursuing an acceptable arrangement with Yahoo, Microsoft and others now entering this market. In this model, readers view scanned page images but search text automatically generated by OCR software. For many purposes, such "image front" collections are quite effective: narrative prose printed since the mid 19th century lends itself very well to commercial OCR.

Image books do not, however, provide the accuracy and detailed markup that users of primary sources expect. Text collections with millions of words will contain errors for some time after publication but we want to minimize these errors. We want to be able to identify pieces of texts by standard citation (e.g., "Liv. 3.22" should retrieve the text of Book 3, Chapter 22 of Livy's History of Rome. We also want text searches to be able to distinguish between primary text, textual notes and other annotations.

The following describes an approach of adding structure to digital image books of primary sources.

* '''Collate an image-front edition with searchable, OCR generated text against other electronic editions of the same text''': Many classical texts are available on-line in at least one edition. Once we have scanned a new edition and generated text with OCR, we can collate the OCR against pre-existing electronic editions with surprisingly little effort: half of the word forms in a book length document are generally unique. By comparing sequences of unique word forms in pre-existing text and new OCR, we can align use these sequences to align two texts. In our experiments, we have found that we can immediately align one word in ten. We can then compare the intervening sequence (on the average nine words long) to identify variations. Variations include errors in data entry (whether in the OCR or in the pre-existing text), deliberate textual variations and non-textual elements such as headers and textual notes. Where a variation involves one or two words and we cannot generate a morphological analysis for the new words, then we probably have an error. If we can generate morphological analyses for the variants in both versions, then we probably have deliberate variations. If we have extra text at the start or end of pages, we probably have headers or notes. If we have extraneous numbers in the source texts, then these are probably citations. Even if we are working with a pre-existing text that contains errors or whose provenance is unknown, we can often use this text to determine that page 123 of edition X contains book 3, lines 33 to 57 of a given edition, thus making the OCR generated edition citable by chapter and verse. If we have an accurate pre-existing text without textual notes, we can compare the results of searching that text with searching the relevant sections of the OCR-generated text. If a word shows up in the OCR generated text but not in the pre-existing text, then we probably have a match in the textual notes. While OCR quality varies from text to text and from language to language, we can thus produce initial searches of the textual notes with relatively little effort.

* '''Create an accurate, carefully marked up transcription of a print original''': In this stage, we aim to capture every character on the printed source page and to represent the logical structure of the document: ideally, the text should be sufficiently well encoded that readers could ask to compare the readings reported by different witnesses (e.g., "display places where M differs from P and provide a statistical analysis of how often these sources differ").

* '''Create a new edition, traceable to its print original, but able to represent multiple versions representing multiple witnesses and multiple new editions''': The source text becomes the foundation multiple new editions. Once we have a carefully constructed source text, we can generate as many variations as we like. The source may -- and probably willl -- soon recede into the background but will provide a stable framework whereby we can compare all subsequent editions.

====Choice of source texts====

If we were creating a traditional scholarly text collection, we would want the most up-to-date current editions, In this model, however, we need to balance the authority of the source text against their ability to evolve into richer editions encoding multiple sources and editorial versions. If a serious user community exists, if it values additions to textual scholarship and if it has reasonable technical and editorial mechanisms to enhance its editions, living older texts will overtake any static edition.

The two extreme cases are:

* '''Recent editions that may be at present the most comprehensive and authoritative but cannot be augmented'''. Whether or not publishers can claim copyright to scholarly reconstructions of primary source materials, editors should certainly have the right to prepare a single version of an edition to which no one else can make changes.

* '''Editions that are are designed to accept -- and document -- new witnesses and editorial decisions'''. In the simplest case, this would include careful transcriptions of public domain editions. A mature versioning environment tracks each addition and can reconstruct any given version. Versioning software analyzes new transcriptions of witnesses and editions.

In practical terms, the best accessible editions will usually be the best public domain editions, with a few editors initially offering their work. It would probably be best to use public domain editions as initial test cases and to use these to work out inevitable bugs and organizational issues. Current editors may, in any event, find it as easy to add their changes to a well-structured public domain edition than to supervise the markup of their own print editions or the word processing files from which they derive.

====Sources for Images of Print Editions====

* '''Local book scanning''': A number of institutions (including Perseus) can scan limited numbers of books. Sheet feeder scanners can process c. 1,000 pages an hour but they require that the source books be disbound. Look down scanners do not damage the source materials and are slower but they still can process 100+ pages in an hour and are useful for smaller jobs.

* '''Large book scanning projects''': There are now a number of projects that are scanning very large numbers of books. [[http://books.google.com/ Google Print]] has begun assembling a library that will include tens of millions of books. Google plans to make the library openly searchable and will return copies of the scanned books to the participating research libraries, but it is not clear how easily other developers will be able to get their own copies on which to apply specialized OCR and content analysis. The [[http://www.opencontentalliance.org/ Open Content Alliance]] constitutes a growing consortium of content providers and third party service providers. Led by the [[http://www.archive.org Internet Archive]], the OCA has begun making high resolution image books available and is providing [[http://www.archive.org/details/texts a clearing house for related efforts]] such as the [[http://www.archive.org/details/millionbooks Million Book Project]]. The newer robotic scanners do a very good job of turning pages -- even pausing to let one page clinging to another drop off as they turn. They seem to be able to process more than 1,000 pages an hour and thus to exceed the best throughput we have achieved running disbound pages through a sheet feeder -- very impressive. The drawback is that these robots are expensive: the most recent ones from Kirtas cost $140,000-$180,000. You need to get high volume to justify this enconomically. If you can get 1,200 pages an hour, then you might do three books an hour and 120 books a week. That would be about 6,000 books a year -- or about $30-$40 per book for the hardware investement alone exclusive of labor and postprocessing. If you consider 100 hours/week over two years and thus 300 400-page books a week, you get to 15,000 a year and the price clearly comes down. Run that over three years with 45,000 books and the cost becomes manageable.

In practice, editors interested in a few authors can get their source materials scanned at a variety of locations. Larger series (such as the Patrologia Latina) are well suited to the large scale book scanning projects. The biggest problem involves getting copies of the desired books to a location where large scale scanning is taking place. The California Digital Library, which serves the UC system, and the University of Toronto were early on partners in OCA and between them would have virtually every edition of Greek or Latin texts published in the past two centuries. An [[http://www.libraryjournal.com/article/CA6277402.html article in LibraryJournal from November 1, 2005]] reports that the European Commission is planning a large digital library project of its own that will focus initially on the public domain.

====Components of next generation electronic editions====
These editions will have the following components:

* '''One or more baseline print editions available as image books''': At least one print edition should be available as an electronic source to which readers can refer if they feel that they have detected a data entry or formatting error. Everything necessary for representing at least one core edition in a tagged file should be available to the community. Given the demands of publishers, these may not be the most up-to-date editions of an author but they are intended as a starting point. All such texts should, of course, have OCR generated searchable text. If the original source texts have page numbers, then these should be encoded and citable.

* '''A flexible editing environment which allows user communities to improve the current document''': Electronic documents are by nature dynamic and can evolve over time. Where print editions constitute end points of a long stage of development, electronic editions can serve as starting points to on-going development. Initial tasks may focus on correcting OCR errors, adding structural markup and other basic chores. Ultimately, however, users will want to associate higher level annotations (e.g., specifying that a given "Salamis" is the Salamis in Cyprus rather than near Athens, or indicating that "faciam" is a subjunctive rather than a future, etc.). Examples of decentralized editing environments that link transcriptions with images of the source pages include [[http://www.pgdp.net/ Distributed Proofreaders]] program of [[http://www.gutenberg.org/ Project Gutenberg]] and the [[http://www.ccel.org/help/facsim/ Digital Facsimile Editions]] of the [[http://www.ccel.org/ Christian Classics Ethereal Library]] ,

* '''A tagged transcript of one or more print editions''': This should include everything from the original edition, including introduction, textual notes, commentary, index, and any other materials from the source book. At this stage, the idioyncratic line breaks of particular editions should be preserved if the textual notes, commentary or other parts of the book use these line breaks for internal citations. All citations should be tagged and activated: e.g., wherever the text refers to "page 132 line 18" or "chapter 44, line 8", these expressions should be converted into active links. Textual notes should appear as simple notes and placed within the body of the source texts. This version serves as a temporary work space and should yield to the following stage. It should become the official representation of the original print edition. The [[http://www.uni-mannheim.de/mateo/camenahtdocs/camenahist.html | Camena project]]

* '''Fully interpreted electronic version of the print text''': While many documents may be complete at this stage, textual notes in critical editions should be converted from human readable descriptions into machine interpretable operations. Thus, readers should be able to view the text as it appears in any given manuscript, view places where any two witnesses disagree with one another, and see analyses of how far different versions of the text differ from one another. This version of the text should become the default and replace the tagged transcript.

* '''One or more translations''': Translations should have provenance so that readers know whether or not they reflect the online version of the source text. Translations should, like the editions, include all accompanying materials including introduction, notes, appendices, indices etc. Like editions, translations should be available both as image books so that readers can, when in doubt, consult the print originals.

The fully interpreted electronic edition should then provide a starting for subsequent edits. The text could evolve in a variety of ways.

* '''Systematic collations''': Individuals may systematically collate the source text against new witnesses (e.g., manuscripts, papyri, etc.) or new editions (where editors may have derived different conclusions and printed different readings). All additions must be transparent: thus, we cannot record new readings without providing their jusification. We can add new readings from manuscripts and other sources without necessarily changing the text. We cannot record different editorial decisions without encoding the source for those decisions.

* '''Coordination of edition, textual notes and at least one reference translation''': We may have multiple translations reflecting multiple editions of a given work but we should have at least one edition that reflects the content of the base edition and that can represent the different readings in the textual notes. Readers should always be able to see how (or whether) any given reading affects the main translation. Readers should thus be able to filter out those notes which do not impact upon the English and to analyze the ''aggregate impact'' of choosing one version over another. While small changes of language can have dramatic effects upon meaning, readers should be able to gauge the overall significance of different version.

A great deal more can be done with and for any given edition: we can add (and have added) commentaries, linguistic markup, links to scholarship and other supplementary materials. At the same time, the but the above represents a basic level of documentation towards which producers should, in our view, aim.

====Editorial Conventions====

* '''Changes from the source text to the transcription''': The Text Encoding Initiative provides tags to record locations where editors have corrected errors in the source, expanded abbreviations, and regularized spellings.

* '''Markup stylesheet''': The Text Encoding Initiative offers a range of tags but is not universal. In some cases, we will need to extend the TEI. In other cases, the TEI allows us to represent the same information in different ways: e.g., <name type="place">Rome</name> or <placeName>Rome</placeName>. The more homogeneous editions can be, the easier it will be to search, browse and maintain them over time. Perseus has evolved conventions of its own over time, but even within Perseus different projects has approached the same problems differently. We need documentation that is more extensive and that can be updated in real time (e.g., a Wiki).

OSCE Dunn Paper

2007-01-29T14:50:58Z

NotisToufexis:

1 e-Science and the critical edition: a discussion paper

1.1 Stuart Dunn and Tobias Blanke

1.1.1 Arts and Humanities e-Science Support Centre, King's College London

At the end of the Nineties, a national e-Science Core Programme was established in the UK. Its agenda was driven by scientists who needed new technologies and concepts to cope with the ever increasing amount of data, both from experiments and simulations as well as knowledge gathering exercises. Faced with this 'data deluge', a new data-driven science was conceptualized with the scientist and research methods at the center of new data technologies. The idea of e-Science and the e-Scientist was accompanied by the development of new high-speed computing networks that promised solutions to a variety of problems in coping with the vast amount of information. 'Grid technologies' were the result of a global effort from computer scientists working together witch practitioners to advance existing network technologies like the internet in order to create a global space of sharing resources and services.

Several e-Science initiatives in the UK are promoting to advance research work in virtual spaces with advanced computing - in particular network technologies. Technologies and methodologies for the automation and support of research processes are being investigated. Grid technologies and methodologies address how globally distributed data resources can be used in the research process or how computational power can be shared. At the same time, new forms of scholarly communications in 'virtual organizations' are developed. For example, the Access Grid promises tools to support structured meetings of researchers in group-to-group collaborations, a benefit which will be keenly felt by A&H researchers as they move towards larger and more formal collaborations. The advantages of direct communication in face-to-face meetings is combined with the ability to share instantly digital items among the groups. Grid technologies integrate two recent developments in research that are inseparable from each other: the new possibilities due to improved technologies complement new highly collaborative research.

E-Science therefore stands for the development and deployment of a networked infrastructure and culture through which resources can be shared in a secure environment. These resources can be everything from processing power, data, or expertise that researchers can share. This networked infrastructure allows a culture of collaboration, in which new forms of collaboration can emerge, and new and advanced methodologies can be explored.

A key to the success of e-Science is the provision of shared access to research facilities and therefore to provide answers to the increasing globalisation of research. Researchers from around the world can work together and use each other's resources as if they were collocated. Digital knowledge objects shall be created and (re-)used in virtual collaboration spaces. E-research is about joining things up and not purely about CPU power or computer networking. It is about pro-active relationships as between server to server and programme to programme and research practitioner to research practitioner. This global collaboration in a virtual space will be of key significance to what Arts and Humanities (A&H) researchers are going to be doing over the next ten years; and will fundamentally alter their relationship with the resources they use.

Critical editions provide a key example of such resources. A recent expert seminar convened at the University of Sheffield by the AHDS e-Science Scoping Survey (http://ahds.ac.uk/e-science/e-science-scoping-study.htm) debated the application of e-science methods and technologies to the critical edition. It was considered that the concepts of the Virtual Research Environment (http://www.ahessc.ac.uk/briefing_papers/VRE_briefing_paper.html) and Virtual Organization have the potential to enable a paradigm shift from the 'traditional' model of the critical edition, whereby the text is produced by an individual researcher or small group of scholars and presented to a wider community as a static document, and an alternative whereby texts are produced and owned collaboratively by that community. In the latter case the text is produced as part of an iterative and ongoing process, under the collective influence of a group of researchers. The same principle could apply to elements of the 'digital infrastructure' on which much collaborative work relies - thesauri, dictionaries, lexica and so on. This raises complex issues of academic integrity and trust: the high-profile debate of the applicability of Wikipedia in research contexts is well known, and few would argue that a totally unfettered editorial process is appropriate. However such methodologies have very profound implications for the way humanities research is done, and the challenge is to quantify and qualify the shades of grey between Wikipedia and the traditional critical edition model.

1.1.1 Some key questions are:

* What technologies are needed to enable the collaborative research environments required for such 'democratization' of the critical edition?
* Do users need such editions? Will they ever trust them?
* How should access to the editorial process be managed? Who decides who gets to edit the text? Should it be managed at all?
* How should version control be maintained?
* How should annotations and edits be captured, both in terms of the finished article and the workflow process?
* What kind of peer-review process needs to be in place?
* How should cataloguing, referencing and citation of such documents be approached?
* How can such texts fit in to existing library and information (infra)structures? Will these need to be rethought?

[[Category:OSCE]]

OSCE Dunn Paper

2007-01-29T14:49:58Z

NotisToufexis:

OSCE Dunn Paper

2007-01-29T14:49:47Z

NotisToufexis:

[OSCE index>Main.osce] | [OSCE programme>programme]

1 e-Science and the critical edition: a discussion paper

1.1 Stuart Dunn and Tobias Blanke

1.1.1 Arts and Humanities e-Science Support Centre, King's College London

At the end of the Nineties, a national e-Science Core Programme was established in the UK. Its agenda was driven by scientists who needed new technologies and concepts to cope with the ever increasing amount of data, both from experiments and simulations as well as knowledge gathering exercises. Faced with this 'data deluge', a new data-driven science was conceptualized with the scientist and research methods at the center of new data technologies. The idea of e-Science and the e-Scientist was accompanied by the development of new high-speed computing networks that promised solutions to a variety of problems in coping with the vast amount of information. 'Grid technologies' were the result of a global effort from computer scientists working together witch practitioners to advance existing network technologies like the internet in order to create a global space of sharing resources and services.

Several e-Science initiatives in the UK are promoting to advance research work in virtual spaces with advanced computing - in particular network technologies. Technologies and methodologies for the automation and support of research processes are being investigated. Grid technologies and methodologies address how globally distributed data resources can be used in the research process or how computational power can be shared. At the same time, new forms of scholarly communications in 'virtual organizations' are developed. For example, the Access Grid promises tools to support structured meetings of researchers in group-to-group collaborations, a benefit which will be keenly felt by A&H researchers as they move towards larger and more formal collaborations. The advantages of direct communication in face-to-face meetings is combined with the ability to share instantly digital items among the groups. Grid technologies integrate two recent developments in research that are inseparable from each other: the new possibilities due to improved technologies complement new highly collaborative research.

E-Science therefore stands for the development and deployment of a networked infrastructure and culture through which resources can be shared in a secure environment. These resources can be everything from processing power, data, or expertise that researchers can share. This networked infrastructure allows a culture of collaboration, in which new forms of collaboration can emerge, and new and advanced methodologies can be explored.

A key to the success of e-Science is the provision of shared access to research facilities and therefore to provide answers to the increasing globalisation of research. Researchers from around the world can work together and use each other's resources as if they were collocated. Digital knowledge objects shall be created and (re-)used in virtual collaboration spaces. E-research is about joining things up and not purely about CPU power or computer networking. It is about pro-active relationships as between server to server and programme to programme and research practitioner to research practitioner. This global collaboration in a virtual space will be of key significance to what Arts and Humanities (A&H) researchers are going to be doing over the next ten years; and will fundamentally alter their relationship with the resources they use.

Critical editions provide a key example of such resources. A recent expert seminar convened at the University of Sheffield by the AHDS e-Science Scoping Survey (http://ahds.ac.uk/e-science/e-science-scoping-study.htm) debated the application of e-science methods and technologies to the critical edition. It was considered that the concepts of the Virtual Research Environment (http://www.ahessc.ac.uk/briefing_papers/VRE_briefing_paper.html) and Virtual Organization have the potential to enable a paradigm shift from the 'traditional' model of the critical edition, whereby the text is produced by an individual researcher or small group of scholars and presented to a wider community as a static document, and an alternative whereby texts are produced and owned collaboratively by that community. In the latter case the text is produced as part of an iterative and ongoing process, under the collective influence of a group of researchers. The same principle could apply to elements of the 'digital infrastructure' on which much collaborative work relies - thesauri, dictionaries, lexica and so on. This raises complex issues of academic integrity and trust: the high-profile debate of the applicability of Wikipedia in research contexts is well known, and few would argue that a totally unfettered editorial process is appropriate. However such methodologies have very profound implications for the way humanities research is done, and the challenge is to quantify and qualify the shades of grey between Wikipedia and the traditional critical edition model.

1.1.1 Some key questions are:

* What technologies are needed to enable the collaborative research environments required for such 'democratization' of the critical edition?
* Do users need such editions? Will they ever trust them?
* How should access to the editorial process be managed? Who decides who gets to edit the text? Should it be managed at all?
* How should version control be maintained?
* How should annotations and edits be captured, both in terms of the finished article and the workflow process?
* What kind of peer-review process needs to be in place?
* How should cataloguing, referencing and citation of such documents be approached?
* How can such texts fit in to existing library and information (infra)structures? Will these need to be rethought?

Vindolanda Tablets Online

2006-11-24T15:40:53Z

NotisToufexis: /* Description */

=== Vindolanda Tablets Online ===

URL: http://vindolanda.csad.ox.ac.uk/

=== Description ===

This online edition of the Vindolanda writing tablets, excavated from the Roman fort at Vindolanda in northern England, includes the following elements:

* Tablets - a searchable online edition of the tablets (volumes I and II)
* Exhibition - an introduction to the tablets and their context
* Reference - a guide to aspects of the tabletsï¿½ content
* Help - navigation and using the site

Also available are highlights from the tablets.

The website is part of the Script, Image and the Culture of Writing in the Ancient World programme, supported by the Andrew W. Mellon Foundation. It is a collaborative project between the Centre for the Study of Ancient Documents and the Academic Computing Development Team, Oxford University.

Scholarly publications should refer to this site as:

Vindolanda Tablets Online http://vindolanda.csad.ox.ac.uk/

Feedback: if you are using Vindolanda Tablets Online for teaching, research or general interest, please send us your comments on the site.

[[category:Projects]]

Suda Online

2006-11-24T15:40:42Z

NotisToufexis: /* Description */

=== Suda Online (SOL) ===

http://www.stoa.org/sol/

=== Description ===

Certain fundamental sources for the study of the ancient world are currently accessible only to a few specially trained researchers because they have never been provided with a sufficiently convenient interpretive apparatus or, in some cases, even translated into modern languages. The Suda On Line project attacks that inaccessibility by engaging the efforts of scholars world-wide in the translation and annotation of a substantial text that is being made available exclusively through the internet. We have chosen to begin with the Byzantine encyclopedia known as the Suda, a 10th century CE compilation of material on ancient literature, history, and biography. A massive work of about 30,000 entries, and written in sometimes dense Byzantine Greek prose, the Suda is an invaluable source for many details that would otherwise be unknown to us about Greek and Roman antiquity, as well as an important text for the study of Byzantine intellectual history.

Begun in January of 1998, the Suda On Line (SOL) already involves the efforts of over one hundred scholars throughout the world. The goal of the project is to assemble an xml-encoded database, searchable and browsable on the web, with continuously improved annotations, bibliographies and hypertextual links to other electronic resources in addition to the core translation of entries in the Suda. Individual work becomes available on the web as soon as possible, with the minimum necessary initial proofreading and editorial oversight. A large pool of registered editors is empowered to alter and improve the materials in the database continuously as they see fit. The display of each entry includes an indication of the level of editorial scrutiny it has received. We mean to encourage the greatest possible participation in the project and the smallest possible delay in presenting a high quality resource to a wide public readership.

Our goal is not only to provide the SOL as a useful tool for researchers, but also to explore and facilitate the modes of scholarship now made possible by open source technology and the internet: the result will be a scholarly effort that is cooperative rather than solitary, communal rather than proprietary, worldwide rather than localized, evolving rather than static. Accordingly our work aims at two concrete results: in addition to our development of the Suda On Line itself as a respectable scholarly resource, we want to make a generalized, well-documented version of our software freely available for other collaboration-minded scholars to adapt for their own purposes.

[../Main/Projects.html Projects]

[[category:Projects]]

POxy Oxyrhynchus Online

2006-11-24T15:40:26Z

NotisToufexis: /* Description */

=== Oxyrhynchus Papyri Project (POxy: Oxyrhynchus Online) ===

URL: http://www.papyrology.ox.ac.uk/

=== Description ===

The Oxyrhynchus Papyri Project is putting online the corpus of papyri excavated from Oxyrhynchus (Al-Bashnasa in Egypt) by Bernard Grenfell and Arthur Hunt from 1897. The Project has an online table of contents for volumes 1-70 of the Oxyrhynchus Papyri. The table of contents can be navigated by volume number or papyrus number. Digital images of the papyri are currently available from volume 47 onwards. Images are available as 150 dpi resolution for all online papyri with an increasing number also available with a resolution of 300 dpi. Each papyrus record includes location information, editorial details, and notes. The Project's Web site also includes an introduction to Oxyrhynchus and the excavations; details of how the papyri were digitized; and the online version of the exhibition, 'Oxyrhynchus: A City and its Texts' (Ashmolean, 1998).

(source: [http://www.humbul.ac.uk/output/full2.php?id=1023 Humbul Humanities Hub])

[[category:Projects]]

Opentext

2006-11-24T15:40:10Z

NotisToufexis: /* Description */

=== OpenText.org ===

URL: [http://www.opentext.org/ http://www.opentext.org]

=== Description ===

The OpenText.org project is a web-based initiative to develop annotated Greek texts and tools for their analysis. The project aims both to serve, and to collaborate with, the scholarly community. Texts are annotated with various levels of linguistic information, such as text-critical, grammatical, semantic and discourse features.

Beginning with the New Testament, the project aims to construct a representative corpus of Hellenistic Greek to facilitate linguistic and literary research of these important documents. These texts are then annotated through the addition of linguistic and literary features (including marking morphological, syntactical and discourse elements) following a comprehensive model currently under development. The resulting texts can be viewed and searched on this site. It is hoped that interested users will collaborate in the correction and enhancement of this annotation, and become involved in the annotation process themselves.

The key features of the project are:

* texts annotated at distinct linguistic levels
* the use of an XML encoding scheme to mark-up texts
* an 'open' and collaborative approach to encourage the annotation and use of texts
* an on-line tool kit to allow searching and analysis of texts
* a forum to allow the exchange of ideas and to respond to requests for specific searches

[../Main/Projects.html Projects]

[[category:Projects]]

The Oath in Archaic and Classical Greece

2006-11-24T15:39:47Z

NotisToufexis:

=== The Oath in Archaic and Classical Greece ===

* 2004-2007
* A research project funded by the Leverhulme Trust
* Director: Professor A.H. Sommerstein

The oath was an institution of fundamental importance across an enormously wide range of social interactions throughout the ancient Greek world, its binding force one of the most important contributions of religion to social stability and harmony. For this reason, oaths are uttered, prescribed, or referred to in almost every kind of literary or inscriptional text we have from archaic and classical Greece, and a comprehensive study of the subject requires a survey covering all these texts.

The project team for "The Oath in Classical Greece" consists of Professor Sommerstein and two research fellows, Dr Andrew Bayliss and Dr Isabelle Torrance, appointed for a three-year term commencing in September 2004.

The objectives of the project are:

* To create a database including all references to oaths in Greek texts of all kinds from the archaic and classical periods (i.e. down to 322 BC); when complete, the database would be made publicly available via the internet.
* To analyse and interpret this evidence, in stages as it is collected, and present the results in seminar and conference papers, in articles and eventually in a co-authored monograph on the nature, employment and functions of oaths in archaic and classical Greek societies.

The cutoff date of 322 BC (coinciding with the death of Aristotle, the last writings of the Attic orators, and the end of the classical Athenian democracy) was chosen because at about that date there are fundamental changes in the geographical extent of the Greek-speaking world, its ethnic and cultural composition, its political organization and the nature of the available evidence.

There has been no comprehensive, dedicated scholarly study of the oath in ancient Greek society since Rudolf Hirzel's Der Eid (1902), and during the century since then much new evidence has become available and the study of society, ancient and modern, has been revolutionized. Information technology has now made it possible to carry out a complete survey of the evidence far faster and more efficiently than had previously been practicable, and the project is therefore centred on the creation of an electronic database, which will greatly ease the identification of significant correlations, variations and developments, and can be expected to illuminate such significant issues as the following:

* Which ancient Greek social institutions were typically thought to require oaths (with or without additional sanctions) to ensure their proper functioning, and which were not?
* To what extent did oath practices vary with time or place within the Greek world?
* To what extent did oath practices, and the persuasive effect of an oath, vary according to the gender and/or status (e.g. citizen/foreigner, free/slave) of the swearer?
* Did the oath practices of the imaginary worlds created by poets differ from those of the world in which they and their audiences actually lived?
* Is there any evidence that might indicate whether, from the mid/ late fifth century BC, when traditional religious and ethical beliefs were being widely contested in intellectual circles, oaths came to be regarded as less securely reliable than formerly?
* To what extent were the brief oath-like expressions common in conversation (usually translatable as "yes/no, by [name of god]") regarded as having the binding force of a true oath?

The database will be founded on a corpus comprising all texts in Greek, whether inscriptional or literary, that were certainly or probably written between the introduction of alphabetic writing and 322 BC. All references (explicit or by necessary implication) to oaths and swearing will be identified, and for each such reference a record will be created. Where the reference is to an oath taken, tendered or offered on a specific occasion, or prescribed to be taken or tendered under specific circumstances, the record will comprise the following fields:

* source reference
* category (literary, subliterary or inscriptional)
* subcategory (genre of literature, type of inscription, etc.)
* date of source
* provenance of source (if literary, this means domicile of author)
* whether oath is set in a historical or a fictitious context
* date or occasion of oath (if the passage refers to a single occasion)
* circumstances in which oath taken/tendered (if it was prescribed in those circumstances by law or custom)
* place
* person or authority proposing oath
* person(s) taking, or asked to take, oath
* (if oath was volunteered by swearer) person to whom addressed ("swearee")
* what the swearer was asked, or offered, to affirm or promise
* god(s) or other powers invoked
* linguistic formula marking utterance as oath
* consequences (if any) attached to taking oath
* consequences (if any) attached to refusal to take oath
* rewards specified for keeping oath
* punishments specified for breaking oath
* special sanctifying circumstances (location, sacrifice, etc.)
* (if referring to a single occasion) whether oath was taken or refused
* (if referring to a single occasion) effect of oath on behaviour or attitudes of others
* (if referring to a single occasion) whether oath was kept or (disputably or indisputably) broken
* (if oath broken) recorded consequences, if any
* further remarks

There will be an annex to the database consisting of retrospective passages in sources later than 322 BC referring to oaths taken before that date; many of these statements are undoubtedly derived from pre-322 texts, and some are of high importance, but they must be kept separate from the main database because the risk cannot be excluded that they may be, as it were, contaminated by the cultural milieu of the later author.

The database, which will be an Access or MySQL relational database, will be created by the University of Nottingham's R&NT (Research and New Technologies) Database Team with the assistance of the University's Humanities Technology Officer, who will also provide the project team with any training they may need to use the database, as well as monitoring and managing its development over the course of the project.

The database will be created in stages, according to type of source, the staging being so planned that well-defined bodies of evidence would become available for analysis and interpretation fairly early in the process. Thereafter analytical and interpretative work will proceed alongside the expansion of the database.

Once fully populated with data, the database will be provided with an interface, including URL, HTML code and PHP scripts, that will allow it to be made accessible and effectively searchable via the internet. The resulting website will be hosted by the University of Nottingham, and deposit with the Arts and Humanities Data Service will be negotiated also. This final stage in the development of the database will not only make it available to the wider scholarly community but will also greatly facilitate the process of analysis and interpretation in the later stages of the project.

Next to the database itself, the most important outcome of the project will be a monograph, co-authored by Professor Sommerstein and the two research fellows, on the oath in archaic and classical Greek society. This will probably consist of three main parts, Part I discussing the nature and functions of oaths in the Greek world in general terms, Part II their specific uses within polis communities and in inter-state relations, Part III their exploitation in key genres of creative literature. It is hoped that a provisional version of Parts II and III will be complete by the end of the project period, but much of the writing of Part I and revision of the remainder would need to be done after the end of the period, with a target completion date of 2009.

[[category:Projects]]

Leuven Database of Ancient Books

2006-11-24T15:39:18Z

NotisToufexis: /* Description */

=== Leuven Database of Ancient Books (LDAB) ===

URL: http://ldab.arts.kuleuven.ac.be/

=== Description ===

LDAB attempts to collect the basic information on all ancient literary texts, as opposed to documents. The user can find the oldest preserved copies of each text as well as a view of the reception of ancient literature throughout the Hellenistic, Roman and Byzantine period.

LDAB is a FileMaker 5.5 database, running on a Mac OS X 10.2 platform.

[[category:Projects]]

Epigraphic Database Heidelberg

2006-11-24T15:39:01Z

NotisToufexis: /* Concept */

=== Epigraphische Datenbank Heidelberg (EDH) ===

http://www.uni-heidelberg.de/institute/sonst/adw/edh/index.html

Director: Prof. Dr. Dr. h.c. mult. Gï¿½za Alfï¿½ldy

=== Concept ===

(from the EDH web-site)

The aim of the project Epigraphic Database Heidelberg (EDH) is to integrate Latin inscriptions from all parts of the Roman Empire into an extensive database. Since 2004 Greek inscriptions from the same chronological timespan are also being entered. It consists of three databases the Epigraphic database, the Epigraphic Bibliography and the Photographic Database. It exists at an international level alongside other database projects, which serve as a working tool for the swift and simple collection, viewing, supplementing and interdisciplinary analysis of epigraphic material. Furthermore it is possible to the create KWIC indices and to combine the stored information as freely as possible

At present, the Epigraphic database contains over 36.000 inscriptions and thus includes most of the especially noteworthy inscriptions published outside the main editions. In contrast to similar projects, the database presents revised and often corrected versions. Control of this sort is above all necessary in the case of earlier publications, which do not fulfill the standards of modern textual editorial practice. Moreover, the database is not confined to the mere texts, but links them to all the available bibliographical data and information on the inscriptions proper and on the monuments or objects they are inscribed upon. Time-consuming though it is, by means of this method of working the database meets high scholarly demands.

[[category:Projects]]