Dataset Integration Hack

From The Digital Classicist Wiki
Jump to navigation Jump to search

The problem

How to integrate several distributed but Open Access and Open Licensed datasets so that they can be served via a metadata portal from a single web service.

The datasets: Open Access Classical Data


OAI-PMH server and DC metadata. (JN, MR, JMV: more info please?)

JOAI is a Java implementation of OAI-PMH data provider and harvester that might be used for a first proof-of-concept implementation.



Metadata will be extracted on a case-by-case basis from the source data, with additional global parameters provided from local knowledge as required. Ideally, and eventually, individual datasets would provide their own OAI service to expose this metadata. (We may try to illustrate this with IAph and IRT at some point.)


Each dataset will be essentially transformed into a data provider by exposing the extracted metadata accordingly with the OAI-PMH.


OAI-PMH in Dublin Core

Tags How we generate?
dc:title title of resource
dc:creator harvest (or known?)
dc:subject ??
dc:description if any free prose
dc:publisher harvest
dc:contributor harvest if given
dc:date harvest
dc:type photograph|commentary|database|linked data|other)
dc:format filetypes?
dc:identifier URI and/or URL?
dc:source ??
dc:language = modern language
dc:relation ??
dc:coverage ??
dc:rights = license (in spreadsheet)

What's next?

  • Set up OAIPMH server.
  • Create sample metadata for each dataset (ideally by writing scripts for the sake of process reproducibility)
  • discuss viability of CKAN for our purposes
  • Next meeting.