G2A Web App
(From the Greek_Into_Arabic website) The project G2A Web App is aimed at achieving this ambitious target through an architecture based on interconnected modules. In other words, the system works with a nucleus of components for the treatment of both text files and digital image files, which form the core of the system. According to the specific needs, from time to time a number of programs are added both for the management of images (enhancement, segmentation, pattern recognition, etc.) and of text (natural language processing, information extraction, data mining, electronic editing, ecc.). The simplified scheme which lies behind G2A Web Application could be represented as follows:
- The first element is represented by the respect of internationally shared standards, so that the information managed by G2A Web Application is interoperable with other data produced in the humanistic field. The standards are also followed when not only primary data (texts, images, etc.), but also secondary information, such as annotations, variants, comments and/or information produced by computational systems (e.g. morphological, syntactic, semantic analyses) are introduced. The software development tools are totally open source in order to avoid any fees for end-user licences.
- The information system is entirely web-based and the tools for the production or search for information are oriented towards the sector of critical and textual scientific editing. At present, the target of G&A Web Application is represented by specialist users. However, the structure of the system also envisions a number of operations, in particular those connected to the phases of search and query, which can be further developed so as to meet the needs of a non-specialist-user.
- G2A Web App allows to produce on a web server data that have been labelled and annotated in collaborative form, as long as all the members of the same community (e.g. mediaeval philologists, Greek papyrologists, Egyptologists, Latin epigraphists, historians and science philosophers, etc.) agree with the same standards, as evidenced in point 1.
- Some experiments are in course to check whether G2A Web Application meets the needs requested by a specific community of scholars working on ancient sources written in non latin alphabets (e.g., Greek, Arabic). The documents and annotations are produced in digital format and are classified according to a domain ontology agreed upon by the same members of the community. This semantic-conceptual structure can be replicated not only to classify the documents, but also part of their content. In this way, it is possible to retrieve information both at the level of forms (words, strings of characters, lemmas), and at the level of concepts expressed in the single parts of the texts.
A more detailed description can be found at http://www.greekintoarabic.eu/index.php?id=5