29th May 2017 in Portorož, Slovenia
Collaboratively Producing Interoperable Ontologies and Semantically Annotated Corpora: the symogih.org project
Abstract: The Digital History department (Pôle histoire numérique) of the LARHRA laboratory in Lyon has developed since ten years the symogih.org project (Système modulaire de gestion de l’information historique), a method and a platform to collaboratively produce structured data and use them for semantically annotate TEI encoded texts. The aim of the project is not only to connect individual historical research and data production with a collectively managed data repository, but also to interlink the platform’s data to those published by other data providers, e.g. authority files of national libraries, museums and other cultural heritage institutions, and to format them according to widespread standards, like the CIDOC-CRM. In this way the data will be available, interoperable and reusable for new platform-internal and external research projects, and for the public.
In the first part of my talk, I will describe the method the symogih.org project has adopted to collaboratively develop and maintain an ontology for historical data which can be indefinitely extended according to the needs of present participants and of new research projects. Further, I’ll report about the ongoing process of refining the symogih.org ontology using the CIDOC-CRM modelling method. This process is aimed at developing a CRM extension for historical data that will be managed by a consortium and be opened to any interested project and to further development according to the specific needs of participant projects.
In the second part, I’ll give an account of a method to semantically annotate XML encoded texts using some basic tags and properties of the TEI standard, combining them with the flexibility and richness of an ontology for historical data. The workflow integrates the corpus analysis environment TXM for exploring the text from a linguistic perspective before annotating it semantically with the project ontology. I’ll then outline how this method allows to analyse the terminology of a historical text corpus and collaboratively manage a conceptual thesaurus.
Short bio: Francesco Beretta is a CNRS researcher since 2005, in the LARHRA laboratory in Lyon. Since 2009, he is the head of the Digital History department. Specialist in the history of Roman Inquisition, in the intellectual history of catholicism and the history of science, he has taught in different universities in Fribourg, Lausanne, Paris (EPHE, EHESS) and Lyon. In Digital Humanities, his domains of competence are in the field of data modeling and curation, ontologies, relational databases, GIS and text encoding in XML/TEI.
The 3rd Workshop on Semantic Web for Scientific Heritage will be held in conjunction with the 14th ESWC Conference which takes place from May 28th to June 1st in Portorož, Slovenia. It is a continuation of the SW4SH workshop series initiated at ESWC 2015 which aims to provide a leading international and interdisciplinary forum for disseminating the latest research in the field of Semantic Web for the preservation and exploitation of our scientific heritage, the study of the history of ideas and their transmission.
Classicists and historians are interested in developing textual databases, in order to gather and explore large amounts of primary source materials. For a long time, they mainly focused on text digitization and markup. They only recently decided to try to explore the possibility of transferring some analytical processes they previously thought incompatible with automation to knowledge engineering systems, thus taking advantage of the growing set of tools and techniques based on the languages and standards of the semantic Web, such as linked data, ontologies, and automated reasoning. The iconographic data, which are also relevant in history of science and arise similar problematic could be addressed as well and offer suggestive insights for a global methodology for diverse media. On the other hand, Semantic Web researchers are willing to take up more ambitious challenges than those arising in the native context of the Web in terms of anthropological complexity, addressing meta-semantic problems of flexible, pluralist or evolutionary ontologies, sources heterogeneity, hermeneutic and rhetoric dimensions. Thus the opportunity for a fruitful encounter of knowledge engineers with computer-savvy historians and classicists has come. This encounter may be inscribed within the more general context of digital humanities, a research area at the intersection of computing and the humanities disciplines which is gaining an ever-increasing momentum and where the Linked Open Data is playing an increasingly prominent role.
The purpose of the SW4SH workshop series is to provide a forum for discussion about the methodological approaches to the specificity of annotating “scientific” texts (in the wide sense of the term, including disciplines such as history, architecture, or rhetoric), and to support a collaborative reflection, on possible guidelines or specific models for building historical ontologies. The iconographic data, which are also relevant in history of science and arise similar problematic could be addressed as well and offer suggestive insights for a global methodology for diverse media. A key goal of the workshop, focusing on research issues related to pre-modern scientific texts, is to emphasize, through precise projects and up-to-date investigation in digital humanities, the benefit of a multidisciplinary research to create and operate on relevantly structured data. One of the main interests of the very topic of pre-modern historical data management lies in historical semantics, and the opportunity to jointly consider how to identify and express lexical, theoretical and material evolutions. Dealing with historical texts, a major problem is indeed to handle the discrepancy of the historical terminology compared to the modern one, and, in the case of massive, diachronic data, to take into account the contextual and theoretical meaning of terms and segments of texts and their semantics.