A significant portion of information that is nowadays collected in enterprises and organizations resides in text documents, and thus is inherently unstructured. Turning it into a structured form is the aim of information extraction (IE). Depending on the approach followed, the output of an IE process can fill forms or populate relational tables, or can be presented through an ontology. This last approach is particularly interesting, since ontologies may facilitate the integration with other corporate and external data, and enable data management and governance at an abstract, conceptual level, as in Ontology-based Data Access (OBDA). To this aim, OBDA uses declarative mappings that specify the relation between the ontology and the database to be accessed. In OBDA, however, only mappings towards relational databases have been so far considered, and how to declaratively relate the ontology to unstructured sources is still unexplored. By leveraging the study on document spanners for IE, in this paper we propose a new framework that allows to map text documents to ontologies, in the spirit of the OBDA approach. We then investigate the problem of answering conjunctive queries (CQs) in our framework, and show that, if the ontology is specified in the lightweight Description Logic DL-LiteR, the problem can be solved by reformulating the user query into a new spanner. Interestingly, both the spanners used in the mapping and the one computed by the rewriting algorithm have the same expressiveness, and CQ answering in this case is polynomial in data complexity.

A Formal Framework for Coupling Document Spanners with Ontologies / Lembo, Domenico; Scafoglieri, Federico. - (2019). (Intervento presentato al convegno 2th International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) tenutosi a Cagliari; Italy) [10.1109/AIKE.2019.00036].

A Formal Framework for Coupling Document Spanners with Ontologies

LEMBO, DOMENICO
;
SCAFOGLIERI, FEDERICO
2019

Abstract

A significant portion of information that is nowadays collected in enterprises and organizations resides in text documents, and thus is inherently unstructured. Turning it into a structured form is the aim of information extraction (IE). Depending on the approach followed, the output of an IE process can fill forms or populate relational tables, or can be presented through an ontology. This last approach is particularly interesting, since ontologies may facilitate the integration with other corporate and external data, and enable data management and governance at an abstract, conceptual level, as in Ontology-based Data Access (OBDA). To this aim, OBDA uses declarative mappings that specify the relation between the ontology and the database to be accessed. In OBDA, however, only mappings towards relational databases have been so far considered, and how to declaratively relate the ontology to unstructured sources is still unexplored. By leveraging the study on document spanners for IE, in this paper we propose a new framework that allows to map text documents to ontologies, in the spirit of the OBDA approach. We then investigate the problem of answering conjunctive queries (CQs) in our framework, and show that, if the ontology is specified in the lightweight Description Logic DL-LiteR, the problem can be solved by reformulating the user query into a new spanner. Interestingly, both the spanners used in the mapping and the one computed by the rewriting algorithm have the same expressiveness, and CQ answering in this case is polynomial in data complexity.
2019
2th International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)
Information Extraction; Description Logics; Ontology-based Data Access
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
A Formal Framework for Coupling Document Spanners with Ontologies / Lembo, Domenico; Scafoglieri, Federico. - (2019). (Intervento presentato al convegno 2th International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) tenutosi a Cagliari; Italy) [10.1109/AIKE.2019.00036].
File allegati a questo prodotto
File Dimensione Formato  
Lembo_Postprint_A-Formal-Framework_2019.pdf

Open Access dal 09/08/2020

Note: https://ieeexplore.ieee.org/document/8791691
Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 321.96 kB
Formato Adobe PDF
321.96 kB Adobe PDF
Lembo_Frontespizio_A-Formal-Framework_2019.pdf

solo gestori archivio

Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 99.33 kB
Formato Adobe PDF
99.33 kB Adobe PDF   Contatta l'autore
Lembo_A-Formal-Framework_2019.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 395.13 kB
Formato Adobe PDF
395.13 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1276654
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 3
social impact