A significant portion of information that is nowadays collected in enterprises and organizations resides in text documents, and thus is inherently unstructured. Turning it into a structured form is the aim of information extraction (IE). Depending on the approach followed, the output of an IE process can fill forms or populate relational tables, or can be presented through an ontology. This last approach is particularly interesting, since ontologies may facilitate the integration with other corporate and external data, and enable data management and governance at an abstract, conceptual level, as in Ontology-based Data Access (OBDA). To this aim, OBDA uses declarative mappings that specify the relation between the ontology and the database to be accessed. In OBDA, however, only mappings towards relational databases have been so far considered, and how to declaratively relate the ontology to unstructured sources is still unexplored. By leveraging the study on document spanners for IE, in this paper we propose a new framework that allows to map text documents to ontologies, in the spirit of the OBDA approach. We then investigate the problem of answering conjunctive queries (CQs) in our framework, and show that, if the ontology is specified in the lightweight Description Logic DL-LiteR, the problem can be solved by reformulating the user query into a new spanner. Interestingly, both the spanners used in the mapping and the one computed by the rewriting algorithm have the same expressiveness, and CQ answering in this case is polynomial in data complexity.
A Formal Framework for Coupling Document Spanners with Ontologies / Lembo, Domenico; Scafoglieri, Federico. - (2019). (Intervento presentato al convegno 2th International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) tenutosi a Cagliari; Italy) [10.1109/AIKE.2019.00036].
A Formal Framework for Coupling Document Spanners with Ontologies
LEMBO, DOMENICO
;SCAFOGLIERI, FEDERICO
2019
Abstract
A significant portion of information that is nowadays collected in enterprises and organizations resides in text documents, and thus is inherently unstructured. Turning it into a structured form is the aim of information extraction (IE). Depending on the approach followed, the output of an IE process can fill forms or populate relational tables, or can be presented through an ontology. This last approach is particularly interesting, since ontologies may facilitate the integration with other corporate and external data, and enable data management and governance at an abstract, conceptual level, as in Ontology-based Data Access (OBDA). To this aim, OBDA uses declarative mappings that specify the relation between the ontology and the database to be accessed. In OBDA, however, only mappings towards relational databases have been so far considered, and how to declaratively relate the ontology to unstructured sources is still unexplored. By leveraging the study on document spanners for IE, in this paper we propose a new framework that allows to map text documents to ontologies, in the spirit of the OBDA approach. We then investigate the problem of answering conjunctive queries (CQs) in our framework, and show that, if the ontology is specified in the lightweight Description Logic DL-LiteR, the problem can be solved by reformulating the user query into a new spanner. Interestingly, both the spanners used in the mapping and the one computed by the rewriting algorithm have the same expressiveness, and CQ answering in this case is polynomial in data complexity.File | Dimensione | Formato | |
---|---|---|---|
Lembo_Postprint_A-Formal-Framework_2019.pdf
Open Access dal 09/08/2020
Note: https://ieeexplore.ieee.org/document/8791691
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
321.96 kB
Formato
Adobe PDF
|
321.96 kB | Adobe PDF | |
Lembo_Frontespizio_A-Formal-Framework_2019.pdf
solo gestori archivio
Tipologia:
Altro materiale allegato
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
99.33 kB
Formato
Adobe PDF
|
99.33 kB | Adobe PDF | Contatta l'autore |
Lembo_A-Formal-Framework_2019.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
395.13 kB
Formato
Adobe PDF
|
395.13 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.