Catalogo dei prodotti della ricerca

Information Extraction (IE) is the task of automatically organizing in a structured form data extracted from free text documents. In several contexts, it is often desirable that extracted data are then organized according to an ontology, which provides a formal and conceptual representation of the domain of interest. Ontologies allow for a better data interpretation, as well as for their semantic integration with other information, as in Ontology-based Data Access (OBDA), a popular declarative framework for data management where an ontology is connected to a data layer through mappings. However, the data layer considered so far in OBDA has consisted essentially of relational databases, and how to declaratively couple an ontology with unstructured data sources is still unexplored. By leveraging the recent study on document spanners for rule-based IE by Fagin et al., in this paper we propose a new framework that allows to map text documents to ontologies, in the spirit of OBDA. We investigate the problem of answering conjunctive queries in this framework. For ontologies specified in the Description Logics DL-LiteR and DL-LiteF , we show that the problem is polynomial in the size of the underlying documents. We also provide algorithms to solve query answering by rewriting the input query on the basis of the ontology and its mapping towards the source documents. Through these techniques we pursue a virtual approach, similar to that typically adopted in OBDA, which allows us to answer a query without having to first populate the entire ontology. Interestingly, for DL-LiteR both the spanners used in the mapping and the one computed by the rewriting algorithm belong to the same class of expressiveness. This holds also for DL-LiteF , modulo some limitations on the form of the mapping. These results say that in these cases our framework can be easily implemented by decoupling ontology management and document access, which can be delegated to an external IE system able to compute the extraction rules we use in the mapping.

Ontology-based Document Spanning Systems for Information Extraction / Lembo, D., Scafoglieri, F.. - In: INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING. - ISSN 1793-351X. - 14:1(2020), pp. 3-26. [10.1142/S1793351X20400012]

Ontology-based Document Spanning Systems for Information Extraction

Lembo Domenico;Scafoglieri Federico

2020

Abstract

Information Extraction (IE) is the task of automatically organizing in a structured form data extracted from free text documents. In several contexts, it is often desirable that extracted data are then organized according to an ontology, which provides a formal and conceptual representation of the domain of interest. Ontologies allow for a better data interpretation, as well as for their semantic integration with other information, as in Ontology-based Data Access (OBDA), a popular declarative framework for data management where an ontology is connected to a data layer through mappings. However, the data layer considered so far in OBDA has consisted essentially of relational databases, and how to declaratively couple an ontology with unstructured data sources is still unexplored. By leveraging the recent study on document spanners for rule-based IE by Fagin et al., in this paper we propose a new framework that allows to map text documents to ontologies, in the spirit of OBDA. We investigate the problem of answering conjunctive queries in this framework. For ontologies specified in the Description Logics DL-LiteR and DL-LiteF , we show that the problem is polynomial in the size of the underlying documents. We also provide algorithms to solve query answering by rewriting the input query on the basis of the ontology and its mapping towards the source documents. Through these techniques we pursue a virtual approach, similar to that typically adopted in OBDA, which allows us to answer a query without having to first populate the entire ontology. Interestingly, for DL-LiteR both the spanners used in the mapping and the one computed by the rewriting algorithm belong to the same class of expressiveness. This holds also for DL-LiteF , modulo some limitations on the form of the mapping. These results say that in these cases our framework can be easily implemented by decoupling ontology management and document access, which can be delegated to an external IE system able to compute the extraction rules we use in the mapping.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2020
			
	Parole chiave
	
				Information Extraction; ontologies; Description Logics; Computational Complexity
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Ontology-based Document Spanning Systems for Information Extraction / Lembo, D., Scafoglieri, F.. - In: INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING. - ISSN 1793-351X. - 14:1(2020), pp. 3-26. [10.1142/S1793351X20400012]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Lembo_Ontology-based_2020.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 545.05 kB Formato Adobe PDF Contatta l'autore	545.05 kB	Adobe PDF	Contatta l'autore
Lembo_postprint_Ontology-based_2020.pdf accesso aperto Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione) Licenza: Creative commons Dimensione 413.39 kB Formato Adobe PDF	413.39 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1476957

Citazioni

ND

9

3

social impact