Real data is often affected by errors and inconsistencies. Many of them depend on the fact that schemas cannot represent a sufficiently wide range of constraints. Data cleaning is the process of identifying and possibly correcting data quality problems that affect the data. Cleaning data requires to gather knowledge on the domain to which the data refer. Anyway, existing data cleaning techniques still access this knowledge as a fragmented collection of heterogenous rules and ad hoc data transformations. Furthermore, data cleaning methodologies for an important class of data based on the semistructured XML data model have not yet been proposed. In this paper we introduce the OXC framework, that offers a methodology for XML data cleaning based on a uniform representation of domain knowledge through an ontology We describe how to define XML related data quality metrics based on our domain knowledge representation, and give a definition of various metrics related to the completeness data quality dimension. © Springer-Verlag Berlin Heidelberg 2005.

Using ontologies for XML data cleaning / Diego, Milano; Monica, Scannapieco; Catarci, Tiziana. - 3762 LNCS:(2005), pp. 562-571. (Intervento presentato al convegno OTM Confederated Internationl Workshops tenutosi a Agia Napa; Cyprus) [10.1007/11575863_75].

Using ontologies for XML data cleaning

CATARCI, Tiziana
2005

Abstract

Real data is often affected by errors and inconsistencies. Many of them depend on the fact that schemas cannot represent a sufficiently wide range of constraints. Data cleaning is the process of identifying and possibly correcting data quality problems that affect the data. Cleaning data requires to gather knowledge on the domain to which the data refer. Anyway, existing data cleaning techniques still access this knowledge as a fragmented collection of heterogenous rules and ad hoc data transformations. Furthermore, data cleaning methodologies for an important class of data based on the semistructured XML data model have not yet been proposed. In this paper we introduce the OXC framework, that offers a methodology for XML data cleaning based on a uniform representation of domain knowledge through an ontology We describe how to define XML related data quality metrics based on our domain knowledge representation, and give a definition of various metrics related to the completeness data quality dimension. © Springer-Verlag Berlin Heidelberg 2005.
2005
OTM Confederated Internationl Workshops
database systems; algorithms; privacy-preserving record
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Using ontologies for XML data cleaning / Diego, Milano; Monica, Scannapieco; Catarci, Tiziana. - 3762 LNCS:(2005), pp. 562-571. (Intervento presentato al convegno OTM Confederated Internationl Workshops tenutosi a Agia Napa; Cyprus) [10.1007/11575863_75].
File allegati a questo prodotto
File Dimensione Formato  
VE_2005_11573-210503.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 386.88 kB
Formato Adobe PDF
386.88 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/210503
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 17
  • ???jsp.display-item.citation.isi??? 10
social impact