Ensuring the quality of integrated data is undoubtedly one of the main problems of integrated data systems. When focusing on multi-national and historical data integration systems, where the “space” and “time” dimensions play a relevant role, it is very much important to build the integration layer in such a way that the final user accesses a layer that is “by design” as much complete as possible. In this paper, we propose a method for accessing data in multipurpose data infrastructures, like data integration systems, which has the properties of (i) relieving the final user from the need to access single data sources while, at the same time, (ii) ensuring to maximize the amount of the information available for the user at the integration layer. Our approach is based on a completeness-aware integration approach which allows the user to have ready available all the maximum information that can get out of the integrated data system without having to carry out the preliminary data quality analysis on each of the databases included in the system. Our proposal of providing data quality information at the integrated level extends then the functions of the individual data sources, opening the data infrastructure to additional uses. This may be a first step to move from data infrastructures towards knowledge infrastructures. A case study on the research infrastructure for the science and innovation studies shows the usefulness of the proposed approach.

Accounting for quality in data integration systems: a completeness-aware integration approach / Daraio, Cinzia; Di Leo, Simone; Scannapieco, Monica. - In: SCIENTOMETRICS. - ISSN 0138-9130. - (2022). [10.1007/s11192-022-04266-0]

Accounting for quality in data integration systems: a completeness-aware integration approach

Daraio, Cinzia
;
Di Leo, Simone;
2022

Abstract

Ensuring the quality of integrated data is undoubtedly one of the main problems of integrated data systems. When focusing on multi-national and historical data integration systems, where the “space” and “time” dimensions play a relevant role, it is very much important to build the integration layer in such a way that the final user accesses a layer that is “by design” as much complete as possible. In this paper, we propose a method for accessing data in multipurpose data infrastructures, like data integration systems, which has the properties of (i) relieving the final user from the need to access single data sources while, at the same time, (ii) ensuring to maximize the amount of the information available for the user at the integration layer. Our approach is based on a completeness-aware integration approach which allows the user to have ready available all the maximum information that can get out of the integrated data system without having to carry out the preliminary data quality analysis on each of the databases included in the system. Our proposal of providing data quality information at the integrated level extends then the functions of the individual data sources, opening the data infrastructure to additional uses. This may be a first step to move from data infrastructures towards knowledge infrastructures. A case study on the research infrastructure for the science and innovation studies shows the usefulness of the proposed approach.
2022
Data and information quality; Data integrated system; Longitudinal data Multinational data; Data inftrastructures; Research infrastructures; Knowledge infrastructures
01 Pubblicazione su rivista::01a Articolo in rivista
Accounting for quality in data integration systems: a completeness-aware integration approach / Daraio, Cinzia; Di Leo, Simone; Scannapieco, Monica. - In: SCIENTOMETRICS. - ISSN 0138-9130. - (2022). [10.1007/s11192-022-04266-0]
File allegati a questo prodotto
File Dimensione Formato  
Daraio_Accounting_2022.pdf

accesso aperto

Note: https://link.springer.com/article/10.1007/s11192-022-04266-0
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 2.42 MB
Formato Adobe PDF
2.42 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1611997
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
social impact