Data preparation is crucial for achieving good data management following the four foundational FAIR principles — Findability, Accessibility, Interoperability, and Reusability. Processing datasets to achieve high data (and metadata) quality is mandatory in modern applications. However, the data preparation activities that are needed to reach such levels may easily become unsustainable due to, for example, resource intensity or scalability challenges. Moreover, some preparation efforts may become unnecessary if they result in negligible improvements or duplicate actions. This paper examines the sustainability aspects of data preparation through the lens of a circular economy. Within the data landscape, this perspective encourages practices that minimize waste, extend the data life cycle, and maximize reuse in alignment with the FAIR principles. We explore these practices and their impact on selecting and configuring effective data preparation strategies to design sustainable, high-quality pipelines. To this end, we propose an evaluation model that integrates data quality metrics with sustainability parameters for human and computational tasks. Finally, we apply the model in a comparative analysis of key data preparation methods, demonstrating its effectiveness in assessing sustainability and quality trade-offs.

Sustainable Quality in Data Preparation / Pernici, Barbara; Cappiello, Cinzia; Alberto Bono, Carlo; Sancricca, Camilla; Catarci, Tiziana; Angelini, Marco; Filosa, Matteo; Palmonari, Matteo; De Paoli, Flavio; Bergamaschi, Sonia; Simonini, Giovanni; Mozzillo, Angelo; Zecchini, Luca. - In: ACM JOURNAL OF DATA AND INFORMATION QUALITY. - ISSN 1936-1955. - 17:4(2025), pp. 1-33. [10.1145/3769120]

Sustainable Quality in Data Preparation

Tiziana Catarci;Marco Angelini;Matteo Filosa;Sonia Bergamaschi;
2025

Abstract

Data preparation is crucial for achieving good data management following the four foundational FAIR principles — Findability, Accessibility, Interoperability, and Reusability. Processing datasets to achieve high data (and metadata) quality is mandatory in modern applications. However, the data preparation activities that are needed to reach such levels may easily become unsustainable due to, for example, resource intensity or scalability challenges. Moreover, some preparation efforts may become unnecessary if they result in negligible improvements or duplicate actions. This paper examines the sustainability aspects of data preparation through the lens of a circular economy. Within the data landscape, this perspective encourages practices that minimize waste, extend the data life cycle, and maximize reuse in alignment with the FAIR principles. We explore these practices and their impact on selecting and configuring effective data preparation strategies to design sustainable, high-quality pipelines. To this end, we propose an evaluation model that integrates data quality metrics with sustainability parameters for human and computational tasks. Finally, we apply the model in a comparative analysis of key data preparation methods, demonstrating its effectiveness in assessing sustainability and quality trade-offs.
2025
Information Systems; Information Integration; Data preparation; Data quality; Sustainability
01 Pubblicazione su rivista::01a Articolo in rivista
Sustainable Quality in Data Preparation / Pernici, Barbara; Cappiello, Cinzia; Alberto Bono, Carlo; Sancricca, Camilla; Catarci, Tiziana; Angelini, Marco; Filosa, Matteo; Palmonari, Matteo; De Paoli, Flavio; Bergamaschi, Sonia; Simonini, Giovanni; Mozzillo, Angelo; Zecchini, Luca. - In: ACM JOURNAL OF DATA AND INFORMATION QUALITY. - ISSN 1936-1955. - 17:4(2025), pp. 1-33. [10.1145/3769120]
File allegati a questo prodotto
File Dimensione Formato  
Pernici_Sustainable-Quality_2025.pdf

accesso aperto

Note: https://doi.org/10.1145/3769120
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 612.32 kB
Formato Adobe PDF
612.32 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1751132
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact