Federated Search enables data accessibility under privacy-preserving regulations. One of the strategic objectives of the BBMRI-ERIC biobanking infrastructure is to make high-quality samples findable via Federated Search. The main prerequisites for biobanks to join the Federated Network are the conversion of a local database into a Common Data Model and the setup of a server node in which the database is loaded and made accessible for external queries. Data conversion is often the most critical step for institutions, as many of them lack the technical expertise needed to improve the FAIRness of their data. This is achieved through a local Extraction, Transformation and Loading process of data, usually extracted from a Biobank Information Management System. We hereby present a framework for the conversion of minimal information datasets into HL7-FHIR transaction bundles, allowing basic Biobank Interoperability, and enabling biobanks to be connected to the BBMRI-ERIC European Federated Platform. The toolkit consists of several Python modules, creating JSON files, ready to be uploaded to an internal FHIR server connected to the federated network, enabling data sharing and query execution. This tool has been successfully integrated in three BBMRI.it biobanks, allowing them to share their data correctly. In general, this tool will enforce data harmonization and standardization among research infrastructures, integrating the current pipeline into local information systems. The framework is available at https://github.com/bbdataeng/a-small-fire.

An Extract, Transform, Load foundation for Biobank Data Interoperability / Cruoglio, Antonella; Rossi, Federica; Fragnito, Davide; Palombo, Ramona; Massacci, Alice; Betti, Martina; D'Antonio, Mattia; Borsani, Massimiliano; Miele, Claudia; Ciliberto, Gennaro; Lavitrano, Marialuisa; Pallocca, Matteo. - In: ACM JOURNAL OF DATA AND INFORMATION QUALITY. - ISSN 1936-1955. - 17:4(2025). [10.1145/3769117]

An Extract, Transform, Load foundation for Biobank Data Interoperability

Martina Betti;Claudia Miele;Marialuisa Lavitrano;Matteo Pallocca
2025

Abstract

Federated Search enables data accessibility under privacy-preserving regulations. One of the strategic objectives of the BBMRI-ERIC biobanking infrastructure is to make high-quality samples findable via Federated Search. The main prerequisites for biobanks to join the Federated Network are the conversion of a local database into a Common Data Model and the setup of a server node in which the database is loaded and made accessible for external queries. Data conversion is often the most critical step for institutions, as many of them lack the technical expertise needed to improve the FAIRness of their data. This is achieved through a local Extraction, Transformation and Loading process of data, usually extracted from a Biobank Information Management System. We hereby present a framework for the conversion of minimal information datasets into HL7-FHIR transaction bundles, allowing basic Biobank Interoperability, and enabling biobanks to be connected to the BBMRI-ERIC European Federated Platform. The toolkit consists of several Python modules, creating JSON files, ready to be uploaded to an internal FHIR server connected to the federated network, enabling data sharing and query execution. This tool has been successfully integrated in three BBMRI.it biobanks, allowing them to share their data correctly. In general, this tool will enforce data harmonization and standardization among research infrastructures, integrating the current pipeline into local information systems. The framework is available at https://github.com/bbdataeng/a-small-fire.
2025
FAIR; Federated Search; Biobank Data Quality; ETL; Clinical Bioinformatics
01 Pubblicazione su rivista::01a Articolo in rivista
An Extract, Transform, Load foundation for Biobank Data Interoperability / Cruoglio, Antonella; Rossi, Federica; Fragnito, Davide; Palombo, Ramona; Massacci, Alice; Betti, Martina; D'Antonio, Mattia; Borsani, Massimiliano; Miele, Claudia; Ciliberto, Gennaro; Lavitrano, Marialuisa; Pallocca, Matteo. - In: ACM JOURNAL OF DATA AND INFORMATION QUALITY. - ISSN 1936-1955. - 17:4(2025). [10.1145/3769117]
File allegati a questo prodotto
File Dimensione Formato  
Cruoglio_An-Extract_2025.pdf

accesso aperto

Note: https://doi.org/10.1145/376911
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 748.44 kB
Formato Adobe PDF
748.44 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1755079
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact