Enhancing reproducibility and data accessibility is essential to scientific research. However, ensuring data privacy while achieving these goals is challenging, especially in the medical field, where sensitive data are often commonplace. One possible solution is to use synthetic data that mimic real-world datasets. This approach may help to streamline therapy evaluation and enable quicker access to innovative treatments. We propose using a method based on sequential conditional regressions, such as in a fully conditional specification (FCS) approach, along with flexible parametric survival models to accurately replicate covariate patterns and survival times. To make our approach available to a wide audience of users, we have developed user-friendly functions in R and Python to implement it. We also provide an example application to registry data on patients affected by Creutzfeld–Jacob disease. The results show the potentialities of the proposed method in mirroring observed multivariate distributions and survival outcomes.

A flexible parametric approach to synthetic patients generation using health data / Cipriani, Marta; Di Rocco, Lorenzo; Puopolo, Maria; Alfò, Marco. - In: STATISTICAL METHODS & APPLICATIONS. - ISSN 1618-2510. - 34:4(2025), pp. 639-662. [10.1007/s10260-025-00800-5]

A flexible parametric approach to synthetic patients generation using health data

Cipriani, Marta
Primo
;
Di Rocco, Lorenzo
Secondo
;
Puopolo, Maria
Penultimo
;
Alfò, Marco
Ultimo
2025

Abstract

Enhancing reproducibility and data accessibility is essential to scientific research. However, ensuring data privacy while achieving these goals is challenging, especially in the medical field, where sensitive data are often commonplace. One possible solution is to use synthetic data that mimic real-world datasets. This approach may help to streamline therapy evaluation and enable quicker access to innovative treatments. We propose using a method based on sequential conditional regressions, such as in a fully conditional specification (FCS) approach, along with flexible parametric survival models to accurately replicate covariate patterns and survival times. To make our approach available to a wide audience of users, we have developed user-friendly functions in R and Python to implement it. We also provide an example application to registry data on patients affected by Creutzfeld–Jacob disease. The results show the potentialities of the proposed method in mirroring observed multivariate distributions and survival outcomes.
2025
flexible parametric survival model; privacy; simulation; synthetic
01 Pubblicazione su rivista::01a Articolo in rivista
A flexible parametric approach to synthetic patients generation using health data / Cipriani, Marta; Di Rocco, Lorenzo; Puopolo, Maria; Alfò, Marco. - In: STATISTICAL METHODS & APPLICATIONS. - ISSN 1618-2510. - 34:4(2025), pp. 639-662. [10.1007/s10260-025-00800-5]
File allegati a questo prodotto
File Dimensione Formato  
Cipriani_flexible-parametric-approach_2025.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.06 MB
Formato Adobe PDF
2.06 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1747320
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact