Catalogo dei prodotti della ricerca

Synthetic data is becoming an increasingly promising technology, and successful applications can improve privacy, fairness, and data democratization. While there are many methods for generating synthetic tabular data, the task remains non-trivial and unexplored for specific scenarios. One such scenario is survival data. Here, the key difficulty is censoring: for some instances, we are not aware of the time of event, or if one even occurred. Imbalances in censoring and time horizons cause generative models to experience three new failure modes specific to survival analysis: (1) generating too few at-risk members; (2) generating too many at-risk members; and (3) censoring too early. We formalize these failure modes and provide three new generative metrics to quantify them. Following this, we propose SurvivalGAN, a generative model that handles survival data firstly by addressing the imbalance in the censoring and event horizons, and secondly by using a dedicated mechanism for approximating time-to-event/censoring. We evaluate this method via extensive experiments on medical datasets. SurvivalGAN outperforms multiple baselines at generating survival data, and in particular addresses the failure modes as measured by the new metrics, in addition to improving downstream performance of survival models trained on the synthetic data.

SurvivalGAN: Generating Time-to-Event Data for Survival Analysis / Norcliffe, A.; Cebere, B.; Imrie, F.; Lio, P.; van der Schaar, M.. - 206:(2023), pp. 10279-10304. ( International Conference on Artificial Intelligence and Statistics Valencia, esp ).

SurvivalGAN: Generating Time-to-Event Data for Survival Analysis

Norcliffe A.;Cebere B.;Imrie F.;Lio P.;van der Schaar M.

2023

Abstract

Synthetic data is becoming an increasingly promising technology, and successful applications can improve privacy, fairness, and data democratization. While there are many methods for generating synthetic tabular data, the task remains non-trivial and unexplored for specific scenarios. One such scenario is survival data. Here, the key difficulty is censoring: for some instances, we are not aware of the time of event, or if one even occurred. Imbalances in censoring and time horizons cause generative models to experience three new failure modes specific to survival analysis: (1) generating too few at-risk members; (2) generating too many at-risk members; and (3) censoring too early. We formalize these failure modes and provide three new generative metrics to quantify them. Following this, we propose SurvivalGAN, a generative model that handles survival data firstly by addressing the imbalance in the censoring and event horizons, and secondly by using a dedicated mechanism for approximating time-to-event/censoring. We evaluate this method via extensive experiments on medical datasets. SurvivalGAN outperforms multiple baselines at generating survival data, and in particular addresses the failure modes as measured by the new metrics, in addition to improving downstream performance of survival models trained on the synthetic data.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Nome convegno
	
				International Conference on Artificial Intelligence and Statistics
			
	Parole chiave
	
				Bioinformatics; Risk assessment
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				SurvivalGAN: Generating Time-to-Event Data for Survival Analysis / Norcliffe, A.; Cebere, B.; Imrie, F.; Lio, P.; van der Schaar, M.. - 206:(2023), pp. 10279-10304. ( International Conference on Artificial Intelligence and Statistics Valencia, esp ).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Norcliffe_SurvivalGAN_2023.pdf accesso aperto Note: https://proceedings.mlr.press/v206/norcliffe23a/norcliffe23a.pdf Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 3.06 MB Formato Adobe PDF	3.06 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1726834

Citazioni

ND

10

1

social impact