Catalogo dei prodotti della ricerca

Federated learning (FL) allows collaborative machine learning (ML) model training among decentralized clients’ information, ensuring data privacy. The decentralized nature of FL deals with non-independent and identically distributed (non-IID) data. This open problem has notable consequences, such as decreased model performance and longer convergence times. Despite its importance, experimental studies systematically addressing all types of data heterogeneity (a.k.a. non-IIDness) remain scarce. This paper aims to fill this gap by assessing and quantifying the non-IID effect through an empirical analysis. We use the Hellinger Distance (HD) to measure differences in distribution among clients. Our study benchmarks five state-of-the-art strategies for handling non-IID data, including label, feature, quantity, and spatiotemporal skews, under realistic and controlled conditions. This is the first comprehensive analysis of the spatiotemporal skew effect in FL. Our findings highlight the significant impact of label and spatiotemporal skew non-IID types on FL model performance, with notable performance drops occurring at specific HD thresholds. The FL performance is also heavily affected, mainly when the non-IIDness is extreme. Thus, we provide recommendations for FL research to tackle data heterogeneity effectively. Our work represents the most extensive examination of non-IIDness in FL, offering a robust foundation for future research.

A thorough assessment of the non-IID data impact in federated learning / Jimenez-Gutierrez, D.M., Hassanzadeh, M., Anagnostopoulos, A., Chatzigiannakis, I., Vitaletti, A.. - In: JOURNAL OF INDUSTRIAL INFORMATION INTEGRATION. - ISSN 2452-414X. - 50:(2026). [10.1016/j.jii.2025.101052]

A thorough assessment of the non-IID data impact in federated learning

Jimenez-Gutierrez D. M.;Hassanzadeh M.;Anagnostopoulos A.;Chatzigiannakis I.;Vitaletti A.

2026

Abstract

Federated learning (FL) allows collaborative machine learning (ML) model training among decentralized clients’ information, ensuring data privacy. The decentralized nature of FL deals with non-independent and identically distributed (non-IID) data. This open problem has notable consequences, such as decreased model performance and longer convergence times. Despite its importance, experimental studies systematically addressing all types of data heterogeneity (a.k.a. non-IIDness) remain scarce. This paper aims to fill this gap by assessing and quantifying the non-IID effect through an empirical analysis. We use the Hellinger Distance (HD) to measure differences in distribution among clients. Our study benchmarks five state-of-the-art strategies for handling non-IID data, including label, feature, quantity, and spatiotemporal skews, under realistic and controlled conditions. This is the first comprehensive analysis of the spatiotemporal skew effect in FL. Our findings highlight the significant impact of label and spatiotemporal skew non-IID types on FL model performance, with notable performance drops occurring at specific HD thresholds. The FL performance is also heavily affected, mainly when the non-IIDness is extreme. Thus, we provide recommendations for FL research to tackle data heterogeneity effectively. Our work represents the most extensive examination of non-IIDness in FL, offering a robust foundation for future research.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2026
			
	Parole chiave
	
				Data heterogeneity quantification; Federated learning; Machine learning; Non-IID data; Spatiotemporal skew
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				A thorough assessment of the non-IID data impact in federated learning / Jimenez-Gutierrez, D.M., Hassanzadeh, M., Anagnostopoulos, A., Chatzigiannakis, I., Vitaletti, A.. - In: JOURNAL OF INDUSTRIAL INFORMATION INTEGRATION. - ISSN 2452-414X. - 50:(2026). [10.1016/j.jii.2025.101052]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Jimenez-Gutierrez_A-thorough-assessment_2026.pdf accesso aperto Note: https://doi.org/10.1016/j.jii.2025.101052 Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 2.8 MB Formato Adobe PDF	2.8 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1769952

Citazioni

ND

3

2

social impact