Catalogo dei prodotti della ricerca

Clustering problems and clustering algorithms are often overly sensitive to the presence of outliers: even a handful of points can greatly affect the structure of the optimal solution and its cost. This is why many algorithms for robust clustering problems have been formulated in recent years. These algorithms discard some points as outliers, excluding them from the clustering. However, outlier selection can be unfair: some categories of input points may be disproportionately affected by the outlier removal algorithm. We study the problem of k-clustering with fair outlier removal and provide the first approximation algorithm for well-known clustering formulations, such as k-means and k-median. We analyze this algorithm and prove that it has strong theoretical guarantees. We complement this result with an empirical evaluation showing that, while standard methods for outlier removal have a disproportionate impact across categories of input points, our algorithm equalizes the impact while retaining strong experimental performances on multiple real-world datasets. We also show how the fairness of outlier removal can influence the performance of a downstream learning task. Finally, we provide a coreset construction, which makes our algorithm scalable to very large datasets.

k-Clustering with Fair Outliers / Almanza, Matteo; Epasto, Alessandro; Panconesi, Alessandro; Re, Giuseppe. - (2022), pp. 5-15. (Intervento presentato al convegno The Fifteenth ACM International Conference on Web Search and Data Mining, WSDM 2022 tenutosi a Virtual Event, AZ, USA) [10.1145/3488560.3498485].

k-Clustering with Fair Outliers

Matteo Almanza;Alessandro Epasto;Alessandro Panconesi;Giuseppe Re

2022

Abstract

Clustering problems and clustering algorithms are often overly sensitive to the presence of outliers: even a handful of points can greatly affect the structure of the optimal solution and its cost. This is why many algorithms for robust clustering problems have been formulated in recent years. These algorithms discard some points as outliers, excluding them from the clustering. However, outlier selection can be unfair: some categories of input points may be disproportionately affected by the outlier removal algorithm. We study the problem of k-clustering with fair outlier removal and provide the first approximation algorithm for well-known clustering formulations, such as k-means and k-median. We analyze this algorithm and prove that it has strong theoretical guarantees. We complement this result with an empirical evaluation showing that, while standard methods for outlier removal have a disproportionate impact across categories of input points, our algorithm equalizes the impact while retaining strong experimental performances on multiple real-world datasets. We also show how the fairness of outlier removal can influence the performance of a downstream learning task. Finally, we provide a coreset construction, which makes our algorithm scalable to very large datasets.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2022
			
	Nome convegno
	
				The Fifteenth ACM International Conference on Web Search and Data Mining, WSDM 2022
			
	Parole chiave
	
				Clustering; outliers; fairness; k-means; disparate impact; coreset
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				k-Clustering with Fair Outliers / Almanza, Matteo; Epasto, Alessandro; Panconesi, Alessandro; Re, Giuseppe. - (2022), pp. 5-15. (Intervento presentato al  convegno The Fifteenth ACM International Conference on Web Search and Data Mining, WSDM 2022 tenutosi a Virtual Event, AZ, USA) [10.1145/3488560.3498485].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1612882

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact