Catalogo dei prodotti della ricerca

Given a set of points, clustering consists of finding a partition of a point set into k clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the famous k-median and k-means objectives. One may also choose centers to be j dimensional subspaces, which gives rise to subspace clustering. In this paper, we consider learning bounds for these problems. That is, given a set of n samples P drawn independently from some unknown, but fixed distribution D, how quickly does a solution computed on P converge to the optimal clustering of D? We give several near optimal results. In particular, 1. For center-based objectives, we show a convergence rate of ~O(√k/n). This matches the known optimal bounds of [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] and [Bartlett, Linder, and Lugosi, IEEE Trans. Inf. Theory 1998] for k-means and extends it to other important objectives such as k-median. 2. For subspace clustering with j-dimensional subspaces, we show a convergence rate of ~O(√(kj^2)/n). These are the first provable bounds for most of these problems. For the specific case of projective clustering, which generalizes k-means, we show a converge rate of Ω(√(kj)/n) is necessary, thereby proving that the bounds from [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] are essentially optimal.

On Generalization Bounds for Projective Clustering / Bucarelli, MARIA SOFIA; Larsen, Matilde; Schwiegelshohn, Chris; Toftrup, Mads. - 36:(2023). (Intervento presentato al convegno Advances in Neural Information Processing Systems tenutosi a New Orleans; USA).

On Generalization Bounds for Projective Clustering

Maria Sofia Bucarelli;Matilde Larsen;Chris Schwiegelshohn;Mads Toftrup

2023

Abstract

Given a set of points, clustering consists of finding a partition of a point set into k clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the famous k-median and k-means objectives. One may also choose centers to be j dimensional subspaces, which gives rise to subspace clustering. In this paper, we consider learning bounds for these problems. That is, given a set of n samples P drawn independently from some unknown, but fixed distribution D, how quickly does a solution computed on P converge to the optimal clustering of D? We give several near optimal results. In particular, 1. For center-based objectives, we show a convergence rate of ~O(√k/n). This matches the known optimal bounds of [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] and [Bartlett, Linder, and Lugosi, IEEE Trans. Inf. Theory 1998] for k-means and extends it to other important objectives such as k-median. 2. For subspace clustering with j-dimensional subspaces, we show a convergence rate of ~O(√(kj^2)/n). These are the first provable bounds for most of these problems. For the specific case of projective clustering, which generalizes k-means, we show a converge rate of Ω(√(kj)/n) is necessary, thereby proving that the bounds from [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] are essentially optimal.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Nome convegno
	
				Advances in Neural Information Processing Systems
			
	Parole chiave
	
				subspace clustering; learning theory, clustering; error bounds
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				On Generalization Bounds for Projective Clustering / Bucarelli, MARIA SOFIA; Larsen, Matilde; Schwiegelshohn, Chris; Toftrup, Mads. - 36:(2023). (Intervento presentato al  convegno Advances in Neural Information Processing Systems tenutosi a New Orleans; USA).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Bucarelli_On-Generalization_2023.pdf accesso aperto Note: https://proceedings.neurips.cc/paper_files/paper/2023/file/e30bf4765ae6b16a87fb4d7b0b3b3dec-Paper-Conference.pdf Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.96 MB Formato Adobe PDF	1.96 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1707394

Citazioni

ND

0

0

social impact