The minimum sum-of-squares clustering problem (MSSC) consists of partitioning n observations into k clusters in order to minimize the sum of squared distances from the points to the centroid of their cluster. In this paper, we propose an exact algorithm for the MSSC problem based on the branch-and-bound technique. The lower bound is computed by using a cutting-plane procedure in which valid inequalities are iteratively added to the Peng–Wei semidefinite programming (SDP) relaxation. The upper bound is computed with the constrained version of k-means in which the initial centroids are extracted from the solution of the SDP relaxation. In the branch-and bound procedure, we incorporate instance-level must-link and cannot-link constraints to express knowledge about which data points should or should not be grouped together. We manage to reduce the size of the problem at each level, preserving the structure of the SDP problem itself. To the best of our knowledge, the obtained results show that the approach allows us to successfully solve, for the first time, real-world instances up to 4,000 data points.

SOS-SDP: An Exact Solver for Minimum Sum-of-Squares Clustering / Piccialli, Veronica; Sudoso, Antonio M.; Wiegele, Angelika. - In: INFORMS JOURNAL ON COMPUTING. - ISSN 1091-9856. - 34:4(2022), pp. 2144-2162. [10.1287/ijoc.2022.1166]

SOS-SDP: An Exact Solver for Minimum Sum-of-Squares Clustering

Veronica Piccialli
;
Antonio M. Sudoso
;
2022

Abstract

The minimum sum-of-squares clustering problem (MSSC) consists of partitioning n observations into k clusters in order to minimize the sum of squared distances from the points to the centroid of their cluster. In this paper, we propose an exact algorithm for the MSSC problem based on the branch-and-bound technique. The lower bound is computed by using a cutting-plane procedure in which valid inequalities are iteratively added to the Peng–Wei semidefinite programming (SDP) relaxation. The upper bound is computed with the constrained version of k-means in which the initial centroids are extracted from the solution of the SDP relaxation. In the branch-and bound procedure, we incorporate instance-level must-link and cannot-link constraints to express knowledge about which data points should or should not be grouped together. We manage to reduce the size of the problem at each level, preserving the structure of the SDP problem itself. To the best of our knowledge, the obtained results show that the approach allows us to successfully solve, for the first time, real-world instances up to 4,000 data points.
2022
clustering; semidefinite programming; branch and bound
01 Pubblicazione su rivista::01a Articolo in rivista
SOS-SDP: An Exact Solver for Minimum Sum-of-Squares Clustering / Piccialli, Veronica; Sudoso, Antonio M.; Wiegele, Angelika. - In: INFORMS JOURNAL ON COMPUTING. - ISSN 1091-9856. - 34:4(2022), pp. 2144-2162. [10.1287/ijoc.2022.1166]
File allegati a questo prodotto
File Dimensione Formato  
Piccialli_SOS-SDP_2022.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 993.45 kB
Formato Adobe PDF
993.45 kB Adobe PDF   Contatta l'autore
Piccialli_postprint_SOS-SDP_2021.pdf

accesso aperto

Tipologia: Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review)
Licenza: Creative commons
Dimensione 936.32 kB
Formato Adobe PDF
936.32 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1627485
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 17
social impact