The minimum sum-of-squares clustering problem (MSSC), also known as k-means clustering, refers to the problem of partitioning n data points into k clusters, with the objective of minimizing the total sum of squared Euclidean distances between each point and the center of its assigned cluster. We propose an efficient algorithm for solving large-scale MSSC instances, which combines column generation (CG) with dynamic constraint aggregation (DCA) to effectively reduce the number of constraints considered in the CG master problem. DCA was originally conceived to reduce degeneracy in set partitioning problems by utilizing an aggregated restricted master problem obtained from a partition of the set partitioning constraints into disjoint clusters. In this work, we explore the use of DCA within a CG algorithm for MSSC exact solution. Our method is fine-tuned by a series of ablation studies on DCA design choices, and is demonstrated to significantly outperform existing state-of-the-art exact approaches available in the literature.

A Column Generation Algorithm with Dynamic Constraint Aggregation for Minimum Sum-of-Squares Clustering / Sudoso, Antonio M.; Aloise, Daniel. - In: INFORMS JOURNAL ON COMPUTING. - ISSN 1091-9856. - (2025). [10.1287/ijoc.2024.0938]

A Column Generation Algorithm with Dynamic Constraint Aggregation for Minimum Sum-of-Squares Clustering

Sudoso, Antonio M.
;
2025

Abstract

The minimum sum-of-squares clustering problem (MSSC), also known as k-means clustering, refers to the problem of partitioning n data points into k clusters, with the objective of minimizing the total sum of squared Euclidean distances between each point and the center of its assigned cluster. We propose an efficient algorithm for solving large-scale MSSC instances, which combines column generation (CG) with dynamic constraint aggregation (DCA) to effectively reduce the number of constraints considered in the CG master problem. DCA was originally conceived to reduce degeneracy in set partitioning problems by utilizing an aggregated restricted master problem obtained from a partition of the set partitioning constraints into disjoint clusters. In this work, we explore the use of DCA within a CG algorithm for MSSC exact solution. Our method is fine-tuned by a series of ablation studies on DCA design choices, and is demonstrated to significantly outperform existing state-of-the-art exact approaches available in the literature.
2025
data clustering; column generation; dynamic constraint aggregation; global optimization
01 Pubblicazione su rivista::01a Articolo in rivista
A Column Generation Algorithm with Dynamic Constraint Aggregation for Minimum Sum-of-Squares Clustering / Sudoso, Antonio M.; Aloise, Daniel. - In: INFORMS JOURNAL ON COMPUTING. - ISSN 1091-9856. - (2025). [10.1287/ijoc.2024.0938]
File allegati a questo prodotto
File Dimensione Formato  
Sudoso_A Column_2025.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.54 MB
Formato Adobe PDF
2.54 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1743290
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact