This paper proposes an innovative method to determine the optimal ranking of a set of univariate units in the maximum number of clusters with sortable centroids. Units within the identified clusters are considered equivalent, while units between clusters show a significant difference in terms of the variable in study. By means of bootstrap estimates of clusters’ centroids, the proposed procedure allows to identify the optimal number of “well-separated” classes, adding on the deterministic results. Moreover, the bootstrap estimates of units’ membership matrices allow us to define an optimal ranking of these units within the identified clusters: the obtained clusters are ranked so that units within each cluster are represented by the rank of the cluster they belong to. Centroids and membership matrices are obtained by applying a specialized K-means clustering on one dimensional data. This methodology is particularly useful in a framework where the aim is to rank units in equivalence classes in a univariate setting. The performance of the presented methodology is evaluated through a simulation study and compared with some widely used techniques to choose the number of clusters and with Gaussian mixture models. Moreover, two real data applications provide insights on the rank of European cities according to their air pollution level and on the rank of National Basketball Association players in terms of their on-court performance. A graphic visualization of the obtained ranking allows to immediately appreciate both the resulting partition of units into equivalence classes and its stability measurement.

A novel clustering method with maximum number of ordered centroids and stable clusters for optimal ranking in a univariate setting / Bottazzi Schenone, Mariaelena; Grimaccia, Elena; Vichi, Maurizio. - In: STATISTICAL METHODS & APPLICATIONS. - ISSN 1618-2510. - (2025). [10.1007/s10260-025-00803-2]

A novel clustering method with maximum number of ordered centroids and stable clusters for optimal ranking in a univariate setting

Mariaelena Bottazzi Schenone
;
Elena Grimaccia;Maurizio Vichi
2025

Abstract

This paper proposes an innovative method to determine the optimal ranking of a set of univariate units in the maximum number of clusters with sortable centroids. Units within the identified clusters are considered equivalent, while units between clusters show a significant difference in terms of the variable in study. By means of bootstrap estimates of clusters’ centroids, the proposed procedure allows to identify the optimal number of “well-separated” classes, adding on the deterministic results. Moreover, the bootstrap estimates of units’ membership matrices allow us to define an optimal ranking of these units within the identified clusters: the obtained clusters are ranked so that units within each cluster are represented by the rank of the cluster they belong to. Centroids and membership matrices are obtained by applying a specialized K-means clustering on one dimensional data. This methodology is particularly useful in a framework where the aim is to rank units in equivalence classes in a univariate setting. The performance of the presented methodology is evaluated through a simulation study and compared with some widely used techniques to choose the number of clusters and with Gaussian mixture models. Moreover, two real data applications provide insights on the rank of European cities according to their air pollution level and on the rank of National Basketball Association players in terms of their on-court performance. A graphic visualization of the obtained ranking allows to immediately appreciate both the resulting partition of units into equivalence classes and its stability measurement.
2025
one-dimensional data clustering; ranking in equivalence classes; optimal number of clusters; bootstrap; k-means clustering
01 Pubblicazione su rivista::01a Articolo in rivista
A novel clustering method with maximum number of ordered centroids and stable clusters for optimal ranking in a univariate setting / Bottazzi Schenone, Mariaelena; Grimaccia, Elena; Vichi, Maurizio. - In: STATISTICAL METHODS & APPLICATIONS. - ISSN 1618-2510. - (2025). [10.1007/s10260-025-00803-2]
File allegati a questo prodotto
File Dimensione Formato  
s10260-025-00803-2.pdf

accesso aperto

Note: Full article
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.92 MB
Formato Adobe PDF
1.92 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1744954
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact