Catalogo dei prodotti della ricerca

This study proposes a new method to define the optimal number of groups in cluster analysis, in cases when the clusters’ order is relevant. In this work, the clustering method of k-means is applied to a univariate index, resulting from a Structural Equation Model (SEM). In contrast to the majority of conventional procedures for choosing the number of clusters, the new methodology looks for the greatest number of clearly distinct clusters rather than a more parsimonious one. This method enables the construction of a granular ranking of the units in clusters starting from an index and minimizes the information loss caused by clustering with a low number of groups. Indeed, the classification adds more information to the mere ordering of units: it aids in locating homogeneous groups of elements for which the index value can be considered the same. Namely, it helps in identifying units perceived as similar each other, which should be considered as “ties” in the ranking since they have substantially the same index value. This methodology works well when the goal is to rank units in groups from “the best” to “the worst”, according to a particular measure. The clusters’ number has been chosen, considering the maximum number of significantly different clusters, according to the non-parametric Wilcoxon “rank-sum” test. Since there exists an ordering between clusters, the test compares each cluster with the closest one. An “ad-hoc” algorithm is proposed to define the ideal number of clusters. In this paper, k-means clustering is applied to an index measuring air pollution across European urban areas. A clustering of cities for different air pollution levels is graphically represented. The analysis’ results provide essential information to develop locally tailored policies aimed at the reduction of air pollution in metropolitan areas.

Optimal number of clusters to rank a model-based index / BOTTAZZI SCHENONE, M., Grimaccia, E., Vichi, M.. - (2024). (Conference of European Statistics Stakeholders (CESS) - 2022 Roma ).

Optimal number of clusters to rank a model-based index

Mariaelena Bottazzi Schenone;Elena Grimaccia;Maurizio Vichi

2024

Abstract

This study proposes a new method to define the optimal number of groups in cluster analysis, in cases when the clusters’ order is relevant. In this work, the clustering method of k-means is applied to a univariate index, resulting from a Structural Equation Model (SEM). In contrast to the majority of conventional procedures for choosing the number of clusters, the new methodology looks for the greatest number of clearly distinct clusters rather than a more parsimonious one. This method enables the construction of a granular ranking of the units in clusters starting from an index and minimizes the information loss caused by clustering with a low number of groups. Indeed, the classification adds more information to the mere ordering of units: it aids in locating homogeneous groups of elements for which the index value can be considered the same. Namely, it helps in identifying units perceived as similar each other, which should be considered as “ties” in the ranking since they have substantially the same index value. This methodology works well when the goal is to rank units in groups from “the best” to “the worst”, according to a particular measure. The clusters’ number has been chosen, considering the maximum number of significantly different clusters, according to the non-parametric Wilcoxon “rank-sum” test. Since there exists an ordering between clusters, the test compares each cluster with the closest one. An “ad-hoc” algorithm is proposed to define the ideal number of clusters. In this paper, k-means clustering is applied to an index measuring air pollution across European urban areas. A clustering of cities for different air pollution levels is graphically represented. The analysis’ results provide essential information to develop locally tailored policies aimed at the reduction of air pollution in metropolitan areas.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2024
			
	Nome convegno
	
				Conference of European Statistics Stakeholders (CESS) - 2022
			
	Parole chiave
	
				Clusters ranking; Wilcoxon rank-sum test; multidimensional index; air pollution; metropolitan areas
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Optimal number of clusters to rank a model-based index / BOTTAZZI SCHENONE, M., Grimaccia, E., Vichi, M.. - (2024). (Conference of European Statistics Stakeholders (CESS) - 2022 Roma ).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Optimal number of clusters to rank a model-based index.pdf accesso aperto Note: Articolo Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 717.16 kB Formato Adobe PDF	717.16 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1710416

Citazioni

ND

0

0

social impact