Clustering is an essential technique across various domains, such as data science, machine learning, and explainable artificial intelligence. Information visualization and visual analytics techniques have been proven to effectively support human involvement in the visual exploration of clustered data to enhance the understanding and refinement of cluster assignments. To support the human involvement, several perceptual studies and visual quality metrics have already been proposed. However, the visual perception of clustering quality metrics, also known as Cluster Validity Indexes (CVIs), still remains to be further explored. This paper presents the first attempt of a deep and exhaustive evaluation of the perceptive aspects of clustering quality metrics, focusing on the Davies-Bouldin Index, Dunn Index, Calinski-Harabasz Index, and Silhouette Score. Our research is centered around two main objectives: a) assessing the human perception of common CVIs in 2D scatterplots and b) exploring the potential of Large Multimodal Models, in particular GPT-4o, to emulate the assessed human perception. To this end, we conducted two systematic data studies and a user study covering a broad collection of datasets. By discussing the obtained results, highlighting limitations, and areas for further exploration, this paper aims to propose a foundation for future research activities.

Towards a Visual Perception-Based Analysis of Clustering Quality Metrics / Blasilli, Graziano; Kerrigan, Daniel; Bertini, Enrico; Santucci, Giuseppe. - (2024), pp. 15-24. (Intervento presentato al convegno IEEE Visualization in Data Science tenutosi a St. Pete Beach (FL); USA) [10.1109/VDS63897.2024.00007].

Towards a Visual Perception-Based Analysis of Clustering Quality Metrics

Graziano Blasilli
;
Giuseppe Santucci
2024

Abstract

Clustering is an essential technique across various domains, such as data science, machine learning, and explainable artificial intelligence. Information visualization and visual analytics techniques have been proven to effectively support human involvement in the visual exploration of clustered data to enhance the understanding and refinement of cluster assignments. To support the human involvement, several perceptual studies and visual quality metrics have already been proposed. However, the visual perception of clustering quality metrics, also known as Cluster Validity Indexes (CVIs), still remains to be further explored. This paper presents the first attempt of a deep and exhaustive evaluation of the perceptive aspects of clustering quality metrics, focusing on the Davies-Bouldin Index, Dunn Index, Calinski-Harabasz Index, and Silhouette Score. Our research is centered around two main objectives: a) assessing the human perception of common CVIs in 2D scatterplots and b) exploring the potential of Large Multimodal Models, in particular GPT-4o, to emulate the assessed human perception. To this end, we conducted two systematic data studies and a user study covering a broad collection of datasets. By discussing the obtained results, highlighting limitations, and areas for further exploration, this paper aims to propose a foundation for future research activities.
2024
IEEE Visualization in Data Science
clustering; clustering quality metrics; clustering validity indexes; visual perception; large multimodal models; LMM
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Towards a Visual Perception-Based Analysis of Clustering Quality Metrics / Blasilli, Graziano; Kerrigan, Daniel; Bertini, Enrico; Santucci, Giuseppe. - (2024), pp. 15-24. (Intervento presentato al convegno IEEE Visualization in Data Science tenutosi a St. Pete Beach (FL); USA) [10.1109/VDS63897.2024.00007].
File allegati a questo prodotto
File Dimensione Formato  
Blasilli_Towards-Visual_2024.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 9.78 MB
Formato Adobe PDF
9.78 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1726497
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact