In this paper, we compare through a simulation study two approaches to cluster mixed-type data, where some variables are continuous and some others ordinal. The first is model-based, according to which the variables are assumed to follow a Gaussian mixture model, where, as regards the ordinal variables, it is only partially observed. In order to overcome computational issues, the parameter estimation is carried out through an EM-like algorithm maximizing a composite log-likelihood based on low-dimensional margins. In the second approach, the Gower distance matrix is computed, then the PAM algorithm is used for clustering.

A Comparison Between Methods to Cluster Mixed-Type Data: Gaussian Mixtures Versus Gower Distance / Ranalli, M.; Rocci, R.. - (2021), pp. 163-172. - STUDIES IN CLASSIFICATION, DATA ANALYSIS, AND KNOWLEDGE ORGANIZATION. [10.1007/978-3-030-69944-4_17].

A Comparison Between Methods to Cluster Mixed-Type Data: Gaussian Mixtures Versus Gower Distance

Ranalli M.;Rocci R.
2021

Abstract

In this paper, we compare through a simulation study two approaches to cluster mixed-type data, where some variables are continuous and some others ordinal. The first is model-based, according to which the variables are assumed to follow a Gaussian mixture model, where, as regards the ordinal variables, it is only partially observed. In order to overcome computational issues, the parameter estimation is carried out through an EM-like algorithm maximizing a composite log-likelihood based on low-dimensional margins. In the second approach, the Gower distance matrix is computed, then the PAM algorithm is used for clustering.
2021
Studies in Classification, Data Analysis, and Knowledge Organization
978-3-030-69943-7
978-3-030-69944-4
composite likelihood; EM algorithm; Gower’s distance; mixed-type data; mixture models; PAM algorithm
02 Pubblicazione su volume::02a Capitolo o Articolo
A Comparison Between Methods to Cluster Mixed-Type Data: Gaussian Mixtures Versus Gower Distance / Ranalli, M.; Rocci, R.. - (2021), pp. 163-172. - STUDIES IN CLASSIFICATION, DATA ANALYSIS, AND KNOWLEDGE ORGANIZATION. [10.1007/978-3-030-69944-4_17].
File allegati a questo prodotto
File Dimensione Formato  
Ranalli_Comparison-between-methods_2021.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 184.91 kB
Formato Adobe PDF
184.91 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1603300
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact