The literature on clustering for continuous data is rich and wide; differently, that one developed for categorical data is still limited. In some cases, the clustering problem is made more difficult by the presence of noise variables/dimensions that do not contain information about the clustering structure and could mask it. The aim of this paper is to propose a model for simultaneous clustering and dimensionality reduction of ordered categorical data able to detect the discriminative dimensions discarding the noise ones. Following the underlying response variable approach, the observed variables are considered as a discretization of underlying first-order latent continuous variables distributed as a Gaussian mixture. To recognize discriminative and noise dimensions, these variables are considered to be linear combinations of two independent sets of second-order latent variables where only one contains the information about the cluster structure while the other one contains noise dimensions. The model specification involves multidimensional integrals that make the maximum likelihood estimation cumbersome and in some cases infeasible. To overcome this issue, the parameter estimation is carried out through an EM-like algorithm maximizing a composite log-likelihood based on low-dimensional margins. Examples of application of the proposal on real and simulated data are performed to show the effectiveness of the proposal.

A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data / Ranalli, M.; Rocci, R.. - In: PSYCHOMETRIKA. - ISSN 0033-3123. - 82:4(2017), pp. 1007-1034. [10.1007/s11336-017-9578-5]

A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data

Ranalli M.
;
Rocci R.
2017

Abstract

The literature on clustering for continuous data is rich and wide; differently, that one developed for categorical data is still limited. In some cases, the clustering problem is made more difficult by the presence of noise variables/dimensions that do not contain information about the clustering structure and could mask it. The aim of this paper is to propose a model for simultaneous clustering and dimensionality reduction of ordered categorical data able to detect the discriminative dimensions discarding the noise ones. Following the underlying response variable approach, the observed variables are considered as a discretization of underlying first-order latent continuous variables distributed as a Gaussian mixture. To recognize discriminative and noise dimensions, these variables are considered to be linear combinations of two independent sets of second-order latent variables where only one contains the information about the cluster structure while the other one contains noise dimensions. The model specification involves multidimensional integrals that make the maximum likelihood estimation cumbersome and in some cases infeasible. To overcome this issue, the parameter estimation is carried out through an EM-like algorithm maximizing a composite log-likelihood based on low-dimensional margins. Examples of application of the proposal on real and simulated data are performed to show the effectiveness of the proposal.
2017
composite likelihood; mixture models; reduction ordinal data; Algorithms; Computer Simulation; Educational Status; Happiness; Humans; Siblings; Cluster Analysis; Data Interpretation, Statistical; Models, Statistical
01 Pubblicazione su rivista::01a Articolo in rivista
A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data / Ranalli, M.; Rocci, R.. - In: PSYCHOMETRIKA. - ISSN 0033-3123. - 82:4(2017), pp. 1007-1034. [10.1007/s11336-017-9578-5]
File allegati a questo prodotto
File Dimensione Formato  
Ranalli_Model-based-approach_2017.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.39 MB
Formato Adobe PDF
1.39 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1347497
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
social impact