Twelve parsimonious models for clustering mixed-type (ordinal and continuous) data are proposed. Ordinal and continuous data are assumed to follow a multivariate finite mixture of Gaussians. Two main closely related issues should be faced with when the dimensionality of the data increases: the number of parameters increases exponentially; a large number of ordinal variables makes the full maximum likelihood estimation infeasible. To solve the first issue, the model should be more parsimonious in terms of the number of parameters to estimate. At this aim, a general class of eight parsimonious mixture models for mixed-type data are defined by imposing a factor decomposition on component-specific covariance matrices. The loadings and variances of error terms of the factor model may be constrained to be equal or unequal across mixture components. To add some extra flexibility to maintain a certain degree of parsimony, four further models are defined, where the latent factors in each cluster are the same but with different variances. A nice feature of these semi-constrained models is that, under mild conditions, the factors are unique. In other terms, it is impossible to rotate the factors as in the classical factor analysis model. To solve the second issue, a composite likelihood approach is adopted. Estimates computation is carried out using an EM-type algorithm based on composite likelihood. The proposal is evaluated through a simulation study and an application to real data.
Parsimonious and semi-constrained models for clustering mixed-type data through a composite likelihood approach / Ranalli, Monia; Rocci, Roberto. - (2023), pp. 116-116. (Intervento presentato al convegno Econometrics and Statistics (EcoSta 2023) tenutosi a Tokyo, Japan).
Parsimonious and semi-constrained models for clustering mixed-type data through a composite likelihood approach
Monia Ranalli
;Roberto RocciSecondo
2023
Abstract
Twelve parsimonious models for clustering mixed-type (ordinal and continuous) data are proposed. Ordinal and continuous data are assumed to follow a multivariate finite mixture of Gaussians. Two main closely related issues should be faced with when the dimensionality of the data increases: the number of parameters increases exponentially; a large number of ordinal variables makes the full maximum likelihood estimation infeasible. To solve the first issue, the model should be more parsimonious in terms of the number of parameters to estimate. At this aim, a general class of eight parsimonious mixture models for mixed-type data are defined by imposing a factor decomposition on component-specific covariance matrices. The loadings and variances of error terms of the factor model may be constrained to be equal or unequal across mixture components. To add some extra flexibility to maintain a certain degree of parsimony, four further models are defined, where the latent factors in each cluster are the same but with different variances. A nice feature of these semi-constrained models is that, under mild conditions, the factors are unique. In other terms, it is impossible to rotate the factors as in the classical factor analysis model. To solve the second issue, a composite likelihood approach is adopted. Estimates computation is carried out using an EM-type algorithm based on composite likelihood. The proposal is evaluated through a simulation study and an application to real data.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.