A finite mixture model to simultaneously cluster the rows and columns of a two-mode ordinal data matrix is proposed. Following the Underlying Response Variable (URV) approach, the observed variables are considered to be a discretization of latent continuous variables distributed as a mixture of Gaussians. To introduce a partition of the P variables within the g-th component of the mixture, we adopt a factorial representation of the data where a binary row stochastic matrix, representing variable membership, is used to cluster variables. In this way, we associate a component in the finite mixture with a cluster of variables and define a bicluster of units and variables. The number of clusters of variables (and therefore the partition of variables) may vary with clusters of units. Due to the numerical intractability of the likelihood function, the estimation of model parameters is based on composite likelihood (CL) methods. It essentially reduces to a computationally efficient Expectation-Maximization type algorithm. The performance of the proposed approach is discussed in both simulated and real datasets.
Biclustering of ordinal data through a composite likelihood approach / Ranalli, Monia; Martella, Francesca. - (2024). (Intervento presentato al convegno 26th International Conference on Computational Statistics tenutosi a Giessen, Germany).
Biclustering of ordinal data through a composite likelihood approach.
monia ranalli
;francesca martella
2024
Abstract
A finite mixture model to simultaneously cluster the rows and columns of a two-mode ordinal data matrix is proposed. Following the Underlying Response Variable (URV) approach, the observed variables are considered to be a discretization of latent continuous variables distributed as a mixture of Gaussians. To introduce a partition of the P variables within the g-th component of the mixture, we adopt a factorial representation of the data where a binary row stochastic matrix, representing variable membership, is used to cluster variables. In this way, we associate a component in the finite mixture with a cluster of variables and define a bicluster of units and variables. The number of clusters of variables (and therefore the partition of variables) may vary with clusters of units. Due to the numerical intractability of the likelihood function, the estimation of model parameters is based on composite likelihood (CL) methods. It essentially reduces to a computationally efficient Expectation-Maximization type algorithm. The performance of the proposed approach is discussed in both simulated and real datasets.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.