Biclustering concerns the simultaneous partitioning of units and variables into homogeneous blocks of rows and columns in a data matrix. In detail, this approach is often used to analyze large data matrices in which the relationships between rows and columns can be considered symmetrical. A common area of application concerns the field of genetics, where the biclustering approach can be used to identify groups of genes that are co-expressed under subsets of experimental conditions. A novel model-based biclustering approach for multivariate data is introduced exploiting a finite mixture of generalized latent trait models. The proposed model allows us to cluster units into subsets, called components, via a finite mixture specification. Within each component, subsets of variables, called segments, are identified by a flexible and parsimonious specification of the linear predictor in terms of a row-stochastic vector. The model is designed to handle both qualitative and quantitative variables with (conditional) distribution in the Exponential Family. The integration of a multidimensional, continuous latent trait in the linear predictor allows us to account for the residual dependence between multivariate outcomes from the same unit. In addition, the proposal allows for the inclusion of covariates in the latent layer of the model to determine their impact on component formation. We employ an EM-type algorithm for maximum likelihood estimation of model parameters, together with Gauss Hermite quadrature in order to approximate multidimensional integrals whose closed-form solutions are not available.
Mixtures of Generalized Latent Trait Analyzers for biclustering multivariate data / Failli, Dalila; francesca Marino, Maria; Martella, Francesca. - (2024). (Intervento presentato al convegno StaTalk tenutosi a Firenze).
Mixtures of Generalized Latent Trait Analyzers for biclustering multivariate data
francesca Martella
2024
Abstract
Biclustering concerns the simultaneous partitioning of units and variables into homogeneous blocks of rows and columns in a data matrix. In detail, this approach is often used to analyze large data matrices in which the relationships between rows and columns can be considered symmetrical. A common area of application concerns the field of genetics, where the biclustering approach can be used to identify groups of genes that are co-expressed under subsets of experimental conditions. A novel model-based biclustering approach for multivariate data is introduced exploiting a finite mixture of generalized latent trait models. The proposed model allows us to cluster units into subsets, called components, via a finite mixture specification. Within each component, subsets of variables, called segments, are identified by a flexible and parsimonious specification of the linear predictor in terms of a row-stochastic vector. The model is designed to handle both qualitative and quantitative variables with (conditional) distribution in the Exponential Family. The integration of a multidimensional, continuous latent trait in the linear predictor allows us to account for the residual dependence between multivariate outcomes from the same unit. In addition, the proposal allows for the inclusion of covariates in the latent layer of the model to determine their impact on component formation. We employ an EM-type algorithm for maximum likelihood estimation of model parameters, together with Gauss Hermite quadrature in order to approximate multidimensional integrals whose closed-form solutions are not available.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


