A challenge in microarray data analysis concerns discovering local structures composed by sets of genes that show homogeneous expression patterns across subsets of conditions. We present an extension of the mixture of factor analyzers model (MFA) allowing for simultaneous clustering of genes and conditions. The proposed model is rather flexible since it models the density of high- dimensional data assuming a mixture of Gaussian distributions with a particular omponent- specific covariance structure. Specifically, a binary and row stochastic matrix representing tissue membership is used to cluster tissues (experimental conditions), whereas the traditional mixture approach is used to define the gene clustering. An alternating expectation conditional maximization (AECM) algorithm is proposed for parameter estimation; experiments on simulated and real data show the efficiency of our method as a general approach to biclustering. The Matlab code of the algorithm is available upon request from authors.
Biclustering of Gene Expression Data by an Extension of Mixtures of Factor Analyzers / Martella, Francesca; Alfo', Marco; Vichi, Maurizio. - In: THE INTERNATIONAL JOURNAL OF BIOSTATISTICS. - ISSN 1557-4679. - STAMPA. - 4(1):(2008), pp. 1-19. [10.2202/1557-4679.1078]
Biclustering of Gene Expression Data by an Extension of Mixtures of Factor Analyzers
MARTELLA, Francesca;ALFO', Marco;VICHI, Maurizio
2008
Abstract
A challenge in microarray data analysis concerns discovering local structures composed by sets of genes that show homogeneous expression patterns across subsets of conditions. We present an extension of the mixture of factor analyzers model (MFA) allowing for simultaneous clustering of genes and conditions. The proposed model is rather flexible since it models the density of high- dimensional data assuming a mixture of Gaussian distributions with a particular omponent- specific covariance structure. Specifically, a binary and row stochastic matrix representing tissue membership is used to cluster tissues (experimental conditions), whereas the traditional mixture approach is used to define the gene clustering. An alternating expectation conditional maximization (AECM) algorithm is proposed for parameter estimation; experiments on simulated and real data show the efficiency of our method as a general approach to biclustering. The Matlab code of the algorithm is available upon request from authors.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.