A challenge in microarray data analysis concerns discovering local structures composed by sets of genes that show homogeneous expression patterns across subsets of conditions. We present an extension of the mixture of factor analyzers model (MFA) allowing for simultaneous clustering of genes and conditions. The proposed model is rather flexible since it models the density of high- dimensional data assuming a mixture of Gaussian distributions with a particular omponent- specific covariance structure. Specifically, a binary and row stochastic matrix representing tissue membership is used to cluster tissues (experimental conditions), whereas the traditional mixture approach is used to define the gene clustering. An alternating expectation conditional maximization (AECM) algorithm is proposed for parameter estimation; experiments on simulated and real data show the efficiency of our method as a general approach to biclustering. The Matlab code of the algorithm is available upon request from authors.

Biclustering of Gene Expression Data by an Extension of Mixtures of Factor Analyzers / Martella, Francesca; Alfo', Marco; Vichi, Maurizio. - In: THE INTERNATIONAL JOURNAL OF BIOSTATISTICS. - ISSN 1557-4679. - STAMPA. - 4(1):(2008), pp. 1-19. [10.2202/1557-4679.1078]

Biclustering of Gene Expression Data by an Extension of Mixtures of Factor Analyzers

MARTELLA, Francesca;ALFO', Marco;VICHI, Maurizio
2008

Abstract

A challenge in microarray data analysis concerns discovering local structures composed by sets of genes that show homogeneous expression patterns across subsets of conditions. We present an extension of the mixture of factor analyzers model (MFA) allowing for simultaneous clustering of genes and conditions. The proposed model is rather flexible since it models the density of high- dimensional data assuming a mixture of Gaussian distributions with a particular omponent- specific covariance structure. Specifically, a binary and row stochastic matrix representing tissue membership is used to cluster tissues (experimental conditions), whereas the traditional mixture approach is used to define the gene clustering. An alternating expectation conditional maximization (AECM) algorithm is proposed for parameter estimation; experiments on simulated and real data show the efficiency of our method as a general approach to biclustering. The Matlab code of the algorithm is available upon request from authors.
2008
mixture of factor analyzers, biclustering, microarray data
01 Pubblicazione su rivista::01a Articolo in rivista
Biclustering of Gene Expression Data by an Extension of Mixtures of Factor Analyzers / Martella, Francesca; Alfo', Marco; Vichi, Maurizio. - In: THE INTERNATIONAL JOURNAL OF BIOSTATISTICS. - ISSN 1557-4679. - STAMPA. - 4(1):(2008), pp. 1-19. [10.2202/1557-4679.1078]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/133539
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? ND
social impact