We propose a robust model for discovering differentially expressed genes which directly incorporates biological significance, i.e., effect dimension. Using the so-called c-fold rule, we transform the expressions into a nominal observed random variable with three categories: below a fixed lower threshold, above a fixed upper threshold or within the two thresholds. Gene expression data is then transformed into a nominal variable with three levels possibly originated by three different distributions corresponding to under expressed, not differential, and over expressed genes. This leads to a statistical model for a 3-component mixture of trinomial distributions with suitable constraints on the parameter space. In order to obtain the MLE estimates, we show how to implement a constrained EM algorithm with a latent label for the corresponding component of each gene. Different strategies for a statistically significant gene discovery are discussed and compared. We illustrate the method on a little simulation study and a real dataset on multiple sclerosis.

A Three Component Latent Class Model for Robust Semiparametric Gene Discovery / Alfo', Marco; Farcomeni, Alessio; Tardella, Luca. - In: STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY. - ISSN 1544-6115. - ELETTRONICO. - 10:1(2011), pp. 1-19. [10.2202/1544-6115.1565]

A Three Component Latent Class Model for Robust Semiparametric Gene Discovery

ALFO', Marco;FARCOMENI, Alessio;TARDELLA, Luca
2011

Abstract

We propose a robust model for discovering differentially expressed genes which directly incorporates biological significance, i.e., effect dimension. Using the so-called c-fold rule, we transform the expressions into a nominal observed random variable with three categories: below a fixed lower threshold, above a fixed upper threshold or within the two thresholds. Gene expression data is then transformed into a nominal variable with three levels possibly originated by three different distributions corresponding to under expressed, not differential, and over expressed genes. This leads to a statistical model for a 3-component mixture of trinomial distributions with suitable constraints on the parameter space. In order to obtain the MLE estimates, we show how to implement a constrained EM algorithm with a latent label for the corresponding component of each gene. Different strategies for a statistically significant gene discovery are discussed and compared. We illustrate the method on a little simulation study and a real dataset on multiple sclerosis.
2011
differentially expressed genes; effect size; mixture model; microarray data
01 Pubblicazione su rivista::01a Articolo in rivista
A Three Component Latent Class Model for Robust Semiparametric Gene Discovery / Alfo', Marco; Farcomeni, Alessio; Tardella, Luca. - In: STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY. - ISSN 1544-6115. - ELETTRONICO. - 10:1(2011), pp. 1-19. [10.2202/1544-6115.1565]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/500222
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
social impact