A Three Component Latent Class Model for Robust Semiparametric Gene Discovery

Alfo', Marco; Farcomeni, Alessio; Tardella, Luca

doi:10.2202/1544-6115.1565

We propose a robust model for discovering differentially expressed genes which directly incorporates biological significance, i.e., effect dimension. Using the so-called c-fold rule, we transform the expressions into a nominal observed random variable with three categories: below a fixed lower threshold, above a fixed upper threshold or within the two thresholds. Gene expression data is then transformed into a nominal variable with three levels possibly originated by three different distributions corresponding to under expressed, not differential, and over expressed genes. This leads to a statistical model for a 3-component mixture of trinomial distributions with suitable constraints on the parameter space. In order to obtain the MLE estimates, we show how to implement a constrained EM algorithm with a latent label for the corresponding component of each gene. Different strategies for a statistically significant gene discovery are discussed and compared. We illustrate the method on a little simulation study and a real dataset on multiple sclerosis.

A Three Component Latent Class Model for Robust Semiparametric Gene Discovery / Alfo', M., Farcomeni, A., Tardella, L.. - In: STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY. - ISSN 1544-6115. - ELETTRONICO. - 10:1(2011), pp. 1-19. [10.2202/1544-6115.1565]

A Three Component Latent Class Model for Robust Semiparametric Gene Discovery

ALFO', Marco;FARCOMENI, Alessio;TARDELLA, Luca

2011

Abstract

We propose a robust model for discovering differentially expressed genes which directly incorporates biological significance, i.e., effect dimension. Using the so-called c-fold rule, we transform the expressions into a nominal observed random variable with three categories: below a fixed lower threshold, above a fixed upper threshold or within the two thresholds. Gene expression data is then transformed into a nominal variable with three levels possibly originated by three different distributions corresponding to under expressed, not differential, and over expressed genes. This leads to a statistical model for a 3-component mixture of trinomial distributions with suitable constraints on the parameter space. In order to obtain the MLE estimates, we show how to implement a constrained EM algorithm with a latent label for the corresponding component of each gene. Different strategies for a statistically significant gene discovery are discussed and compared. We illustrate the method on a little simulation study and a real dataset on multiple sclerosis.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2011
			
	Parole chiave
	
				differentially expressed genes; effect size; mixture model; microarray data
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				A Three Component Latent Class Model for Robust Semiparametric Gene Discovery / Alfo', M., Farcomeni, A., Tardella, L.. - In: STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY. - ISSN 1544-6115. - ELETTRONICO. - 10:1(2011), pp. 1-19. [10.2202/1544-6115.1565]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/500222

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

1

3

1

Catalogo dei prodotti della ricerca