The Cancer Genome Atlas database offers the possibility of analyzing genome-wide expression RNA-Seq cancer data using paired counts, that is, studies where expression data are collected in pairs of normal and cancer cells, by taking samples from the same individual. Correlation of gene expression profiles is the most common analysis to study co-expression groups, which is used to find biological interpretation of -omics big data. The aim of the paper is threefold: firstly we show for the first time, the presence of a “regulation-correlation bias” in RNA-Seq paired expression data, that is an artifactual link between the expression status (up- or down-regulation) of a gene pair and the sign of the corresponding correlation coefficient. Secondly, we provide a statistical model able to theoretically explain the reasons for the presence of such a bias. Thirdly, we present a bias-removal algorithm, called SEaCorAl, able to effectively reduce bias effects and improve the biological significance of correlation analysis. Validation of the SEaCorAl algorithm is performed by showing a significant increase in the ability to detect biologically meaningful associations of positive correlations and a significant increase of the modularity of the resulting unbiased correlation network.

SEaCorAl: Identifying and contrasting the regulation-correlation bias in RNA-Seq paired expression data of patient groups / Petti, M.; Verrienti, A.; Paci, P.; Farina, L.. - In: COMPUTERS IN BIOLOGY AND MEDICINE. - ISSN 0010-4825. - 135:(2021). [10.1016/j.compbiomed.2021.104567]

SEaCorAl: Identifying and contrasting the regulation-correlation bias in RNA-Seq paired expression data of patient groups

Petti M.
Primo
Methodology
;
Verrienti A.
Secondo
Supervision
;
Paci P.
Penultimo
Supervision
;
Farina L.
Ultimo
Methodology
2021

Abstract

The Cancer Genome Atlas database offers the possibility of analyzing genome-wide expression RNA-Seq cancer data using paired counts, that is, studies where expression data are collected in pairs of normal and cancer cells, by taking samples from the same individual. Correlation of gene expression profiles is the most common analysis to study co-expression groups, which is used to find biological interpretation of -omics big data. The aim of the paper is threefold: firstly we show for the first time, the presence of a “regulation-correlation bias” in RNA-Seq paired expression data, that is an artifactual link between the expression status (up- or down-regulation) of a gene pair and the sign of the corresponding correlation coefficient. Secondly, we provide a statistical model able to theoretically explain the reasons for the presence of such a bias. Thirdly, we present a bias-removal algorithm, called SEaCorAl, able to effectively reduce bias effects and improve the biological significance of correlation analysis. Validation of the SEaCorAl algorithm is performed by showing a significant increase in the ability to detect biologically meaningful associations of positive correlations and a significant increase of the modularity of the resulting unbiased correlation network.
2021
Correlation analysis; Correlation networks; Paired data; RNA-Seq data; Spurious correlations
01 Pubblicazione su rivista::01a Articolo in rivista
SEaCorAl: Identifying and contrasting the regulation-correlation bias in RNA-Seq paired expression data of patient groups / Petti, M.; Verrienti, A.; Paci, P.; Farina, L.. - In: COMPUTERS IN BIOLOGY AND MEDICINE. - ISSN 0010-4825. - 135:(2021). [10.1016/j.compbiomed.2021.104567]
File allegati a questo prodotto
File Dimensione Formato  
Petti_SEaCorAl_2021.pdf

accesso aperto

Note: https://doi.org/10.1016/j.compbiomed.2021.104567
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 7.92 MB
Formato Adobe PDF
7.92 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1557582
Citazioni
  • ???jsp.display-item.citation.pmc??? 2
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 4
social impact