Background: Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of high-throughout experiments, mainly Next Generation Sequencing, for more than 30 cancer types. Results: We propose TCGA2BED a software tool to search and retrieve TCGA data, and convert them in the structured BED format for their seamless use and integration. Additionally, it supports the conversion in CSV, GTF, JSON, and XML standard formats. Furthermore, TCGA2BED extends TCGA data with information extracted from other genomic databases (i.e., NCBI Entrez Gene, HGNC, UCSC, and miRBase). We also provide and maintain an automatically updated data repository with publicly available Copy Number Variation, DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental data of TCGA converted into the BED format, and their associated clinical and biospecimen meta data in attribute-value text format. Conclusions: The availability of the valuable TCGA data in BED format reduces the time spent in taking advantage of them: it is possible to efficiently and effectively deal with huge amounts of cancer genomic data integratively, and to search, retrieve and extend them with additional information. The BED format facilitates the investigators allowing several knowledge discovery analyses on all tumor types in TCGA with the final aim of understanding pathological mechanisms and aiding cancer treatments.

TCGA2BED: Extracting, extending, integrating, and querying The Cancer Genome Atlas / Cumbo, F.; Fiscon, G.; Ceri, S.; Masseroli, M.; Weitschek, E.. - In: BMC BIOINFORMATICS. - ISSN 1471-2105. - 18:1(2017). [10.1186/s12859-016-1419-5]

TCGA2BED: Extracting, extending, integrating, and querying The Cancer Genome Atlas

Fiscon G.
Co-primo
;
2017

Abstract

Background: Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of high-throughout experiments, mainly Next Generation Sequencing, for more than 30 cancer types. Results: We propose TCGA2BED a software tool to search and retrieve TCGA data, and convert them in the structured BED format for their seamless use and integration. Additionally, it supports the conversion in CSV, GTF, JSON, and XML standard formats. Furthermore, TCGA2BED extends TCGA data with information extracted from other genomic databases (i.e., NCBI Entrez Gene, HGNC, UCSC, and miRBase). We also provide and maintain an automatically updated data repository with publicly available Copy Number Variation, DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental data of TCGA converted into the BED format, and their associated clinical and biospecimen meta data in attribute-value text format. Conclusions: The availability of the valuable TCGA data in BED format reduces the time spent in taking advantage of them: it is possible to efficiently and effectively deal with huge amounts of cancer genomic data integratively, and to search, retrieve and extend them with additional information. The BED format facilitates the investigators allowing several knowledge discovery analyses on all tumor types in TCGA with the final aim of understanding pathological mechanisms and aiding cancer treatments.
2017
Cancer; Data extraction; Data integration; Knowledge extraction; DNA Copy Number Variations; DNA Methylation; Databases, Genetic; High-Throughput Nucleotide Sequencing; Humans; Internet; MicroRNAs; Neoplasms; Sequence Analysis, DNA; User-Computer Interface
01 Pubblicazione su rivista::01a Articolo in rivista
TCGA2BED: Extracting, extending, integrating, and querying The Cancer Genome Atlas / Cumbo, F.; Fiscon, G.; Ceri, S.; Masseroli, M.; Weitschek, E.. - In: BMC BIOINFORMATICS. - ISSN 1471-2105. - 18:1(2017). [10.1186/s12859-016-1419-5]
File allegati a questo prodotto
File Dimensione Formato  
Cumbo_TCGA2BED_2017.pdf

accesso aperto

Note: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1419-5
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 1.18 MB
Formato Adobe PDF
1.18 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1610416
Citazioni
  • ???jsp.display-item.citation.pmc??? 8
  • Scopus 32
  • ???jsp.display-item.citation.isi??? 24
social impact