Catalogo dei prodotti della ricerca

Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in A,C,G,Tk occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in the realm of genome assembly. However, they are so specialized to this domain that they do not extend easily to the computation of informational and linguistic indices, concurrently on sets of genomes.

Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms / Ferraro Petrillo, U., Roscigno, G., Cattaneo, G., Giancarlo, R.. - In: BIOINFORMATICS. - ISSN 1367-4803. - STAMPA. - 34:11(2018), pp. 1826-1833. [10.1093/bioinformatics/bty018]

Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms

Ferraro Petrillo, Umberto;Roscigno, Gianluca;Cattaneo, Giuseppe;Giancarlo, Raffaele

2018

Abstract

Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in A,C,G,Tk occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in the realm of genome assembly. However, they are so specialized to this domain that they do not extend easily to the computation of informational and linguistic indices, concurrently on sets of genomes.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2018
			
	Parole chiave
	
				genomic analysis; hadoop; distributed computing
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms / Ferraro Petrillo, U., Roscigno, G., Cattaneo, G., Giancarlo, R.. - In: BIOINFORMATICS. - ISSN 1367-4803. - STAMPA. - 34:11(2018), pp. 1826-1833. [10.1093/bioinformatics/bty018]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Ferraropetrillo_Informational-linguistic-analysis_2018.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 598.14 kB Formato Adobe PDF Contatta l'autore	598.14 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1113477

Citazioni

7

20

16

social impact