Catalogo dei prodotti della ricerca

Alignment-free methods are one of the mainstays of biological sequence comparison, i.e., the assessment of how similar two biological sequences are to each other, a fundamental and routine task in computational biology and bioinformatics. They have gained popularity since, even on standard desktop machines, they are faster than methods based on alignments. However, with the advent of Next-Generation Sequencing Technologies, datasets whose size, i.e., number of sequences and their total length, is a challenge to the execution of alignment-free methods on those standard machines are quite common. Here, we propose the first paradigm for the computation of k-mer-based alignment-free methods for Apache Hadoop that extends the problem sizes that can be processed with respect to a standard sequential machine while alsogranting a good time performance. Technically, as opposed to a standard Hadoop implementation, its effectiveness is achieved thanks to the incremental management of a persistent hash table during the map phase, a task not contemplated by the basic Hadoop functions and that can be useful also in other contexts.

An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop / Cattaneo, Giuseppe; FERRARO PETRILLO, Umberto; Giancarlo, Raffaele; Roscigno, Gianluca. - In: THE JOURNAL OF SUPERCOMPUTING. - ISSN 0920-8542. - STAMPA. - 73:4(2017), pp. 1467-1483. [10.1007/s11227-016-1835-3]

An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop

Cattaneo, Giuseppe;FERRARO PETRILLO, UMBERTO;Giancarlo, Raffaele;Roscigno, Gianluca

2017

Abstract

Alignment-free methods are one of the mainstays of biological sequence comparison, i.e., the assessment of how similar two biological sequences are to each other, a fundamental and routine task in computational biology and bioinformatics. They have gained popularity since, even on standard desktop machines, they are faster than methods based on alignments. However, with the advent of Next-Generation Sequencing Technologies, datasets whose size, i.e., number of sequences and their total length, is a challenge to the execution of alignment-free methods on those standard machines are quite common. Here, we propose the first paradigm for the computation of k-mer-based alignment-free methods for Apache Hadoop that extends the problem sizes that can be processed with respect to a standard sequential machine while alsogranting a good time performance. Technically, as opposed to a standard Hadoop implementation, its effectiveness is achieved thanks to the incremental management of a persistent hash table during the map phase, a task not contemplated by the basic Hadoop functions and that can be useful also in other contexts.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2017
			
	Parole chiave
	
				alignment-free sequence comparison and analysis; distributed computing; Hadoop; MapReduce; software; theoretical computer science; information systems; hardware and architecture
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop / Cattaneo, Giuseppe; FERRARO PETRILLO, Umberto; Giancarlo, Raffaele; Roscigno, Gianluca. - In: THE JOURNAL OF SUPERCOMPUTING. - ISSN 0920-8542. - STAMPA. - 73:4(2017), pp. 1467-1483. [10.1007/s11227-016-1835-3]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Cattaneo_effective-extension_2016.pdf solo gestori archivio Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.06 MB Formato Adobe PDF Contatta l'autore	1.06 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/878527

Citazioni

ND

27

20

social impact