Catalogo dei prodotti della ricerca

Alignment-free algorithms are used in bioinformatics to efficiently evaluate the similarity between pairs of genomic sequences. They work by extracting and aggregating features from the sequences under study and, then, by comparing them using alignment-free functions. When working on large collections of huge sequences, it is possible to improve the performance of these algorithms by executing the extraction and the aggregation steps, in parallel, across the computing nodes of a distributed system. In this work, we address the problem of finding the optimal schedule to use for assigning to computing nodes the features to be aggregated, in order to minimize the maximum aggregation time. For this purpose, we consider one exact mathematical programming approach and two approximated ones, based on the Longest Processing Time heuristic. These have been implemented using the Gurobi solver, and compared to the algorithm used by the Spark distributed computing framework for assigning tasks to computing nodes. The experiments have been performed on some large collections of genomic sequences well-known in literature. The results show that the proposed approaches perform favourably with respect to the scheduling strategy used by Spark, in terms of quality of the solution. In particular the exact approach, run up to the time limit, allows to reduce the makespan up to 77.11%, when considering the largest instance tested. However, the required computational time of the exact approach is not compatible with its online application. The approximated approaches appear more promising in terms of computational time, while providing good quality solutions.

Scheduling K-mers Counting in a Distributed Environment / Amorosi, L.; Di Rocco, L.; Ferraro Petrillo, U.. - (2022), pp. 73-83. ( International Conference on Optimization and Decision Sciences, ODS 2021 Rome, Italy ) [10.1007/978-3-030-95380-5_7].

Scheduling K-mers Counting in a Distributed Environment

Amorosi L.;Di Rocco L.;Ferraro Petrillo U.

2022

Abstract

Alignment-free algorithms are used in bioinformatics to efficiently evaluate the similarity between pairs of genomic sequences. They work by extracting and aggregating features from the sequences under study and, then, by comparing them using alignment-free functions. When working on large collections of huge sequences, it is possible to improve the performance of these algorithms by executing the extraction and the aggregation steps, in parallel, across the computing nodes of a distributed system. In this work, we address the problem of finding the optimal schedule to use for assigning to computing nodes the features to be aggregated, in order to minimize the maximum aggregation time. For this purpose, we consider one exact mathematical programming approach and two approximated ones, based on the Longest Processing Time heuristic. These have been implemented using the Gurobi solver, and compared to the algorithm used by the Spark distributed computing framework for assigning tasks to computing nodes. The experiments have been performed on some large collections of genomic sequences well-known in literature. The results show that the proposed approaches perform favourably with respect to the scheduling strategy used by Spark, in terms of quality of the solution. In particular the exact approach, run up to the time limit, allows to reduce the makespan up to 77.11%, when considering the largest instance tested. However, the required computational time of the exact approach is not compatible with its online application. The approximated approaches appear more promising in terms of computational time, while providing good quality solutions.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2022
			
	Nome convegno
	
				International Conference on Optimization and Decision Sciences, ODS 2021
			
	Parole chiave
	
				Identical parallel machines; Job scheduling; Genomic analysis
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Scheduling K-mers Counting in a Distributed Environment / Amorosi, L.; Di Rocco, L.; Ferraro Petrillo, U.. - (2022), pp. 73-83. ( International Conference on Optimization and Decision Sciences, ODS 2021 Rome, Italy ) [10.1007/978-3-030-95380-5_7].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Amorosi_Scheduling-K-mers_2022.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 4.82 MB Formato Adobe PDF Contatta l'autore	4.82 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1644217

Citazioni

ND

2

3

social impact