Alignment-free algorithms are used in bioinformatics to efficiently evaluate the similarity between pairs of genomic sequences. They work by extracting and aggregating features from the sequences under study and, then, by comparing them using alignment-free functions. When working on large collections of huge sequences, it is possible to improve the performance of these algorithms by executing the extraction and the aggregation steps, in parallel, across the computing nodes of a distributed system. In this work, we address the problem of finding the optimal schedule to use for assigning to computing nodes the features to be aggregated, in order to minimize the maximum aggregation time. For this purpose, we consider one exact mathematical programming approach and two approximated ones, based on the Longest Processing Time heuristic. These have been implemented using the Gurobi solver, and compared to the algorithm used by the Spark distributed computing framework for assigning tasks to computing nodes. The experiments have been performed on some large collections of genomic sequences well-known in literature. The results show that the proposed approaches perform favourably with respect to the scheduling strategy used by Spark, in terms of quality of the solution. In particular the exact approach, run up to the time limit, allows to reduce the makespan up to 77.11%, when considering the largest instance tested. However, the required computational time of the exact approach is not compatible with its online application. The approximated approaches appear more promising in terms of computational time, while providing good quality solutions.
Scheduling K-mers Counting in a Distributed Environment / Amorosi, L.; Di Rocco, L.; Ferraro Petrillo, U.. - (2022), pp. 73-83. (Intervento presentato al convegno International Conference on Optimization and Decision Sciences, ODS 2021 tenutosi a Rome, Italy) [10.1007/978-3-030-95380-5_7].
Scheduling K-mers Counting in a Distributed Environment
Amorosi L.;Di Rocco L.;Ferraro Petrillo U.
2022
Abstract
Alignment-free algorithms are used in bioinformatics to efficiently evaluate the similarity between pairs of genomic sequences. They work by extracting and aggregating features from the sequences under study and, then, by comparing them using alignment-free functions. When working on large collections of huge sequences, it is possible to improve the performance of these algorithms by executing the extraction and the aggregation steps, in parallel, across the computing nodes of a distributed system. In this work, we address the problem of finding the optimal schedule to use for assigning to computing nodes the features to be aggregated, in order to minimize the maximum aggregation time. For this purpose, we consider one exact mathematical programming approach and two approximated ones, based on the Longest Processing Time heuristic. These have been implemented using the Gurobi solver, and compared to the algorithm used by the Spark distributed computing framework for assigning tasks to computing nodes. The experiments have been performed on some large collections of genomic sequences well-known in literature. The results show that the proposed approaches perform favourably with respect to the scheduling strategy used by Spark, in terms of quality of the solution. In particular the exact approach, run up to the time limit, allows to reduce the makespan up to 77.11%, when considering the largest instance tested. However, the required computational time of the exact approach is not compatible with its online application. The approximated approaches appear more promising in terms of computational time, while providing good quality solutions.File | Dimensione | Formato | |
---|---|---|---|
Amorosi_Scheduling-K-mers_2022.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
4.82 MB
Formato
Adobe PDF
|
4.82 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.