Catalogo dei prodotti della ricerca

Identification of known genetic traits and disease-related variants within an individual requires a fundamental task: genotyping a set of variants from a database. However, the efficiency of this process is challenged by the growing volume of sequencing data and variant databases. At such scale, even the fastest genotyping tool available can deliver a result in a time that is unacceptable. To address this issue, we present SparkGeno, the first known distributed alignment-free pipeline for genotyping the particular case of Single Nucleotide Polymorphisms (SNPs). Building upon a distributed reformulation of traditional alignment-free genotyping pipelines, and using the Apache Spark framework, we introduce several optimizations to further enhance the performance of our code in a distributed environment. Our pipeline comes in two versions that employ different data structures, making them suitable for processing datasets featuring different numbers of SNPs. Moreover, we present the results of an experimental analysis on widely studied datasets to assess how relying on distributed computing allows for a fast, accurate and scalable solution for large-scale genotyping. Finally, we also report the results of an additional experiment for validating the effectiveness of the signature-based approach we used to perform genotyping. Our results show that SparkGeno, when run on a distributed system, is able to genotype variants from whole-genome sequencing data orders of growth faster than existing tools, in a scalable manner in terms of the number of the available computational units. This makes SparkGeno a promising solution for large-scale genotyping applications, such as precision medicine and population-scale studies.

A Distributed Alignment-free Pipeline for Human SNPs Genotyping / Di Rocco, L., Ferraro Petrillo, U.. - (2023), pp. 1-8. (14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics Houston ) [10.1145/3584371.3612990].

A Distributed Alignment-free Pipeline for Human SNPs Genotyping

Di Rocco L.;Ferraro Petrillo U.

2023

Abstract

Identification of known genetic traits and disease-related variants within an individual requires a fundamental task: genotyping a set of variants from a database. However, the efficiency of this process is challenged by the growing volume of sequencing data and variant databases. At such scale, even the fastest genotyping tool available can deliver a result in a time that is unacceptable. To address this issue, we present SparkGeno, the first known distributed alignment-free pipeline for genotyping the particular case of Single Nucleotide Polymorphisms (SNPs). Building upon a distributed reformulation of traditional alignment-free genotyping pipelines, and using the Apache Spark framework, we introduce several optimizations to further enhance the performance of our code in a distributed environment. Our pipeline comes in two versions that employ different data structures, making them suitable for processing datasets featuring different numbers of SNPs. Moreover, we present the results of an experimental analysis on widely studied datasets to assess how relying on distributed computing allows for a fast, accurate and scalable solution for large-scale genotyping. Finally, we also report the results of an additional experiment for validating the effectiveness of the signature-based approach we used to perform genotyping. Our results show that SparkGeno, when run on a distributed system, is able to genotype variants from whole-genome sequencing data orders of growth faster than existing tools, in a scalable manner in terms of the number of the available computational units. This makes SparkGeno a promising solution for large-scale genotyping applications, such as precision medicine and population-scale studies.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Nome convegno
	
				14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
			
	Parole chiave
	
				snp detection; distributed computing; genotyping
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				A Distributed Alignment-free Pipeline for Human SNPs Genotyping / Di Rocco, L., Ferraro Petrillo, U.. - (2023), pp. 1-8. (14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics Houston ) [10.1145/3584371.3612990].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Ferraro Petrillo_distibuted-alignment-free_2024.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 747.79 kB Formato Adobe PDF	747.79 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1692306

Citazioni

ND

3

ND

social impact