An Extensible, Scalable Spark Platform for Alignment-free Genomic
  Analysis -- Version 2

Ferraro Petrillo, Umberto; Palini, Francesco; Cattaneo, Giuseppe; Giancarlo, Raffaele

Motivation: Alignment-free distance and similarity functions (AF functions, for short) are a computationally convenient alternative to two and multiple sequence alignments for many genomic, metagenomic and epigenomic tasks. Yet, their use is still to the proof of principle stage: only recently a benchmarking study has coherently evaluated a handful of the functions proposed over the years, identifying a pool of well performing ones. However, more is needed to make this pool usable on a day-to-day basis. In particular, a statistical significance quantification associated to the output of a given function would greatly help when no reference point is available. For most functions, such an analysis is bound to be based on Monte Carlo Hypothesis Test simulations, yielding a dramatic increase in computational time that transforms this into a Big Data problem. Surprisingly, it has been hardly considered, despite the increasing popularity of Big Data Technologies in Computational Biology. Results: We fill this important gap by providing the first user-friendly, extensible, efficient Spark platform for Alignment-free genomic analysis. Thanks to its scalability, Monte Carlo Hypothesis Test simulations on the output of AF functions can seamlessly be afforded for either small or huge collections of sequences. Thus, we are able to comparatively study for the first time AF functions in relation to the statistical significance of their output. Such novel analysis allows us to reduce the pool of well performing functions coming from the benchmarking study to a handful of them.

An Extensible, Scalable Spark Platform for Alignment-free Genomic Analysis -- Version 2 / FERRARO PETRILLO, Umberto; Palini, Francesco; Cattaneo, Giuseppe; Giancarlo, Raffaele. - (2020).

An Extensible, Scalable Spark Platform for Alignment-free Genomic Analysis -- Version 2

Umberto Ferraro Petrillo;Francesco Palini;Giuseppe Cattaneo;Raffaele Giancarlo

2020

Abstract

Scheda breve

Scheda completa

Anno di pubblicazione

2020

Appartiene alla tipologia:

13c Pubblicazione su portale

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1413854

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Catalogo dei prodotti della ricerca

An Extensible, Scalable Spark Platform for Alignment-free Genomic Analysis -- Version 2

Umberto Ferraro Petrillo;Francesco Palini;Giuseppe Cattaneo;Raffaele Giancarlo

2020

Abstract

Scheda breve

Scheda completa

Attenzione

Citazioni

social impact

Catalogo dei prodotti della ricerca

An Extensible, Scalable Spark Platform for Alignment-free Genomic Analysis -- Version 2

Umberto Ferraro Petrillo;Francesco Palini;Giuseppe Cattaneo;Raffaele Giancarlo

2020

Abstract

Scheda breve Scheda completa

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa