Background: Due to the overwhelming increase in multi-locus DNA barcode data provided by taxonomists and field scientists, sequence analysis techniques have been widely developed to effectively compare multi-locus sequences. On the one hand, traditional alignment-based methods are time-consuming and cannot be used for the analysis of non-alignable, multi-locus sequences. On the other hand, alignment-free algorithms allow for the establishment of similarity between biological sequences based on the counts of fixed-length substrings (k-mers) and have proved successful in many applications, including multi-locus DNA barcode analysis. Alignment-free algorithms rely on counting and comparing the frequency of all the distinct k-mers that occur in the considered sequences. Results: Here, we present LAF (Logic Alignment Free), a method that combines alignment-free techniques and rule-based classifiers in order to assign multi-locus DNA barcode sequences to their corresponding species. LAF looks for a minimal subset of k-mers whose relative frequencies are used to build the classification models as disjunctive-normal-form logic formulas (“if-then rules”, e.g., “if the frequency of AACT>0.03, then the species of the sequence is Mycena pura”). Significance: We successfully applied LAF to the classification of DNA barcode sequences belonging to the plant and fungus kingdoms. In particular, focusing our analysis on multi-locus barcode samples, we succeeded in obtaining reliable classification performances at different taxonomic levels by extracting a handful of rules.

Classifying DNA barcode multi-locus sequences with feature vectors and supervised approaches / Weitschek, Emanuel; Fiscon, Giulia; Bertolazzi, Paola; Felici, Giovanni. - In: GENOME. - ISSN 1480-3321. - 58:(2015), pp. 163-303. ((Intervento presentato al convegno 6th International Barcode of Life Conference / Résumés scientifiques du 6(e) congrès international « Barcode of Life. tenutosi a University of Guelph, canada [10.1139/gen-2015-0087].

Classifying DNA barcode multi-locus sequences with feature vectors and supervised approaches

Giulia Fiscon
Secondo
;
Giovanni Felici
Ultimo
2015

Abstract

Background: Due to the overwhelming increase in multi-locus DNA barcode data provided by taxonomists and field scientists, sequence analysis techniques have been widely developed to effectively compare multi-locus sequences. On the one hand, traditional alignment-based methods are time-consuming and cannot be used for the analysis of non-alignable, multi-locus sequences. On the other hand, alignment-free algorithms allow for the establishment of similarity between biological sequences based on the counts of fixed-length substrings (k-mers) and have proved successful in many applications, including multi-locus DNA barcode analysis. Alignment-free algorithms rely on counting and comparing the frequency of all the distinct k-mers that occur in the considered sequences. Results: Here, we present LAF (Logic Alignment Free), a method that combines alignment-free techniques and rule-based classifiers in order to assign multi-locus DNA barcode sequences to their corresponding species. LAF looks for a minimal subset of k-mers whose relative frequencies are used to build the classification models as disjunctive-normal-form logic formulas (“if-then rules”, e.g., “if the frequency of AACT>0.03, then the species of the sequence is Mycena pura”). Significance: We successfully applied LAF to the classification of DNA barcode sequences belonging to the plant and fungus kingdoms. In particular, focusing our analysis on multi-locus barcode samples, we succeeded in obtaining reliable classification performances at different taxonomic levels by extracting a handful of rules.
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11573/1252302
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact