Single-molecule full-length complementary DNA (cDNA) sequencing can aid genome annotation by revealing transcript structure and alternative splice forms, yet current annotation pipelines do not incorporate such information. Here we present long-read annotation (LoReAn) software, an automated annotation pipeline utilizing short- and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal genomes (Verticillium dahliae and Plicaturopsis crispa) and two plant genomes (Arabidopsis [Arabidopsis thaliana] and Oryza sativa), we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA-sequencing data generated from either the Pacific Biosciences or MinION sequencing platforms, correctly predicting gene structure, and capturing genes missed by other annotation pipelines.

Long-read annotation. Automated Eukaryotic genome annotation based on long-read cDNA sequencing / Cook, David E; Valle-Inclan, Jose Espejo; Pajoro, Alice; Rovenich, Hanna; Thomma, Bart P H J; Faino, Luigi. - In: PLANT PHYSIOLOGY. - ISSN 0032-0889. - 179:1(2019), pp. 38-54. [10.1104/pp.18.00848]

Long-read annotation. Automated Eukaryotic genome annotation based on long-read cDNA sequencing

Faino, Luigi
Ultimo
Conceptualization
2019

Abstract

Single-molecule full-length complementary DNA (cDNA) sequencing can aid genome annotation by revealing transcript structure and alternative splice forms, yet current annotation pipelines do not incorporate such information. Here we present long-read annotation (LoReAn) software, an automated annotation pipeline utilizing short- and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal genomes (Verticillium dahliae and Plicaturopsis crispa) and two plant genomes (Arabidopsis [Arabidopsis thaliana] and Oryza sativa), we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA-sequencing data generated from either the Pacific Biosciences or MinION sequencing platforms, correctly predicting gene structure, and capturing genes missed by other annotation pipelines.
2019
bioinformatics; genome annotaiton
01 Pubblicazione su rivista::01a Articolo in rivista
Long-read annotation. Automated Eukaryotic genome annotation based on long-read cDNA sequencing / Cook, David E; Valle-Inclan, Jose Espejo; Pajoro, Alice; Rovenich, Hanna; Thomma, Bart P H J; Faino, Luigi. - In: PLANT PHYSIOLOGY. - ISSN 0032-0889. - 179:1(2019), pp. 38-54. [10.1104/pp.18.00848]
File allegati a questo prodotto
File Dimensione Formato  
Cook_Long-read-annotation_2019.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.44 MB
Formato Adobe PDF
2.44 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1213709
Citazioni
  • ???jsp.display-item.citation.pmc??? 12
  • Scopus 37
  • ???jsp.display-item.citation.isi??? 33
social impact