Long-read annotation. Automated Eukaryotic genome annotation based on long-read cDNA sequencing

Cook, David E; Valle-Inclan, Jose Espejo; Pajoro, Alice; Rovenich, Hanna; Thomma, Bart P H J; Faino, Luigi

doi:10.1104/pp.18.00848

Single-molecule full-length complementary DNA (cDNA) sequencing can aid genome annotation by revealing transcript structure and alternative splice forms, yet current annotation pipelines do not incorporate such information. Here we present long-read annotation (LoReAn) software, an automated annotation pipeline utilizing short- and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal genomes (Verticillium dahliae and Plicaturopsis crispa) and two plant genomes (Arabidopsis [Arabidopsis thaliana] and Oryza sativa), we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA-sequencing data generated from either the Pacific Biosciences or MinION sequencing platforms, correctly predicting gene structure, and capturing genes missed by other annotation pipelines.

Long-read annotation. Automated Eukaryotic genome annotation based on long-read cDNA sequencing / Cook, D.E., Valle-Inclan, J.E., Pajoro, A., Rovenich, H., Thomma, B.P.H.J., Faino, L.. - In: PLANT PHYSIOLOGY. - ISSN 0032-0889. - 179:1(2019), pp. 38-54. [10.1104/pp.18.00848]

Long-read annotation. Automated Eukaryotic genome annotation based on long-read cDNA sequencing

Cook, David E;Valle-Inclan, Jose Espejo;Pajoro, Alice;Rovenich, Hanna;Thomma, Bart P H J;Faino, Luigi^{Ultimo

Conceptualization}

2019

Abstract

Single-molecule full-length complementary DNA (cDNA) sequencing can aid genome annotation by revealing transcript structure and alternative splice forms, yet current annotation pipelines do not incorporate such information. Here we present long-read annotation (LoReAn) software, an automated annotation pipeline utilizing short- and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal genomes (Verticillium dahliae and Plicaturopsis crispa) and two plant genomes (Arabidopsis [Arabidopsis thaliana] and Oryza sativa), we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA-sequencing data generated from either the Pacific Biosciences or MinION sequencing platforms, correctly predicting gene structure, and capturing genes missed by other annotation pipelines.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2019
			
	Parole chiave
	
				bioinformatics; genome annotaiton
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Long-read annotation. Automated Eukaryotic genome annotation based on long-read cDNA sequencing / Cook, D.E., Valle-Inclan, J.E., Pajoro, A., Rovenich, H., Thomma, B.P.H.J., Faino, L.. - In: PLANT PHYSIOLOGY. - ISSN 0032-0889. - 179:1(2019), pp. 38-54. [10.1104/pp.18.00848]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Cook_Long-read-annotation_2019.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 2.44 MB Formato Adobe PDF Contatta l'autore	2.44 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1213709

Citazioni

34

49

44

Catalogo dei prodotti della ricerca