Catalogo dei prodotti della ricerca

Species diversity analysis of microbial communities is an important tool for assess- ing an ecosystem health. The advent of high-throughput genome sequencing tech- niques has made it possible to process an unprecedented number of RNA sequences. However, many studies report the presence of a significant number of fictitious rare species in datasets generated using these techniques. These species are the product of errors that can occur at any step of the sequence analysis pipeline. The overcount of rare species (especially singletons) affects the estimation of the total number of species, and of the diversity of the community as measured by Shannon’s index. To avoid overestimating these quantities, it is crucial to model the source of error. In this work, we present a new model that treats spurious singletons as false-negative record linkage errors, and compare it with another approach where spurious single- tons are considered for deletion. We discuss the two inferential approaches both with an application to real data and on theoretical grounds. We demonstrate that, while Shannon’s index can differ significantly under the two models, the estimate of the total number of species is equivalent.

Estimating the number of sequencing errors in microbial diversity studies / Di Cecco, Davide; Tancredi, Andrea. - In: ENVIRONMENTAL AND ECOLOGICAL STATISTICS. - ISSN 1573-3009. - (2024). [10.1007/s10651-024-00614-w]

Estimating the number of sequencing errors in microbial diversity studies

Davide Di Cecco;Andrea Tancredi

2024

Abstract

Species diversity analysis of microbial communities is an important tool for assess- ing an ecosystem health. The advent of high-throughput genome sequencing tech- niques has made it possible to process an unprecedented number of RNA sequences. However, many studies report the presence of a significant number of fictitious rare species in datasets generated using these techniques. These species are the product of errors that can occur at any step of the sequence analysis pipeline. The overcount of rare species (especially singletons) affects the estimation of the total number of species, and of the diversity of the community as measured by Shannon’s index. To avoid overestimating these quantities, it is crucial to model the source of error. In this work, we present a new model that treats spurious singletons as false-negative record linkage errors, and compare it with another approach where spurious single- tons are considered for deletion. We discuss the two inferential approaches both with an application to real data and on theoretical grounds. We demonstrate that, while Shannon’s index can differ significantly under the two models, the estimate of the total number of species is equivalent.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2024
			
	Parole chiave
	
				Approximate Bayesian Computation; Linkage errors;  Microbial diversity; Sequencing errors
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Estimating the number of sequencing errors in microbial diversity studies / Di Cecco, Davide; Tancredi, Andrea. - In: ENVIRONMENTAL AND ECOLOGICAL STATISTICS. - ISSN 1573-3009. - (2024). [10.1007/s10651-024-00614-w]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Tancredi_Estimating-number_2024.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.2 MB Formato Adobe PDF Contatta l'autore	1.2 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1709378

Citazioni

ND

ND

3

social impact