The phenomenon of one–inflation is gaining more and more attention in the recent literature on species abundance and capture–recapture analysis. When analysing frequency count distribution, the excess of singletons is often ascribed to the erroneous inclusion of spurious cases. Various works propose to estimate the true number of singletons relying on the higher, supposedly error–free, counts (“discounting” approach). We argument that, in the case of microbial diversity studies, the generating process of the spurious singletons can be described in terms of false negative record linkage errors. Errors in sequencing the RNA genomes result in chimeric sequences that cannot be associated to the correct species, and constitute missing links that are added to the true singletons. In this scenario, none of the observed frequency counts is assumed to be error–free, and we propose an ABC algorithm to estimate the true frequency counts. The number of true singletons estimated in this way may differ considerably from the discounting approach. This implies different estimates of the diversity as measured, e.g., by Shannon’s index. However, curiously, the total population count estimates under the two approaches coincide.

On the nature of one–inflation in microbial diversity studies / DI CECCO, Davide; Tancredi, Andrea. - (2023), pp. 425-429. (Intervento presentato al convegno IWSM 2023 tenutosi a Dortmund).

On the nature of one–inflation in microbial diversity studies

Davide Di Cecco
;
Andrea Tancredi
2023

Abstract

The phenomenon of one–inflation is gaining more and more attention in the recent literature on species abundance and capture–recapture analysis. When analysing frequency count distribution, the excess of singletons is often ascribed to the erroneous inclusion of spurious cases. Various works propose to estimate the true number of singletons relying on the higher, supposedly error–free, counts (“discounting” approach). We argument that, in the case of microbial diversity studies, the generating process of the spurious singletons can be described in terms of false negative record linkage errors. Errors in sequencing the RNA genomes result in chimeric sequences that cannot be associated to the correct species, and constitute missing links that are added to the true singletons. In this scenario, none of the observed frequency counts is assumed to be error–free, and we propose an ABC algorithm to estimate the true frequency counts. The number of true singletons estimated in this way may differ considerably from the discounting approach. This implies different estimates of the diversity as measured, e.g., by Shannon’s index. However, curiously, the total population count estimates under the two approaches coincide.
2023
IWSM 2023
species problem; biodiversity; linkage errors; approximate bayesian computation
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
On the nature of one–inflation in microbial diversity studies / DI CECCO, Davide; Tancredi, Andrea. - (2023), pp. 425-429. (Intervento presentato al convegno IWSM 2023 tenutosi a Dortmund).
File allegati a questo prodotto
File Dimensione Formato  
DiCecco_nature-one–inflation_2023.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 1.81 MB
Formato Adobe PDF
1.81 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1688795
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact