In this chapter we deal with population size estimation in a particularly interesting case. We assume that there is uncertainty regarding the fact that some observed individuals actually belong to the population of interest. We are motivated by the Scotland Drug Injectors data set of Overstall et al. [233], where some drug users may have quit and therefore there is left-censoring for some cell counts. We do so in a Bayesian framework. In the Bayesian framework (e.g., Bernardo and Smith [32]) inference is obtained via the posterior distribution of model parameters. There are clear advantages in our context: first of all, prior knowledge can be summarized by prior distributions, which also naturally provide regularization of the estimates; additionally, sampling from the posterior distribution is less cumbersome than maximizing the likelihood of a very complex model with censoring. Left-censoring is in our opinion more common than one could expect especially in social science research where separate multiple lists are obtained for the investigation of the population size. In Farcomeni and Scacciatelli [120], for instance, data collection is based on the registry of subjects caught in the street carrying, buying or using cannabis. The final population size estimate is then based on the assumption that all subjects sampled actually have used cannabis at least once, while it could be possible that some of them were carrying or buying it for someone else. The approach of Overstall et al. proceeds by modeling the counts of the target population underlying each left-censored cell via a truncated Poisson. The only assumption is that the number of subjects in a cell is only an upper bound for the actual number that should have been measured. Other approaches to the problem include Link et al. [185], where the observed counts are assumed to be affected by measurement error over a true latent multinomial count distribution, and Wright et al. [310], which is based on data augmentation. Overstall et al. [233] focus mostly on a single choice for the prior parameters. In this chapter we revisit and extend their approach. We then compare different objective and subjective prior and model specification choices, both from a theoretical and practical point of view using the motivating data as a case study. The rest of the chapter is organized as follows: in the next section we introduce the motivating Scotland drug injectors data set. We then detail log-linear models for possibly left-censored counts, and provide our first generalizations by discussing some simple forms of unobserved heterogeneity. In Section 25.4 we discuss choices for prior parameters, and their rationale; additionally we use the Deviance Information Criterion (DIC) for model choice. In Section 25.5 we briefly outline how to sample from the posterior distribution of model parameters. In Section 25.6 we illustrate several options for model specification of the Scotland drug injectors data set, and give concluding remarks in Section 25.7.

Bayesian population size estimation with censored counts / ALUNNI FEGATELLI, Danilo; Farcomeni, Alessio; Tardella, Luca. - STAMPA. - (2017), pp. 375-390.

Bayesian population size estimation with censored counts

ALUNNI FEGATELLI, DANILO;FARCOMENI, Alessio;TARDELLA, Luca
2017

Abstract

In this chapter we deal with population size estimation in a particularly interesting case. We assume that there is uncertainty regarding the fact that some observed individuals actually belong to the population of interest. We are motivated by the Scotland Drug Injectors data set of Overstall et al. [233], where some drug users may have quit and therefore there is left-censoring for some cell counts. We do so in a Bayesian framework. In the Bayesian framework (e.g., Bernardo and Smith [32]) inference is obtained via the posterior distribution of model parameters. There are clear advantages in our context: first of all, prior knowledge can be summarized by prior distributions, which also naturally provide regularization of the estimates; additionally, sampling from the posterior distribution is less cumbersome than maximizing the likelihood of a very complex model with censoring. Left-censoring is in our opinion more common than one could expect especially in social science research where separate multiple lists are obtained for the investigation of the population size. In Farcomeni and Scacciatelli [120], for instance, data collection is based on the registry of subjects caught in the street carrying, buying or using cannabis. The final population size estimate is then based on the assumption that all subjects sampled actually have used cannabis at least once, while it could be possible that some of them were carrying or buying it for someone else. The approach of Overstall et al. proceeds by modeling the counts of the target population underlying each left-censored cell via a truncated Poisson. The only assumption is that the number of subjects in a cell is only an upper bound for the actual number that should have been measured. Other approaches to the problem include Link et al. [185], where the observed counts are assumed to be affected by measurement error over a true latent multinomial count distribution, and Wright et al. [310], which is based on data augmentation. Overstall et al. [233] focus mostly on a single choice for the prior parameters. In this chapter we revisit and extend their approach. We then compare different objective and subjective prior and model specification choices, both from a theoretical and practical point of view using the motivating data as a case study. The rest of the chapter is organized as follows: in the next section we introduce the motivating Scotland drug injectors data set. We then detail log-linear models for possibly left-censored counts, and provide our first generalizations by discussing some simple forms of unobserved heterogeneity. In Section 25.4 we discuss choices for prior parameters, and their rationale; additionally we use the Deviance Information Criterion (DIC) for model choice. In Section 25.5 we briefly outline how to sample from the posterior distribution of model parameters. In Section 25.6 we illustrate several options for model specification of the Scotland drug injectors data set, and give concluding remarks in Section 25.7.
2017
Capture-Recapture Methods for the Social and Medical Sciences
9781498745314
capture-recapture; bayesian inference; censored counts; default prior; prior elicitation; log-linear model
02 Pubblicazione su volume::02a Capitolo o Articolo
Bayesian population size estimation with censored counts / ALUNNI FEGATELLI, Danilo; Farcomeni, Alessio; Tardella, Luca. - STAMPA. - (2017), pp. 375-390.
File allegati a questo prodotto
File Dimensione Formato  
Alunni Fegatelli_Bayesian-population_2017.pdf

solo gestori archivio

Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 290.76 kB
Formato Adobe PDF
290.76 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/954528
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact