We consider the problem of estimating the size of a population of interest, or ``target population'', by integrating multiple data sources. Each source provides a list of the units of our population. In this context, we identify three possible scenarios: - Each unit of our target population is included in at least one of the sources, but the identification of the units is not error free: Some out--of--scope units are erroneously included in the lists and, viceversa, some units of our population are erroneously identified as out--of--scope; - All observed units are correctly identified as belonging or not to the target population. However, some units are not enlisted in any of the available sources. So, we have a problem of undercoverage of our lists; - Not all units are comprised in the data at hand, and the observed units are not correctly classified with respect to the target population. end{enumerate} The first scenario can be essentially characterized as a case of misclassification. We can exploit the information redundancy at our disposal to estimate the misclassification errors by making some assumptions on the randomness of that redundancy, and, as a result, we could even estimate unit--level probabilities of belonging to the target population. The second scenario represents a typical situation of a capture--recapture setting, where we have a set of lists wich are incomplete (they do not cover all units, and some unobserved units are not registered in any list) and overlapping (a unit can be registered in several sources). The event of being captured corresponds to the event of being registered in a list. Unlike the previous scenario, we can just estimate the number of unobserved units. In the third scenario, which is the focus of this chapter, we are assuming that both issues, of uncertainty of detection and uncertainty of state identification, are present in the data at hand. We essentially refer to a capture--recapture setting where the classic assumption of absence of error in the units identification is relaxed. In this context, a misclassification error can be rephrased as an ``erroneous capture''.

Estimating population size in multiple record systems with uncertainty of state identification / DI CECCO, Davide. - (2019), pp. 169-196. [10.1201/9781315120416].

Estimating population size in multiple record systems with uncertainty of state identification

DI CECCO, DAVIDE
2019

Abstract

We consider the problem of estimating the size of a population of interest, or ``target population'', by integrating multiple data sources. Each source provides a list of the units of our population. In this context, we identify three possible scenarios: - Each unit of our target population is included in at least one of the sources, but the identification of the units is not error free: Some out--of--scope units are erroneously included in the lists and, viceversa, some units of our population are erroneously identified as out--of--scope; - All observed units are correctly identified as belonging or not to the target population. However, some units are not enlisted in any of the available sources. So, we have a problem of undercoverage of our lists; - Not all units are comprised in the data at hand, and the observed units are not correctly classified with respect to the target population. end{enumerate} The first scenario can be essentially characterized as a case of misclassification. We can exploit the information redundancy at our disposal to estimate the misclassification errors by making some assumptions on the randomness of that redundancy, and, as a result, we could even estimate unit--level probabilities of belonging to the target population. The second scenario represents a typical situation of a capture--recapture setting, where we have a set of lists wich are incomplete (they do not cover all units, and some unobserved units are not registered in any list) and overlapping (a unit can be registered in several sources). The event of being captured corresponds to the event of being registered in a list. Unlike the previous scenario, we can just estimate the number of unobserved units. In the third scenario, which is the focus of this chapter, we are assuming that both issues, of uncertainty of detection and uncertainty of state identification, are present in the data at hand. We essentially refer to a capture--recapture setting where the classic assumption of absence of error in the units identification is relaxed. In this context, a misclassification error can be rephrased as an ``erroneous capture''.
2019
Analysis of integrated data
9781315120416
data integration; capture-recapture; latent variable
02 Pubblicazione su volume::02a Capitolo o Articolo
Estimating population size in multiple record systems with uncertainty of state identification / DI CECCO, Davide. - (2019), pp. 169-196. [10.1201/9781315120416].
File allegati a questo prodotto
File Dimensione Formato  
DiCecco_Estimating-Population-Size_2019.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.34 MB
Formato Adobe PDF
1.34 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1356561
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact