We propose and illustrate a hierarchical Bayesian approach for matching statistical records observed on different occasions. We show how this model can be profitably adopted both in record linkage problems and in capture-recapture setups, where the size of a finite population is the real object of interest. There are at least two important differences between the proposed model-based approach and the current practice in record linkage. First, the statistical model is built up on the actually observed categorical variables and no reduction (to 0-1 comparisons) of the available information takes place. Second, the hierarchical structure of the model allows a two-way propagation of the uncertainty between the parameter estimation step and the matching procedure so that no plug-in estimates are used and the correct uncertainty is accounted for both in estimating the population size and in performing the record linkage. We illustrate and motivate our proposal through a real data example and simulations.

A hierarchical Bayesian approach to record linkage and population size problems / Tancredi, Andrea; Liseo, Brunero. - In: THE ANNALS OF APPLIED STATISTICS. - ISSN 1932-6157. - STAMPA. - 5:2 B(2011), pp. 1553-1585. [10.1214/10-aoas447]

A hierarchical Bayesian approach to record linkage and population size problems

TANCREDI, ANDREA;LISEO, Brunero
2011

Abstract

We propose and illustrate a hierarchical Bayesian approach for matching statistical records observed on different occasions. We show how this model can be profitably adopted both in record linkage problems and in capture-recapture setups, where the size of a finite population is the real object of interest. There are at least two important differences between the proposed model-based approach and the current practice in record linkage. First, the statistical model is built up on the actually observed categorical variables and no reduction (to 0-1 comparisons) of the available information takes place. Second, the hierarchical structure of the model allows a two-way propagation of the uncertainty between the parameter estimation step and the matching procedure so that no plug-in estimates are used and the correct uncertainty is accounted for both in estimating the population size and in performing the record linkage. We illustrate and motivate our proposal through a real data example and simulations.
2011
record linkage; conditional independence; gibbs sampling; capture-recapture methods; metropolis-hastings
01 Pubblicazione su rivista::01a Articolo in rivista
A hierarchical Bayesian approach to record linkage and population size problems / Tancredi, Andrea; Liseo, Brunero. - In: THE ANNALS OF APPLIED STATISTICS. - ISSN 1932-6157. - STAMPA. - 5:2 B(2011), pp. 1553-1585. [10.1214/10-aoas447]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/230369
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 86
  • ???jsp.display-item.citation.isi??? 70
social impact