We introduce a Bayesian methodology for performing record linkage and regression analysis using the resulting matched units in a k lists framework with possible duplications. We frame the record linkage process into a formal statistical model which comprises both the matching variables and other variables included at the inferential stage. This way, the researcher is able to account for the matching process uncertainty in inferential procedures based on probabilistically linked data, and at the same time, he/she is also able to generate a feedback propagation of uncertainty between the working statistical model and the record linkage stage. We argue that the feedback effect is essential to eliminate potential biases that could characterize the resulting post-linkage inference. The feedback effect is also able to improve record linkage performances. Practical implementation of the procedure is based on standard Bayesian computational techniques. Although the methodology is quite general, we have restricted our analysis to the popular and important case of multiple linear regression set-up for expository convenience.

Bayesian Regression Analysis with Linked and Duplicated Data / Tancredi, Andrea; Steorts, Rebecca; Liseo, Brunero. - (2016). (Intervento presentato al convegno CLADAG 2015 10th Scientific meeting of the classification and data analysis group of the italian statistical society. tenutosi a Cagliari, Santa Margherita di Pula).

Bayesian Regression Analysis with Linked and Duplicated Data

Andrea Tancredi;Brunero Liseo
2016

Abstract

We introduce a Bayesian methodology for performing record linkage and regression analysis using the resulting matched units in a k lists framework with possible duplications. We frame the record linkage process into a formal statistical model which comprises both the matching variables and other variables included at the inferential stage. This way, the researcher is able to account for the matching process uncertainty in inferential procedures based on probabilistically linked data, and at the same time, he/she is also able to generate a feedback propagation of uncertainty between the working statistical model and the record linkage stage. We argue that the feedback effect is essential to eliminate potential biases that could characterize the resulting post-linkage inference. The feedback effect is also able to improve record linkage performances. Practical implementation of the procedure is based on standard Bayesian computational techniques. Although the methodology is quite general, we have restricted our analysis to the popular and important case of multiple linear regression set-up for expository convenience.
2016
CLADAG 2015 10th Scientific meeting of the classification and data analysis group of the italian statistical society.
Record Linkage; Hit-And-Miss algorithm
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Bayesian Regression Analysis with Linked and Duplicated Data / Tancredi, Andrea; Steorts, Rebecca; Liseo, Brunero. - (2016). (Intervento presentato al convegno CLADAG 2015 10th Scientific meeting of the classification and data analysis group of the italian statistical society. tenutosi a Cagliari, Santa Margherita di Pula).
File allegati a questo prodotto
File Dimensione Formato  
Tancredi_Bayesian-Regression-Analysis_2016.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.12 MB
Formato Adobe PDF
1.12 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1248039
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact