We introduce a Bayesian methodology for performing record linkage and regression analysis using the resulting matched units in a k lists framework with possible duplications. We frame the record linkage process into a formal statistical model which comprises both the matching variables and other variables included at the inferential stage. This way, the researcher is able to account for the matching process uncertainty in inferential procedures based on probabilistically linked data, and at the same time, he/she is also able to generate a feedback propagation of uncertainty between the working statistical model and the record linkage stage. We argue that the feedback effect is essential to eliminate potential biases that could characterize the resulting post-linkage inference. The feedback effect is also able to improve record linkage performances. Practical implementation of the procedure is based on standard Bayesian computational techniques. Although the methodology is quite general, we have restricted our analysis to the popular and important case of multiple linear regression set-up for expository convenience.
Bayesian Regression Analysis with Linked and Duplicated Data / Tancredi, Andrea; Steorts, Rebecca; Liseo, Brunero. - (2016). (Intervento presentato al convegno CLADAG 2015 10th Scientific meeting of the classification and data analysis group of the italian statistical society. tenutosi a Cagliari, Santa Margherita di Pula).
Bayesian Regression Analysis with Linked and Duplicated Data
Andrea Tancredi;Brunero Liseo
2016
Abstract
We introduce a Bayesian methodology for performing record linkage and regression analysis using the resulting matched units in a k lists framework with possible duplications. We frame the record linkage process into a formal statistical model which comprises both the matching variables and other variables included at the inferential stage. This way, the researcher is able to account for the matching process uncertainty in inferential procedures based on probabilistically linked data, and at the same time, he/she is also able to generate a feedback propagation of uncertainty between the working statistical model and the record linkage stage. We argue that the feedback effect is essential to eliminate potential biases that could characterize the resulting post-linkage inference. The feedback effect is also able to improve record linkage performances. Practical implementation of the procedure is based on standard Bayesian computational techniques. Although the methodology is quite general, we have restricted our analysis to the popular and important case of multiple linear regression set-up for expository convenience.File | Dimensione | Formato | |
---|---|---|---|
Tancredi_Bayesian-Regression-Analysis_2016.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.12 MB
Formato
Adobe PDF
|
1.12 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.