In this paper we have described and extended some recent proposals on a general Bayesian methodology for performing record linkage and making inference using the resulting matched units. In particular, we have framed the record linkage process into a formal statistical model which comprises both the matching variables and the other variables included at the inferential stage. This way, the researcher is able to account for the matching process uncertainty in inferential procedures based on probabilistically linked data, and at the same time, he/she is also able to generate a feedback propagation of the information between the working statistical model and the record linkage stage. We have argued that this feedback effect is both essential to eliminate potential biases that otherwise would characterize the resulting linked data inference, and able to im- prove record linkage performances. The practical implementation of the procedure is based on the use of standard Bayesian computational techniques, such as Markov Chain Monte Carlo algorithms. Although the methodology is quite general, we have restricted our analysis to the popular and important case of multiple linear regression set-up for expository convenience.
Regression Analysis with linked data: problems and possible solutions / Tancredi, Andrea; Liseo, Brunero. - In: STATISTICA. - ISSN 1973-2201. - STAMPA. - 5:1(2015), pp. 19-35.
Regression Analysis with linked data: problems and possible solutions
TANCREDI, ANDREA
;LISEO, Brunero
2015
Abstract
In this paper we have described and extended some recent proposals on a general Bayesian methodology for performing record linkage and making inference using the resulting matched units. In particular, we have framed the record linkage process into a formal statistical model which comprises both the matching variables and the other variables included at the inferential stage. This way, the researcher is able to account for the matching process uncertainty in inferential procedures based on probabilistically linked data, and at the same time, he/she is also able to generate a feedback propagation of the information between the working statistical model and the record linkage stage. We have argued that this feedback effect is both essential to eliminate potential biases that otherwise would characterize the resulting linked data inference, and able to im- prove record linkage performances. The practical implementation of the procedure is based on the use of standard Bayesian computational techniques, such as Markov Chain Monte Carlo algorithms. Although the methodology is quite general, we have restricted our analysis to the popular and important case of multiple linear regression set-up for expository convenience.File | Dimensione | Formato | |
---|---|---|---|
Tancredi_Regression_2015.pdf
accesso aperto
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
288.1 kB
Formato
Adobe PDF
|
288.1 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.