In this paper we have described and extended some recent proposals on a general Bayesian methodology for performing record linkage and making inference using the resulting matched units. In particular, we have framed the record linkage process into a formal statistical model which comprises both the matching variables and the other variables included at the inferential stage. This way, the researcher is able to account for the matching process uncertainty in inferential procedures based on probabilistically linked data, and at the same time, he/she is also able to generate a feedback propagation of the information between the working statistical model and the record linkage stage. We have argued that this feedback effect is both essential to eliminate potential biases that otherwise would characterize the resulting linked data inference, and able to im- prove record linkage performances. The practical implementation of the procedure is based on the use of standard Bayesian computational techniques, such as Markov Chain Monte Carlo algorithms. Although the methodology is quite general, we have restricted our analysis to the popular and important case of multiple linear regression set-up for expository convenience.

Regression Analysis with linked data: problems and possible solutions / Tancredi, Andrea; Liseo, Brunero. - In: STATISTICA. - ISSN 1973-2201. - STAMPA. - 5:1(2015), pp. 19-35.

Regression Analysis with linked data: problems and possible solutions

TANCREDI, ANDREA
;
LISEO, Brunero
2015

Abstract

In this paper we have described and extended some recent proposals on a general Bayesian methodology for performing record linkage and making inference using the resulting matched units. In particular, we have framed the record linkage process into a formal statistical model which comprises both the matching variables and the other variables included at the inferential stage. This way, the researcher is able to account for the matching process uncertainty in inferential procedures based on probabilistically linked data, and at the same time, he/she is also able to generate a feedback propagation of the information between the working statistical model and the record linkage stage. We have argued that this feedback effect is both essential to eliminate potential biases that otherwise would characterize the resulting linked data inference, and able to im- prove record linkage performances. The practical implementation of the procedure is based on the use of standard Bayesian computational techniques, such as Markov Chain Monte Carlo algorithms. Although the methodology is quite general, we have restricted our analysis to the popular and important case of multiple linear regression set-up for expository convenience.
2015
Bayesian regression; Hit-miss model; Metropolis-Hastings algorithm; Record linkage
01 Pubblicazione su rivista::01a Articolo in rivista
Regression Analysis with linked data: problems and possible solutions / Tancredi, Andrea; Liseo, Brunero. - In: STATISTICA. - ISSN 1973-2201. - STAMPA. - 5:1(2015), pp. 19-35.
File allegati a questo prodotto
File Dimensione Formato  
Tancredi_Regression_2015.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 288.1 kB
Formato Adobe PDF
288.1 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/783231
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 12
social impact