The combined use of data from different sources is an opportunity that the National Statistical Institutes exploit more and more frequently. In a context where huge amount of information, produced by different actors, can be integrated and compared, it becomes even more necessary to provide quality assessments of methods and techniques that have allowed to achieve integration results. When considering data integration at the micro level, record linkage procedures are widely used and generally produce good results (when strong identifying variables are available), although rarely are these procedures provided with associated quality indicators. However, especially in official statistics, quality indicators need to be used in subsequent statistical analyses to guarantee and assess data accuracy and reliability. This paper proposes a method for linkage error estimation. The method enriches the Fellegi and Sunter model for probabilistic record linkage: as well known, the Fellegi and Sunter decision rule is very effective for link identification but generally less reliable for result evaluation. The proposal aims at predicting the linkage quality in the Fellegi and Sunter framework, introducing a supervised step.
New proposal for linkage error estimation / Tuoto, Tiziana. - In: STATISTICAL JOURNAL OF THE IAOS. - ISSN 1874-7655. - 32:(2016), pp. 413-420.
New proposal for linkage error estimation
Tiziana Tuoto
2016
Abstract
The combined use of data from different sources is an opportunity that the National Statistical Institutes exploit more and more frequently. In a context where huge amount of information, produced by different actors, can be integrated and compared, it becomes even more necessary to provide quality assessments of methods and techniques that have allowed to achieve integration results. When considering data integration at the micro level, record linkage procedures are widely used and generally produce good results (when strong identifying variables are available), although rarely are these procedures provided with associated quality indicators. However, especially in official statistics, quality indicators need to be used in subsequent statistical analyses to guarantee and assess data accuracy and reliability. This paper proposes a method for linkage error estimation. The method enriches the Fellegi and Sunter model for probabilistic record linkage: as well known, the Fellegi and Sunter decision rule is very effective for link identification but generally less reliable for result evaluation. The proposal aims at predicting the linkage quality in the Fellegi and Sunter framework, introducing a supervised step.File | Dimensione | Formato | |
---|---|---|---|
Tuoto_New-proposal_2016.pdf
accesso aperto
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
769.61 kB
Formato
Adobe PDF
|
769.61 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.