Probabilistic record linkage based on Fellegi-Sunter theory is a methodology for integrating data collected in different sources when a unique common identifier is not available. It requires at least three matching variables are available to identify the probability model. In official statistics, it is emerging the need to join archives even with less than three common variables, this is the case for instance of addresses and business archives of poor quality. For this problem, we compare available common variables by means of string comparators and propose mixtures of continuous and categorical distributions rather than usual the latent class models to estimate linkage probabilities

Probabilistic record linkage with less than three matching variables / Tuoto, Tiziana; Fortini, Marco. - (2020), pp. 3-8. (Intervento presentato al convegno 50th Meeting of the Italian Statistical Society tenutosi a Pisa).

Probabilistic record linkage with less than three matching variables

Tiziana Tuoto
Primo
;
Marco Fortini
Secondo
2020

Abstract

Probabilistic record linkage based on Fellegi-Sunter theory is a methodology for integrating data collected in different sources when a unique common identifier is not available. It requires at least three matching variables are available to identify the probability model. In official statistics, it is emerging the need to join archives even with less than three common variables, this is the case for instance of addresses and business archives of poor quality. For this problem, we compare available common variables by means of string comparators and propose mixtures of continuous and categorical distributions rather than usual the latent class models to estimate linkage probabilities
2020
50th Meeting of the Italian Statistical Society
Fellegi-Sunter record linkage; mixture models; string metrics
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Probabilistic record linkage with less than three matching variables / Tuoto, Tiziana; Fortini, Marco. - (2020), pp. 3-8. (Intervento presentato al convegno 50th Meeting of the Italian Statistical Society tenutosi a Pisa).
File allegati a questo prodotto
File Dimensione Formato  
Tuoto_Probabilistic-record-linkage_2020.pdf

accesso aperto

Note: contributo
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 478.23 kB
Formato Adobe PDF
478.23 kB Adobe PDF
Tuoto_SIS-2020-atti-convegno_2020.pdf

accesso aperto

Note: frontespizio e indice del volume
Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 199.51 kB
Formato Adobe PDF
199.51 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1663628
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact