Record linkage is a data integration technique whose goal is to identify the same unit represented in different data sources in different ways. Deterministic linkage and probabilistic linkage are two linkage techniques that have been already widely used. The goal of this paper is to show how a record linkage procedure based on a probabilistic approach provides an increase in linked pairs compared to the sole use of a deterministic approach and to provide a step procedure to sequentially apply multiple linkage techniques. Datasets are the Inail (Italian National Institute for Insurance against Accidents at Work) archive of work-related accidents occurring with the use of a vehicle and the Istat (Italian National Institute of Statistics) archive of road accidents resulting in death or injury. It was applied a deterministic linkage followed by probabilistic linkage on the unlinked records. Deterministic linkage undoubtedly considers two records to be a link if they agree on a selection of variables whereas probabilistic linkage assigns a probability of being a link to records. Results show that the probabilistic linkage produced an increase of 18% of the linked pairs compared to the sole use of the deterministic approach.

Work-related road accidents: a data linkage procedure applied to assess traffic accidents at work and commuting / Taiano, Luca; Massari, Stefania; Tuoto, Tiziana; Valentino, Luca; Bruzzone, Silvia; Veronico, Liana. - In: RIVISTA DI STATISTICA UFFICIALE. - ISSN 1828-1982. - 3/2021:(2021), pp. 9-29.

Work-related road accidents: a data linkage procedure applied to assess traffic accidents at work and commuting

Stefania Massari;Tiziana Tuoto;
2021

Abstract

Record linkage is a data integration technique whose goal is to identify the same unit represented in different data sources in different ways. Deterministic linkage and probabilistic linkage are two linkage techniques that have been already widely used. The goal of this paper is to show how a record linkage procedure based on a probabilistic approach provides an increase in linked pairs compared to the sole use of a deterministic approach and to provide a step procedure to sequentially apply multiple linkage techniques. Datasets are the Inail (Italian National Institute for Insurance against Accidents at Work) archive of work-related accidents occurring with the use of a vehicle and the Istat (Italian National Institute of Statistics) archive of road accidents resulting in death or injury. It was applied a deterministic linkage followed by probabilistic linkage on the unlinked records. Deterministic linkage undoubtedly considers two records to be a link if they agree on a selection of variables whereas probabilistic linkage assigns a probability of being a link to records. Results show that the probabilistic linkage produced an increase of 18% of the linked pairs compared to the sole use of the deterministic approach.
2021
Data integration; record linkage; deterministic linkage; probabilistic linkage; road accidents
01 Pubblicazione su rivista::01a Articolo in rivista
Work-related road accidents: a data linkage procedure applied to assess traffic accidents at work and commuting / Taiano, Luca; Massari, Stefania; Tuoto, Tiziana; Valentino, Luca; Bruzzone, Silvia; Veronico, Liana. - In: RIVISTA DI STATISTICA UFFICIALE. - ISSN 1828-1982. - 3/2021:(2021), pp. 9-29.
File allegati a questo prodotto
File Dimensione Formato  
Tuoto_Work-related-road_2021.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 545.23 kB
Formato Adobe PDF
545.23 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1609571
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact