Record linkage is a data integration technique whose goal is to identify the same unit represented in different data sources in different ways. Deterministic linkage and probabilistic linkage are two linkage techniques that have been already widely used. The goal of this paper is to show how a record linkage procedure based on a probabilistic approach provides an increase in linked pairs compared to the sole use of a deterministic approach and to provide a step procedure to sequentially apply multiple linkage techniques. Datasets are the Inail (Italian National Institute for Insurance against Accidents at Work) archive of work-related accidents occurring with the use of a vehicle and the Istat (Italian National Institute of Statistics) archive of road accidents resulting in death or injury. It was applied a deterministic linkage followed by probabilistic linkage on the unlinked records. Deterministic linkage undoubtedly considers two records to be a link if they agree on a selection of variables whereas probabilistic linkage assigns a probability of being a link to records. Results show that the probabilistic linkage produced an increase of 18% of the linked pairs compared to the sole use of the deterministic approach.
Work-related road accidents: a data linkage procedure applied to assess traffic accidents at work and commuting / Taiano, Luca; Massari, Stefania; Tuoto, Tiziana; Valentino, Luca; Bruzzone, Silvia; Veronico, Liana. - In: RIVISTA DI STATISTICA UFFICIALE. - ISSN 1828-1982. - 3/2021:(2021), pp. 9-29.
Work-related road accidents: a data linkage procedure applied to assess traffic accidents at work and commuting
Stefania Massari;Tiziana Tuoto;
2021
Abstract
Record linkage is a data integration technique whose goal is to identify the same unit represented in different data sources in different ways. Deterministic linkage and probabilistic linkage are two linkage techniques that have been already widely used. The goal of this paper is to show how a record linkage procedure based on a probabilistic approach provides an increase in linked pairs compared to the sole use of a deterministic approach and to provide a step procedure to sequentially apply multiple linkage techniques. Datasets are the Inail (Italian National Institute for Insurance against Accidents at Work) archive of work-related accidents occurring with the use of a vehicle and the Istat (Italian National Institute of Statistics) archive of road accidents resulting in death or injury. It was applied a deterministic linkage followed by probabilistic linkage on the unlinked records. Deterministic linkage undoubtedly considers two records to be a link if they agree on a selection of variables whereas probabilistic linkage assigns a probability of being a link to records. Results show that the probabilistic linkage produced an increase of 18% of the linked pairs compared to the sole use of the deterministic approach.File | Dimensione | Formato | |
---|---|---|---|
Tuoto_Work-related-road_2021.pdf
accesso aperto
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
545.23 kB
Formato
Adobe PDF
|
545.23 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.