Record linkage aims at quickly and accurately identifying if two records represent the same real world entity. In many applications, we are interested in restricting the linkage results to "1 to 1" links, that is a single record does not appear more than once in the output. This can be dealt with the transport algorithm. The optimization problem, however, grows quadratically in the size of the input, quickly becoming untreatable for cases with a few thousand records. This paper compares different solutions, provided by some R packages for linear programming solvers. The comparison is done in terms of memory usage and execution time. The aim is to overcome the current implementation in the toolkit RELAIS, specifically developed for record linkage problems. The results highlight improvements beyond expectations. In fact the tested solutions allow successfully executing the "1 to 1" reduction for large size datasets up to the largest sample surveys at National Statistical Institutes.

Optimization Routines for Enforcing One-to-One Matches in Record Linkage Problems / Moretti, Diego; Valentino, Luca; Tuoto, Tiziana. - In: THE R JOURNAL. - ISSN 2073-4859. - 11:1(2019), pp. 185-197.

Optimization Routines for Enforcing One-to-One Matches in Record Linkage Problems

Tuoto Tiziana
2019

Abstract

Record linkage aims at quickly and accurately identifying if two records represent the same real world entity. In many applications, we are interested in restricting the linkage results to "1 to 1" links, that is a single record does not appear more than once in the output. This can be dealt with the transport algorithm. The optimization problem, however, grows quadratically in the size of the input, quickly becoming untreatable for cases with a few thousand records. This paper compares different solutions, provided by some R packages for linear programming solvers. The comparison is done in terms of memory usage and execution time. The aim is to overcome the current implementation in the toolkit RELAIS, specifically developed for record linkage problems. The results highlight improvements beyond expectations. In fact the tested solutions allow successfully executing the "1 to 1" reduction for large size datasets up to the largest sample surveys at National Statistical Institutes.
2019
record linkage; link reduction; linear programming
01 Pubblicazione su rivista::01a Articolo in rivista
Optimization Routines for Enforcing One-to-One Matches in Record Linkage Problems / Moretti, Diego; Valentino, Luca; Tuoto, Tiziana. - In: THE R JOURNAL. - ISSN 2073-4859. - 11:1(2019), pp. 185-197.
File allegati a questo prodotto
File Dimensione Formato  
Tuoto_OptimizationRoutines_2019.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 490.22 kB
Formato Adobe PDF
490.22 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1490808
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact