Blocking is a key component of Entity Resolution (ER) that aims to improve efficiency by quickly pruning out non-matching record pairs. However, depending on the noise in the dataset and the distribution of entity cluster sizes, existing techniques can be either (a) too aggressive, such that they help scale but can adversely affect the ER effectiveness, or (b) too permissive, potentially harming ER efficiency. We propose a new methodology of progressive blocking that enables both efficient and effective ER and works across different entity cluster size distributions without manual fine tuning. In this paper, we demonstrate BEER (Blocking for Effective Entity Resolution), the first end-to-end system that leverages intermediate ER output in a feedback loop to refine the blocking result in a data-driven fashion, thereby enabling effective entity resolution. BEER allows the user to explore the different components of the ER pipeline, analyze the effectiveness of alternative blocking techniques and understand the interaction between blocking and ER. BEER supports visualization of the different entities present in a block, explains the change in blocking output with every round of feedback and allows the end-user to interactively compare different techniques. BEER has been developed as open-source software; the code and the demonstration video are available at beer-system.github.io.

BEER: Blocking for Effective Entity Resolution / Galhotra, S.; Firmani, D.; Saha, B.; Srivastava, D.. - (2021), pp. 2711-2715. (Intervento presentato al convegno ACM Special Interest Group on Management of Data Conference tenutosi a Virtual; Online) [10.1145/3448016.3452747].

BEER: Blocking for Effective Entity Resolution

Firmani D.
;
Srivastava D.
2021

Abstract

Blocking is a key component of Entity Resolution (ER) that aims to improve efficiency by quickly pruning out non-matching record pairs. However, depending on the noise in the dataset and the distribution of entity cluster sizes, existing techniques can be either (a) too aggressive, such that they help scale but can adversely affect the ER effectiveness, or (b) too permissive, potentially harming ER efficiency. We propose a new methodology of progressive blocking that enables both efficient and effective ER and works across different entity cluster size distributions without manual fine tuning. In this paper, we demonstrate BEER (Blocking for Effective Entity Resolution), the first end-to-end system that leverages intermediate ER output in a feedback loop to refine the blocking result in a data-driven fashion, thereby enabling effective entity resolution. BEER allows the user to explore the different components of the ER pipeline, analyze the effectiveness of alternative blocking techniques and understand the interaction between blocking and ER. BEER supports visualization of the different entities present in a block, explains the change in blocking output with every round of feedback and allows the end-user to interactively compare different techniques. BEER has been developed as open-source software; the code and the demonstration video are available at beer-system.github.io.
2021
ACM Special Interest Group on Management of Data Conference
Entity Resolution; open-source software; Blocking
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
BEER: Blocking for Effective Entity Resolution / Galhotra, S.; Firmani, D.; Saha, B.; Srivastava, D.. - (2021), pp. 2711-2715. (Intervento presentato al convegno ACM Special Interest Group on Management of Data Conference tenutosi a Virtual; Online) [10.1145/3448016.3452747].
File allegati a questo prodotto
File Dimensione Formato  
Galhotra_BEER_2021.pdf

accesso aperto

Note: https://doi.org/10.1145/3448016.3452747
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.42 MB
Formato Adobe PDF
2.42 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1696183
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 5
social impact