Blocking is a key component of Entity Resolution (ER) that aims to improve efficiency by quickly pruning out non-matching record pairs. However, depending on the noise in the dataset and the distribution of entity cluster sizes, existing techniques can be either (a) too aggressive, such that they help scale but can adversely affect the ER effectiveness, or (b) too permissive, potentially harming ER efficiency. We propose a new methodology of progressive blocking that enables both efficient and effective ER and works across different entity cluster size distributions without manual fine tuning. In this paper, we demonstrate BEER (Blocking for Effective Entity Resolution), the first end-to-end system that leverages intermediate ER output in a feedback loop to refine the blocking result in a data-driven fashion, thereby enabling effective entity resolution. BEER allows the user to explore the different components of the ER pipeline, analyze the effectiveness of alternative blocking techniques and understand the interaction between blocking and ER. BEER supports visualization of the different entities present in a block, explains the change in blocking output with every round of feedback and allows the end-user to interactively compare different techniques. BEER has been developed as open-source software; the code and the demonstration video are available at beer-system.github.io.
BEER: Blocking for Effective Entity Resolution / Galhotra, S.; Firmani, D.; Saha, B.; Srivastava, D.. - (2021), pp. 2711-2715. (Intervento presentato al convegno ACM Special Interest Group on Management of Data Conference tenutosi a Virtual; Online) [10.1145/3448016.3452747].
BEER: Blocking for Effective Entity Resolution
Firmani D.
;Srivastava D.
2021
Abstract
Blocking is a key component of Entity Resolution (ER) that aims to improve efficiency by quickly pruning out non-matching record pairs. However, depending on the noise in the dataset and the distribution of entity cluster sizes, existing techniques can be either (a) too aggressive, such that they help scale but can adversely affect the ER effectiveness, or (b) too permissive, potentially harming ER efficiency. We propose a new methodology of progressive blocking that enables both efficient and effective ER and works across different entity cluster size distributions without manual fine tuning. In this paper, we demonstrate BEER (Blocking for Effective Entity Resolution), the first end-to-end system that leverages intermediate ER output in a feedback loop to refine the blocking result in a data-driven fashion, thereby enabling effective entity resolution. BEER allows the user to explore the different components of the ER pipeline, analyze the effectiveness of alternative blocking techniques and understand the interaction between blocking and ER. BEER supports visualization of the different entities present in a block, explains the change in blocking output with every round of feedback and allows the end-user to interactively compare different techniques. BEER has been developed as open-source software; the code and the demonstration video are available at beer-system.github.io.File | Dimensione | Formato | |
---|---|---|---|
Galhotra_BEER_2021.pdf
accesso aperto
Note: https://doi.org/10.1145/3448016.3452747
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
2.42 MB
Formato
Adobe PDF
|
2.42 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.