Entity resolution (ER) is the task of identifying all records in a database that refer to the same underlying entity. This is an expen- sive task, and can take a significant amount of money and time; the end-user may want to take decisions during the process, rather than waiting for the task to be completed. We formalize an online ver- sion of the entity resolution task, and use an oracle which correctly labels matching and non-matching pairs through queries. In this setting, we design algorithms that seek to maximize progressive re- call, and develop a novel analysis framework for prior proposals on entity resolution with an oracle, beyond their worst case guaran- tees. Finally, we provide both theoretical and experimental analysis of the proposed algorithms.
Online entity resolution using an oracle / Firmani, Donatella; Saha, Barna; Srivastava, Divesh. - In: PROCEEDINGS OF THE VLDB ENDOWMENT. - ISSN 2150-8097. - 9:(2016), pp. 384-395. [10.14778/2876473.2876474]
Online entity resolution using an oracle
Firmani Donatella;
2016
Abstract
Entity resolution (ER) is the task of identifying all records in a database that refer to the same underlying entity. This is an expen- sive task, and can take a significant amount of money and time; the end-user may want to take decisions during the process, rather than waiting for the task to be completed. We formalize an online ver- sion of the entity resolution task, and use an oracle which correctly labels matching and non-matching pairs through queries. In this setting, we design algorithms that seek to maximize progressive re- call, and develop a novel analysis framework for prior proposals on entity resolution with an oracle, beyond their worst case guaran- tees. Finally, we provide both theoretical and experimental analysis of the proposed algorithms.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.