Entity resolution (ER) seeks to identify which records in a data set refer to the same real-world entity. Given the diversity of ways in which entities can be represented, ER is known to be a chal- lenging task for automated strategies, but relatively easier for expert humans. Nonetheless, also humans can make mistakes. Our contribution is an error correction toolkit that can be leveraged by a variety of hybrid human-machine ER algorithms, based on a formal way for selecting “control queries” for the human experts. We demonstrate empirically that less recent ER algorithms equipped with our tool can perform even better than most recent ER methods with built-in error correction.
Crowdsourced entity resolution with control queries / Galhotra, Sainyam; Firmani, Donatella; Saha, Barna; Srivastava, Divesh. - 2400:(2019). (Intervento presentato al convegno 27th Italian Symposium on Advanced Database Systems, SEBD 2019 tenutosi a Grosseto; Italy).
Crowdsourced entity resolution with control queries
Firmani Donatella;
2019
Abstract
Entity resolution (ER) seeks to identify which records in a data set refer to the same real-world entity. Given the diversity of ways in which entities can be represented, ER is known to be a chal- lenging task for automated strategies, but relatively easier for expert humans. Nonetheless, also humans can make mistakes. Our contribution is an error correction toolkit that can be leveraged by a variety of hybrid human-machine ER algorithms, based on a formal way for selecting “control queries” for the human experts. We demonstrate empirically that less recent ER algorithms equipped with our tool can perform even better than most recent ER methods with built-in error correction.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.