This paper considers the problem of querying dirty databases, which may contain both erroneous facts and multiple names for the same entity. While both of these data quality issues have been widely studied in isolation, our contribution is a holistic framework for jointly deduplicating and repairing data. Our REPLACE framework follows a declarative approach, utilizing logical rules to specify under which conditions a pair of entity references can or must be merged and logical constraints to specify consistency requirements. The semantics defines a space of solutions, each consisting of a set of merges to perform and a set of facts to delete, which can be further refined by applying optimality criteria. As there may be multiple optimal solutions, we use classical notions of possible and certain query answers to reason over the alternative solutions, and introduce a novel notion of most informative answer to obtain a more compact presentation of query results. We perform a detailed analysis of the data complexity of the central reasoning tasks of recognizing optimal solutions and (most informative) possible and certain answers, for each of the three notions of optimal solution and for both general and restricted specifications.

REPLACE: A Logical Framework for Combining Collective Entity Resolution and Repairing / Bienvenu, M.; Cima, G.; Gutierrez-Basulto, V.. - (2023), pp. 3132-3139. (Intervento presentato al convegno International Joint Conference on Artificial Intelligence tenutosi a Macao; China) [10.24963/ijcai.2023/349].

REPLACE: A Logical Framework for Combining Collective Entity Resolution and Repairing

Cima G.
;
2023

Abstract

This paper considers the problem of querying dirty databases, which may contain both erroneous facts and multiple names for the same entity. While both of these data quality issues have been widely studied in isolation, our contribution is a holistic framework for jointly deduplicating and repairing data. Our REPLACE framework follows a declarative approach, utilizing logical rules to specify under which conditions a pair of entity references can or must be merged and logical constraints to specify consistency requirements. The semantics defines a space of solutions, each consisting of a set of merges to perform and a set of facts to delete, which can be further refined by applying optimality criteria. As there may be multiple optimal solutions, we use classical notions of possible and certain query answers to reason over the alternative solutions, and introduce a novel notion of most informative answer to obtain a more compact presentation of query results. We perform a detailed analysis of the data complexity of the central reasoning tasks of recognizing optimal solutions and (most informative) possible and certain answers, for each of the three notions of optimal solution and for both general and restricted specifications.
2023
International Joint Conference on Artificial Intelligence
declarative framework; logical rules and constraints; entity resolution
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
REPLACE: A Logical Framework for Combining Collective Entity Resolution and Repairing / Bienvenu, M.; Cima, G.; Gutierrez-Basulto, V.. - (2023), pp. 3132-3139. (Intervento presentato al convegno International Joint Conference on Artificial Intelligence tenutosi a Macao; China) [10.24963/ijcai.2023/349].
File allegati a questo prodotto
File Dimensione Formato  
Bienvenu_REPLACE_2023.pdf

accesso aperto

Note: DOI 10.24963/ijcai.2023/349
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 174.28 kB
Formato Adobe PDF
174.28 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1694113
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact