The standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases, relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies, we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behavior of volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach.

Queries with Arithmetic on Incomplete Databases / Console, M.; Hofer, M.; Libkin, L.. - (2020), pp. 179-189. (Intervento presentato al convegno ACM SIGMOD-SIGACT-SIGART Conference on Principles of Database Systems tenutosi a Portland; United States) [10.1145/3375395.3387666].

Queries with Arithmetic on Incomplete Databases

Console M.
;
Libkin L.
2020

Abstract

The standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases, relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies, we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behavior of volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach.
2020
ACM SIGMOD-SIGACT-SIGART Conference on Principles of Database Systems
approximate query answers; asymptotic behavior; first-order queries; incomplete information; measure of certainty; missing data; numerical data; query answering
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Queries with Arithmetic on Incomplete Databases / Console, M.; Hofer, M.; Libkin, L.. - (2020), pp. 179-189. (Intervento presentato al convegno ACM SIGMOD-SIGACT-SIGART Conference on Principles of Database Systems tenutosi a Portland; United States) [10.1145/3375395.3387666].
File allegati a questo prodotto
File Dimensione Formato  
Console_Queries-with-Arithmetic_2020.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.42 MB
Formato Adobe PDF
1.42 MB Adobe PDF   Contatta l'autore
Console_Postprint_Queries-with-Arithmetic_2020.pdf.pdf

accesso aperto

Note: https://dl.acm.org/doi/10.1145/3375395.3387666
Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.03 MB
Formato Adobe PDF
1.03 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1474734
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 4
social impact