We introduce the similarity caching problem, a variant of classical caching in which an algorithm can return an element from the cache that is similar, but not necessarily identical, to the query element. We are motivated by buffer management questions in approximate nearest-neighbor applications, especially in the context of caching targeted advertisements on the web. Formally, we assume the queries lie in a metric space, with distance function d(.,.). A query p is considered a cache hit if there is a point q in the cache that is sufficiently close to p, i.e., for a threshold radius r, we have d(p, q) <= r. The goal is then to minimize the number of cache misses, vis-a-vis the optimal algorithm. As with classical caching, we use the competitive ratio to measure the performance of different algorithms. While similarity caching is a strict generalization of classical caching, we show that unless the algorithm is allowed extra power (either in the size of the cache or the threshold r) over the optimal offline algorithm, the problem is intractable. We then proceed to quantify the hardness as a function of the complexity of the underlying metric space. We show that the problem becomes easier as we proceed from general metric spaces to those of bounded doubling dimension, and to Euclidean metrics. Finally, we investigate several extensions of the problem: dependence of the threshold r on the query and a smoother trade-off between the cache-miss cost and the query-query similarity.

Similarity Caching / Chierichetti, Flavio; Ravi, Kumar; Sergei, Vassilvitskii. - (2009), pp. 127-135. (Intervento presentato al convegno 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS) tenutosi a Providence, RI nel JUN 29-JUL 01, 2009) [10.1145/1559795.1559815].

Similarity Caching

CHIERICHETTI, FLAVIO;
2009

Abstract

We introduce the similarity caching problem, a variant of classical caching in which an algorithm can return an element from the cache that is similar, but not necessarily identical, to the query element. We are motivated by buffer management questions in approximate nearest-neighbor applications, especially in the context of caching targeted advertisements on the web. Formally, we assume the queries lie in a metric space, with distance function d(.,.). A query p is considered a cache hit if there is a point q in the cache that is sufficiently close to p, i.e., for a threshold radius r, we have d(p, q) <= r. The goal is then to minimize the number of cache misses, vis-a-vis the optimal algorithm. As with classical caching, we use the competitive ratio to measure the performance of different algorithms. While similarity caching is a strict generalization of classical caching, we show that unless the algorithm is allowed extra power (either in the size of the cache or the threshold r) over the optimal offline algorithm, the problem is intractable. We then proceed to quantify the hardness as a function of the complexity of the underlying metric space. We show that the problem becomes easier as we proceed from general metric spaces to those of bounded doubling dimension, and to Euclidean metrics. Finally, we investigate several extensions of the problem: dependence of the threshold r on the query and a smoother trade-off between the cache-miss cost and the query-query similarity.
2009
28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS)
buffer management; caching; competitive analysis; nearest-neighbor
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Similarity Caching / Chierichetti, Flavio; Ravi, Kumar; Sergei, Vassilvitskii. - (2009), pp. 127-135. (Intervento presentato al convegno 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS) tenutosi a Providence, RI nel JUN 29-JUL 01, 2009) [10.1145/1559795.1559815].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/497784
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 12
social impact