Locality sensitive hashing (LSH) is a key algorithmic tool that lies at the heart of many information retrieval and machine learning systems [1, 2, 8]. LSH schemes are used to sketch large objects (e.g., Web pages, fields of flowers, or - more generally - sets and vectors) into fingerprints of few bits each; the fingerprints are then used to quickly, and approximately, reconstruct some similarity relation between the objects. A LSH scheme for a similarity (or, analogously, for a distance) can significantly improve the computational cost of many algorithmic primitives (e.g., nearest neighbor search, and clustering). For this reason, in the last two decades, researchers have tried to understand which similarities admit efficient LSH schemes: such schemes were obtained for many similarities [1-3, 7-9], while the non-existence of LSH schemes was proved for a number of other similarities [3]. In our talk, we will introduce the class of LSH-preserving transformations [4] (functions that, when applied to a similarity that admits a LSH scheme, return a similarity that also admits such a scheme). We will give a characterization of this class of functions: they are precisely the probability generating functions, up to scaling. We will then show how this characterization can be used to construct LSH schemes for a number of well-known similarities. We will then discuss a notion of similarity distortion [6], in order to deal with similarities which are known to not admit LSH schemes - this notion aims to determine the minimum distortions that these similarities have to be subject of, before starting to admit a LSH scheme. We will introduce a number of general theoretical tools that can be used to determine the optimal distortions of some important classes of similarities. Finally, we will consider the computational problem of checking whether a similarity admits a LSH scheme [5], showing that, unfortunately, this problem is computationally hard in a very strong sense.

Locality sensitive hashing schemes, similarities, and distortion (invited talk) / Chierichetti, F.. - 11376:(2019), pp. 531-532. (Intervento presentato al convegno 45th International Conference on Current Trends in Theory and Practice of Computer Science, Nový Smokovec, Slovakia, January 27-30, 2019, Proceedings tenutosi a Nový Smokovec, Slovakia).

Locality sensitive hashing schemes, similarities, and distortion (invited talk)

Chierichetti F.
2019

Abstract

Locality sensitive hashing (LSH) is a key algorithmic tool that lies at the heart of many information retrieval and machine learning systems [1, 2, 8]. LSH schemes are used to sketch large objects (e.g., Web pages, fields of flowers, or - more generally - sets and vectors) into fingerprints of few bits each; the fingerprints are then used to quickly, and approximately, reconstruct some similarity relation between the objects. A LSH scheme for a similarity (or, analogously, for a distance) can significantly improve the computational cost of many algorithmic primitives (e.g., nearest neighbor search, and clustering). For this reason, in the last two decades, researchers have tried to understand which similarities admit efficient LSH schemes: such schemes were obtained for many similarities [1-3, 7-9], while the non-existence of LSH schemes was proved for a number of other similarities [3]. In our talk, we will introduce the class of LSH-preserving transformations [4] (functions that, when applied to a similarity that admits a LSH scheme, return a similarity that also admits such a scheme). We will give a characterization of this class of functions: they are precisely the probability generating functions, up to scaling. We will then show how this characterization can be used to construct LSH schemes for a number of well-known similarities. We will then discuss a notion of similarity distortion [6], in order to deal with similarities which are known to not admit LSH schemes - this notion aims to determine the minimum distortions that these similarities have to be subject of, before starting to admit a LSH scheme. We will introduce a number of general theoretical tools that can be used to determine the optimal distortions of some important classes of similarities. Finally, we will consider the computational problem of checking whether a similarity admits a LSH scheme [5], showing that, unfortunately, this problem is computationally hard in a very strong sense.
2019
45th International Conference on Current Trends in Theory and Practice of Computer Science, Nový Smokovec, Slovakia, January 27-30, 2019, Proceedings
lsh;similarities;nearest neighbor
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Locality sensitive hashing schemes, similarities, and distortion (invited talk) / Chierichetti, F.. - 11376:(2019), pp. 531-532. (Intervento presentato al convegno 45th International Conference on Current Trends in Theory and Practice of Computer Science, Nový Smokovec, Slovakia, January 27-30, 2019, Proceedings tenutosi a Nový Smokovec, Slovakia).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1613509
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact