Locality sensitive hashing (LSH) is a key algorithmic tool that is widely used both in theory and practice. An important goal in the study of LSH is to understand which similarity functions admit an LSH, that is, are LSHable. In this article, we focus on the class of transformations such that given any similarity that is LSHable, the transformed similarity will continue to be LSHable. We show a tight characterization of all such LSH-preserving transformations: they are precisely the probability generating functions, up to scaling. As a concrete application of this result, we study which set similarity measures are LSHable. We obtain a complete characterization of similarity measures between two sets A and B that are ratios of two linear functions of |A B|, |A δB|, |AB|: such a measure is LSHable if and only if its corresponding distance is a metric. This result generalizes the well-known LSH for the Jaccard set similarity, namely, the minwiseindependent permutations, and obtains LSHs for many set similarity measures that are used in practice. Using our main result, we obtain a similar characterization for set similarities involving radicals.

LSH-Preserving functions and their applications / Chierichetti, Flavio; Kumar, Ravi. - In: JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY. - ISSN 0004-5411. - 62:5(2015), pp. 1-25. [10.1145/2816813]

LSH-Preserving functions and their applications

CHIERICHETTI, FLAVIO;
2015

Abstract

Locality sensitive hashing (LSH) is a key algorithmic tool that is widely used both in theory and practice. An important goal in the study of LSH is to understand which similarity functions admit an LSH, that is, are LSHable. In this article, we focus on the class of transformations such that given any similarity that is LSHable, the transformed similarity will continue to be LSHable. We show a tight characterization of all such LSH-preserving transformations: they are precisely the probability generating functions, up to scaling. As a concrete application of this result, we study which set similarity measures are LSHable. We obtain a complete characterization of similarity measures between two sets A and B that are ratios of two linear functions of |A B|, |A δB|, |AB|: such a measure is LSHable if and only if its corresponding distance is a metric. This result generalizes the well-known LSH for the Jaccard set similarity, namely, the minwiseindependent permutations, and obtains LSHs for many set similarity measures that are used in practice. Using our main result, we obtain a similar characterization for set similarities involving radicals.
2015
Locality sensitive hashing; Set similarities; Hardware and Architecture; Software; Artificial Intelligence; Information Systems; Control and Systems Engineering
01 Pubblicazione su rivista::01a Articolo in rivista
LSH-Preserving functions and their applications / Chierichetti, Flavio; Kumar, Ravi. - In: JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY. - ISSN 0004-5411. - 62:5(2015), pp. 1-25. [10.1145/2816813]
File allegati a questo prodotto
File Dimensione Formato  
Chierichetti_LSH_2015.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 333.92 kB
Formato Adobe PDF
333.92 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/877835
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 17
social impact