Recommender systems are an integral part of many web applica- tions. With increasingly larger user bases, scalability has become an important issue. Many of the most scalable algorithms with respect to both space and running times are based on locality-sensitive hashing (LSH). However, a significant drawback is that these meth- ods are only able to handle insertions to user profiles and tend to perform poorly when items may be removed. We initiate the study of scalable locality-sensitive hashing for dynamic input. Specifi- cally, using the Jaccard index as similarity measure, we design (1) a sketching algorithm for similarity estimation via a black box re- duction to ℓ0 norm estimation and (2) a locality sensitive hashing scheme maintainable in fully dynamic data streams that quickly filters out low-similarity pairs. Our algorithms have little to no overhead in terms of running time compared to previous LSH ap- proaches for the insertion only case, and drastically outperform previous algorithms in case of deletions

Sketch 'Em All: Fast Approximate Similarity Search for Dynamic Data Streams / Bury, Marc; Schwiegelshohn, CHRIS RENE; Sorella, Mara. - STAMPA. - (2018), pp. 72-80. (Intervento presentato al convegno 11th ACM International Conference on Web Search and Data Mining, WSDM 2018 tenutosi a Marina Del Rey, CA, USA nel Febuary 5-9, 2018) [10.1145/3159652.3159694].

Sketch 'Em All: Fast Approximate Similarity Search for Dynamic Data Streams

SCHWIEGELSHOHN, CHRIS RENE
;
Sorella, Mara
2018

Abstract

Recommender systems are an integral part of many web applica- tions. With increasingly larger user bases, scalability has become an important issue. Many of the most scalable algorithms with respect to both space and running times are based on locality-sensitive hashing (LSH). However, a significant drawback is that these meth- ods are only able to handle insertions to user profiles and tend to perform poorly when items may be removed. We initiate the study of scalable locality-sensitive hashing for dynamic input. Specifi- cally, using the Jaccard index as similarity measure, we design (1) a sketching algorithm for similarity estimation via a black box re- duction to ℓ0 norm estimation and (2) a locality sensitive hashing scheme maintainable in fully dynamic data streams that quickly filters out low-similarity pairs. Our algorithms have little to no overhead in terms of running time compared to previous LSH ap- proaches for the insertion only case, and drastically outperform previous algorithms in case of deletions
2018
11th ACM International Conference on Web Search and Data Mining, WSDM 2018
locality sensitive hashing, dynamic data streams, Jaccard index
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Sketch 'Em All: Fast Approximate Similarity Search for Dynamic Data Streams / Bury, Marc; Schwiegelshohn, CHRIS RENE; Sorella, Mara. - STAMPA. - (2018), pp. 72-80. (Intervento presentato al convegno 11th ACM International Conference on Web Search and Data Mining, WSDM 2018 tenutosi a Marina Del Rey, CA, USA nel Febuary 5-9, 2018) [10.1145/3159652.3159694].
File allegati a questo prodotto
File Dimensione Formato  
Bury_Sketch_2018.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.37 MB
Formato Adobe PDF
1.37 MB Adobe PDF   Contatta l'autore
WSDM2018_Frontespizio-indice_2018.pdf

solo gestori archivio

Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.65 MB
Formato Adobe PDF
2.65 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1085773
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact