Recommender systems are an integral part of many web applica- tions. With increasingly larger user bases, scalability has become an important issue. Many of the most scalable algorithms with respect to both space and running times are based on locality-sensitive hashing (LSH). However, a significant drawback is that these meth- ods are only able to handle insertions to user profiles and tend to perform poorly when items may be removed. We initiate the study of scalable locality-sensitive hashing for dynamic input. Specifi- cally, using the Jaccard index as similarity measure, we design (1) a sketching algorithm for similarity estimation via a black box re- duction to ℓ0 norm estimation and (2) a locality sensitive hashing scheme maintainable in fully dynamic data streams that quickly filters out low-similarity pairs. Our algorithms have little to no overhead in terms of running time compared to previous LSH ap- proaches for the insertion only case, and drastically outperform previous algorithms in case of deletions
Sketch 'Em All: Fast Approximate Similarity Search for Dynamic Data Streams / Bury, Marc; Schwiegelshohn, CHRIS RENE; Sorella, Mara. - STAMPA. - (2018), pp. 72-80. (Intervento presentato al convegno 11th ACM International Conference on Web Search and Data Mining, WSDM 2018 tenutosi a Marina Del Rey, CA, USA nel Febuary 5-9, 2018) [10.1145/3159652.3159694].
Sketch 'Em All: Fast Approximate Similarity Search for Dynamic Data Streams
SCHWIEGELSHOHN, CHRIS RENE
;Sorella, Mara
2018
Abstract
Recommender systems are an integral part of many web applica- tions. With increasingly larger user bases, scalability has become an important issue. Many of the most scalable algorithms with respect to both space and running times are based on locality-sensitive hashing (LSH). However, a significant drawback is that these meth- ods are only able to handle insertions to user profiles and tend to perform poorly when items may be removed. We initiate the study of scalable locality-sensitive hashing for dynamic input. Specifi- cally, using the Jaccard index as similarity measure, we design (1) a sketching algorithm for similarity estimation via a black box re- duction to ℓ0 norm estimation and (2) a locality sensitive hashing scheme maintainable in fully dynamic data streams that quickly filters out low-similarity pairs. Our algorithms have little to no overhead in terms of running time compared to previous LSH ap- proaches for the insertion only case, and drastically outperform previous algorithms in case of deletionsFile | Dimensione | Formato | |
---|---|---|---|
Bury_Sketch_2018.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.37 MB
Formato
Adobe PDF
|
1.37 MB | Adobe PDF | Contatta l'autore |
WSDM2018_Frontespizio-indice_2018.pdf
solo gestori archivio
Tipologia:
Altro materiale allegato
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
2.65 MB
Formato
Adobe PDF
|
2.65 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.