We present Spacerini, a tool that integrates the Pyserini toolkit for reproducible information retrieval research with Hugging Face to enable the seamless construction and deployment of interactive search engines. Spacerini makes state-of-the-art sparse and dense retrieval models more accessible to non-IR practitioners while minimizing deployment effort. This is useful for NLP researchers who want to better understand and validate their research by performing qualitative analyses of training corpora, for IR researchers who want to demonstrate new retrieval models integrated into the growing Pyserini ecosystem, and for third parties reproducing the work of other researchers. Spacerini is open source and includes utilities for loading, preprocessing, indexing, and deploying search engines locally and remotely. We demonstrate a portfolio of 13 search engines created with Spacerini for different use cases.

Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face / Akiki, Christopher; Ogundepo, Odunayo; Piktus, Aleksandra; Zhang, Xinyu; Oladipo, Akintunde; Lin, Jimmy; Potthast, Martin. - (2023), pp. 140-148. (Intervento presentato al convegno Empirical Methods in Natural Language Processing (EMNLP) tenutosi a Singapore) [10.18653/v1/2023.emnlp-demo.12].

Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face

Piktus, Aleksandra
;
2023

Abstract

We present Spacerini, a tool that integrates the Pyserini toolkit for reproducible information retrieval research with Hugging Face to enable the seamless construction and deployment of interactive search engines. Spacerini makes state-of-the-art sparse and dense retrieval models more accessible to non-IR practitioners while minimizing deployment effort. This is useful for NLP researchers who want to better understand and validate their research by performing qualitative analyses of training corpora, for IR researchers who want to demonstrate new retrieval models integrated into the growing Pyserini ecosystem, and for third parties reproducing the work of other researchers. Spacerini is open source and includes utilities for loading, preprocessing, indexing, and deploying search engines locally and remotely. We demonstrate a portfolio of 13 search engines created with Spacerini for different use cases.
2023
Empirical Methods in Natural Language Processing (EMNLP)
natural language processing; information retrieval; large language models; training data inspection tools
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face / Akiki, Christopher; Ogundepo, Odunayo; Piktus, Aleksandra; Zhang, Xinyu; Oladipo, Akintunde; Lin, Jimmy; Potthast, Martin. - (2023), pp. 140-148. (Intervento presentato al convegno Empirical Methods in Natural Language Processing (EMNLP) tenutosi a Singapore) [10.18653/v1/2023.emnlp-demo.12].
File allegati a questo prodotto
File Dimensione Formato  
Akiki_Spacerini_2023.pdf

accesso aperto

Note: https://aclanthology.org/2023.emnlp-demo.12.pdf
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 948.61 kB
Formato Adobe PDF
948.61 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1717588
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact