With the advent of contextualized embeddings, attention towards neural ranking approaches for Information Retrieval increased considerably. However, two aspects have remained largely neglected: i) queries usually consist of few keywords only, which increases ambiguity and makes their contextualization harder, and ii) performing neural ranking on non-English documents is still cumbersome due to shortage of labeled datasets. In this paper we present SIR (Sense-enhanced Information Retrieval) to mitigate both problems by leveraging word sense information. At the core of our approach lies a novel multilingual query expansion mechanism based on Word Sense Disambiguation that provides sense definitions as additional semantic information for the query. Importantly, we use senses as a bridge across languages, thus allowing our model to perform considerably better than its supervised and unsupervised alternatives across French, German, Italian and Spanish languages on several CLEF benchmarks, while being trained on English Robust04 data only. We release SIR at https://github.com/SapienzaNLP/sir.

IR like a SIR: Sense-enhanced Information Retrieval for Multiple Languages / Blloshmi, Rexhina; Pasini, Tommaso; Campolungo, Niccolò; Banerjee, Somnath; Navigli, Roberto; Pasi, Gabriella. - (2021), pp. 1030-1041. (Intervento presentato al convegno 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 tenutosi a Punta Cana, Dominican Republic) [10.18653/v1/2021.emnlp-main.79].

IR like a SIR: Sense-enhanced Information Retrieval for Multiple Languages

Blloshmi, Rexhina
Primo
;
Pasini, Tommaso
Secondo
;
Campolungo, Niccolò;Navigli, Roberto
Penultimo
;
Pasi, Gabriella
Ultimo
2021

Abstract

With the advent of contextualized embeddings, attention towards neural ranking approaches for Information Retrieval increased considerably. However, two aspects have remained largely neglected: i) queries usually consist of few keywords only, which increases ambiguity and makes their contextualization harder, and ii) performing neural ranking on non-English documents is still cumbersome due to shortage of labeled datasets. In this paper we present SIR (Sense-enhanced Information Retrieval) to mitigate both problems by leveraging word sense information. At the core of our approach lies a novel multilingual query expansion mechanism based on Word Sense Disambiguation that provides sense definitions as additional semantic information for the query. Importantly, we use senses as a bridge across languages, thus allowing our model to perform considerably better than its supervised and unsupervised alternatives across French, German, Italian and Spanish languages on several CLEF benchmarks, while being trained on English Robust04 data only. We release SIR at https://github.com/SapienzaNLP/sir.
2021
2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
nlp; natural language processing; information retrieval; ir; word sense disambiguation; wsd; multilinguality; sense-enhanced;
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
IR like a SIR: Sense-enhanced Information Retrieval for Multiple Languages / Blloshmi, Rexhina; Pasini, Tommaso; Campolungo, Niccolò; Banerjee, Somnath; Navigli, Roberto; Pasi, Gabriella. - (2021), pp. 1030-1041. (Intervento presentato al convegno 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 tenutosi a Punta Cana, Dominican Republic) [10.18653/v1/2021.emnlp-main.79].
File allegati a questo prodotto
File Dimensione Formato  
Blloshmi_IR-like_2021.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 677.38 kB
Formato Adobe PDF
677.38 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1604154
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 4
social impact