The knowledge acquisition bottleneck strongly affects the creation of multilingual sense-annotated data, hence limiting the power of supervised systems when applied to multilingual Word Sense Disambiguation. In this paper, we propose a semi-supervised approach based upon a novel label propagation scheme, which, by jointly leveraging contextualized word embeddings and the multilingual information enclosed in a knowledge base, projects sense labels from a high-resource language, i.e., English, to lower-resourced ones. Backed by several experiments, we provide empirical evidence that our automatically created datasets are of a higher quality than those generated by other competitors and lead a supervised model to achieve state-of-the-art performances in all multilingual Word Sense Disambiguation tasks. We make our datasets available for research purposes at https://github.com/SapienzaNLP/mulan.

MuLaN: Multilingual Label propagatioN for Word Sense Disambiguation / Barba, Edoardo; Procopio, Luigi; Campolungo, Niccolò; Pasini, Tommaso; Navigli, Roberto. - (2020), pp. 3837-3844. (Intervento presentato al convegno International Joint Conference on Artificial Intelligence tenutosi a Kyoto, Japan) [10.24963/ijcai.2020/531].

MuLaN: Multilingual Label propagatioN for Word Sense Disambiguation

Barba, Edoardo;Procopio, Luigi;Campolungo, Niccolò;Pasini, Tommaso;Navigli, Roberto
2020

Abstract

The knowledge acquisition bottleneck strongly affects the creation of multilingual sense-annotated data, hence limiting the power of supervised systems when applied to multilingual Word Sense Disambiguation. In this paper, we propose a semi-supervised approach based upon a novel label propagation scheme, which, by jointly leveraging contextualized word embeddings and the multilingual information enclosed in a knowledge base, projects sense labels from a high-resource language, i.e., English, to lower-resourced ones. Backed by several experiments, we provide empirical evidence that our automatically created datasets are of a higher quality than those generated by other competitors and lead a supervised model to achieve state-of-the-art performances in all multilingual Word Sense Disambiguation tasks. We make our datasets available for research purposes at https://github.com/SapienzaNLP/mulan.
2020
International Joint Conference on Artificial Intelligence
natural language processing; natural language semantics; resources and evaluation
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
MuLaN: Multilingual Label propagatioN for Word Sense Disambiguation / Barba, Edoardo; Procopio, Luigi; Campolungo, Niccolò; Pasini, Tommaso; Navigli, Roberto. - (2020), pp. 3837-3844. (Intervento presentato al convegno International Joint Conference on Artificial Intelligence tenutosi a Kyoto, Japan) [10.24963/ijcai.2020/531].
File allegati a questo prodotto
File Dimensione Formato  
Navigli_MuLan_2020.pdf

accesso aperto

Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 240.32 kB
Formato Adobe PDF
240.32 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1431920
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 34
  • ???jsp.display-item.citation.isi??? 17
social impact