The knowledge acquisition bottleneck strongly affects the creation of multilingual sense-annotated data, hence limiting the power of supervised systems when applied to multilingual Word Sense Disambiguation. In this paper, we propose a semi-supervised approach based upon a novel label propagation scheme, which, by jointly leveraging contextualized word embeddings and the multilingual information enclosed in a knowledge base, projects sense labels from a high-resource language, i.e., English, to lower-resourced ones. Backed by several experiments, we provide empirical evidence that our automatically created datasets are of a higher quality than those generated by other competitors and lead a supervised model to achieve state-of-the-art performances in all multilingual Word Sense Disambiguation tasks. We make our datasets available for research purposes at https://github.com/SapienzaNLP/mulan.
MuLaN: Multilingual Label propagatioN for Word Sense Disambiguation / Barba, Edoardo; Procopio, Luigi; Campolungo, Niccolò; Pasini, Tommaso; Navigli, Roberto. - (2020), pp. 3837-3844. (Intervento presentato al convegno International Joint Conference on Artificial Intelligence tenutosi a Kyoto, Japan) [10.24963/ijcai.2020/531].
MuLaN: Multilingual Label propagatioN for Word Sense Disambiguation
Barba, Edoardo;Procopio, Luigi;Campolungo, Niccolò;Pasini, Tommaso;Navigli, Roberto
2020
Abstract
The knowledge acquisition bottleneck strongly affects the creation of multilingual sense-annotated data, hence limiting the power of supervised systems when applied to multilingual Word Sense Disambiguation. In this paper, we propose a semi-supervised approach based upon a novel label propagation scheme, which, by jointly leveraging contextualized word embeddings and the multilingual information enclosed in a knowledge base, projects sense labels from a high-resource language, i.e., English, to lower-resourced ones. Backed by several experiments, we provide empirical evidence that our automatically created datasets are of a higher quality than those generated by other competitors and lead a supervised model to achieve state-of-the-art performances in all multilingual Word Sense Disambiguation tasks. We make our datasets available for research purposes at https://github.com/SapienzaNLP/mulan.File | Dimensione | Formato | |
---|---|---|---|
Navigli_MuLan_2020.pdf
accesso aperto
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
240.32 kB
Formato
Adobe PDF
|
240.32 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.