The lexical substitution task aims at finding suitable replacements for words in context. It has proved to be useful in several areas, such as word sense induction and text simplification, as well as in more practical applications such as writing-assistant tools. However, the paucity of annotated data has forced researchers to apply mainly unsupervised approaches, limiting the applicability of large pre-trained models and thus hampering the potential benefits of supervised approaches to the task. In this paper, we mitigate this issue by proposing ALaSca, a novel approach to automatically creating large-scale datasets for English lexical substitution. ALaSca allows examples to be produced for potentially any word in a language vocabulary and to cover most of the meanings it lists. Thanks to this, we can unleash the full potential of neural architectures and finetune them on the lexical substitution task. Indeed, when using our data, a transformer-based model performs substantially better than when using manually annotated data only. We release ALaSca at https://sapienzanlp.github.io/alasca/.

ALaSca: an Automated approach for Large-Scale Lexical Substitution / Lacerra, Caterina; Pasini, Tommaso; Tripodi, Rocco; Navigli, Roberto. - In: IJCAI. - ISSN 1045-0823. - (2021), pp. 3836-3842. (Intervento presentato al convegno 30th International Joint Conference on Artificial Intelligence, IJCAI 2021 tenutosi a Online) [10.24963/ijcai.2021/528].

ALaSca: an Automated approach for Large-Scale Lexical Substitution

Lacerra, Caterina
;
Pasini, Tommaso;Tripodi, Rocco;Navigli, Roberto
2021

Abstract

The lexical substitution task aims at finding suitable replacements for words in context. It has proved to be useful in several areas, such as word sense induction and text simplification, as well as in more practical applications such as writing-assistant tools. However, the paucity of annotated data has forced researchers to apply mainly unsupervised approaches, limiting the applicability of large pre-trained models and thus hampering the potential benefits of supervised approaches to the task. In this paper, we mitigate this issue by proposing ALaSca, a novel approach to automatically creating large-scale datasets for English lexical substitution. ALaSca allows examples to be produced for potentially any word in a language vocabulary and to cover most of the meanings it lists. Thanks to this, we can unleash the full potential of neural architectures and finetune them on the lexical substitution task. Indeed, when using our data, a transformer-based model performs substantially better than when using manually annotated data only. We release ALaSca at https://sapienzanlp.github.io/alasca/.
2021
30th International Joint Conference on Artificial Intelligence, IJCAI 2021
natural language processing; lexical semantics; lexical substitution
04 Pubblicazione in atti di convegno::04c Atto di convegno in rivista
ALaSca: an Automated approach for Large-Scale Lexical Substitution / Lacerra, Caterina; Pasini, Tommaso; Tripodi, Rocco; Navigli, Roberto. - In: IJCAI. - ISSN 1045-0823. - (2021), pp. 3836-3842. (Intervento presentato al convegno 30th International Joint Conference on Artificial Intelligence, IJCAI 2021 tenutosi a Online) [10.24963/ijcai.2021/528].
File allegati a questo prodotto
File Dimensione Formato  
Lacerra_ALaSca_2021.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 409.9 kB
Formato Adobe PDF
409.9 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1585582
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
social impact