This dissertation addresses the granularity issue in sense inventories by proposing two alternative approaches to word sense disambiguation (WSD). First, a coarse-grained framework is introduced through the Coarse Sense Inventory (CSI), consisting of 45 categories optimized for interpretability and human annotator agreement. CSI demonstrates a competitive balance between performance and expressiveness, outperforming alternative inventories, particularly in few-shot learning scenarios. Second, the thesis explores lexical substitution as a viable alternative to traditional WSD, facilitated by the introduction of two novel resources: ALaSca and GeneSis. ALaSca, the first large-scale dataset for lexical substitution, leverages clustering to account for context-specific word meanings, enabling finetuned language models to surpass unsupervised baselines in candidate ranking tasks. GeneSis, a generative seq2seq approach, further advances lexical substitution by producing contextually appropriate substitutes and achieving state-of-the-art performance on substitute prediction tasks. Despite these advances, challenges remain, including limited evaluation settings and focus on English, which open avenues for future research in multilingual lexical substitution, lexical simplification, and contextual word understanding.

Overcoming sense inventories weaknesses through coarse-grained resources and lexical substitution / Lacerra, Caterina. - (2022 May 20).

Overcoming sense inventories weaknesses through coarse-grained resources and lexical substitution

LACERRA, CATERINA
20/05/2022

Abstract

This dissertation addresses the granularity issue in sense inventories by proposing two alternative approaches to word sense disambiguation (WSD). First, a coarse-grained framework is introduced through the Coarse Sense Inventory (CSI), consisting of 45 categories optimized for interpretability and human annotator agreement. CSI demonstrates a competitive balance between performance and expressiveness, outperforming alternative inventories, particularly in few-shot learning scenarios. Second, the thesis explores lexical substitution as a viable alternative to traditional WSD, facilitated by the introduction of two novel resources: ALaSca and GeneSis. ALaSca, the first large-scale dataset for lexical substitution, leverages clustering to account for context-specific word meanings, enabling finetuned language models to surpass unsupervised baselines in candidate ranking tasks. GeneSis, a generative seq2seq approach, further advances lexical substitution by producing contextually appropriate substitutes and achieving state-of-the-art performance on substitute prediction tasks. Despite these advances, challenges remain, including limited evaluation settings and focus on English, which open avenues for future research in multilingual lexical substitution, lexical simplification, and contextual word understanding.
20-mag-2022
File allegati a questo prodotto
File Dimensione Formato  
Tesi_dottorato_Lacerra.pdf

accesso aperto

Note: Tesi completa
Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 4.88 MB
Formato Adobe PDF
4.88 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1730612
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact