Nibbling at the Hard Core of Word Sense Disambiguation

Maru, Marco; Conia, Simone; Bevilacqua, Michele; Navigli, Roberto

doi:10.18653/v1/2022.acl-long.324

With state-of-the-art systems having finally attained estimated human performance, Word Sense Disambiguation (WSD) has now joined the array of Natural Language Processing tasks that have seemingly been solved, thanks to the vast amounts of knowledge encoded into Transformer-based pre-trained language models. And yet, if we look below the surface of raw figures, it is easy to realize that current approaches still make trivial mistakes that a human would never make. In this work, we provide evidence showing why the F1 score metric should not simply be taken at face value and present an exhaustive analysis of the errors that seven of the most representative state-of-the-art systems for English all-words WSD make on traditional evaluation benchmarks.In addition, we produce and release a collection of test sets featuring (a) an amended version of the standard evaluation benchmark that fixes its lexical and semantic inaccuracies, (b) 42D, a challenge set devised to assess the resilience of systems with respect to least frequent word senses and senses not seen at training time, and (c) hardEN, a challenge set made up solely of instances which none of the investigated state-of-the-art systems can solve. We make all of the test sets and model predictions available to the research community at https://github.com/SapienzaNLP/wsd-hard-benchmark.

Nibbling at the Hard Core of Word Sense Disambiguation / Maru, Marco; Conia, Simone; Bevilacqua, Michele; Navigli, Roberto. - 1:(2022), pp. 4724-4737. (Intervento presentato al convegno Association for Computational Linguistics tenutosi a Dublin; Ireland) [10.18653/v1/2022.acl-long.324].

Nibbling at the Hard Core of Word Sense Disambiguation

Maru, Marco^Primo;Conia, Simone;Bevilacqua, Michele;Navigli, Roberto^Ultimo

2022

Abstract

With state-of-the-art systems having finally attained estimated human performance, Word Sense Disambiguation (WSD) has now joined the array of Natural Language Processing tasks that have seemingly been solved, thanks to the vast amounts of knowledge encoded into Transformer-based pre-trained language models. And yet, if we look below the surface of raw figures, it is easy to realize that current approaches still make trivial mistakes that a human would never make. In this work, we provide evidence showing why the F1 score metric should not simply be taken at face value and present an exhaustive analysis of the errors that seven of the most representative state-of-the-art systems for English all-words WSD make on traditional evaluation benchmarks.In addition, we produce and release a collection of test sets featuring (a) an amended version of the standard evaluation benchmark that fixes its lexical and semantic inaccuracies, (b) 42D, a challenge set devised to assess the resilience of systems with respect to least frequent word senses and senses not seen at training time, and (c) hardEN, a challenge set made up solely of instances which none of the investigated state-of-the-art systems can solve. We make all of the test sets and model predictions available to the research community at https://github.com/SapienzaNLP/wsd-hard-benchmark.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2022
			
	Nome convegno
	
				Association for Computational Linguistics
			
	Parole chiave
	
				word sense disambiguation; semantics; natural language processing; benchmark
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Nibbling at the Hard Core of Word Sense Disambiguation / Maru, Marco; Conia, Simone; Bevilacqua, Michele; Navigli, Roberto. - 1:(2022), pp. 4724-4737. (Intervento presentato al  convegno Association for Computational Linguistics tenutosi a Dublin; Ireland) [10.18653/v1/2022.acl-long.324].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Muru_Nibbling_2022.pdf accesso aperto Note: Link alla pubblicazione: https://aclanthology.org/2022.acl-long.324/ Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 358.46 kB Formato Adobe PDF	358.46 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1639906

Citazioni

ND

24

13

Nome	Dominio	Durata	Descrizione
s_.*	plu.mx	sessione	recupero grafico citazioni sociali da plumx
A_.*	core.ac.uk	7 giorni	recupero pubblicazioni consigliate per il pannello core-recommander
GS_.*	gstatic.com	richiesta http	visualizza grafico citazioni
CC_.*	creativecommons.org	richiesta http	visualizza licenza bitstream

Catalogo dei prodotti della ricerca