Catalogo dei prodotti della ricerca

Idioms are lexically-complex phrases whose meaning cannot be derived by compositionally interpreting their components. Although the automatic identification and understanding of idioms is essential for a wide range of Natural Language Understanding tasks, they are still largely under-investigated. This motivated the organization of the SemEval-2022 Task 2, which is divided into two multilingual subtasks: one about idiomaticity detection, and the other about sentence embeddings. In this work, we focus on the first subtask and propose a Transformer-based dual-encoder architecture to compute the semantic similarity between a potentially-idiomatic expression and its context and, based on this, predict idiomaticity. Then, we show how and to what extent Named Entity Recognition can be exploited to reduce the degree of confusion of idiom identification systems and, therefore, improve performance. Our model achieves 92.1 F1 in the one-shot setting and shows strong robustness towards unseen idioms achieving 77.4 F1 in the zero-shot setting. We release our code at https://github.com/Babelscape/ner4id.

NER4ID at SemEval-2022 Task 2: Named Entity Recognition for Idiomaticity Detection / Tedeschi, S., Navigli, R.. - (2022), pp. 204-210. (16th International Workshop on Semantic Evaluation, SemEval 2022 Seattle, United States ) [10.18653/v1/2022.semeval-1.25].

NER4ID at SemEval-2022 Task 2: Named Entity Recognition for Idiomaticity Detection

Tedeschi, Simone;Navigli, Roberto

2022

Abstract

Idioms are lexically-complex phrases whose meaning cannot be derived by compositionally interpreting their components. Although the automatic identification and understanding of idioms is essential for a wide range of Natural Language Understanding tasks, they are still largely under-investigated. This motivated the organization of the SemEval-2022 Task 2, which is divided into two multilingual subtasks: one about idiomaticity detection, and the other about sentence embeddings. In this work, we focus on the first subtask and propose a Transformer-based dual-encoder architecture to compute the semantic similarity between a potentially-idiomatic expression and its context and, based on this, predict idiomaticity. Then, we show how and to what extent Named Entity Recognition can be exploited to reduce the degree of confusion of idiom identification systems and, therefore, improve performance. Our model achieves 92.1 F1 in the one-shot setting and shows strong robustness towards unseen idioms achieving 77.4 F1 in the zero-shot setting. We release our code at https://github.com/Babelscape/ner4id.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2022
			
	Nome convegno
	
				16th International Workshop on Semantic Evaluation, SemEval 2022
			
	Parole chiave
	
				Natural Language Processing; Figurative Language; Idiomatic Expressions
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				NER4ID at SemEval-2022 Task 2: Named Entity Recognition for Idiomaticity Detection / Tedeschi, S., Navigli, R.. - (2022), pp. 204-210. (16th International Workshop on Semantic Evaluation, SemEval 2022 Seattle, United States ) [10.18653/v1/2022.semeval-1.25].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Tedeschi_NER4ID_2022.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 773.09 kB Formato Adobe PDF	773.09 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1653120

Citazioni

ND

4

0

social impact