Idioms are lexically-complex phrases whose meaning cannot be derived by compositionally interpreting their components. Although the automatic identification and understanding of idioms is essential for a wide range of Natural Language Understanding tasks, they are still largely under-investigated. This motivated the organization of the SemEval-2022 Task 2, which is divided into two multilingual subtasks: one about idiomaticity detection, and the other about sentence embeddings. In this work, we focus on the first subtask and propose a Transformer-based dual-encoder architecture to compute the semantic similarity between a potentially-idiomatic expression and its context and, based on this, predict idiomaticity. Then, we show how and to what extent Named Entity Recognition can be exploited to reduce the degree of confusion of idiom identification systems and, therefore, improve performance. Our model achieves 92.1 F1 in the one-shot setting and shows strong robustness towards unseen idioms achieving 77.4 F1 in the zero-shot setting. We release our code at https://github.com/Babelscape/ner4id.

NER4ID at SemEval-2022 Task 2: Named Entity Recognition for Idiomaticity Detection / Tedeschi, Simone; Navigli, Roberto. - (2022), pp. 204-210. (Intervento presentato al convegno 16th International Workshop on Semantic Evaluation, SemEval 2022 tenutosi a Seattle, United States) [10.18653/v1/2022.semeval-1.25].

NER4ID at SemEval-2022 Task 2: Named Entity Recognition for Idiomaticity Detection

Tedeschi, Simone
;
Navigli, Roberto
2022

Abstract

Idioms are lexically-complex phrases whose meaning cannot be derived by compositionally interpreting their components. Although the automatic identification and understanding of idioms is essential for a wide range of Natural Language Understanding tasks, they are still largely under-investigated. This motivated the organization of the SemEval-2022 Task 2, which is divided into two multilingual subtasks: one about idiomaticity detection, and the other about sentence embeddings. In this work, we focus on the first subtask and propose a Transformer-based dual-encoder architecture to compute the semantic similarity between a potentially-idiomatic expression and its context and, based on this, predict idiomaticity. Then, we show how and to what extent Named Entity Recognition can be exploited to reduce the degree of confusion of idiom identification systems and, therefore, improve performance. Our model achieves 92.1 F1 in the one-shot setting and shows strong robustness towards unseen idioms achieving 77.4 F1 in the zero-shot setting. We release our code at https://github.com/Babelscape/ner4id.
2022
16th International Workshop on Semantic Evaluation, SemEval 2022
Natural Language Processing; Figurative Language; Idiomatic Expressions
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
NER4ID at SemEval-2022 Task 2: Named Entity Recognition for Idiomaticity Detection / Tedeschi, Simone; Navigli, Roberto. - (2022), pp. 204-210. (Intervento presentato al convegno 16th International Workshop on Semantic Evaluation, SemEval 2022 tenutosi a Seattle, United States) [10.18653/v1/2022.semeval-1.25].
File allegati a questo prodotto
File Dimensione Formato  
Tedeschi_NER4ID_2022.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 773.09 kB
Formato Adobe PDF
773.09 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1653120
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact