Catalogo dei prodotti della ricerca

Idioms are phrases which present a figurative meaning that cannot be (completely) derived by looking at the meaning of their individual components. Identifying and understanding idioms in context is a crucial goal and a key challenge in a wide range of Natural Language Understanding tasks. Although efforts have been undertaken in this direction, the automatic identification and understanding of idioms is still a largely under-investigated area, especially when operating in a multilingual scenario. In this paper, we address such limitations and put forward several new contributions: we propose a novel multilingual Transformer-based system for the identification of idioms; we produce a high-quality automatically-created training dataset in 10 languages, along with a novel manually-curated evaluation benchmark; finally, we carry out a thorough performance analysis and release our evaluation suite at https://github.com/Babelscape/ID10M.

ID10M: Idiom Identification in 10 Languages / Tedeschi, S., Martelli, F., Navigli, R.. - (2022), pp. 2715-2726. (Findings of the Association for Computational Linguistics: NAACL 2022 Seattle, United States ) [10.18653/v1/2022.findings-naacl.208].

ID10M: Idiom Identification in 10 Languages

Tedeschi, Simone;Martelli, Federico;Navigli, Roberto

2022

Abstract

Idioms are phrases which present a figurative meaning that cannot be (completely) derived by looking at the meaning of their individual components. Identifying and understanding idioms in context is a crucial goal and a key challenge in a wide range of Natural Language Understanding tasks. Although efforts have been undertaken in this direction, the automatic identification and understanding of idioms is still a largely under-investigated area, especially when operating in a multilingual scenario. In this paper, we address such limitations and put forward several new contributions: we propose a novel multilingual Transformer-based system for the identification of idioms; we produce a high-quality automatically-created training dataset in 10 languages, along with a novel manually-curated evaluation benchmark; finally, we carry out a thorough performance analysis and release our evaluation suite at https://github.com/Babelscape/ID10M.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2022
			
	Nome convegno
	
				Findings of the Association for Computational Linguistics: NAACL 2022
			
	Parole chiave
	
				Natural Language Processing; Figurative Language; Idiomatic Expressions
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				ID10M: Idiom Identification in 10 Languages / Tedeschi, S., Martelli, F., Navigli, R.. - (2022), pp. 2715-2726. (Findings of the Association for Computational Linguistics: NAACL 2022 Seattle, United States ) [10.18653/v1/2022.findings-naacl.208].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Tedeschi_ID10M_2022.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 790.19 kB Formato Adobe PDF	790.19 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1653020

Citazioni

ND

31

ND

social impact