Catalogo dei prodotti della ricerca

Word alignment plays a crucial role in several NLP tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid-and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.

XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs / Martelli, Federico; Bejgu, ANDREI STEFAN; Campagnano, Cesare; Čibej, Jaka; Costa, Rute; Gantar, Apolonija; Kallas, Jelena; Koeva, Svetla; Koppel, Kristina; Krek, Simon; Langemets, Margit; Lipp, Veronika; Nimb, Sanni; Olsen, Sussi; Sandford Pedersen, Bolette; Quochi, Valeria; Salgado, Ana; Simon, László; Tiberius, Carole; Ureña-Ruiz, Rafael-J; Navigli, Roberto. - 3596:(2023). (Intervento presentato al convegno Ninth Italian Conference on Computational Linguistics tenutosi a Venice; Italy).

XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs

Federico Martelli^Primo;Andrei Stefan Bejgu^Secondo;Cesare Campagnano;Jaka Čibej;Rute Costa;Apolonija Gantar;Jelena Kallas;Svetla Koeva;Kristina Koppel;Simon Krek;Margit Langemets;Veronika Lipp;Sanni Nimb;Sussi Olsen;Bolette Sandford Pedersen;Valeria Quochi;Ana Salgado;László Simon;Carole Tiberius;Rafael-J Ureña-Ruiz;Roberto Navigli

2023

Abstract

Word alignment plays a crucial role in several NLP tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid-and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Nome convegno
	
				Ninth Italian Conference on Computational Linguistics
			
	Parole chiave
	
				Word alignment; Deep Learning; Natural Language Processing
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs / Martelli, Federico; Bejgu, ANDREI STEFAN; Campagnano, Cesare; Čibej, Jaka; Costa, Rute; Gantar, Apolonija; Kallas, Jelena; Koeva, Svetla; Koppel, Kristina; Krek, Simon; Langemets, Margit; Lipp, Veronika; Nimb, Sanni; Olsen, Sussi; Sandford Pedersen, Bolette; Quochi, Valeria; Salgado, Ana; Simon, László; Tiberius, Carole; Ureña-Ruiz, Rafael-J; Navigli, Roberto. - 3596:(2023). (Intervento presentato al  convegno Ninth Italian Conference on Computational Linguistics tenutosi a Venice; Italy).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Martelli_XL-WA_2023.pdf accesso aperto Note: https://ceur-ws.org/Vol-3596/paper32.pdf Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 327.45 kB Formato Adobe PDF	327.45 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1694184

Citazioni

ND

0

ND

social impact