Word alignment plays a crucial role in several NLP tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid-and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.

XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs / Martelli, F., Bejgu, A.S., Campagnano, C., Čibej, J., Costa, R., Gantar, A., Kallas, J., Koeva, S., Koppel, K., Krek, S., Langemets, M., Lipp, V., Nimb, S., Olsen, S., Sandford Pedersen, B., Quochi, V., Salgado, A., Simon, L., Tiberius, C., Ureña-Ruiz, R., et al.. - 3596:(2023). (Ninth Italian Conference on Computational Linguistics Venice; Italy ).

XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs

Federico Martelli
Primo
;
Andrei Stefan Bejgu
Secondo
;
Cesare Campagnano;Simon Krek;Roberto Navigli
2023

Abstract

Word alignment plays a crucial role in several NLP tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid-and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.
2023
Ninth Italian Conference on Computational Linguistics
Word alignment; Deep Learning; Natural Language Processing
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs / Martelli, F., Bejgu, A.S., Campagnano, C., Čibej, J., Costa, R., Gantar, A., Kallas, J., Koeva, S., Koppel, K., Krek, S., Langemets, M., Lipp, V., Nimb, S., Olsen, S., Sandford Pedersen, B., Quochi, V., Salgado, A., Simon, L., Tiberius, C., Ureña-Ruiz, R., et al.. - 3596:(2023). (Ninth Italian Conference on Computational Linguistics Venice; Italy ).
File allegati a questo prodotto
File Dimensione Formato  
Martelli_XL-WA_2023.pdf

accesso aperto

Note: https://ceur-ws.org/Vol-3596/paper32.pdf
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 327.45 kB
Formato Adobe PDF
327.45 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1694184
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact