Zero-shot translations is a fascinating feature of Multilingual Neural Machine Translation (MNMT) systems. These MNMT models are usually trained on English-centric data, i.e. English either as the source or target language, and with a language label prepended to the input indicating the target language. However, recent work has highlighted several flaws of these models in zero-shot scenarios where language labels are ignored and the wrong language is generated or different runs show highly unstable results. In this paper, we investigate the benefits of an explicit alignment to language labels in Transformer-based MNMT models in the zero-shot context, by jointly training one cross attention head with word alignment supervision to stress the focus on the target language label. We compare and evaluate several MNMT systems on three multilingual MT benchmarks of different sizes, showing that simply supervising one cross attention head to focus both on word alignments and language labels reduces the bias towards translating into the wrong language, improving the zero-shot performance overall. Moreover, as an additional advantage, we find that our alignment supervision leads to more stable results across different training runs.

An Empirical Investigation of Word Alignment Supervision for Zero-Shot Multilingual Neural Machine Translation / Raganato, Alessandro; Vázquez, Raúl; Creutz, Mathias; Tiedemann, Jörg. - (2021), pp. 8449-8456. ( 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 Punta Cana; Dominican Republic ) [10.18653/v1/2021.emnlp-main.664].

An Empirical Investigation of Word Alignment Supervision for Zero-Shot Multilingual Neural Machine Translation

Alessandro Raganato
;
2021

Abstract

Zero-shot translations is a fascinating feature of Multilingual Neural Machine Translation (MNMT) systems. These MNMT models are usually trained on English-centric data, i.e. English either as the source or target language, and with a language label prepended to the input indicating the target language. However, recent work has highlighted several flaws of these models in zero-shot scenarios where language labels are ignored and the wrong language is generated or different runs show highly unstable results. In this paper, we investigate the benefits of an explicit alignment to language labels in Transformer-based MNMT models in the zero-shot context, by jointly training one cross attention head with word alignment supervision to stress the focus on the target language label. We compare and evaluate several MNMT systems on three multilingual MT benchmarks of different sizes, showing that simply supervising one cross attention head to focus both on word alignments and language labels reduces the bias towards translating into the wrong language, improving the zero-shot performance overall. Moreover, as an additional advantage, we find that our alignment supervision leads to more stable results across different training runs.
2021
2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
machine translation; Natural Language Processing; multilinguality
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
An Empirical Investigation of Word Alignment Supervision for Zero-Shot Multilingual Neural Machine Translation / Raganato, Alessandro; Vázquez, Raúl; Creutz, Mathias; Tiedemann, Jörg. - (2021), pp. 8449-8456. ( 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 Punta Cana; Dominican Republic ) [10.18653/v1/2021.emnlp-main.664].
File allegati a questo prodotto
File Dimensione Formato  
Raganato_An-Empirical_2021.pdf

accesso aperto

Note: https://aclanthology.org/2021.emnlp-main.664.pdf
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 403.52 kB
Formato Adobe PDF
403.52 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1570688
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 5
social impact