Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation

Raganato, Alessandro; Scherrer, Yves; Tiedemann, Jörg

doi:10.18653/v1/2020.findings-emnlp.49

Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that most attention heads learn simple, and often redundant, positional patterns. In this paper, we propose to replace all but one attention head of each encoder layer with simple fixed – non-learnable – attentive patterns that are solely based on position and do not require any external knowledge. Our experiments with different data sizes and multiple language pairs show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality and even increases BLEU scores by up to 3 points in low-resource scenarios.

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation / Raganato, Alessandro; Scherrer, Yves; Tiedemann, Jörg. - (2020), pp. 556-568. (Intervento presentato al convegno Findings of the Association for Computational Linguistics: EMNLP 2020 tenutosi a Online; Online) [10.18653/v1/2020.findings-emnlp.49].

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation

Raganato, Alessandro;Scherrer, Yves;Tiedemann, Jörg

2020

Abstract

Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that most attention heads learn simple, and often redundant, positional patterns. In this paper, we propose to replace all but one attention head of each encoder layer with simple fixed – non-learnable – attentive patterns that are solely based on position and do not require any external knowledge. Our experiments with different data sizes and multiple language pairs show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality and even increases BLEU scores by up to 3 points in low-resource scenarios.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2020
			
	Nome convegno
	
				Findings of the Association for Computational Linguistics: EMNLP 2020
			
	Parole chiave
	
				transformer; machine translation; attention
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation / Raganato, Alessandro; Scherrer, Yves; Tiedemann, Jörg. - (2020), pp. 556-568. (Intervento presentato al  convegno Findings of the Association for Computational Linguistics: EMNLP 2020 tenutosi a Online; Online) [10.18653/v1/2020.findings-emnlp.49].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1553699

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

37

ND

Catalogo dei prodotti della ricerca