Catalogo dei prodotti della ricerca

The attention mechanism is a successful technique in modern NLP, especially in tasks like machine translation. The recently proposed network architecture of the Transformer is based entirely on attention mechanisms and achieves new state of the art results in neural machine translation, outperforming other sequence-to-sequence models. However, so far not much is known about the internal properties of the model and the representations it learns to achieve that performance. To study this question, we investigate the information that is learned by the attention mechanism in Transformer models with different translation quality. We assess the representations of the encoder by extracting dependency relations based on self-attention weights, we perform four probing tasks to study the amount of syntactic and semantic captured information and we also test attention in a transfer learning scenario. Our analysis sheds light on the relative strengths and weaknesses of the various encoder representations. We observe that specific attention heads mark syntactic dependency relations and we can also confirm that lower layers tend to learn more about syntax while higher layers tend to encode more semantics.

An Analysis of Encoder Representations in Transformer-Based Machine Translation / Raganato, Alessandro; Tiedemann, Jörg. - (2018), pp. 287-297. (Intervento presentato al convegno The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP tenutosi a Brussels; Belgium) [10.18653/v1/W18-5431].

An Analysis of Encoder Representations in Transformer-Based Machine Translation

Raganato, Alessandro;Tiedemann, Jörg

2018

Abstract

The attention mechanism is a successful technique in modern NLP, especially in tasks like machine translation. The recently proposed network architecture of the Transformer is based entirely on attention mechanisms and achieves new state of the art results in neural machine translation, outperforming other sequence-to-sequence models. However, so far not much is known about the internal properties of the model and the representations it learns to achieve that performance. To study this question, we investigate the information that is learned by the attention mechanism in Transformer models with different translation quality. We assess the representations of the encoder by extracting dependency relations based on self-attention weights, we perform four probing tasks to study the amount of syntactic and semantic captured information and we also test attention in a transfer learning scenario. Our analysis sheds light on the relative strengths and weaknesses of the various encoder representations. We observe that specific attention heads mark syntactic dependency relations and we can also confirm that lower layers tend to learn more about syntax while higher layers tend to encode more semantics.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2018
			
	Nome convegno
	
				The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
			
	Parole chiave
	
				attention; transformer; machine translation
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				An Analysis of Encoder Representations in Transformer-Based Machine Translation / Raganato, Alessandro; Tiedemann, Jörg. - (2018), pp. 287-297. (Intervento presentato al  convegno The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP tenutosi a Brussels; Belgium) [10.18653/v1/W18-5431].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1553731

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

189

ND

social impact