The attention mechanism is a successful technique in modern NLP, especially in tasks like machine translation. The recently proposed network architecture of the Transformer is based entirely on attention mechanisms and achieves new state of the art results in neural machine translation, outperforming other sequence-to-sequence models. However, so far not much is known about the internal properties of the model and the representations it learns to achieve that performance. To study this question, we investigate the information that is learned by the attention mechanism in Transformer models with different translation quality. We assess the representations of the encoder by extracting dependency relations based on self-attention weights, we perform four probing tasks to study the amount of syntactic and semantic captured information and we also test attention in a transfer learning scenario. Our analysis sheds light on the relative strengths and weaknesses of the various encoder representations. We observe that specific attention heads mark syntactic dependency relations and we can also confirm that lower layers tend to learn more about syntax while higher layers tend to encode more semantics.

An Analysis of Encoder Representations in Transformer-Based Machine Translation / Raganato, Alessandro; Tiedemann, Jörg. - (2018), pp. 287-297. (Intervento presentato al convegno The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP tenutosi a Brussels; Belgium) [10.18653/v1/W18-5431].

An Analysis of Encoder Representations in Transformer-Based Machine Translation

Raganato, Alessandro;
2018

Abstract

The attention mechanism is a successful technique in modern NLP, especially in tasks like machine translation. The recently proposed network architecture of the Transformer is based entirely on attention mechanisms and achieves new state of the art results in neural machine translation, outperforming other sequence-to-sequence models. However, so far not much is known about the internal properties of the model and the representations it learns to achieve that performance. To study this question, we investigate the information that is learned by the attention mechanism in Transformer models with different translation quality. We assess the representations of the encoder by extracting dependency relations based on self-attention weights, we perform four probing tasks to study the amount of syntactic and semantic captured information and we also test attention in a transfer learning scenario. Our analysis sheds light on the relative strengths and weaknesses of the various encoder representations. We observe that specific attention heads mark syntactic dependency relations and we can also confirm that lower layers tend to learn more about syntax while higher layers tend to encode more semantics.
2018
The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
attention; transformer; machine translation
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
An Analysis of Encoder Representations in Transformer-Based Machine Translation / Raganato, Alessandro; Tiedemann, Jörg. - (2018), pp. 287-297. (Intervento presentato al convegno The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP tenutosi a Brussels; Belgium) [10.18653/v1/W18-5431].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1553731
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 172
  • ???jsp.display-item.citation.isi??? ND
social impact