In this paper, we explore a multilingual translation model with a cross-lingually shared layer that can be used as fixed-size sentence representation in different downstream tasks. We systematically study the impact of the size of the shared layer and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that the performance in translation does correlate with trainable downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. On the other hand, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. We hypothesize that the training procedure on the downstream task enables the model to identify the encoded information that is useful for the specific task whereas non-trainable benchmarks can be confused by other types of information also encoded in the representation of a sentence.

An Evaluation of Language-Agnostic Inner-Attention-Based Representations in Machine Translation / Raganato, Alessandro; Vázquez, Raúl; Creutz, Mathias; Tiedemann, Jörg. - (2019), pp. 27-32. (Intervento presentato al convegno The 4th Workshop on Representation Learning for NLP (RepL4NLP-2019) tenutosi a Florence; Italy) [10.18653/v1/W19-4304].

An Evaluation of Language-Agnostic Inner-Attention-Based Representations in Machine Translation

Raganato, Alessandro;
2019

Abstract

In this paper, we explore a multilingual translation model with a cross-lingually shared layer that can be used as fixed-size sentence representation in different downstream tasks. We systematically study the impact of the size of the shared layer and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that the performance in translation does correlate with trainable downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. On the other hand, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. We hypothesize that the training procedure on the downstream task enables the model to identify the encoded information that is useful for the specific task whereas non-trainable benchmarks can be confused by other types of information also encoded in the representation of a sentence.
2019
The 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
machine translation; sentence representation; inner attention
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
An Evaluation of Language-Agnostic Inner-Attention-Based Representations in Machine Translation / Raganato, Alessandro; Vázquez, Raúl; Creutz, Mathias; Tiedemann, Jörg. - (2019), pp. 27-32. (Intervento presentato al convegno The 4th Workshop on Representation Learning for NLP (RepL4NLP-2019) tenutosi a Florence; Italy) [10.18653/v1/W19-4304].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1553737
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 4
social impact