Translating text that contains entity names is a challenging task, as cultural-related references can vary significantly across languages. These variations may also be caused by transcreation, an adaptation process that entails more than transliteration and word-for-word translation. In this paper, we address the problem of cross-cultural translation on two fronts: (i) we introduce XC-Translate, the first large-scale, manually-created benchmark for machine translation that focuses on text that contains potentially culturally-nuanced entity names, and (ii) we propose KG-MT, a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model by leveraging a dense retrieval mechanism. Our experiments and analyses show that current machine translation systems and large language models still struggle to translate texts containing entity names, whereas KG-MT outperforms state-of-the-art approaches by a large margin, obtaining a 129% and 62% relative improvement compared to NLLB-200 and GPT-4, respectively.

Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs / Conia, Simone; Lee, Daniel; Li, Min; Minhas, Umar Farooq; Potdar, Saloni; Li, Yunyao. - (2024), pp. 16343-16360. ( Empirical Methods in Natural Language Processing Miami; United States of America ) [10.18653/v1/2024.emnlp-main.914].

Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs

Conia, Simone
Primo
;
2024

Abstract

Translating text that contains entity names is a challenging task, as cultural-related references can vary significantly across languages. These variations may also be caused by transcreation, an adaptation process that entails more than transliteration and word-for-word translation. In this paper, we address the problem of cross-cultural translation on two fronts: (i) we introduce XC-Translate, the first large-scale, manually-created benchmark for machine translation that focuses on text that contains potentially culturally-nuanced entity names, and (ii) we propose KG-MT, a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model by leveraging a dense retrieval mechanism. Our experiments and analyses show that current machine translation systems and large language models still struggle to translate texts containing entity names, whereas KG-MT outperforms state-of-the-art approaches by a large margin, obtaining a 129% and 62% relative improvement compared to NLLB-200 and GPT-4, respectively.
2024
Empirical Methods in Natural Language Processing
machine translation; natural language processing; knowledge graphs; large language models; multilingual NLP
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs / Conia, Simone; Lee, Daniel; Li, Min; Minhas, Umar Farooq; Potdar, Saloni; Li, Yunyao. - (2024), pp. 16343-16360. ( Empirical Methods in Natural Language Processing Miami; United States of America ) [10.18653/v1/2024.emnlp-main.914].
File allegati a questo prodotto
File Dimensione Formato  
Conia_Toward-cross-cultural_2024.pdf

accesso aperto

Note: https://aclanthology.org/2024.emnlp-main.914.pdf
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.87 MB
Formato Adobe PDF
1.87 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1727952
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact