Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!

Perrella, Stefano; Proietti, Lorenzo; Scire', Alessandro; Barba, Edoardo; Navigli, Roberto

doi:10.18653/v1/2024.acl-long.856

Annually, at the Conference of Machine Translation (WMT), the Metrics Shared Task organizers conduct the Machine Translation (MT) meta-evaluation, assessing MT metrics' capabilities according to their correlation with human judgments. Their results guide researchers toward enhancing the next generation of metrics and MT systems. With the recent introduction of neural metrics, the field has witnessed notable advancements. Nevertheless, the inherent opacity of these metrics has posed substantial challenges to the meta-evaluation process. This work highlights two issues with the meta-evaluation framework currently employed in WMT, and assesses their impact on the metrics rankings. To do this, we introduce the concept of sentinel metrics, which are designed explicitly to scrutinize the meta-evaluation process's accuracy, robustness, and fairness. By employing sentinel metrics, we aim to validate our findings, and shed light on and monitor the potential biases or inconsistencies in the rankings. We discover that the present meta-evaluation framework favors two categories of metrics: i) those explicitly trained to mimic human quality assessments, and ii) continuous metrics. Finally, we raise concerns regarding the evaluation capabilities of state-of-the-art metrics, emphasizing that they might be basing their assessments on spurious correlations found in their training data.

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In! / Perrella, Stefano; Proietti, Lorenzo; Scire', Alessandro; Barba, Edoardo; Navigli, Roberto. - 1:(2024), pp. 16216-16244. (Intervento presentato al convegno Association for Computational Linguistics tenutosi a Bangkok; Thailand) [10.18653/v1/2024.acl-long.856].

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!

Stefano Perrella^Co-primo;Lorenzo Proietti^Co-primo;Alessandro Scire'^Secondo;Edoardo Barba^Penultimo;Roberto Navigli^Ultimo

2024

Abstract

Annually, at the Conference of Machine Translation (WMT), the Metrics Shared Task organizers conduct the Machine Translation (MT) meta-evaluation, assessing MT metrics' capabilities according to their correlation with human judgments. Their results guide researchers toward enhancing the next generation of metrics and MT systems. With the recent introduction of neural metrics, the field has witnessed notable advancements. Nevertheless, the inherent opacity of these metrics has posed substantial challenges to the meta-evaluation process. This work highlights two issues with the meta-evaluation framework currently employed in WMT, and assesses their impact on the metrics rankings. To do this, we introduce the concept of sentinel metrics, which are designed explicitly to scrutinize the meta-evaluation process's accuracy, robustness, and fairness. By employing sentinel metrics, we aim to validate our findings, and shed light on and monitor the potential biases or inconsistencies in the rankings. We discover that the present meta-evaluation framework favors two categories of metrics: i) those explicitly trained to mimic human quality assessments, and ii) continuous metrics. Finally, we raise concerns regarding the evaluation capabilities of state-of-the-art metrics, emphasizing that they might be basing their assessments on spurious correlations found in their training data.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2024
			
	Nome convegno
	
				Association for Computational Linguistics
			
	Parole chiave
	
				machine translation; machine translation evaluation; mt metrics; meta-evaluation
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In! / Perrella, Stefano; Proietti, Lorenzo; Scire', Alessandro; Barba, Edoardo; Navigli, Roberto. - 1:(2024), pp. 16216-16244. (Intervento presentato al  convegno Association for Computational Linguistics tenutosi a Bangkok; Thailand) [10.18653/v1/2024.acl-long.856].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Perrella_Guardians_2024.pdf accesso aperto Note: PDF: https://aclanthology.org/2024.acl-long.856.pdf Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 2.58 MB Formato Adobe PDF	2.58 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1720213

Citazioni

ND

0

ND

Catalogo dei prodotti della ricerca