SemEval-2022 Task 8: Multilingual news article similarity

Chen, Xi; Zeynali, Ali; Camargo, Chico; Flöck, Fabian; Gaffney, Devin; Grabowicz, Przemyslaw; Hale, Scott; Jurgens, David; Samory, Mattia

doi:10.18653/v1/2022.semeval-1.155

Thousands of new news articles appear daily in outlets in different languages. Understanding which articles refer to the same story can not only improve applications like news aggregation but enable cross-linguistic analysis of media consumption and attention. However, assessing the similarity of stories in news articles is challenging due to the different dimensions in which a story might vary, e.g., two articles may have substantial textual overlap but describe similar events that happened years apart. To address this challenge, we introduce a new dataset of nearly 10,000 news article pairs spanning 18 language combinations annotated for seven dimensions of similarity as SemEval 2022 Task 8. Here, we present an overview of the task, the best performing submissions, and the frontiers and challenges for measuring multilingual news article similarity. While the participants of this SemEval task contributed very strong models, achieving up to 0.818 correlation with gold standard labels across languages, human annotators are capable of reaching higher correlations, suggesting space for further progress.

SemEval-2022 Task 8: Multilingual news article similarity / Chen, Xi; Zeynali, Ali; Camargo, Chico; Flöck, Fabian; Gaffney, Devin; Grabowicz, Przemyslaw; Hale, Scott; Jurgens, David; Samory, Mattia. - (2022), pp. 1094-1106. (Intervento presentato al convegno International Workshop on Semantic Evaluation, SemEval 2022 tenutosi a Seattle, United States) [10.18653/v1/2022.semeval-1.155].

SemEval-2022 Task 8: Multilingual news article similarity

Chen, Xi;Zeynali, Ali;Camargo, Chico;Flöck, Fabian;Gaffney, Devin;Grabowicz, Przemyslaw;Hale, Scott;Jurgens, David;Samory, Mattia

2022

Abstract

Thousands of new news articles appear daily in outlets in different languages. Understanding which articles refer to the same story can not only improve applications like news aggregation but enable cross-linguistic analysis of media consumption and attention. However, assessing the similarity of stories in news articles is challenging due to the different dimensions in which a story might vary, e.g., two articles may have substantial textual overlap but describe similar events that happened years apart. To address this challenge, we introduce a new dataset of nearly 10,000 news article pairs spanning 18 language combinations annotated for seven dimensions of similarity as SemEval 2022 Task 8. Here, we present an overview of the task, the best performing submissions, and the frontiers and challenges for measuring multilingual news article similarity. While the participants of this SemEval task contributed very strong models, achieving up to 0.818 correlation with gold standard labels across languages, human annotators are capable of reaching higher correlations, suggesting space for further progress.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2022
			
	Nome convegno
	
				International Workshop on Semantic Evaluation, SemEval 2022
			
	Parole chiave
	
				multilingual; news similarity; agenda setting
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				SemEval-2022 Task 8: Multilingual news article similarity / Chen, Xi; Zeynali, Ali; Camargo, Chico; Flöck, Fabian; Gaffney, Devin; Grabowicz, Przemyslaw; Hale, Scott; Jurgens, David; Samory, Mattia. - (2022), pp. 1094-1106. (Intervento presentato al  convegno International Workshop on Semantic Evaluation, SemEval 2022 tenutosi a Seattle, United States) [10.18653/v1/2022.semeval-1.155].

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1655744

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

34

ND

Catalogo dei prodotti della ricerca