According to recent research, geometric deep learning allows to reach unprecedented accuracy for online misinformation detection. By fully leveraging the news social context, URL propagation paths in social networks are first represented as graphs and then classified using Graph Neural Network (GNN) models. Despite these remarkable efforts, researchers are still hampered by the scarcity of high-quality benchmark datasets, and as a result, the efficacy of state-of-the-art approaches could be overestimated. So far, in order to obtain a decent number of third-party fact-checked URLs, researchers have either sampled news from notoriously reliable and unreliable sources using distant supervision, or they have gathered pre-labeled URLs from third-party fact-checking websites. In the former case, resulting datasets can be quite large, but also noisy and biased since pieces of news are labeled as true or false according to their source label, and not individually fact-checked. In the latter case, assigned labels are more reliable, but the included news articles are usually in a single language and they may reflect unknown editorial decisions. As a result, datasets of the latter type are typically small, homogeneous, and thus unrealistically easy for automatic fake news detection models. In this work, we present FbMultiLingMisinfo, a new multilingual benchmark dataset, aimed at a more realistic evaluation of state-of-the-art misinformation detection models. URLs in our dataset come from the Facebook Privacy-Protected Full URLs Data Set, which we augmented with their propagation paths on Twitter. Our experimental results show that, when GNN-based models are tested on FbMultiLingMisinfo, recent misinformation detection results are only partially confirmed. We further show that a sharp reduction in the training size significantly reduces the model accuracy on FbMultiLingMisinfo, but not on two other widely used benchmark datasets for fake news detection.

FbMultiLingMisinfo: Challenging Large-Scale Multilingual Benchmark for Misinformation Detection / Barnabo, G; Siciliano, F; Castillo, C; Leonardi, S; Nakov, P; Martino, Gd; Silvestri, F. - (2022), pp. 1-8. - PROCEEDINGS OF ... INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS. [10.1109/IJCNN55064.2022.9892739].

FbMultiLingMisinfo: Challenging Large-Scale Multilingual Benchmark for Misinformation Detection

Siciliano, F
Primo
Membro del Collaboration Group
;
Leonardi, S
Secondo
Membro del Collaboration Group
;
Silvestri, F
Ultimo
Membro del Collaboration Group
2022

Abstract

According to recent research, geometric deep learning allows to reach unprecedented accuracy for online misinformation detection. By fully leveraging the news social context, URL propagation paths in social networks are first represented as graphs and then classified using Graph Neural Network (GNN) models. Despite these remarkable efforts, researchers are still hampered by the scarcity of high-quality benchmark datasets, and as a result, the efficacy of state-of-the-art approaches could be overestimated. So far, in order to obtain a decent number of third-party fact-checked URLs, researchers have either sampled news from notoriously reliable and unreliable sources using distant supervision, or they have gathered pre-labeled URLs from third-party fact-checking websites. In the former case, resulting datasets can be quite large, but also noisy and biased since pieces of news are labeled as true or false according to their source label, and not individually fact-checked. In the latter case, assigned labels are more reliable, but the included news articles are usually in a single language and they may reflect unknown editorial decisions. As a result, datasets of the latter type are typically small, homogeneous, and thus unrealistically easy for automatic fake news detection models. In this work, we present FbMultiLingMisinfo, a new multilingual benchmark dataset, aimed at a more realistic evaluation of state-of-the-art misinformation detection models. URLs in our dataset come from the Facebook Privacy-Protected Full URLs Data Set, which we augmented with their propagation paths on Twitter. Our experimental results show that, when GNN-based models are tested on FbMultiLingMisinfo, recent misinformation detection results are only partially confirmed. We further show that a sharp reduction in the training size significantly reduces the model accuracy on FbMultiLingMisinfo, but not on two other widely used benchmark datasets for fake news detection.
2022
International Joint Conference on Neural Networks, {IJCNN} 2022, Padua, Italy, July 18-23, 2022
978-1-7281-8671-9
Misinformation; disinformation; fake news; fact-checking; factuality
02 Pubblicazione su volume::02a Capitolo o Articolo
FbMultiLingMisinfo: Challenging Large-Scale Multilingual Benchmark for Misinformation Detection / Barnabo, G; Siciliano, F; Castillo, C; Leonardi, S; Nakov, P; Martino, Gd; Silvestri, F. - (2022), pp. 1-8. - PROCEEDINGS OF ... INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS. [10.1109/IJCNN55064.2022.9892739].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1685502
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 0
social impact