AAA: Fair evaluation for abuse detection systems wanted

Calabrese, A.; Bevilacqua, M.; Ross, B.; Tripodi, R.; Navigli, R.

doi:10.1145/3447535.3462484

User-generated web content is rife with abusive language that can harm others and discourage participation. Thus, a primary research aim is to develop abuse detection systems that can be used to alert and support human moderators of online communities. Such systems are notoriously hard to develop and evaluate. Even when they appear to achieve satisfactory performance on current evaluation metrics, they may fail in practice on new data. This is partly because datasets commonly used in this field suffer from selection bias, and consequently, existing supervised models overrely on cue words such as group identifiers (e.g., gay and black) which are not inherently abusive. Although there are attempts to mitigate this bias, current evaluation metrics do not adequately quantify their progress. In this work, we introduce Adversarial Attacks against Abuse (AAA), a new evaluation strategy and associated metric that better captures a model's performance on certain classes of hard-to-classify microposts, and for example penalises systems which are biased on low-level lexical features. It does so by adversarially modifying the model developer's training and test data to generate plausible test samples dynamically. We make AAA available as an easy-to-use tool, and show its effectiveness in error analysis by comparing the AAA performance of several state-of-the-art models on multiple datasets. This work will inform the development of detection systems and contribute to the fight against abusive language online.

AAA: Fair evaluation for abuse detection systems wanted / Calabrese, A.; Bevilacqua, M.; Ross, B.; Tripodi, R.; Navigli, R.. - (2021), pp. 243-252. (Intervento presentato al convegno 13th ACM Web Science Conference, WebSci 2021 tenutosi a gbr) [10.1145/3447535.3462484].

AAA: Fair evaluation for abuse detection systems wanted

Calabrese A.^Primo;Bevilacqua M.^Secondo;Ross B.;Tripodi R.;Navigli R.^Ultimo

2021

Abstract

User-generated web content is rife with abusive language that can harm others and discourage participation. Thus, a primary research aim is to develop abuse detection systems that can be used to alert and support human moderators of online communities. Such systems are notoriously hard to develop and evaluate. Even when they appear to achieve satisfactory performance on current evaluation metrics, they may fail in practice on new data. This is partly because datasets commonly used in this field suffer from selection bias, and consequently, existing supervised models overrely on cue words such as group identifiers (e.g., gay and black) which are not inherently abusive. Although there are attempts to mitigate this bias, current evaluation metrics do not adequately quantify their progress. In this work, we introduce Adversarial Attacks against Abuse (AAA), a new evaluation strategy and associated metric that better captures a model's performance on certain classes of hard-to-classify microposts, and for example penalises systems which are biased on low-level lexical features. It does so by adversarially modifying the model developer's training and test data to generate plausible test samples dynamically. We make AAA available as an easy-to-use tool, and show its effectiveness in error analysis by comparing the AAA performance of several state-of-the-art models on multiple datasets. This work will inform the development of detection systems and contribute to the fight against abusive language online.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2021
			
	Nome convegno
	
				13th ACM Web Science Conference, WebSci 2021
			
	Parole chiave
	
				Abuse detection; evaluation; hate speech
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				AAA: Fair evaluation for abuse detection systems wanted / Calabrese, A.; Bevilacqua, M.; Ross, B.; Tripodi, R.; Navigli, R.. - (2021), pp. 243-252. (Intervento presentato al  convegno 13th ACM Web Science Conference, WebSci 2021 tenutosi a gbr) [10.1145/3447535.3462484].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Bevilacqua_Abuse-detection-systems_2021.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.83 MB Formato Adobe PDF	1.83 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1642751

Citazioni

ND

11

6

Catalogo dei prodotti della ricerca