Discrimination Bias Detection through Categorical Association in Pre-trained Language Models

Dusi, M.; Arici, N.; Gerevini, A. E.; Putelli, L.; Serina, I.

doi:10.1109/ACCESS.2024.3482010

The analysis of the presence of bias, prejudices and unwanted discriminatory behavior in pre-trained neural language models (NLMs), considering the sensitivity of the topic and its public interest, should respect two main criteria: the intuition and the statistical rigor. To the state of the art, there are two main categories of approaches for analyzing bias: those based on the models’ textual output, and those based on the geometric space of the embedded representations calculated by the NLMs. While the first one is intuitive, this kind of analysis is often conducted on simple template sentences, which limit the overall validity of their conclusions in a real-world context. On the contrary, geometric methods are more rigorous but quite more complex to implement and understand for those who are non-experts in Natural Language Processing (NLP). In this paper, we propose a unique method for analyzing bias in pre-trained language models that combines these two aspects. Through a simple classification task, we verify whether the information contained in the embedded representation of words that describes a protected property (such as the religion) can be used to identify a stereotyped property (such as the criminal behavior), requiring only a minimal supervised dataset. We experimentally verify our approach, finding that four widespread Transformer-based models are affected by prejudices of gender, nationality, and religion.

Discrimination Bias Detection through Categorical Association in Pre-trained Language Models / Dusi, M.; Arici, N.; Gerevini, A. E.; Putelli, L.; Serina, I.. - In: IEEE ACCESS. - ISSN 2169-3536. - 12:(2024), pp. 162651-162667. [10.1109/ACCESS.2024.3482010]

Discrimination Bias Detection through Categorical Association in Pre-trained Language Models

Dusi M.^Primo;Arici N.^Secondo;Gerevini A. E.;Putelli L.^Penultimo;Serina I.^Ultimo

2024

Abstract

The analysis of the presence of bias, prejudices and unwanted discriminatory behavior in pre-trained neural language models (NLMs), considering the sensitivity of the topic and its public interest, should respect two main criteria: the intuition and the statistical rigor. To the state of the art, there are two main categories of approaches for analyzing bias: those based on the models’ textual output, and those based on the geometric space of the embedded representations calculated by the NLMs. While the first one is intuitive, this kind of analysis is often conducted on simple template sentences, which limit the overall validity of their conclusions in a real-world context. On the contrary, geometric methods are more rigorous but quite more complex to implement and understand for those who are non-experts in Natural Language Processing (NLP). In this paper, we propose a unique method for analyzing bias in pre-trained language models that combines these two aspects. Through a simple classification task, we verify whether the information contained in the embedded representation of words that describes a protected property (such as the religion) can be used to identify a stereotyped property (such as the criminal behavior), requiring only a minimal supervised dataset. We experimentally verify our approach, finding that four widespread Transformer-based models are affected by prejudices of gender, nationality, and religion.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2024
			
	Parole chiave
	
				AI Fairness; Bias Detection; Contextual Word Embedding; Ethics of AI; Language Models; Natural Language Processing
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Discrimination Bias Detection through Categorical Association in Pre-trained Language Models / Dusi, M.; Arici, N.; Gerevini, A. E.; Putelli, L.; Serina, I.. - In: IEEE ACCESS. - ISSN 2169-3536. - 12:(2024), pp. 162651-162667. [10.1109/ACCESS.2024.3482010]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Dusi_Discrimination_2024.pdf accesso aperto Note: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10719988 Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 3.92 MB Formato Adobe PDF	3.92 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1725661

Citazioni

ND

0

0

Catalogo dei prodotti della ricerca