Transformer-based algorithms such as BERT are typically trained with large corpora of documents, extracted directly from the Internet. As reported by several studies, these data can contain biases, stereotypes and other properties which are transferred also to the machine learning models, potentially leading them to a discriminatory behaviour which should be identified and corrected. A very intuitive technique for bias identification in NLP models is the visualization of word embeddings. Exploiting the concept of that a short distance between two word vectors means a semantic similarity between these two words; for instance, a closeness between the terms nurse and woman could be an indicator of gender bias in the model. These techniques however were designed for static word embedding algorithms such as Word2Vec. Instead, BERT does not guarantee the same relation between semantic similarity and short distance, making the visualization techniques more difficult to apply. In this work, we propose a weakly supervised approach, which only requires a list of gendered words that can be easily found in online lexical resources, for visualizing the gender bias present in the English base model of BERT. Our approach is based on a Linear Support Vector Classifier and Principal Component Analysis (PCA) and obtains better results with respect to standard PCA.

Graphical Identification of Gender Bias in BERT with a Weakly Supervised Approach / Dusi, M.; Arici, N.; Gerevini, A. E.; Putelli, L.; Serina, I.. - 3287:(2022), pp. 164-176. (Intervento presentato al convegno 6th Workshop on Natural Language for Artificial Intelligence, NL4AI 2022 tenutosi a Udine; Italy).

Graphical Identification of Gender Bias in BERT with a Weakly Supervised Approach

Dusi M.
;
Gerevini A. E.
;
2022

Abstract

Transformer-based algorithms such as BERT are typically trained with large corpora of documents, extracted directly from the Internet. As reported by several studies, these data can contain biases, stereotypes and other properties which are transferred also to the machine learning models, potentially leading them to a discriminatory behaviour which should be identified and corrected. A very intuitive technique for bias identification in NLP models is the visualization of word embeddings. Exploiting the concept of that a short distance between two word vectors means a semantic similarity between these two words; for instance, a closeness between the terms nurse and woman could be an indicator of gender bias in the model. These techniques however were designed for static word embedding algorithms such as Word2Vec. Instead, BERT does not guarantee the same relation between semantic similarity and short distance, making the visualization techniques more difficult to apply. In this work, we propose a weakly supervised approach, which only requires a list of gendered words that can be easily found in online lexical resources, for visualizing the gender bias present in the English base model of BERT. Our approach is based on a Linear Support Vector Classifier and Principal Component Analysis (PCA) and obtains better results with respect to standard PCA.
2022
6th Workshop on Natural Language for Artificial Intelligence, NL4AI 2022
BERT; Ethics; Fairness; Gender Bias; Model Interpretability
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Graphical Identification of Gender Bias in BERT with a Weakly Supervised Approach / Dusi, M.; Arici, N.; Gerevini, A. E.; Putelli, L.; Serina, I.. - 3287:(2022), pp. 164-176. (Intervento presentato al convegno 6th Workshop on Natural Language for Artificial Intelligence, NL4AI 2022 tenutosi a Udine; Italy).
File allegati a questo prodotto
File Dimensione Formato  
Dusi_Graphical-Identification-Gender_2022.pdf

accesso aperto

Note: https://ceur-ws.org/Vol-3287/paper16.pdf
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 1.61 MB
Formato Adobe PDF
1.61 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1725875
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact