Training Large Language Models on biased datasets tends to teach a discriminatory behavior to the systems themselves, as it has been proven by the last years literature on fairness in AI and Machine Learning algorithms. The developed bias-detection strategies often ignores the inner body of the model, making it easy to generalize the methodology, but harder to understand the underlying motivations. In this paper, we present a general approach for detecting unwanted prejudices in Language Models, requiring only a small set of input data. Our strategy works on the embedding representation of languages, without any constraint on model architecture, but it is able to detect which parts of the representation is the most prejudice-affected. © 2024 Copyright for this paper by its authors.
Supervised Bias Detection in Transformers-based Language Models / Dusi, M.; Gerevini, A. E.; Putelli, L.; Serina, I.. - 3670:(2024). (Intervento presentato al convegno 2023 International Conference of the Italian Association for Artificial Intelligence Doctoral Consortium, AIxIA-DC 2023 tenutosi a Roma; Italia).
Supervised Bias Detection in Transformers-based Language Models
Dusi M.
;Gerevini A. E.;
2024
Abstract
Training Large Language Models on biased datasets tends to teach a discriminatory behavior to the systems themselves, as it has been proven by the last years literature on fairness in AI and Machine Learning algorithms. The developed bias-detection strategies often ignores the inner body of the model, making it easy to generalize the methodology, but harder to understand the underlying motivations. In this paper, we present a general approach for detecting unwanted prejudices in Language Models, requiring only a small set of input data. Our strategy works on the embedding representation of languages, without any constraint on model architecture, but it is able to detect which parts of the representation is the most prejudice-affected. © 2024 Copyright for this paper by its authors.| File | Dimensione | Formato | |
|---|---|---|---|
|
Dusi_Supervised-Bias_2024.pdf
accesso aperto
Note: https://ceur-ws.org/Vol-3670/paper97.pdf
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
1.16 MB
Formato
Adobe PDF
|
1.16 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


