Instruction-Following Language Models (IFLMs) are promising and versatile tools for solving many downstream, information-seeking tasks. Given their success, there is an urgent need to have a shared resource to determine whether existing and new IFLMs are prone to produce biased language interactions. In this paper, we propose Prompt Association Test (P-AT): a new resource for testing the presence of social biases in IFLMs. P-AT stems from WEAT (Caliskan et al., 2017) and generalizes the notion of measuring social biases to IFLMs. Basically, we cast WEAT word tests in promptized classification tasks, and we associate a metric - the bias score. Our resource consists of 2310 prompts. We then experimented with several families of IFLMs discovering gender and race biases in all the analyzed models. We expect P-AT to be an important tool for quantifying bias across different dimensions and, therefore, for encouraging the creation of fairer IFLMs before their distortions have consequences in the real world.
Measuring bias in Instruction-Following models with P-AT / Onorati, Dario; Ruzzetti Elena, Sofia; Venditti, Davide; Ranaldi, Leonardo; Zanzotto Fabio, Massimo. - (2023), pp. 8006-8034. ( Empirical Methods in Natural Language Processing Sentosa Gateway; Singapore ) [10.18653/v1/2023.findings-emnlp.539].
Measuring bias in Instruction-Following models with P-AT
Onorati Dario
;
2023
Abstract
Instruction-Following Language Models (IFLMs) are promising and versatile tools for solving many downstream, information-seeking tasks. Given their success, there is an urgent need to have a shared resource to determine whether existing and new IFLMs are prone to produce biased language interactions. In this paper, we propose Prompt Association Test (P-AT): a new resource for testing the presence of social biases in IFLMs. P-AT stems from WEAT (Caliskan et al., 2017) and generalizes the notion of measuring social biases to IFLMs. Basically, we cast WEAT word tests in promptized classification tasks, and we associate a metric - the bias score. Our resource consists of 2310 prompts. We then experimented with several families of IFLMs discovering gender and race biases in all the analyzed models. We expect P-AT to be an important tool for quantifying bias across different dimensions and, therefore, for encouraging the creation of fairer IFLMs before their distortions have consequences in the real world.| File | Dimensione | Formato | |
|---|---|---|---|
|
Onorati_Measuring_2023.pdf
accesso aperto
Note: DOI: 10.18653/v1/2023.findings-emnlp.539
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
314.61 kB
Formato
Adobe PDF
|
314.61 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


