Instruction-Following Language Models (IFLMs) are promising and versatile tools for solving many downstream, information-seeking tasks. Given their success, there is an urgent need to have a shared resource to determine whether existing and new IFLMs are prone to produce biased language interactions. In this paper, we propose Prompt Association Test (P-AT): a new resource for testing the presence of social biases in IFLMs. P-AT stems from WEAT (Caliskan et al., 2017) and generalizes the notion of measuring social biases to IFLMs. Basically, we cast WEAT word tests in promptized classification tasks, and we associate a metric - the bias score. Our resource consists of 2310 prompts. We then experimented with several families of IFLMs discovering gender and race biases in all the analyzed models. We expect P-AT to be an important tool for quantifying bias across different dimensions and, therefore, for encouraging the creation of fairer IFLMs before their distortions have consequences in the real world.

Measuring bias in Instruction-Following models with P-AT / Onorati, Dario; Ruzzetti Elena, Sofia; Venditti, Davide; Ranaldi, Leonardo; Zanzotto Fabio, Massimo. - (2023), pp. 8006-8034. ( Empirical Methods in Natural Language Processing Sentosa Gateway; Singapore ) [10.18653/v1/2023.findings-emnlp.539].

Measuring bias in Instruction-Following models with P-AT

Onorati Dario
;
2023

Abstract

Instruction-Following Language Models (IFLMs) are promising and versatile tools for solving many downstream, information-seeking tasks. Given their success, there is an urgent need to have a shared resource to determine whether existing and new IFLMs are prone to produce biased language interactions. In this paper, we propose Prompt Association Test (P-AT): a new resource for testing the presence of social biases in IFLMs. P-AT stems from WEAT (Caliskan et al., 2017) and generalizes the notion of measuring social biases to IFLMs. Basically, we cast WEAT word tests in promptized classification tasks, and we associate a metric - the bias score. Our resource consists of 2310 prompts. We then experimented with several families of IFLMs discovering gender and race biases in all the analyzed models. We expect P-AT to be an important tool for quantifying bias across different dimensions and, therefore, for encouraging the creation of fairer IFLMs before their distortions have consequences in the real world.
2023
Empirical Methods in Natural Language Processing
Natural Language Processing; LLMs; Instruction-Following Language Models; social bias
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Measuring bias in Instruction-Following models with P-AT / Onorati, Dario; Ruzzetti Elena, Sofia; Venditti, Davide; Ranaldi, Leonardo; Zanzotto Fabio, Massimo. - (2023), pp. 8006-8034. ( Empirical Methods in Natural Language Processing Sentosa Gateway; Singapore ) [10.18653/v1/2023.findings-emnlp.539].
File allegati a questo prodotto
File Dimensione Formato  
Onorati_Measuring_2023.pdf

accesso aperto

Note: DOI: 10.18653/v1/2023.findings-emnlp.539
Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 314.61 kB
Formato Adobe PDF
314.61 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1696812
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 0
social impact