Relation Extraction (RE) is at the core of many Natural Language Understanding tasks, including knowledge-base population and Question Answering. However, any Natural Language Processing system is exposed to biases, and the analysis of these has not received much attention in RE. We propose a new method for inspecting bias in the RE pipeline, which is completely transparent in terms of interpretability. Specifically, in this work we analyze biases related to gender and place of birth. Our methodology includes (i) obtaining semantic triplets (subject, object, semantic relation) involving ‘person’ entities from RE resources, (ii) collecting meta-information (‘gender’ and ‘place of birth’) using Entity Linking technologies, and then (iii) analyze the distribution of triplets across different groups (e.g., men versus women). We investigate bias at two levels: In the training data of three commonly used RE datasets (SREDFM, CrossRE, NYT), and in the predictions of a state-of-the-art RE approach (ReLiK). To enable cross-dataset analysis, we introduce a taxonomy of relation types mapping the label sets of different RE datasets to a unified label space. Our findings reveal that bias is a compounded issue affecting underrepresented groups within data and predictions for RE.

Dissecting Biases in Relation Extraction: A Cross-Dataset Analysis on People’s Gender and Origin / Stranisci, Marco; Huguet Cabot, Pere Lluis; Bassignana, Elisa; Navigli, Roberto. - (2024), pp. 190-202. (Intervento presentato al convegno 5th Workshop on Gender Bias in Natural Language Processing tenutosi a Bangkok; Thailand) [10.18653/v1/2024.gebnlp-1.12].

Dissecting Biases in Relation Extraction: A Cross-Dataset Analysis on People’s Gender and Origin

Huguet Cabot, Pere Lluis
;
Navigli, Roberto
2024

Abstract

Relation Extraction (RE) is at the core of many Natural Language Understanding tasks, including knowledge-base population and Question Answering. However, any Natural Language Processing system is exposed to biases, and the analysis of these has not received much attention in RE. We propose a new method for inspecting bias in the RE pipeline, which is completely transparent in terms of interpretability. Specifically, in this work we analyze biases related to gender and place of birth. Our methodology includes (i) obtaining semantic triplets (subject, object, semantic relation) involving ‘person’ entities from RE resources, (ii) collecting meta-information (‘gender’ and ‘place of birth’) using Entity Linking technologies, and then (iii) analyze the distribution of triplets across different groups (e.g., men versus women). We investigate bias at two levels: In the training data of three commonly used RE datasets (SREDFM, CrossRE, NYT), and in the predictions of a state-of-the-art RE approach (ReLiK). To enable cross-dataset analysis, we introduce a taxonomy of relation types mapping the label sets of different RE datasets to a unified label space. Our findings reveal that bias is a compounded issue affecting underrepresented groups within data and predictions for RE.
2024
5th Workshop on Gender Bias in Natural Language Processing
bias; relation extraction; gender
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Dissecting Biases in Relation Extraction: A Cross-Dataset Analysis on People’s Gender and Origin / Stranisci, Marco; Huguet Cabot, Pere Lluis; Bassignana, Elisa; Navigli, Roberto. - (2024), pp. 190-202. (Intervento presentato al convegno 5th Workshop on Gender Bias in Natural Language Processing tenutosi a Bangkok; Thailand) [10.18653/v1/2024.gebnlp-1.12].
File allegati a questo prodotto
File Dimensione Formato  
Stranisci_Dissecting-Biases_2024.pdf

accesso aperto

Note: https://aclanthology.org/2024.gebnlp-1.12.pdf
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 346.16 kB
Formato Adobe PDF
346.16 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1726490
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact