The role of women in modern society is a central problem in several developed countries. Despite encouraging policies, women’s participation in STEM fields is significantly lower than men’s one. In order to develop solutions for mitigating this disparity, a deeper understanding of the underlying causes is crucial and a proper quantification of the phenomenon represents a first step to any analysis. While the problem of gender gap in scientific communities was long debated, information on authors’ genders is often unavailable (see, for instance, ResearchGate and Scopus). Additionally, the lack of open-source software for automated gender prediction based on names calls for time costly human efforts. It arises the need for novel effective algorithms. Moreover, as a further challenge, desired software should guarantee gender fairness by providing the same performance for both male and female names recognition. In this paper, we propose a gender fair software to automatically predict authors’ gender from their given names. The code leverages most of the existing information sources, i.e., Scopus, Semantic Scholar, and Harvard dataset. We performed an experimental application by analysing two datasets of publications, thus providing interesting insights. Finally, we evaluated the software performances in terms of accuracy, precision, recall, F1-score, and gender fairness by means of two distinct case studies. The proposed solution can enable fairer gender prediction by combining open data with carefully calibrated criteria, matching the performance of commercial tools while offering a transparent and accessible solution.
Improving fair name-based prediction of gender in scientific communities / Guariglia Migliore, Maria; D'Agostino, Gregorio; Patriarca, Tatiana; De Nicola, Antonio. - In: SCIENTOMETRICS. - ISSN 0138-9130. - 130:9(2025), pp. 4849-4877. [10.1007/s11192-025-05384-1]
Improving fair name-based prediction of gender in scientific communities
Guariglia Migliore, Maria;
2025
Abstract
The role of women in modern society is a central problem in several developed countries. Despite encouraging policies, women’s participation in STEM fields is significantly lower than men’s one. In order to develop solutions for mitigating this disparity, a deeper understanding of the underlying causes is crucial and a proper quantification of the phenomenon represents a first step to any analysis. While the problem of gender gap in scientific communities was long debated, information on authors’ genders is often unavailable (see, for instance, ResearchGate and Scopus). Additionally, the lack of open-source software for automated gender prediction based on names calls for time costly human efforts. It arises the need for novel effective algorithms. Moreover, as a further challenge, desired software should guarantee gender fairness by providing the same performance for both male and female names recognition. In this paper, we propose a gender fair software to automatically predict authors’ gender from their given names. The code leverages most of the existing information sources, i.e., Scopus, Semantic Scholar, and Harvard dataset. We performed an experimental application by analysing two datasets of publications, thus providing interesting insights. Finally, we evaluated the software performances in terms of accuracy, precision, recall, F1-score, and gender fairness by means of two distinct case studies. The proposed solution can enable fairer gender prediction by combining open data with carefully calibrated criteria, matching the performance of commercial tools while offering a transparent and accessible solution.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


