This paper presents early findings from a pilot study within IMAGES (Inclusive Machine Learning Using Art and Culture for Tackling Gender and Ethnicity Stereotypes), a PRIN PNRR interdisciplinary project investigating the role of artificial intelligence in supporting inclusive cultural representations. Focusing on a sample of 50 image-text pairs drawn from the Central Catalog of the Italian Ministry of Culture (MiC), we test the capacity of GPT-4o to detect gender and ethnic bias in visual and textual cultural heritage metadata. We evaluate the model’s autonomous and guided performance in identifying stereotypical representations and in generating bias-aware, machine-readable metadata. Preliminary results suggest that while GPT-4o is proficient in identifying overt gender stereotypes, it tends to over-interpret ambiguous content and under-detect subtle or culturally embedde bias—especially in ethnic representations. These results underscore the need for hybrid validation frameworks that integrate human oversight, culturally situated taxonomies, and transparent prompt engineering strategies. The study contributes to the broader aims of the IMAGES project by offering operational and epistemological insights into the promises and pitfalls of using large language models in bias-aware cultural metadata generation.
Bias Detection in Cultural Heritage Metadata: Preliminary Results from the IMAGES Project / Oddi, Angelo; Romagna, Gianmauro; Rasconi, Riccardo; Panarese, Paola; De Gasperis, Paolo. - (2026). ( AEQUITAS 2025 Fairness and Bias in AI Proceedings of the 3rd Workshop on Fairness and Bias in AI Bologna, Italy ).
Bias Detection in Cultural Heritage Metadata: Preliminary Results from the IMAGES Project
Angelo Oddi;Gianmauro Romagna;Riccardo Rasconi;Paola Panarese;Paolo De Gasperis
2026
Abstract
This paper presents early findings from a pilot study within IMAGES (Inclusive Machine Learning Using Art and Culture for Tackling Gender and Ethnicity Stereotypes), a PRIN PNRR interdisciplinary project investigating the role of artificial intelligence in supporting inclusive cultural representations. Focusing on a sample of 50 image-text pairs drawn from the Central Catalog of the Italian Ministry of Culture (MiC), we test the capacity of GPT-4o to detect gender and ethnic bias in visual and textual cultural heritage metadata. We evaluate the model’s autonomous and guided performance in identifying stereotypical representations and in generating bias-aware, machine-readable metadata. Preliminary results suggest that while GPT-4o is proficient in identifying overt gender stereotypes, it tends to over-interpret ambiguous content and under-detect subtle or culturally embedde bias—especially in ethnic representations. These results underscore the need for hybrid validation frameworks that integrate human oversight, culturally situated taxonomies, and transparent prompt engineering strategies. The study contributes to the broader aims of the IMAGES project by offering operational and epistemological insights into the promises and pitfalls of using large language models in bias-aware cultural metadata generation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


