In practical settings, classification datasets are often labeled by humans, leading to potential noise due to varying annotations from different individuals. The exact noise distribution impacting these labels is typically unknown; however, one quantity we can measure and attempt to exploit is inter-rater agreement. Building on this, our work makes key contributions: we (i) demonstrate how inter-annotator statistics can be used to estimate the label noise distribution; (ii) propose methods that leverage these estimates to train models on noisy data; and (iii) derive generalization bounds within the empirical risk minimization framework that depend on the estimated noise characteristics. Finally, we present experiments that support our findings.
When Annotators Disagree: A Principled Approach to Learning with Noisy Labels / Bucarelli, Maria Sofia; Purificato, Antonio; Bacciu, Andrea; Cassano, Lucas; Siciliano, Federico; Nelakanti, Anil; Mantrach, Amin; Silvestri, Fabrizio. - In: IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE. - ISSN 2691-4581. - (2026), pp. 1-16. [10.1109/tai.2026.3666527]
When Annotators Disagree: A Principled Approach to Learning with Noisy Labels
Bucarelli, Maria Sofia
Primo
Methodology
;Purificato, AntonioSecondo
Methodology
;Bacciu, AndreaSoftware
;Siciliano, FedericoResources
;Mantrach, AminPenultimo
Writing – Review & Editing
;Silvestri, FabrizioUltimo
Project Administration
2026
Abstract
In practical settings, classification datasets are often labeled by humans, leading to potential noise due to varying annotations from different individuals. The exact noise distribution impacting these labels is typically unknown; however, one quantity we can measure and attempt to exploit is inter-rater agreement. Building on this, our work makes key contributions: we (i) demonstrate how inter-annotator statistics can be used to estimate the label noise distribution; (ii) propose methods that leverage these estimates to train models on noisy data; and (iii) derive generalization bounds within the empirical risk minimization framework that depend on the estimated noise characteristics. Finally, we present experiments that support our findings.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


