In practical settings, classification datasets are obtained through a labelling process that is usually done by humans. Labels can be noisy as they are obtained by aggregating the different individual labels assigned to the same sample by multiple and possibly disagreeing, annotators. The inter-rater agreement on these datasets can be measured while the underlying noise distribution to which the labels are subject is assumed to be unknown. In this work, we: (i) show how to leverage the inter-annotator statistics to estimate the noise distribution to which labels are subject; (ii) introduce methods that use the estimate of the noise distribution to learn from the noisy dataset; and (iii) establish generalization bounds in the empirical risk minimization framework that depend on the estimated quantities. We conclude the paper by providing experiments that illustrate our findings.

Leveraging Inter-Rater Agreement for Classification in the Presence of Noisy Labels / Bucarelli, MARIA SOFIA; Cassano, Lucas; Siciliano, Federico; Mantrach, Amin; Silvestri, Fabrizio. - (2023), pp. 3439-3448. ( IEEE Conference on Computer Vision and Pattern Recognition Vancouver; Canada ) [10.1109/CVPR52729.2023.00335].

Leveraging Inter-Rater Agreement for Classification in the Presence of Noisy Labels

Maria Sofia Bucarelli
Primo
;
Federico Siciliano;Fabrizio Silvestri
Ultimo
2023

Abstract

In practical settings, classification datasets are obtained through a labelling process that is usually done by humans. Labels can be noisy as they are obtained by aggregating the different individual labels assigned to the same sample by multiple and possibly disagreeing, annotators. The inter-rater agreement on these datasets can be measured while the underlying noise distribution to which the labels are subject is assumed to be unknown. In this work, we: (i) show how to leverage the inter-annotator statistics to estimate the noise distribution to which labels are subject; (ii) introduce methods that use the estimate of the noise distribution to learn from the noisy dataset; and (iii) establish generalization bounds in the empirical risk minimization framework that depend on the estimated quantities. We conclude the paper by providing experiments that illustrate our findings.
2023
IEEE Conference on Computer Vision and Pattern Recognition
Deep Learning; Noisy Labels; Supervised Learning
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Leveraging Inter-Rater Agreement for Classification in the Presence of Noisy Labels / Bucarelli, MARIA SOFIA; Cassano, Lucas; Siciliano, Federico; Mantrach, Amin; Silvestri, Fabrizio. - (2023), pp. 3439-3448. ( IEEE Conference on Computer Vision and Pattern Recognition Vancouver; Canada ) [10.1109/CVPR52729.2023.00335].
File allegati a questo prodotto
File Dimensione Formato  
Bucarelli_postprint_Leveraging_2023.pdf

accesso aperto

Note: DOI: 10.1109/CVPR52729.2023.00335
Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Creative commons
Dimensione 585.99 kB
Formato Adobe PDF
585.99 kB Adobe PDF
Bucarelli_Leveraging_2023.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 762.75 kB
Formato Adobe PDF
762.75 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1685080
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 7
social impact