Machine-learning models can be fooled by adversarial examples, i.e., carefully-crafted input perturbations that force models to output wrong predictions. While uncertainty quantification has been recently proposed to detect adversarial inputs, under the assumption that such attacks exhibit a higher prediction uncertainty than pristine data, it has been shown that adaptive attacks specifically aimed at reducing also the uncertainty estimate can easily bypass this defense mechanism. In this work, we focus on a different adversarial scenario in which the attacker is still interested in manipulating the uncertainty estimate, but regardless of the correctness of the prediction; in particular, the goal is to undermine the use of machine-learning models when their outputs are consumed by a downstream module or by a human operator. Following such direction, we: (i) design a threat model for attacks targeting uncertainty quantification; (ii) devise different attack strategies on conceptually different UQ techniques spanning for both classification and semantic segmentation problems; (iii) conduct a first complete and extensive analysis to compare the differences between some of the most employed UQ approaches under attack. Our extensive experimental analysis shows that our attacks are more effective in manipulating uncertainty quantification measures than attacks aimed to also induce misclassifications.

Adversarial Attacks Against Uncertainty Quantification / Ledda, Emanuele; Angioni, Daniele; Piras, Giorgio; Fumera, Giorgio; Biggio, Battista; Roli, Fabio. - (2023), pp. 4599-4608. (Intervento presentato al convegno International Conference on Computer Vision (ICCV) Workshops, 2023 tenutosi a Parigi).

Adversarial Attacks Against Uncertainty Quantification

Emanuele Ledda
;
Daniele Angioni;Giorgio Piras;
2023

Abstract

Machine-learning models can be fooled by adversarial examples, i.e., carefully-crafted input perturbations that force models to output wrong predictions. While uncertainty quantification has been recently proposed to detect adversarial inputs, under the assumption that such attacks exhibit a higher prediction uncertainty than pristine data, it has been shown that adaptive attacks specifically aimed at reducing also the uncertainty estimate can easily bypass this defense mechanism. In this work, we focus on a different adversarial scenario in which the attacker is still interested in manipulating the uncertainty estimate, but regardless of the correctness of the prediction; in particular, the goal is to undermine the use of machine-learning models when their outputs are consumed by a downstream module or by a human operator. Following such direction, we: (i) design a threat model for attacks targeting uncertainty quantification; (ii) devise different attack strategies on conceptually different UQ techniques spanning for both classification and semantic segmentation problems; (iii) conduct a first complete and extensive analysis to compare the differences between some of the most employed UQ approaches under attack. Our extensive experimental analysis shows that our attacks are more effective in manipulating uncertainty quantification measures than attacks aimed to also induce misclassifications.
2023
International Conference on Computer Vision (ICCV) Workshops, 2023
uncertainty quantification; deep learning; adversarial machine learning
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Adversarial Attacks Against Uncertainty Quantification / Ledda, Emanuele; Angioni, Daniele; Piras, Giorgio; Fumera, Giorgio; Biggio, Battista; Roli, Fabio. - (2023), pp. 4599-4608. (Intervento presentato al convegno International Conference on Computer Vision (ICCV) Workshops, 2023 tenutosi a Parigi).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1690355
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact