Adversarial Training (AT) is a known, powerful, well-established technique for improving classifier robustness to input perturbations, yet its applicability beyond discriminative settings remains limited. Motivated by the widespread use of score-based generative models and their need to operate robustly under substantial noisy or corrupted input data, we propose an adaptation of AT for these models, providing a thorough empirical assessment. We introduce a principled formulation of AT for Diffusion Models (DMs) that replaces the conventional invariance objective with an equivariance constraint aligned to the denoising dynamics of score matching. Our method integrates seamlessly into diffusion training by adding either random perturbations--similar to randomized smoothing--or adversarial ones--akin to AT. Our approach offers several advantages: (a) tolerance to heavy noise and corruption, (b) reduced memorization, (c) robustness to outliers and extreme data variability and (d) resilience to iterative adversarial attacks. We validate these claims on proof-of-concept low- and high-dimensional datasets with known ground-truth distributions, enabling precise error analysis. We further evaluate on standard benchmarks (CIFAR-10, CelebA, and LSUN Bedroom), where our approach shows improved robustness and preserved sample fidelity under severe noise, data corruption, and adversarial evaluation. Code available at github.com/OmnAI-Lab/Adversarial-Training-DM

Why Adversarially Train Diffusion Models? / Briglia, Maria Rosaria; Mirza, Mujtaba Hussain; Lisanti, Giuseppe; Masi, Iacopo. - (2026). ( International Conference on Learning Representations (ICLR) Rio De Janeiro, Brazil ).

Why Adversarially Train Diffusion Models?

Maria Rosaria Briglia;Mujtaba Hussain Mirza;Iacopo Masi
2026

Abstract

Adversarial Training (AT) is a known, powerful, well-established technique for improving classifier robustness to input perturbations, yet its applicability beyond discriminative settings remains limited. Motivated by the widespread use of score-based generative models and their need to operate robustly under substantial noisy or corrupted input data, we propose an adaptation of AT for these models, providing a thorough empirical assessment. We introduce a principled formulation of AT for Diffusion Models (DMs) that replaces the conventional invariance objective with an equivariance constraint aligned to the denoising dynamics of score matching. Our method integrates seamlessly into diffusion training by adding either random perturbations--similar to randomized smoothing--or adversarial ones--akin to AT. Our approach offers several advantages: (a) tolerance to heavy noise and corruption, (b) reduced memorization, (c) robustness to outliers and extreme data variability and (d) resilience to iterative adversarial attacks. We validate these claims on proof-of-concept low- and high-dimensional datasets with known ground-truth distributions, enabling precise error analysis. We further evaluate on standard benchmarks (CIFAR-10, CelebA, and LSUN Bedroom), where our approach shows improved robustness and preserved sample fidelity under severe noise, data corruption, and adversarial evaluation. Code available at github.com/OmnAI-Lab/Adversarial-Training-DM
2026
International Conference on Learning Representations (ICLR)
alignment, fairness, safety, privacy, adversarial training, diffusion models
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Why Adversarially Train Diffusion Models? / Briglia, Maria Rosaria; Mirza, Mujtaba Hussain; Lisanti, Giuseppe; Masi, Iacopo. - (2026). ( International Conference on Learning Representations (ICLR) Rio De Janeiro, Brazil ).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1763256
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact