Deep Neural Networks are vulnerable to adversarial examples, i.e., carefully crafted input samples that can cause models to make incorrect predictions with high confidence. To mitigate these vulnerabilities, adversarial training and detection-based defenses have been proposed to strengthen models in advance. However, most of these approaches focus on a single data modality, overlooking the relationships between visual patterns and textual descriptions of the input. In this paper, we propose a novel defense, Multi-Shield, designed to combine and complement these defenses with multimodal information to further enhance their robustness. Multi-Shield leverages multimodal large language models to detect adversarial examples and abstain from uncertain classifications when there is no alignment between textual and visual representations of the input. Extensive evaluations on six distinct datasets, using robust and non-robust image classification models, demonstrate that Multi-Shield can be easily integrated to detect and reject adversarial examples, outperforming the original defenses.

Robust image classification with multi-modal large language models / Villani, Francesco; Maljkovic, Igor; Lazzaro, Dario; Sotgiu, Angelo; Cinà, Antonio Emanuele; Roli, Fabio. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 194:(2025), pp. 1-7. [10.1016/j.patrec.2025.04.022]

Robust image classification with multi-modal large language models

Lazzaro, Dario;
2025

Abstract

Deep Neural Networks are vulnerable to adversarial examples, i.e., carefully crafted input samples that can cause models to make incorrect predictions with high confidence. To mitigate these vulnerabilities, adversarial training and detection-based defenses have been proposed to strengthen models in advance. However, most of these approaches focus on a single data modality, overlooking the relationships between visual patterns and textual descriptions of the input. In this paper, we propose a novel defense, Multi-Shield, designed to combine and complement these defenses with multimodal information to further enhance their robustness. Multi-Shield leverages multimodal large language models to detect adversarial examples and abstain from uncertain classifications when there is no alignment between textual and visual representations of the input. Extensive evaluations on six distinct datasets, using robust and non-robust image classification models, demonstrate that Multi-Shield can be easily integrated to detect and reject adversarial examples, outperforming the original defenses.
2025
Adversarial machine learning; Robust classification; Multimodal large language model; Multimodal information; Adversarial examples; Machine learning security; TrustworthyAI
01 Pubblicazione su rivista::01a Articolo in rivista
Robust image classification with multi-modal large language models / Villani, Francesco; Maljkovic, Igor; Lazzaro, Dario; Sotgiu, Angelo; Cinà, Antonio Emanuele; Roli, Fabio. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 194:(2025), pp. 1-7. [10.1016/j.patrec.2025.04.022]
File allegati a questo prodotto
File Dimensione Formato  
Villani_Robust-image_2025.pdf

accesso aperto

Note: https://doi.org/10.1016/j.patrec.2025.04.022
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 1.22 MB
Formato Adobe PDF
1.22 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1745387
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact