Deep learning models have demonstrated competitive performance in many computer vision tasks, such as image-based malware detection and classification. However, their adversarial robustness is critical to apply them in real-world scenarios. In this paper, we propose a framework to analyze the adversarial robustness and attack transferability of image-based models for malware detection and classification. Specifically, we adopt a broad spectrum of image-domain attacks to generate adversarial samples and evaluate the model performance drop and the attack success rate. Moreover, we examine whether adversarial manipulations crafted in the binary domain remain effective when the modified binary is converted into its corresponding image representation. We implement our framework on state-of-the-art models, providing a comprehensive evaluation of their adversarial robustness. Furthermore, we conduct ablation studies to analyze the generalizability capabilities of the adopted models. Experimental results reveal high model susceptibility to adversarial attacks, even when originating in the binary domain. To the best of our knowledge, this is the first work to compare a wide range of adversarial attacks on malware models, analyzing the attack transferability across different domains.
A Comprehensive Study of Cross-domain Adversarial Robustness and Attack Transferability in Image-Based Malware Detection and Classification / Daidone, Giuseppe; Cirillo, Lorenzo; Querzoni, Leonardo; Amerini, Irene. - In: IMAGE AND VISION COMPUTING. - ISSN 0262-8856. - (2026). [10.2139/ssrn.5960449]
A Comprehensive Study of Cross-domain Adversarial Robustness and Attack Transferability in Image-Based Malware Detection and Classification
Daidone, Giuseppe
Primo
;Cirillo, LorenzoSecondo
;Querzoni, LeonardoPenultimo
;Amerini, IreneUltimo
2026
Abstract
Deep learning models have demonstrated competitive performance in many computer vision tasks, such as image-based malware detection and classification. However, their adversarial robustness is critical to apply them in real-world scenarios. In this paper, we propose a framework to analyze the adversarial robustness and attack transferability of image-based models for malware detection and classification. Specifically, we adopt a broad spectrum of image-domain attacks to generate adversarial samples and evaluate the model performance drop and the attack success rate. Moreover, we examine whether adversarial manipulations crafted in the binary domain remain effective when the modified binary is converted into its corresponding image representation. We implement our framework on state-of-the-art models, providing a comprehensive evaluation of their adversarial robustness. Furthermore, we conduct ablation studies to analyze the generalizability capabilities of the adopted models. Experimental results reveal high model susceptibility to adversarial attacks, even when originating in the binary domain. To the best of our knowledge, this is the first work to compare a wide range of adversarial attacks on malware models, analyzing the attack transferability across different domains.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


