In these last years, neural networks are becoming the basis for different kinds of applications and this is mainly due to the stunning performances they offer. Nevertheless, all that glitters is not gold: such tools have demonstrated to be highly sensitive to malicious approaches such as gradient manipulation or the injection of adversarial samples. In particular, another kind of attack that can be performed is to poison a neural network during the training time by injecting a perceptually barely visible trigger signal in a small portion of the dataset (target class), to actually create a backdoor into the trained model. Such a backdoor can be then exploited to redirect all the predictions to the chosen target class at test time. In this work, a novel backdoor attack which resorts to image watermarking algorithms to generate a trigger signal is presented. The watermark signal is almost unperceivable and is embedded in a portion of images of the target class; two different watermarking algorithms have been tested. Experimental results carried out on datasets like MNIST and GTSRB provide satisfactory performances in terms of attack success rate and introduced distortion.
Image Watermarking Backdoor Attacks in CNN-Based Classification Tasks / Abbate, Giovanbattista; Amerini, Irene; Caldelli, Roberto. - 13646:(2023), pp. 4-16. (Intervento presentato al convegno ICPR 2022 International Workshops and Challenges tenutosi a Montreal, QC, Canada) [10.1007/978-3-031-37745-7_1].
Image Watermarking Backdoor Attacks in CNN-Based Classification Tasks
Amerini, Irene;
2023
Abstract
In these last years, neural networks are becoming the basis for different kinds of applications and this is mainly due to the stunning performances they offer. Nevertheless, all that glitters is not gold: such tools have demonstrated to be highly sensitive to malicious approaches such as gradient manipulation or the injection of adversarial samples. In particular, another kind of attack that can be performed is to poison a neural network during the training time by injecting a perceptually barely visible trigger signal in a small portion of the dataset (target class), to actually create a backdoor into the trained model. Such a backdoor can be then exploited to redirect all the predictions to the chosen target class at test time. In this work, a novel backdoor attack which resorts to image watermarking algorithms to generate a trigger signal is presented. The watermark signal is almost unperceivable and is embedded in a portion of images of the target class; two different watermarking algorithms have been tested. Experimental results carried out on datasets like MNIST and GTSRB provide satisfactory performances in terms of attack success rate and introduced distortion.File | Dimensione | Formato | |
---|---|---|---|
Abbate_Image_2022.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
10.11 MB
Formato
Adobe PDF
|
10.11 MB | Adobe PDF | Contatta l'autore |
Abbate_Frontespizio-indice_2022.pdf
accesso aperto
Note: https://link.springer.com/book/10.1007/978-3-031-37745-7
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
125.91 kB
Formato
Adobe PDF
|
125.91 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.