In an era where digital authenticity is frequently compromised by sophisticated synthetic audio technologies, ensuring the integrity of digital media is crucial. This paper addresses the critical challenges of catastrophic forgetting and incremental learning within the domain of audio deepfake detection. We introduce a novel methodology that synergistically combines the discriminative feature extraction capabilities of SincNet with the computational efficiency of LightCNN. Our approach is further augmented by integrating Feature Distillation and Dynamic Class Rebalancing, enhancing the model’s adaptability across evolving deepfake threats while maintaining high accuracy on previously encountered data. The models were tested using the ASVspoof 2015, ASVspoof 2019, and FoR datasets, demonstrating significant improvements in detecting audio deepfakes with reduced computational overhead. Our results illustrate that the proposed model not only effectively counters the issue of catastrophic forgetting but also exhibits superior adaptability through dynamic class rebalancing and feature distillation techniques.
Audio Deepfake Detection: A Continual Approach with Feature Distillation and Dynamic Class Rebalancing / Wani, T. M.; Amerini, I.. - 15321:(2025), pp. 211-227. ( 27th International Conference on Pattern Recognition, ICPR 2024 Kolkata ) [10.1007/978-3-031-78305-0_14].
Audio Deepfake Detection: A Continual Approach with Feature Distillation and Dynamic Class Rebalancing
Wani T. M.
Primo
;Amerini I.
Secondo
2025
Abstract
In an era where digital authenticity is frequently compromised by sophisticated synthetic audio technologies, ensuring the integrity of digital media is crucial. This paper addresses the critical challenges of catastrophic forgetting and incremental learning within the domain of audio deepfake detection. We introduce a novel methodology that synergistically combines the discriminative feature extraction capabilities of SincNet with the computational efficiency of LightCNN. Our approach is further augmented by integrating Feature Distillation and Dynamic Class Rebalancing, enhancing the model’s adaptability across evolving deepfake threats while maintaining high accuracy on previously encountered data. The models were tested using the ASVspoof 2015, ASVspoof 2019, and FoR datasets, demonstrating significant improvements in detecting audio deepfakes with reduced computational overhead. Our results illustrate that the proposed model not only effectively counters the issue of catastrophic forgetting but also exhibits superior adaptability through dynamic class rebalancing and feature distillation techniques.| File | Dimensione | Formato | |
|---|---|---|---|
|
Wani_preprint_Audio-Deepfake_2025.pdf
accesso aperto
Note: https://link.springer.com/chapter/10.1007/978-3-031-78305-0_14
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
668.33 kB
Formato
Adobe PDF
|
668.33 kB | Adobe PDF | |
|
Wani_Audio-Deepfake_2025.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
671.07 kB
Formato
Adobe PDF
|
671.07 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


