In response to the escalating challenge of audio deepfake detection, this study introduces ABC-CapsNet (Attention-Based Cascaded Capsule Network), a novel architecture that merges the perceptual strengths of Mel spectrograms with the robust feature extraction capabilities of VGG18, enhanced by a strategically placed attention mechanism. This architecture pioneers the use of cascaded capsule networks to delve deeper into complex audio data patterns, setting a new standard in the precision of identifying manipulated audio content. Distinctively, ABC-CapsNet not only addresses the inherent limitations found in traditional CNN models but also showcases remarkable effectiveness across diverse datasets. The proposed method achieved an equal error rate EER of 0.06% on the ASVspoof2019 dataset and an EER of 0.04% on the FoR dataset, underscoring the superior accuracy and reliability of the proposed system in combating the sophisticated threat of audio deepfakes.
ABC-CapsNet: Attention based Cascaded Capsule Network for Audio Deepfake Detection / Wani, T. M.; Gulzar, R.; Amerini, I.. - (2024), pp. 2464-2472. (Intervento presentato al convegno 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024 tenutosi a Seattle; USA) [10.1109/CVPRW63382.2024.00253].
ABC-CapsNet: Attention based Cascaded Capsule Network for Audio Deepfake Detection
Wani T. M.
;Amerini I.
2024
Abstract
In response to the escalating challenge of audio deepfake detection, this study introduces ABC-CapsNet (Attention-Based Cascaded Capsule Network), a novel architecture that merges the perceptual strengths of Mel spectrograms with the robust feature extraction capabilities of VGG18, enhanced by a strategically placed attention mechanism. This architecture pioneers the use of cascaded capsule networks to delve deeper into complex audio data patterns, setting a new standard in the precision of identifying manipulated audio content. Distinctively, ABC-CapsNet not only addresses the inherent limitations found in traditional CNN models but also showcases remarkable effectiveness across diverse datasets. The proposed method achieved an equal error rate EER of 0.06% on the ASVspoof2019 dataset and an EER of 0.04% on the FoR dataset, underscoring the superior accuracy and reliability of the proposed system in combating the sophisticated threat of audio deepfakes.File | Dimensione | Formato | |
---|---|---|---|
Wani_ABC-CapsNet_Attention_based_Cascaded_Capsule_Network_for_Audio_Deepfake_Detection_CVPRW_2024_paper.pdf
accesso aperto
Note: https://openaccess.thecvf.com/content/CVPR2024W/WiCV/papers/Wani_ABC-CapsNet_Attention_based_Cascaded_Capsule_Network_for_Audio_Deepfake_Detection_CVPRW_2024_paper.pdf
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
856.82 kB
Formato
Adobe PDF
|
856.82 kB | Adobe PDF | |
WanI_ABC-CapsNet_2024.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.46 MB
Formato
Adobe PDF
|
1.46 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.