In response to the escalating challenge of audio deepfake detection, this study introduces ABC-CapsNet (Attention-Based Cascaded Capsule Network), a novel architecture that merges the perceptual strengths of Mel spectrograms with the robust feature extraction capabilities of VGG18, enhanced by a strategically placed attention mechanism. This architecture pioneers the use of cascaded capsule networks to delve deeper into complex audio data patterns, setting a new standard in the precision of identifying manipulated audio content. Distinctively, ABC-CapsNet not only addresses the inherent limitations found in traditional CNN models but also showcases remarkable effectiveness across diverse datasets. The proposed method achieved an equal error rate EER of 0.06% on the ASVspoof2019 dataset and an EER of 0.04% on the FoR dataset, underscoring the superior accuracy and reliability of the proposed system in combating the sophisticated threat of audio deepfakes.

ABC-CapsNet: Attention based Cascaded Capsule Network for Audio Deepfake Detection / Wani, T. M.; Gulzar, R.; Amerini, I.. - (2024), pp. 2464-2472. (Intervento presentato al convegno 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024 tenutosi a Seattle; USA) [10.1109/CVPRW63382.2024.00253].

ABC-CapsNet: Attention based Cascaded Capsule Network for Audio Deepfake Detection

Wani T. M.
;
Amerini I.
2024

Abstract

In response to the escalating challenge of audio deepfake detection, this study introduces ABC-CapsNet (Attention-Based Cascaded Capsule Network), a novel architecture that merges the perceptual strengths of Mel spectrograms with the robust feature extraction capabilities of VGG18, enhanced by a strategically placed attention mechanism. This architecture pioneers the use of cascaded capsule networks to delve deeper into complex audio data patterns, setting a new standard in the precision of identifying manipulated audio content. Distinctively, ABC-CapsNet not only addresses the inherent limitations found in traditional CNN models but also showcases remarkable effectiveness across diverse datasets. The proposed method achieved an equal error rate EER of 0.06% on the ASVspoof2019 dataset and an EER of 0.04% on the FoR dataset, underscoring the superior accuracy and reliability of the proposed system in combating the sophisticated threat of audio deepfakes.
2024
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
ASVspoof 2019; Audio Deepfakes; Capsule Networks; Cascaded Networks; FoR
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
ABC-CapsNet: Attention based Cascaded Capsule Network for Audio Deepfake Detection / Wani, T. M.; Gulzar, R.; Amerini, I.. - (2024), pp. 2464-2472. (Intervento presentato al convegno 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024 tenutosi a Seattle; USA) [10.1109/CVPRW63382.2024.00253].
File allegati a questo prodotto
File Dimensione Formato  
Wani_ABC-CapsNet_Attention_based_Cascaded_Capsule_Network_for_Audio_Deepfake_Detection_CVPRW_2024_paper.pdf

accesso aperto

Note: https://openaccess.thecvf.com/content/CVPR2024W/WiCV/papers/Wani_ABC-CapsNet_Attention_based_Cascaded_Capsule_Network_for_Audio_Deepfake_Detection_CVPRW_2024_paper.pdf
Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 856.82 kB
Formato Adobe PDF
856.82 kB Adobe PDF
WanI_ABC-CapsNet_2024.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.46 MB
Formato Adobe PDF
1.46 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1726267
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact