Audio deepfake detection is emerging as a crucial field in digital media, as distinguishing real audio from deepfakes becomes increasingly challenging due to the advancement of deepfake technologies. These methods threaten information authenticity and pose serious security risks. Addressing this challenge, we propose a novel architecture that combines Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) for effective deepfake audio detection. Our approach is distinguished by the feature concatenation of a comprehensive set of acoustic features: Mel Frequency Cepstral Coefficients (MFCC), Mel spectrograms, Constant Q Cepstral Coefficients (CQCC), and Constant-Q Transform (CQT) vectors. In the proposed architecture, features processed by a CNN are concatenated into two multi-dimensional features for comprehensive analysis, then analyzed by a BiLSTM network to capture temporal dynamics and contextual dependencies in audio data. This synergistic method ensures an understanding of both spatial and sequential audio characteristics. We validate our model on the ASVSpoof 2019 and FoR datasets, using accuracy and Equal Error Rate (EER) metrics for the evaluation.

Detecting audio deepfakes: integrating CNN and BiLSTM with multi-feature concatenation / TAIBA MAJID, TAIBA MAJID; Qadri, Syed Asif Ahmad; Comminiello, Danilo; Amerini, Irene. - (2024), pp. 271-276. (Intervento presentato al convegno IH&MMSec '24: Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security tenutosi a Baiona, Spain) [10.1145/3658664.3659647].

Detecting audio deepfakes: integrating CNN and BiLSTM with multi-feature concatenation

Wani, Taiba Majid
Primo
Writing – Original Draft Preparation
;
Comminiello, Danilo;Amerini, Irene
Secondo
Supervision
2024

Abstract

Audio deepfake detection is emerging as a crucial field in digital media, as distinguishing real audio from deepfakes becomes increasingly challenging due to the advancement of deepfake technologies. These methods threaten information authenticity and pose serious security risks. Addressing this challenge, we propose a novel architecture that combines Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) for effective deepfake audio detection. Our approach is distinguished by the feature concatenation of a comprehensive set of acoustic features: Mel Frequency Cepstral Coefficients (MFCC), Mel spectrograms, Constant Q Cepstral Coefficients (CQCC), and Constant-Q Transform (CQT) vectors. In the proposed architecture, features processed by a CNN are concatenated into two multi-dimensional features for comprehensive analysis, then analyzed by a BiLSTM network to capture temporal dynamics and contextual dependencies in audio data. This synergistic method ensures an understanding of both spatial and sequential audio characteristics. We validate our model on the ASVSpoof 2019 and FoR datasets, using accuracy and Equal Error Rate (EER) metrics for the evaluation.
2024
IH&MMSec '24: Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security
audio deepfakes; feature concatenation; MFCC; CQCC; CQT; Mel spectrograms; CNN; BiLSTM
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Detecting audio deepfakes: integrating CNN and BiLSTM with multi-feature concatenation / TAIBA MAJID, TAIBA MAJID; Qadri, Syed Asif Ahmad; Comminiello, Danilo; Amerini, Irene. - (2024), pp. 271-276. (Intervento presentato al convegno IH&MMSec '24: Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security tenutosi a Baiona, Spain) [10.1145/3658664.3659647].
File allegati a questo prodotto
File Dimensione Formato  
Majid-Wani_Detecting_2024.pdf

accesso aperto

Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Creative commons
Dimensione 1.34 MB
Formato Adobe PDF
1.34 MB Adobe PDF
Majid-Wani_Indice_Detecting_2024.pdf

solo gestori archivio

Note: Indice
Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.71 MB
Formato Adobe PDF
2.71 MB Adobe PDF   Contatta l'autore
Majid-Wani_Frontespizio_Dectecting_2024.pdf

solo gestori archivio

Note: Frontespizio
Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.87 MB
Formato Adobe PDF
1.87 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1714014
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact