Detecting audio deepfakes: integrating CNN and BiLSTM with multi-feature concatenation

TAIBA MAJID, TAIBA MAJID; Qadri, Syed Asif Ahmad; Comminiello, Danilo; Amerini, Irene

doi:10.1145/3658664.3659647

Audio deepfake detection is emerging as a crucial field in digital media, as distinguishing real audio from deepfakes becomes increasingly challenging due to the advancement of deepfake technologies. These methods threaten information authenticity and pose serious security risks. Addressing this challenge, we propose a novel architecture that combines Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) for effective deepfake audio detection. Our approach is distinguished by the feature concatenation of a comprehensive set of acoustic features: Mel Frequency Cepstral Coefficients (MFCC), Mel spectrograms, Constant Q Cepstral Coefficients (CQCC), and Constant-Q Transform (CQT) vectors. In the proposed architecture, features processed by a CNN are concatenated into two multi-dimensional features for comprehensive analysis, then analyzed by a BiLSTM network to capture temporal dynamics and contextual dependencies in audio data. This synergistic method ensures an understanding of both spatial and sequential audio characteristics. We validate our model on the ASVSpoof 2019 and FoR datasets, using accuracy and Equal Error Rate (EER) metrics for the evaluation.

Detecting audio deepfakes: integrating CNN and BiLSTM with multi-feature concatenation / TAIBA MAJID, TAIBA MAJID; Qadri, Syed Asif Ahmad; Comminiello, Danilo; Amerini, Irene. - (2024), pp. 271-276. (Intervento presentato al convegno IH&MMSec '24: Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security tenutosi a Baiona, Spain) [10.1145/3658664.3659647].

Detecting audio deepfakes: integrating CNN and BiLSTM with multi-feature concatenation

Wani, Taiba Majid^{Primo

Writing – Original Draft Preparation};Qadri, Syed Asif Ahmad;Comminiello, Danilo;Amerini, Irene^{Secondo

Supervision}

2024

Abstract

Audio deepfake detection is emerging as a crucial field in digital media, as distinguishing real audio from deepfakes becomes increasingly challenging due to the advancement of deepfake technologies. These methods threaten information authenticity and pose serious security risks. Addressing this challenge, we propose a novel architecture that combines Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) for effective deepfake audio detection. Our approach is distinguished by the feature concatenation of a comprehensive set of acoustic features: Mel Frequency Cepstral Coefficients (MFCC), Mel spectrograms, Constant Q Cepstral Coefficients (CQCC), and Constant-Q Transform (CQT) vectors. In the proposed architecture, features processed by a CNN are concatenated into two multi-dimensional features for comprehensive analysis, then analyzed by a BiLSTM network to capture temporal dynamics and contextual dependencies in audio data. This synergistic method ensures an understanding of both spatial and sequential audio characteristics. We validate our model on the ASVSpoof 2019 and FoR datasets, using accuracy and Equal Error Rate (EER) metrics for the evaluation.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2024
			
	Nome convegno
	
				IH&MMSec '24: Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security
			
	Parole chiave
	
				audio deepfakes; feature concatenation; MFCC; CQCC; CQT; Mel spectrograms; CNN; BiLSTM
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Detecting audio deepfakes: integrating CNN and BiLSTM with multi-feature concatenation / TAIBA MAJID, TAIBA MAJID; Qadri, Syed Asif Ahmad; Comminiello, Danilo; Amerini, Irene. - (2024), pp. 271-276. (Intervento presentato al  convegno IH&MMSec '24: Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security tenutosi a Baiona, Spain) [10.1145/3658664.3659647].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Majid-Wani_Detecting_2024.pdf accesso aperto Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione) Licenza: Creative commons Dimensione 1.34 MB Formato Adobe PDF	1.34 MB	Adobe PDF
Majid-Wani_Indice_Detecting_2024.pdf solo gestori archivio Note: Indice Tipologia: Altro materiale allegato Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 2.71 MB Formato Adobe PDF Contatta l'autore	2.71 MB	Adobe PDF	Contatta l'autore
Majid-Wani_Frontespizio_Dectecting_2024.pdf solo gestori archivio Note: Frontespizio Tipologia: Altro materiale allegato Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.87 MB Formato Adobe PDF Contatta l'autore	1.87 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1714014

Citazioni

ND

1

1

Catalogo dei prodotti della ricerca