The rapid advancement of deep learning and computer vision technologies has given rise to a concerning class of deceptive media, commonly known as deepfakes. This paper addresses emerging trends in deepfakes, including the creation of hyper-realistic facial manipulations, the incorporation of synthesized human voices, and the addition of fabricated subtitles to video content. To effectively combat these multifaceted deepfake threats, we introduce an ensemble-based deepfake detection framework called the “D-Fence” layer. The D-Fence layer consists of two uni-modal classifiers designed to identify tampered facial and vocal elements, as well as two cross-modal classifiers for interactions between Video-Audio and Audio-Text domains to detect deepfakes across multiple modalities. To evaluate the effectiveness of our framework, we introduce two novel adversarial attacks: the “Bogus-in-the-middle” attack, which strategically inserts counterfeit video frames within authentic sequences, and the “Downsampling attack”, designed to create deceptive audio. A comparative study of the D-Fence layer against various state-of-the-art multi-modal deepfake detection systems is conducted, demonstrating that our ensemble architecture outperforms existing classifiers. Under diverse adversarial conditions, our D-Fence layer achieves an impressive detection accuracy of 92%, showcasing its ability to detect deepfakes efficiently and reliably.

D-Fence layer: an ensemble framework for comprehensive deepfake detection / Asha, S.; Vinod, P.; Amerini, I.; Menon, V. G.. - In: MULTIMEDIA TOOLS AND APPLICATIONS. - ISSN 1380-7501. - (2024). [10.1007/s11042-024-18130-1]

D-Fence layer: an ensemble framework for comprehensive deepfake detection

Amerini I.;
2024

Abstract

The rapid advancement of deep learning and computer vision technologies has given rise to a concerning class of deceptive media, commonly known as deepfakes. This paper addresses emerging trends in deepfakes, including the creation of hyper-realistic facial manipulations, the incorporation of synthesized human voices, and the addition of fabricated subtitles to video content. To effectively combat these multifaceted deepfake threats, we introduce an ensemble-based deepfake detection framework called the “D-Fence” layer. The D-Fence layer consists of two uni-modal classifiers designed to identify tampered facial and vocal elements, as well as two cross-modal classifiers for interactions between Video-Audio and Audio-Text domains to detect deepfakes across multiple modalities. To evaluate the effectiveness of our framework, we introduce two novel adversarial attacks: the “Bogus-in-the-middle” attack, which strategically inserts counterfeit video frames within authentic sequences, and the “Downsampling attack”, designed to create deceptive audio. A comparative study of the D-Fence layer against various state-of-the-art multi-modal deepfake detection systems is conducted, demonstrating that our ensemble architecture outperforms existing classifiers. Under diverse adversarial conditions, our D-Fence layer achieves an impressive detection accuracy of 92%, showcasing its ability to detect deepfakes efficiently and reliably.
2024
Bogus-in-the-middle attack; Cross-modal learning; Downsampling attack; Ensemble learning; Multi-modal deepfakes; Optical flow
01 Pubblicazione su rivista::01a Articolo in rivista
D-Fence layer: an ensemble framework for comprehensive deepfake detection / Asha, S.; Vinod, P.; Amerini, I.; Menon, V. G.. - In: MULTIMEDIA TOOLS AND APPLICATIONS. - ISSN 1380-7501. - (2024). [10.1007/s11042-024-18130-1]
File allegati a questo prodotto
File Dimensione Formato  
Asha_D-Fence_2024.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.71 MB
Formato Adobe PDF
1.71 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1701892
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact