Catalogo dei prodotti della ricerca

Advances in audio synthesis techniques have led to the creation of highly realistic audio deepfakes, posing growing threats to digital integrity and public trust. These synthetic manipulations mimic natural speech with high fidelity, making detection increasingly challenging and fueling the spread of misinformation, identity fraud, and voice-based attacks. To address these concerns, this study proposes the Adaptive Spectro-Temporal Diffusion Transformer (ASTDT), a novel detection framework that tackles key challenges in generalization, interpretability, and adaptability across diverse audio generation techniques. ASTDT integrates a score-based diffusion model to augment training spectrograms with realistic deepfake variations, improving generalization to unseen text-to-speech and voice conversion attacks. An adaptive spectro-temporal feature extraction mechanism partitions audio into interpretable frequency and temporal segments, while a dual-modal attention fusion module jointly processes magnitude and phase features. These fused features are processed by a transformer encoder with diffusion-aware attention, enabling effective modeling of long-range temporal dependencies. To enhance transparency, ASTDT includes an interpretability module that combines quantitative feature attributions and spatial heatmaps to explain model predictions. Experimental results across four benchmark datasets demonstrate the effectiveness of ASTDT, with the model achieving the lowest equal error rate of 1.20% on the ASVspoof 2019 dataset.

ASTDT: an Interpretable Adaptive Spectro-Temporal Diffusion Transformer for audio deepfake detection / Wani, Taiba Majid; Qadri, Syed Asif Ahmad; Ashraf, Arselan; Amerini, Irene. - In: EURASIP JOURNAL ON INFORMATION SECURITY. - ISSN 2510-523X. - 2025:1(2025). [10.1186/s13635-025-00217-3]

ASTDT: an Interpretable Adaptive Spectro-Temporal Diffusion Transformer for audio deepfake detection

Wani, Taiba Majid;Qadri, Syed Asif Ahmad;Ashraf, Arselan;Amerini, Irene

2025

Abstract

Advances in audio synthesis techniques have led to the creation of highly realistic audio deepfakes, posing growing threats to digital integrity and public trust. These synthetic manipulations mimic natural speech with high fidelity, making detection increasingly challenging and fueling the spread of misinformation, identity fraud, and voice-based attacks. To address these concerns, this study proposes the Adaptive Spectro-Temporal Diffusion Transformer (ASTDT), a novel detection framework that tackles key challenges in generalization, interpretability, and adaptability across diverse audio generation techniques. ASTDT integrates a score-based diffusion model to augment training spectrograms with realistic deepfake variations, improving generalization to unseen text-to-speech and voice conversion attacks. An adaptive spectro-temporal feature extraction mechanism partitions audio into interpretable frequency and temporal segments, while a dual-modal attention fusion module jointly processes magnitude and phase features. These fused features are processed by a transformer encoder with diffusion-aware attention, enabling effective modeling of long-range temporal dependencies. To enhance transparency, ASTDT includes an interpretability module that combines quantitative feature attributions and spatial heatmaps to explain model predictions. Experimental results across four benchmark datasets demonstrate the effectiveness of ASTDT, with the model achieving the lowest equal error rate of 1.20% on the ASVspoof 2019 dataset.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Parole chiave
	
				Adaptive Spectro-Temporal Diffusion Transformer (ASTDT); Audio deepfake detection; Dual-modal attention fusion; Interpretability; Spectrogram analysis
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				ASTDT: an Interpretable Adaptive Spectro-Temporal Diffusion Transformer for audio deepfake detection / Wani, Taiba Majid; Qadri, Syed Asif Ahmad; Ashraf, Arselan; Amerini, Irene. - In: EURASIP JOURNAL ON INFORMATION SECURITY. - ISSN 2510-523X. - 2025:1(2025). [10.1186/s13635-025-00217-3]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Wani_ASTDT_2025.pdf accesso aperto Note: https://jis-eurasipjournals.springeropen.com/counter/pdf/10.1186/s13635-025-00217-3 Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 5.36 MB Formato Adobe PDF	5.36 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1765785

Citazioni

ND

0

0

social impact