Catalogo dei prodotti della ricerca

Environmental Sound Classification (ESC) is becoming an ever increasingly important application in different scenarios, such as smart cities, autonomous systems, safety, and industrial monitoring. Traditional methods for ESC mainly rely on features extracted from a single-representation, usually spectrograms or MFCCs. However, while deep learning-based CNN models have demonstrated excellent performance, they still suffer from certain limitations due to the reliance on a single feature representation. In this regard, this work exploits a multi-representation strategy by fusing five kinds of audio features, namely: spectrograms, phasograms, scalograms, wavelet phasograms, and MFCC-grams. Each representation captures different properties of the audio. These representations are combined in a structured manner by investigating three fusion strategies: early, intermediate, and late fusion using a novel model based on the EfficientNet, named EfficientAudioNet. The proposed strategies are evaluated on four benchmark datasets: a Construction Site machinery sounds dataset, the ESC-10 and ESC-50 environmental sound datasets, and the UrbanSound8K dataset. Experimental results demonstrate that the multi-representation fusion, specially the early fusion, significantly enhances the classification performance. Overall, the proposed approach overcomes state-of-the-art accuracy on all the tested datasets.

EfficientAudioNet. Enhancing environmental sound classification through data fusion of multiple audio representations / Scarpiniti, Michele; Hussain, Saud; Pu, Wangyi; Uncini, Aurelio; Lee, Yong-Cheol. - (2025), pp. 1-8. ( 2025 International Joint Conference on Neural Networks (IJCNN) Rome, Italy ) [10.1109/ijcnn64981.2025.11227157].

EfficientAudioNet. Enhancing environmental sound classification through data fusion of multiple audio representations

Scarpiniti, Michele;Hussain, Saud;Pu, Wangyi;Uncini, Aurelio;Lee, Yong-Cheol

2025

Abstract

Environmental Sound Classification (ESC) is becoming an ever increasingly important application in different scenarios, such as smart cities, autonomous systems, safety, and industrial monitoring. Traditional methods for ESC mainly rely on features extracted from a single-representation, usually spectrograms or MFCCs. However, while deep learning-based CNN models have demonstrated excellent performance, they still suffer from certain limitations due to the reliance on a single feature representation. In this regard, this work exploits a multi-representation strategy by fusing five kinds of audio features, namely: spectrograms, phasograms, scalograms, wavelet phasograms, and MFCC-grams. Each representation captures different properties of the audio. These representations are combined in a structured manner by investigating three fusion strategies: early, intermediate, and late fusion using a novel model based on the EfficientNet, named EfficientAudioNet. The proposed strategies are evaluated on four benchmark datasets: a Construction Site machinery sounds dataset, the ESC-10 and ESC-50 environmental sound datasets, and the UrbanSound8K dataset. Experimental results demonstrate that the multi-representation fusion, specially the early fusion, significantly enhances the classification performance. Overall, the proposed approach overcomes state-of-the-art accuracy on all the tested datasets.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Nome convegno
	
				2025 International Joint Conference on Neural Networks (IJCNN)
			
	Parole chiave
	
				environmental sound classification (ESC); classification; spectrogram; scalogram; phasogram; MFCC; deep learning
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				EfficientAudioNet. Enhancing environmental sound classification through data fusion of multiple audio representations / Scarpiniti, Michele; Hussain, Saud; Pu, Wangyi; Uncini, Aurelio; Lee, Yong-Cheol. - (2025), pp. 1-8. ( 2025 International Joint Conference on Neural Networks (IJCNN) Rome, Italy ) [10.1109/ijcnn64981.2025.11227157].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Scarpiniti_EfficientAudioNet_2025.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.06 MB Formato Adobe PDF Contatta l'autore	1.06 MB	Adobe PDF	Contatta l'autore
Scarpiniti_postprint_EfficientAudioNet_2025.pdf solo gestori archivio Note: Versione in post-print Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 377.73 kB Formato Adobe PDF Contatta l'autore	377.73 kB	Adobe PDF	Contatta l'autore
Scarpiniti_copertina_EfficientAudioNet_2025.pdf solo gestori archivio Tipologia: Altro materiale allegato Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 13.95 MB Formato Adobe PDF Contatta l'autore	13.95 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1755858

Citazioni

ND

ND

ND

social impact