Generative audio data augmentation in a construction site is one of challenging research areas due to the high dissimilarity between work sounds of involved machines and equipment. However, it becomes necessary since the availability of audio data of critical work classes is often rare. Motivated by these considerations and demands, in this paper, we propose a complex-valued GAN architecture working with the audio spectrogram, named CoVal-SGAN, for an effective augmentation of audio data. Specifically, the proposed CoVal-SGAN exploits both the magnitude and phase information to improve the quality of the artificially generated audio signals and increase the overall performance of the underlying classifier. Numerical results, performed on the data recorded in real-world construction sites, along with the comparisons with available state-of-the-art approaches, show the effectiveness of the proposed idea by obtaining an improved accuracy.
CoVal-SGAN: A Complex-Valued Spectral GAN architecture for the effective audio data augmentation in construction sites / Scarpiniti, M.; Mauri, C.; Comminiello, D.; Uncini, A.; Lee, Y. -C.. - 2022-July:(2022), pp. 1-8. (Intervento presentato al convegno 2022 International Joint Conference on Neural Networks (IJCNN 2022) tenutosi a Padua, Italy) [10.1109/IJCNN55064.2022.9891915].
CoVal-SGAN: A Complex-Valued Spectral GAN architecture for the effective audio data augmentation in construction sites
Scarpiniti M.
;Mauri C.;Comminiello D.;Uncini A.;
2022
Abstract
Generative audio data augmentation in a construction site is one of challenging research areas due to the high dissimilarity between work sounds of involved machines and equipment. However, it becomes necessary since the availability of audio data of critical work classes is often rare. Motivated by these considerations and demands, in this paper, we propose a complex-valued GAN architecture working with the audio spectrogram, named CoVal-SGAN, for an effective augmentation of audio data. Specifically, the proposed CoVal-SGAN exploits both the magnitude and phase information to improve the quality of the artificially generated audio signals and increase the overall performance of the underlying classifier. Numerical results, performed on the data recorded in real-world construction sites, along with the comparisons with available state-of-the-art approaches, show the effectiveness of the proposed idea by obtaining an improved accuracy.File | Dimensione | Formato | |
---|---|---|---|
Scarpiniti_CoVal-SGAN_2022.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
943.62 kB
Formato
Adobe PDF
|
943.62 kB | Adobe PDF | Contatta l'autore |
Scarpiniti_post-print_CoVal-SGAN_2022.pdf.pdf
Open Access dal 02/10/2024
Note: post-print
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Creative commons
Dimensione
350.07 kB
Formato
Adobe PDF
|
350.07 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.