Localization of sound sources in 3D sound fields is an extremely challenging task, especially when the environments are reverberant and involve multiple sources. In this work, we propose a deep neural network to analyze audio signals recorded by 3D microphones and localize sound sources in a spatial sound field. In particular, we consider first-order Ambisonics microphones to capture 3D acoustic signals and represent them by spherical harmonic decomposition in the quaternion domain. Moreover, to improve the localization performance, we use quaternion input features derived from the acoustic intensity, which is strictly related to the direction of arrival (DOA) of a sound source. The proposed network architecture involves both quaternion-valued convolutional and recurrent layers. Results show that the proposed method is able to exploit both the quaternion-valued representation of ambisonic signals and to improve the localization performance with respect to existing methods.
Quaternion neural networks for 3D sound source localization in reverberant environments / Celsi, M. R.; Scardapane, S.; Comminiello, D.. - 2020-:(2020), pp. 1-6. (Intervento presentato al convegno 30th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2020 tenutosi a Espoo, Finland) [10.1109/MLSP49062.2020.9231809].
Quaternion neural networks for 3D sound source localization in reverberant environments
Scardapane S.;Comminiello D.
2020
Abstract
Localization of sound sources in 3D sound fields is an extremely challenging task, especially when the environments are reverberant and involve multiple sources. In this work, we propose a deep neural network to analyze audio signals recorded by 3D microphones and localize sound sources in a spatial sound field. In particular, we consider first-order Ambisonics microphones to capture 3D acoustic signals and represent them by spherical harmonic decomposition in the quaternion domain. Moreover, to improve the localization performance, we use quaternion input features derived from the acoustic intensity, which is strictly related to the direction of arrival (DOA) of a sound source. The proposed network architecture involves both quaternion-valued convolutional and recurrent layers. Results show that the proposed method is able to exploit both the quaternion-valued representation of ambisonic signals and to improve the localization performance with respect to existing methods.File | Dimensione | Formato | |
---|---|---|---|
Celsi_Quaternion_2020.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
367.49 kB
Formato
Adobe PDF
|
367.49 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.