Localization of sound sources in 3D sound fields is an extremely challenging task, especially when the environments are reverberant and involve multiple sources. In this work, we propose a deep neural network to analyze audio signals recorded by 3D microphones and localize sound sources in a spatial sound field. In particular, we consider first-order Ambisonics microphones to capture 3D acoustic signals and represent them by spherical harmonic decomposition in the quaternion domain. Moreover, to improve the localization performance, we use quaternion input features derived from the acoustic intensity, which is strictly related to the direction of arrival (DOA) of a sound source. The proposed network architecture involves both quaternion-valued convolutional and recurrent layers. Results show that the proposed method is able to exploit both the quaternion-valued representation of ambisonic signals and to improve the localization performance with respect to existing methods.

Quaternion neural networks for 3D sound source localization in reverberant environments / Celsi, M. R.; Scardapane, S.; Comminiello, D.. - 2020-:(2020), pp. 1-6. (Intervento presentato al convegno 30th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2020 tenutosi a Espoo, Finland) [10.1109/MLSP49062.2020.9231809].

Quaternion neural networks for 3D sound source localization in reverberant environments

Scardapane S.;Comminiello D.
2020

Abstract

Localization of sound sources in 3D sound fields is an extremely challenging task, especially when the environments are reverberant and involve multiple sources. In this work, we propose a deep neural network to analyze audio signals recorded by 3D microphones and localize sound sources in a spatial sound field. In particular, we consider first-order Ambisonics microphones to capture 3D acoustic signals and represent them by spherical harmonic decomposition in the quaternion domain. Moreover, to improve the localization performance, we use quaternion input features derived from the acoustic intensity, which is strictly related to the direction of arrival (DOA) of a sound source. The proposed network architecture involves both quaternion-valued convolutional and recurrent layers. Results show that the proposed method is able to exploit both the quaternion-valued representation of ambisonic signals and to improve the localization performance with respect to existing methods.
2020
30th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2020
3D audio; convolutional recurrent neural networks; hypercomplex-valued neural networks; quaternion neural networks; source localization
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Quaternion neural networks for 3D sound source localization in reverberant environments / Celsi, M. R.; Scardapane, S.; Comminiello, D.. - 2020-:(2020), pp. 1-6. (Intervento presentato al convegno 30th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2020 tenutosi a Espoo, Finland) [10.1109/MLSP49062.2020.9231809].
File allegati a questo prodotto
File Dimensione Formato  
Celsi_Quaternion_2020.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 367.49 kB
Formato Adobe PDF
367.49 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1547550
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 9
social impact