In recent years, several approaches have been proposed for the task of Sound Event Localization and Detection (SELD) with multiple overlapping sound events in the 3D sound field. However, accuracy improvements have been often achieved at the expense of more complex networks and a larger number of parameters. In this paper, we propose an efficient and lightweight Quaternion Temporal Convolutional Network for the SELD task (QSELD-TCN), which combines the advantages of the quaternion-valued processing and the effectiveness of the Temporal Convolutional Network (TCN). The proposed approach involves a representation of the Ambisonic signal components as a single quaternion and, accordingly, the use of quaternion-valued layers through the whole structure of the neural network. This results in a considerable saving of parameters with respect to the corresponding real-valued model. In particular, a quaternion implementation of the TCN block is presented, exploiting TCN ability in capturing long-term dependencies and the effectiveness of quaternion convolutional layers in grasping correlations among input dimensions. The proposed approach implies less runtime memory and lower storage memory, and it achieves faster inference time with respect to the state-of-the-art methods, making its implementation possible even in devices with limited resources.
Efficient Sound Event Localization and Detection in the Quaternion Domain / Brignone, Christian; Mancini, Gioia; Grassucci, Eleonora; Uncini, Aurelio; Comminiello, Danilo. - In: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS. II, EXPRESS BRIEFS. - ISSN 1549-7747. - 69:5(2022), pp. 2453-2457. [10.1109/TCSII.2022.3160388]
Efficient Sound Event Localization and Detection in the Quaternion Domain
Grassucci Eleonora
;Uncini Aurelio;Comminiello Danilo
2022
Abstract
In recent years, several approaches have been proposed for the task of Sound Event Localization and Detection (SELD) with multiple overlapping sound events in the 3D sound field. However, accuracy improvements have been often achieved at the expense of more complex networks and a larger number of parameters. In this paper, we propose an efficient and lightweight Quaternion Temporal Convolutional Network for the SELD task (QSELD-TCN), which combines the advantages of the quaternion-valued processing and the effectiveness of the Temporal Convolutional Network (TCN). The proposed approach involves a representation of the Ambisonic signal components as a single quaternion and, accordingly, the use of quaternion-valued layers through the whole structure of the neural network. This results in a considerable saving of parameters with respect to the corresponding real-valued model. In particular, a quaternion implementation of the TCN block is presented, exploiting TCN ability in capturing long-term dependencies and the effectiveness of quaternion convolutional layers in grasping correlations among input dimensions. The proposed approach implies less runtime memory and lower storage memory, and it achieves faster inference time with respect to the state-of-the-art methods, making its implementation possible even in devices with limited resources.File | Dimensione | Formato | |
---|---|---|---|
Brignone_Efficient_2022.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.44 MB
Formato
Adobe PDF
|
1.44 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.