The primary goal of the L3DAS (Learning 3D Audio Sources) project is to stimulate and support collaborative research studies concerning machine learning techniques applied to 3D audio signal processing. To this end, the L3DAS23 Challenge, presented at IEEE ICASSP 2023, focuses on two spatial audio tasks of paramount interest for practical uses: 3D speech enhancement (3DSE) and 3D sound event localization and detection (3DSELD). Both tasks are evaluated within augmented reality applications. The aim of this paper is to describe the main results obtained from this challenge. We provide the L3DAS23 dataset, which comprises a collection of first-order Ambisonics recordings in reverberant simulated environments. Indeed, we maintain some general characteristics of the previous L3DAS challenges, featuring a pair of first-order Ambisonics microphones to capture the audio signals and involving multiple-source and multiple-perspective Ambisonics recordings. However, in this new edition, we introduce audio-visual scenarios by including images that depict the frontal view of the environments as captured from the perspective of the microphones. This addition aims to enrich the challenge experience, giving participants tools for exploring a combination of audio and images for solving the 3DSE and 3DSELD tasks. In addition to a brand-new dataset, we provide updated baseline models designed to take advantage of audio-image pairs. To ensure accessibility and reproducibility, we also supply supporting API for an effortless replication of our results. We support the dataset download and the use of the baseline models via extensive instructions provided on the official GitHub repository at https://github.com/l3das/L3DAS23. Lastly, we present the results achieved by the participants of the L3DAS23 Challenge. For more comprehensive information and in-depth details about the challenge, we invite the reader to visit the L3DAS Project website at http://www.l3das.com/icassp2023.

L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality / Gramaccioni, R. F.; Marinoni, C.; Chen, C.; Uncini, A.; Comminiello, D.. - In: IEEE OPEN JOURNAL OF SIGNAL PROCESSING. - ISSN 2644-1322. - 5:(2024), pp. 1-9. [10.1109/OJSP.2024.3376297]

L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality

Gramaccioni R. F.;Marinoni C.
;
Uncini A.;Comminiello D.
2024

Abstract

The primary goal of the L3DAS (Learning 3D Audio Sources) project is to stimulate and support collaborative research studies concerning machine learning techniques applied to 3D audio signal processing. To this end, the L3DAS23 Challenge, presented at IEEE ICASSP 2023, focuses on two spatial audio tasks of paramount interest for practical uses: 3D speech enhancement (3DSE) and 3D sound event localization and detection (3DSELD). Both tasks are evaluated within augmented reality applications. The aim of this paper is to describe the main results obtained from this challenge. We provide the L3DAS23 dataset, which comprises a collection of first-order Ambisonics recordings in reverberant simulated environments. Indeed, we maintain some general characteristics of the previous L3DAS challenges, featuring a pair of first-order Ambisonics microphones to capture the audio signals and involving multiple-source and multiple-perspective Ambisonics recordings. However, in this new edition, we introduce audio-visual scenarios by including images that depict the frontal view of the environments as captured from the perspective of the microphones. This addition aims to enrich the challenge experience, giving participants tools for exploring a combination of audio and images for solving the 3DSE and 3DSELD tasks. In addition to a brand-new dataset, we provide updated baseline models designed to take advantage of audio-image pairs. To ensure accessibility and reproducibility, we also supply supporting API for an effortless replication of our results. We support the dataset download and the use of the baseline models via extensive instructions provided on the official GitHub repository at https://github.com/l3das/L3DAS23. Lastly, we present the results achieved by the participants of the L3DAS23 Challenge. For more comprehensive information and in-depth details about the challenge, we invite the reader to visit the L3DAS Project website at http://www.l3das.com/icassp2023.
2024
3D Audio; 3D audio; Ambisonics; Data Challenge; Location awareness; Microphones; Noise measurement; Sound Event Localization and Detection; Speech Enhancement; Speech enhancement; Task analysis; Three-dimensional displays
01 Pubblicazione su rivista::01a Articolo in rivista
L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality / Gramaccioni, R. F.; Marinoni, C.; Chen, C.; Uncini, A.; Comminiello, D.. - In: IEEE OPEN JOURNAL OF SIGNAL PROCESSING. - ISSN 2644-1322. - 5:(2024), pp. 1-9. [10.1109/OJSP.2024.3376297]
File allegati a questo prodotto
File Dimensione Formato  
Gramaccioni_L3DAS23_2024.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 2.88 MB
Formato Adobe PDF
2.88 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1714344
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact