Catalogo dei prodotti della ricerca

The primary goal of the L3DAS (Learning 3D Audio Sources) project is to stimulate and support collaborative research studies concerning machine learning techniques applied to 3D audio signal processing. To this end, the L3DAS23 Challenge, presented at IEEE ICASSP 2023, focuses on two spatial audio tasks of paramount interest for practical uses: 3D speech enhancement (3DSE) and 3D sound event localization and detection (3DSELD). Both tasks are evaluated within augmented reality applications. The aim of this paper is to describe the main results obtained from this challenge. We provide the L3DAS23 dataset, which comprises a collection of first-order Ambisonics recordings in reverberant simulated environments. Indeed, we maintain some general characteristics of the previous L3DAS challenges, featuring a pair of first-order Ambisonics microphones to capture the audio signals and involving multiple-source and multiple-perspective Ambisonics recordings. However, in this new edition, we introduce audio-visual scenarios by including images that depict the frontal view of the environments as captured from the perspective of the microphones. This addition aims to enrich the challenge experience, giving participants tools for exploring a combination of audio and images for solving the 3DSE and 3DSELD tasks. In addition to a brand-new dataset, we provide updated baseline models designed to take advantage of audio-image pairs. To ensure accessibility and reproducibility, we also supply supporting API for an effortless replication of our results. We support the dataset download and the use of the baseline models via extensive instructions provided on the official GitHub repository at https://github.com/l3das/L3DAS23. Lastly, we present the results achieved by the participants of the L3DAS23 Challenge. For more comprehensive information and in-depth details about the challenge, we invite the reader to visit the L3DAS Project website at http://www.l3das.com/icassp2023.

L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality / Gramaccioni, R. F.; Marinoni, C.; Chen, C.; Uncini, A.; Comminiello, D.. - In: IEEE OPEN JOURNAL OF SIGNAL PROCESSING. - ISSN 2644-1322. - 5:(2024), pp. 1-9. [10.1109/OJSP.2024.3376297]

L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality

Gramaccioni R. F.;Marinoni C.;Chen C.;Uncini A.;Comminiello D.

2024

Abstract

The primary goal of the L3DAS (Learning 3D Audio Sources) project is to stimulate and support collaborative research studies concerning machine learning techniques applied to 3D audio signal processing. To this end, the L3DAS23 Challenge, presented at IEEE ICASSP 2023, focuses on two spatial audio tasks of paramount interest for practical uses: 3D speech enhancement (3DSE) and 3D sound event localization and detection (3DSELD). Both tasks are evaluated within augmented reality applications. The aim of this paper is to describe the main results obtained from this challenge. We provide the L3DAS23 dataset, which comprises a collection of first-order Ambisonics recordings in reverberant simulated environments. Indeed, we maintain some general characteristics of the previous L3DAS challenges, featuring a pair of first-order Ambisonics microphones to capture the audio signals and involving multiple-source and multiple-perspective Ambisonics recordings. However, in this new edition, we introduce audio-visual scenarios by including images that depict the frontal view of the environments as captured from the perspective of the microphones. This addition aims to enrich the challenge experience, giving participants tools for exploring a combination of audio and images for solving the 3DSE and 3DSELD tasks. In addition to a brand-new dataset, we provide updated baseline models designed to take advantage of audio-image pairs. To ensure accessibility and reproducibility, we also supply supporting API for an effortless replication of our results. We support the dataset download and the use of the baseline models via extensive instructions provided on the official GitHub repository at https://github.com/l3das/L3DAS23. Lastly, we present the results achieved by the participants of the L3DAS23 Challenge. For more comprehensive information and in-depth details about the challenge, we invite the reader to visit the L3DAS Project website at http://www.l3das.com/icassp2023.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2024
			
	Parole chiave
	
				3D Audio; 3D audio; Ambisonics; Data Challenge; Location awareness; Microphones; Noise measurement; Sound Event Localization and Detection; Speech Enhancement; Speech enhancement; Task analysis; Three-dimensional displays
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality / Gramaccioni, R. F.; Marinoni, C.; Chen, C.; Uncini, A.; Comminiello, D.. - In: IEEE OPEN JOURNAL OF SIGNAL PROCESSING. - ISSN 2644-1322. - 5:(2024), pp. 1-9. [10.1109/OJSP.2024.3376297]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Gramaccioni_L3DAS23_2024.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 2.88 MB Formato Adobe PDF	2.88 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1714344

Citazioni

ND

2

1

social impact