This thesis proposes a journey into sound processing through deep learning, particularly generative models, exploring the compositional structure of sound, which is layered in different sources that compose the final auditory experience. The first part of the text focuses on the problem of separating the sources from mixtures, initially using a deterministic separator trained via adversarial losses in a permutation invariant manner and then exploring the setting of Bayesian inference through the use of latent autoregressive models. In the second half of the thesis, we focus on the continuous musical setting (as opposed to symbolic), where the sources that compose the sound are interdependent. By modeling this interdependence probabilistically, we develop diffusion models that allow for the compositional processing of the different stems present in tracks, thus not only separating them but generating them in a conditioned manner (accompaniments). Subsequently, we generalize these models to text conditioned diffusion models without requiring supervised data. We conclude the thesis by discussing possible developments in the compositional generation of audio.

From source separation to compositional music generation / Postolache, Emilian. - (2024 May 29).

From source separation to compositional music generation

POSTOLACHE, EMILIAN
29/05/2024

Abstract

This thesis proposes a journey into sound processing through deep learning, particularly generative models, exploring the compositional structure of sound, which is layered in different sources that compose the final auditory experience. The first part of the text focuses on the problem of separating the sources from mixtures, initially using a deterministic separator trained via adversarial losses in a permutation invariant manner and then exploring the setting of Bayesian inference through the use of latent autoregressive models. In the second half of the thesis, we focus on the continuous musical setting (as opposed to symbolic), where the sources that compose the sound are interdependent. By modeling this interdependence probabilistically, we develop diffusion models that allow for the compositional processing of the different stems present in tracks, thus not only separating them but generating them in a conditioned manner (accompaniments). Subsequently, we generalize these models to text conditioned diffusion models without requiring supervised data. We conclude the thesis by discussing possible developments in the compositional generation of audio.
29-mag-2024
File allegati a questo prodotto
File Dimensione Formato  
Tesi_dottorato_Postolache.pdf

accesso aperto

Note: Tesi completa
Tipologia: Tesi di dottorato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 4.77 MB
Formato Adobe PDF
4.77 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1711506
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact