This thesis proposes a journey into sound processing through deep learning, particularly generative models, exploring the compositional structure of sound, which is layered in different sources that compose the final auditory experience. The first part of the text focuses on the problem of separating the sources from mixtures, initially using a deterministic separator trained via adversarial losses in a permutation invariant manner and then exploring the setting of Bayesian inference through the use of latent autoregressive models. In the second half of the thesis, we focus on the continuous musical setting (as opposed to symbolic), where the sources that compose the sound are interdependent. By modeling this interdependence probabilistically, we develop diffusion models that allow for the compositional processing of the different stems present in tracks, thus not only separating them but generating them in a conditioned manner (accompaniments). Subsequently, we generalize these models to text conditioned diffusion models without requiring supervised data. We conclude the thesis by discussing possible developments in the compositional generation of audio.
From source separation to compositional music generation / Postolache, Emilian. - (2024 May 29).
From source separation to compositional music generation
POSTOLACHE, EMILIAN
29/05/2024
Abstract
This thesis proposes a journey into sound processing through deep learning, particularly generative models, exploring the compositional structure of sound, which is layered in different sources that compose the final auditory experience. The first part of the text focuses on the problem of separating the sources from mixtures, initially using a deterministic separator trained via adversarial losses in a permutation invariant manner and then exploring the setting of Bayesian inference through the use of latent autoregressive models. In the second half of the thesis, we focus on the continuous musical setting (as opposed to symbolic), where the sources that compose the sound are interdependent. By modeling this interdependence probabilistically, we develop diffusion models that allow for the compositional processing of the different stems present in tracks, thus not only separating them but generating them in a conditioned manner (accompaniments). Subsequently, we generalize these models to text conditioned diffusion models without requiring supervised data. We conclude the thesis by discussing possible developments in the compositional generation of audio.File | Dimensione | Formato | |
---|---|---|---|
Tesi_dottorato_Postolache.pdf
accesso aperto
Note: Tesi completa
Tipologia:
Tesi di dottorato
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
4.77 MB
Formato
Adobe PDF
|
4.77 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.