Automatic Music Transcription (AMT) is a complex matter involving several researchers. Recently, due to the powerful Deep Learning techniques, many effective solutions have been proposed. However, there is still room for improvement. To this purpose, in this paper, we propose an architecture based on two U-Net models exploiting Convolutional Neural Networks (CNNs) and a Bidirectional Long-Short Term Memory (BiLSTM) unit, aiming at improving the wave to MIDI transcription performance. This couple of U-Nets act as onset and offset detectors, respectively, whose information are jointly used along with the input mel spectrogram into a third model to find all the active notes in each time-frame. Some numerical results, obtained on the well known MAPS dataset, show the effectiveness of the proposed idea and the advantages over similar state-of-the-art approaches.

A U-Net Based Architecture for Automatic Music Transcription / Scarpiniti, Michele; Sigismondi, Edoardo; Comminiello, Danilo; Uncini, Aurelio. - (2023), pp. 1-6. (Intervento presentato al convegno 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP 2023) tenutosi a Rome; Italy) [10.1109/MLSP55844.2023.10285985].

A U-Net Based Architecture for Automatic Music Transcription

Scarpiniti, Michele
;
Comminiello, Danilo;Uncini, Aurelio
2023

Abstract

Automatic Music Transcription (AMT) is a complex matter involving several researchers. Recently, due to the powerful Deep Learning techniques, many effective solutions have been proposed. However, there is still room for improvement. To this purpose, in this paper, we propose an architecture based on two U-Net models exploiting Convolutional Neural Networks (CNNs) and a Bidirectional Long-Short Term Memory (BiLSTM) unit, aiming at improving the wave to MIDI transcription performance. This couple of U-Nets act as onset and offset detectors, respectively, whose information are jointly used along with the input mel spectrogram into a third model to find all the active notes in each time-frame. Some numerical results, obtained on the well known MAPS dataset, show the effectiveness of the proposed idea and the advantages over similar state-of-the-art approaches.
2023
2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP 2023)
Automatic Music Transcription (AMT); wave to MIDI; deep learning; U-Net; Convolutional Neural Network (CNN)
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
A U-Net Based Architecture for Automatic Music Transcription / Scarpiniti, Michele; Sigismondi, Edoardo; Comminiello, Danilo; Uncini, Aurelio. - (2023), pp. 1-6. (Intervento presentato al convegno 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP 2023) tenutosi a Rome; Italy) [10.1109/MLSP55844.2023.10285985].
File allegati a questo prodotto
File Dimensione Formato  
Scarpiniti_A-U-Net_2023.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 397.8 kB
Formato Adobe PDF
397.8 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1692643
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact