In this work, we propose a neural network approach for speech reconstruction from mel spectrograms, a crucial task in achieving high-quality data after processing speech signals in the time-frequency domain. Specifically, we propose a two-stage deep learning approach based on an overcomplete deep autoencoder (DAE) for the mel filter bank inversion coupled with the deep version of the Griffin-Lim (DeGLI) algorithm for the phase information recovery. After the pre-training of both parts of the architecture, a final fine-tuning on the whole system is performed. Some numerical results, evaluated on the well-known TIMIT dataset, demonstrate the effectiveness of the proposed idea by obtaining a PESQ of 3.996, a STOI equal to 0.994, and a mean opinion score evaluated as 4.15.
A two-stage neural network for speech signal reconstruction from Mel spectrograms / Villani, Filippo; Scarpiniti, Michele; Uncini, Aurelio. - (2025), pp. 267-278. - SMART INNOVATION, SYSTEMS AND TECHNOLOGIES. [10.1007/978-981-96-0994-9_25].
A two-stage neural network for speech signal reconstruction from Mel spectrograms
Villani, Filippo;Scarpiniti, Michele
;Uncini, Aurelio
2025
Abstract
In this work, we propose a neural network approach for speech reconstruction from mel spectrograms, a crucial task in achieving high-quality data after processing speech signals in the time-frequency domain. Specifically, we propose a two-stage deep learning approach based on an overcomplete deep autoencoder (DAE) for the mel filter bank inversion coupled with the deep version of the Griffin-Lim (DeGLI) algorithm for the phase information recovery. After the pre-training of both parts of the architecture, a final fine-tuning on the whole system is performed. Some numerical results, evaluated on the well-known TIMIT dataset, demonstrate the effectiveness of the proposed idea by obtaining a PESQ of 3.996, a STOI equal to 0.994, and a mean opinion score evaluated as 4.15.| File | Dimensione | Formato | |
|---|---|---|---|
|
Villani_Two-stage_2025.pdf
solo gestori archivio
Note: Mel-Inversion_editoriale
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
286.63 kB
Formato
Adobe PDF
|
286.63 kB | Adobe PDF | Contatta l'autore |
|
Villani_postprint-Two-stage_2025.pdf
solo gestori archivio
Note: Mel-Inversion_Postprint
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
364.09 kB
Formato
Adobe PDF
|
364.09 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


