The authors deal with the extension of the hidden control neural network (HCNN) architecture to the ergodic case, i.e., if all the control state sequences are allowed. This scheme gives a deeper understanding of the modeling capabilities offered by the HCNN formalism. In fact, the control input binary digits status can be considered as the presence/absence of a posteriori defined binary phonetic features, forcing the network to produce a low prediction error on pairs of speech frames. Major improvements of the technique have been found after normalization of the output vector components by the prediction error standard deviations. Other improvements arise from the extension to a second order prediction, and an appropriate pruning of the allowed control states transition matrix. Rewiring of the original architecture as a recurrent network allows for the resynthesis of smooth spectral trajectories, once the recurrent network is fed by the optimal control sequence found by dynamic programming when matching real speech against the HCNN control input.

This work deals with the extension of the Hidden Control Neural Network HCNN) architecture to the Ergodic case, i.e. if all the control state sequences are allowed. This scheme gives a deeper understanding of the modelling capabilities offered by the HCNN formalism. In fact, the control input blnary digits status can be considered as the presence/absence of a posteriori defined binary phonetic features, forcing the Network to produce a low prediction error on pairs of speech frames. Major improvements of the technique has been found after normalisation of the output vector components by the prediction error standard deviations. Other improvements arise from the extension to a second order prediction, and an appropriate pruning of the allowed control states transition matrix. Re-wiring of the original architecture as a recurrent network allows for the re-synthesis of smooth spectral trajectories, once the recurrent network is fed by the optimal control sequence found by Dinamic programming when matching real speech against the HCNN Control Input.

Ergodic hidden control neural network for modelling of the speech process / Baldassarra, A.; Martinelli, G.; Ricotti, L. P.; Falaschi, A.. - STAMPA. - 1:(1993), pp. 605-608. (Intervento presentato al convegno 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing tenutosi a Minneapolis, MN, USA nel 27-30 Aprile 1993) [10.1109/ICASSP.1993.319191].

Ergodic hidden control neural network for modelling of the speech process

G. Martinelli;Falaschi A.
1993

Abstract

The authors deal with the extension of the hidden control neural network (HCNN) architecture to the ergodic case, i.e., if all the control state sequences are allowed. This scheme gives a deeper understanding of the modeling capabilities offered by the HCNN formalism. In fact, the control input binary digits status can be considered as the presence/absence of a posteriori defined binary phonetic features, forcing the network to produce a low prediction error on pairs of speech frames. Major improvements of the technique have been found after normalization of the output vector components by the prediction error standard deviations. Other improvements arise from the extension to a second order prediction, and an appropriate pruning of the allowed control states transition matrix. Rewiring of the original architecture as a recurrent network allows for the resynthesis of smooth spectral trajectories, once the recurrent network is fed by the optimal control sequence found by dynamic programming when matching real speech against the HCNN control input.
1993
0780309464
This work deals with the extension of the Hidden Control Neural Network HCNN) architecture to the Ergodic case, i.e. if all the control state sequences are allowed. This scheme gives a deeper understanding of the modelling capabilities offered by the HCNN formalism. In fact, the control input blnary digits status can be considered as the presence/absence of a posteriori defined binary phonetic features, forcing the Network to produce a low prediction error on pairs of speech frames. Major improvements of the technique has been found after normalisation of the output vector components by the prediction error standard deviations. Other improvements arise from the extension to a second order prediction, and an appropriate pruning of the allowed control states transition matrix. Re-wiring of the original architecture as a recurrent network allows for the re-synthesis of smooth spectral trajectories, once the recurrent network is fed by the optimal control sequence found by Dinamic programming when matching real speech against the HCNN Control Input.
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/494380
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact