A low-bit-rate codec based on an ergodic hidden Markov model (EHMM) is presented. A 256-state autoregressive Gaussian EHMM is trained on speech uttered by eight different speakers by means of the Baum-Welch algorithm. Initial estimates are obtained from vector quantization. The resulting EHMM is then utilized for a Viterbi decoding of incoming speech data. The state sequence obtained is frame synchronously encoded. The bit rate is gradually lowered by cutting off low-probability transitions and thus reducing the destination state encoding bit allocation requirements. The encoded spectra sequence is used, on the receiver side of the codec, for linear predictive coding synthesis. Global entropy and distortion measures for different bit rates are reported and compared to vector quantization results. Informal listening tests were performed by comparing the results of the proposed method at various bit rates and vector quantization using the same material

A finite state Markov quantizer for speech coding / Falaschi, Alessandro; Giustiniani, M.; Pierucci, P.. - STAMPA. - (1990), pp. 205-208. (Intervento presentato al convegno International Conference on Acoustics, Speech, and Signal Processing, 1990. ICASSP-90. tenutosi a Albuquerque, NM) [10.1109/ICASSP.1990.115574].

A finite state Markov quantizer for speech coding

Falaschi, Alessandro;
1990

Abstract

A low-bit-rate codec based on an ergodic hidden Markov model (EHMM) is presented. A 256-state autoregressive Gaussian EHMM is trained on speech uttered by eight different speakers by means of the Baum-Welch algorithm. Initial estimates are obtained from vector quantization. The resulting EHMM is then utilized for a Viterbi decoding of incoming speech data. The state sequence obtained is frame synchronously encoded. The bit rate is gradually lowered by cutting off low-probability transitions and thus reducing the destination state encoding bit allocation requirements. The encoded spectra sequence is used, on the receiver side of the codec, for linear predictive coding synthesis. Global entropy and distortion measures for different bit rates are reported and compared to vector quantization results. Informal listening tests were performed by comparing the results of the proposed method at various bit rates and vector quantization using the same material
1990
International Conference on Acoustics, Speech, and Signal Processing, 1990. ICASSP-90.
ergodic hidden Markov model; vector quantization; speech coding
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
A finite state Markov quantizer for speech coding / Falaschi, Alessandro; Giustiniani, M.; Pierucci, P.. - STAMPA. - (1990), pp. 205-208. (Intervento presentato al convegno International Conference on Acoustics, Speech, and Signal Processing, 1990. ICASSP-90. tenutosi a Albuquerque, NM) [10.1109/ICASSP.1990.115574].
File allegati a questo prodotto
File Dimensione Formato  
Falaschi_Finite-state-Markov_1990.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 450.02 kB
Formato Adobe PDF
450.02 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/494473
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact