ABSTRACT: In this paper, we define an F0 quantization scheme for a very low bit rate speech coder based on HMM (hidden Markov model). In the coding system, the encoder carries out phoneme recognition, and transmits phoneme indices, state durations and F0 information to the decoder. In the decoder, phoneme HMM are concatenated according to the phoneme indices, and a sequence of mel-cepstral coefficient vectors is generated from the concatenated HMM. Finally we obtain synthetic speech by using the MLSA (mel log spectrum approximation) filter according to the mel-cepstral coefficients and F0 information. In addition to the F0 quantization, we investigate encoding methods for other parameters to reduce the bit rate, yet keeping the subjective speech quality. A subjective listening test shows that the performance of the proposed coder at about 100∼150 bit/s is superior to a VQ-based vocoder at 600 bit/s (mel-cepstrum: 6 bit/frame×50 frame/s, F0: 6 bit/frame×50 frame/s).
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on; 05/2003 · 4.63 Impact Factor