Blind equalization techniques for ETSI standard DSR front-end
ABSTRACT We present blind equalization techniques for ETSI standard distributed speech recognition (DSR) frontend which compensate for acoustic mismatch caused by input devices. The DSR front-end employs vector quantization (VQ) for feature parameter compression so that the mismatch does not only cause a shift of parameters but also increases VQ distortion. Although cepstral mean subtraction (CMS) is one of the most effective methods to compensate for the shift, it can not decrease VQ distortion in DSR. To compensate for the shift and decrease VQ distortion simultaneously, the proposed methods estimate the shift in the input data necessary to match the VQ codebook distribution. The methods do not need the acoustic likelihood which is calculated in a decoder on the server side. Therefore, they are applicable to the DSR front-end. Japanese Newspaper Article Sentences database (JNAS) was used for the equalization experiments. While the word error rate (WER) for ETSI standard DSR frontend was 18.6 % under acoustic mismatched condition, our proposed method yielded a rate of 12.3 %.
- SourceAvailable from: 22.214.171.124[show abstract] [hide abstract]
ABSTRACT: This paper reports an evaluation of European Telecommunications Standards Institute (ETSI) standard Distributed Speech Recogni- tion (DSR) front-end through continuous word recognition on a Japanese speech corpus and proposes a method, the Bias Removal Method (BRM), that reduces the distortion between feature vec- tor and VQ codebook. Experimental results show that using non- quantized features in acoustic model training procedure can im- prove the recognition performance of DSR fornt-end features and that the proposed method can improve recognition performances of DSR front-end feature.7th International Conference on Spoken Language Processing, ICSLP2002 - INTERSPEECH 2002, Denver, Colorado, USA, September 16-20, 2002; 01/2002
- [show abstract] [hide abstract]
ABSTRACT: Speech coding affects speech recognition performance, with recognition accuracy deteriorating as the coded bit rate decreases. Virtually all systems that recognize coded speech reconstruct the speech waveform from the coded parameters, and then perform recognition (after possible noise and/or channel compensation) using conventional techniques. In this paper we compare the recognition accuracy of coded speech obtained by reconstructing the speech waveform with the speech recognition accuracy obtained when using cepstral features derived from the coding parameters. We focus our efforts on speech that has been coded using the 13-kbps full-rate GSM codec, a Regular Pulse Excited Long Term Prediction (RPE-LTP) codec. The GSM codec develops separate representations for the linear prediction (LPC) filter and the residual signal components of the coded speech. We measure the effects of quantization and coding on the accuracy with which these parameters are represented, and present two differe...02/1970;
Conference Proceeding: Effect of speech coders on speech recognition performance.The 4th International Conference on Spoken Language Processing, Philadelphia, PA, USA, October 3-6, 1996; 01/1996