-
[show abstract]
[hide abstract]
ABSTRACT: In this paper we report a speaker adaptation method when a small
subset of word classes is available for the adaptation. A spectral
transformation approach is used to adapt to a new speaker without
changing the parameters of speaker-independent recognizer. By using
“judge” network the degradation of the recognition rates for
non-adapted word classes is minimized, which leads to the improvement of
overall word recognition rates. The pruned judge network uses less
parameters, but shows better generalization capability than full
connected linear judge networks. Remarkable reduction of error rates is
achieved for adapted 10 words, while maintaining almost same recognition
rates for the non-adapted words. The results demonstrated much better
recognition rates compared to the base-line system
Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on; 02/2000
-
[show abstract]
[hide abstract]
ABSTRACT: Zero-crossings with peak amplitudes (ZCPA) model motivated by human auditory periphery is simple compared with other auditory models, but powerful speech analysis tool for robust speech recognition in noisy environments. In this paper, improvement in recognition rate of ZCPA model is addressed by incorporating time-derivative features with several different time-derivative window lengths. Experimental results show that ZCPA has relatively higher sensitivity to derivative window length than conventional feature extraction algorithms. Also, experimental comparisons with several front-ends including some auditorylike schemes in real-world noisy environments demonstrate the robustness of ZCPA model. ZCPA model shows superior performance compared with other frontends especially in noisy condition corrupted by white Gaussian noise. 1.
01/2000;
-
[show abstract]
[hide abstract]
ABSTRACT: The "Voice Command" system, designed for isolated word recognition tasks in real-world noisy environments, was implemented on a fixed-point DSP board to operate in real-time. Simple auditory model, i.e., zero-crossings with peak amplitudes (ZCPA) model, is used for noise-robust feature extraction, and neural network classifier recognizes input patterns. The system performance is further improved by incorporating speaker adaptation and out-of-vocabulary word rejection capabilities. The radial basis function (RBF) classifier provides better rejection performance than multi-layer perceptron (MLP) classifiers. 1.
01/2000;
-
[show abstract]
[hide abstract]
ABSTRACT: Zero-Crossings with Peak Amplitudes (ZCPA) model motivated by human auditory periphery was proposed to extract reliable features from speech signals even in noisy environments for robust speech recognition. In this paper, some practical considerations for digital hardware implementations of the ZCPA model are addressed and evaluated for recognition of speech corrupted by several real world noises as well as white Gaussian noise. Infinite impulse response (IIR) filters which constitute the cochlear filterbank of the ZCPA are replaced by hamming bandpass filters of which frequency responses are less similar to biological neural tuning curves. Experimental results demonstrate that the detailed frequency response of the cochlear filters are not critical to the performance. Also, the sensitivity of the model output to the variations in microphone gain is investigated, and results in good reliability of the ZCPA model.
01/2000;
-
[show abstract]
[hide abstract]
ABSTRACT: Zero-crossings with peak amplitudes (ZCPA) model motivated by human auditory periphery was proposed to extract reliable features from speech signals even in noisy environments for robust speech recognition. In this paper, the performance of the ZCPA model is further improved by incorporating conventional speech processing techniques into the model output. Spectral and cepstral representations of the ZCPA model output are compared, and the incorporation of dynamic features with several different lengths of time-derivative window are evaluated. Also, comparative evaluations with other front-ends in real-world noisy environments are performed, and result in the superiority of the ZCPA model.
01/2000;
-
[show abstract]
[hide abstract]
ABSTRACT: The Ensemble Interval Histogram (EIH) is an auditory model which can be used as a robust "front-end" for speech recognition systems. The utilization of multiple level-crossing detectors in the EIH provides frequency and intensity information, which may be useful for speech processing. Proper determination of the number of levels and the level values is very important for reliable performance of the system. In this paper, an analytic relationship is developed for variance and SNR of the level-crossing intervals as a function of the crossing level value, and a new feature extraction method based on zero-crossings with peak amplitudes is proposed for robust speech recognition in noisy environments. The proposed method not only can preserve intensity information, but also is robust to noise in estimating frequency information without the efforts to determine the level values and the number of levels. Experimental results show the robustness of the proposed method.
01/2000;
-
Inf. Sci. 01/2000; 123:13-24.
-
The Fifth International Conference on Neural Information Processing, ICONIP'R98, Kitakyushu, Japan, October 21-23, 1998, Proceedings; 01/1998
-
[show abstract]
[hide abstract]
ABSTRACT: A high power class A amplifier has excellent fidelity but
dissipates too much power. A class D amplifier has high efficiency but
shows poor fidelity. This paper proposes a combination of a high
fidelity class A power amplifier with class D power amplifier as
variable power supply. This amplifier named as class I has the merits of
both class A amplifier and class D. The efficiency is seventy-seven
percent at full power rating. The distortion of the proposed amplifier
is about the same as that of class A amplifier. The measured 3 dB
bandwidth is from 10 Hz to 100 kHz
Power Electronics Specialists Conference, 1997. PESC '97 Record., 28th Annual IEEE; 07/1997
-
Consumer Electronics, 1997. Digest of Technical Papers. ICCE., International Conference on; 07/1997
-
Fifth European Conference on Speech Communication and Technology, EUROSPEECH 1997, Rhodes, Greece, September 22-25, 1997; 01/1997
-
[show abstract]
[hide abstract]
ABSTRACT: The ensemble interval histogram (EIH) is an auditory model which can be used as a robust “front-end” for speech recognition systems. The utilization of multiple level-crossing detectors in the EIH provides frequency and intensity information, which may be useful for speech processing. Proper determination of the number of levels and the level values is very important for reliable performance of the system. An analytic relationship is developed for the variance and SNR of the level-crossing intervals as a function of the crossing level value, and a new feature extraction method based on zero-crossings with peak amplitudes is proposed for robust speech recognition in noisy environments. The proposed method not only can preserve intensity information, but also is robust to noise in estimating the frequency information without the need to determine the level values and the number of levels. Experimental results show the robustness of the proposed method
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on; 06/1996