Robust features for speech recognition based on admissible wavelet packets

Dept. of Electron. & Electr. Eng., Loughborough Univ. of Technol., UK
Electronics Letters (Impact Factor: 1.04). 01/2002; DOI:10.1049/el:20011029
Source: IEEE Xplore

ABSTRACT A six-band filter structure derived by using admissible wavelet
packets for the extraction of the features for recognition of noisy
speech is proposed. A simple compensation for white Gaussian noise is
carried out and the recognition performance is compared with the
features based on Mel scale cepstral coefficients (MFCC) and 24-band
admissible wavelet packet filter structure

0 0
  • [show abstract] [hide abstract]
    ABSTRACT: A robust feature extraction technique using Teager Energy Operator (TEO) for Isolated Word Recognition (IWR) has been proposed in this paper. A feature extraction algorithm is motivated by the enhanced discrimination capability TEO that estimates the true energy of the source of a resonance. The robustness is further added using Cepstral Mean Normalization (CMN) on the estimated features. The robust features are computed from the speech signal of a given frame through a series of steps. First, the short time spectrum of each frame of speech signal is calculated. Second, the frame spectrum is passed through a Mel scaled triangular filter bank. Then, the average of absolute values of sequence obtained after applying TEO on each filter output is estimated. Finally, the cepstral coefficients are extracted by applying discrete cosine transform on the estimated averages. These coefficients are further normalized using CMN to get the final features denoted as Normalized Teager Energy Coefficient (NTEC) features. The effectiveness of this technique has been tested on TI-20 isolated word database in presence of white noise. The experimental results show the superiority of the proposed technique over conventional MFCC, Spectral Subtraction (SS) and CMN methods.
    Advances in Computing, Control, and Telecommunication Technologies, International Conference on. 01/2009;
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Unilateral vocal fold paralysis (UVFP) is one of the most severe types of neurogenic laryngeal disorder in which the patients, due to their vocal cords malfunction, are confronted by some serious problems. As the effect of such pathologies would be significantly evident in the reduced quality and feature variation of dysphonic voices, this study is designed to scrutinize the piecewise variation of some specific types of these features, known as energy and entropy, all over the frequency range of pathological speech signals. In order to do so, the wavelet-packet coefficients, in five consecutive levels of decomposition, are used to extract the energy and entropy measures at different spectral sub-bands. As the decomposition procedure leads to a set of high-dimensional feature vectors, genetic algorithm is invoked to search for a group of optimal sub-band indexes for which the extracted features result in the highest recognition rate for pathological and normal subjects' classification. The results of our simulations, using support vector machine classifier, show that the highest recognition rate, for both optimized energy and entropy measures, is achieved at the fifth level of wavelet-packet decomposition. It is also found that entropy feature, with the highest recognition rate of 100% vs. 93.62% for energy, is more prominent in discriminating patients with UVFP from normal subjects. Therefore, entropy feature, in comparison with energy, demonstrates a more efficient description of such pathological voices and provides us a valuable tool for clinical diagnosis of unilateral laryngeal paralysis.
    Computers in Biology and Medicine 05/2007; 37(4):474-85. · 1.16 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: Most of the existing classification methods, used for voice pathology assessment, are built based on labeled pathological and normal voice signals. This paper studies the problem of building a classifier using labeled and unlabeled data. We propose a novel learning technique, called Partitioning and Biased Support Vector Machine Classification (PBSVM), which tries to utilize all the available data in two steps: (1) a new heuristically partition-based algorithm, which extracts high quality pathological and normal samples from an unlabeled set, and (2) a more principle approach based on biased formulation of support vector machine, which is fairly robust to mislabeling and unbalance data problem. Experiments with wavelet-based energy features extracted from sustained vowels show that the new recognition scheme is highly feasible and significantly outperform the baseline classical SVM classifier, especially in the situation where the labeled training data is small.
    Expert Syst. Appl. 01/2011; 38:610-619.