Robust features for speech recognition based on admissible wavelet packets

Dept. of Electron. & Electr. Eng., Loughborough Univ. of Technol., UK
Electronics Letters (Impact Factor: 1.07). 01/2002; 37(25):1554 - 1556. DOI: 10.1049/el:20011029
Source: IEEE Xplore

ABSTRACT A six-band filter structure derived by using admissible wavelet
packets for the extraction of the features for recognition of noisy
speech is proposed. A simple compensation for white Gaussian noise is
carried out and the recognition performance is compared with the
features based on Mel scale cepstral coefficients (MFCC) and 24-band
admissible wavelet packet filter structure

  • [Show abstract] [Hide abstract]
    ABSTRACT: A robust feature extraction technique using Teager Energy Operator (TEO) for Isolated Word Recognition (IWR) has been proposed in this paper. A feature extraction algorithm is motivated by the enhanced discrimination capability TEO that estimates the true energy of the source of a resonance. The robustness is further added using Cepstral Mean Normalization (CMN) on the estimated features. The robust features are computed from the speech signal of a given frame through a series of steps. First, the short time spectrum of each frame of speech signal is calculated. Second, the frame spectrum is passed through a Mel scaled triangular filter bank. Then, the average of absolute values of sequence obtained after applying TEO on each filter output is estimated. Finally, the cepstral coefficients are extracted by applying discrete cosine transform on the estimated averages. These coefficients are further normalized using CMN to get the final features denoted as Normalized Teager Energy Coefficient (NTEC) features. The effectiveness of this technique has been tested on TI-20 isolated word database in presence of white noise. The experimental results show the superiority of the proposed technique over conventional MFCC, Spectral Subtraction (SS) and CMN methods.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Most of the existing classification methods, used for voice pathology assessment, are built based on labeled pathological and normal voice signals. This paper studies the problem of building a classifier using labeled and unlabeled data. We propose a novel learning technique, called Partitioning and Biased Support Vector Machine Classification (PBSVM), which tries to utilize all the available data in two steps: (1) a new heuristically partition-based algorithm, which extracts high quality pathological and normal samples from an unlabeled set, and (2) a more principle approach based on biased formulation of support vector machine, which is fairly robust to mislabeling and unbalance data problem. Experiments with wavelet-based energy features extracted from sustained vowels show that the new recognition scheme is highly feasible and significantly outperform the baseline classical SVM classifier, especially in the situation where the labeled training data is small.
    Expert Systems with Applications 01/2011; 38(1):610-619. DOI:10.1016/j.eswa.2010.07.010 · 1.97 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The existence of vocal fold edema or the formation of nodules and polyp are one of the conventional types of benign vocal fold lesions that can affect the speech signal quality of patients. This paper proposes a non-invasive method in order to discriminate these three types of vocal fold diseases and classify them into their corresponding group of vocal fold inflammation by processing the speech signal of patients. Experiments on the basis of two different methods of feature extraction, wavelet packet sub bands and Mel frequency scaled filter banks, are carried out with 83 voiced signals, uttered by individuals of both sexes, aged from 19 to 81, each suffering from one of these three special cases of vocal fold swelling. As the similarity of these three groups of vocal fold disorder leads to highly correlated groups of extracted features for each class, genetic algorithm is applied to find the most separable feature vector indexes. The classification done through using support vector machine as a nonlinear classifier showed that extracted feature vectors on the basis of entropy definition, as an expression of vocal fold irregularities, under some specific wavelet packet sub-bands results in the best classification percentage of 91.18% for these three classes of vocal fold pathology.
    International Conference on Speech and Computer; 09/2005