Conference Paper

Recognition of stress in speech using wavelet analysis and Teager energy operator.

Conference: INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, September 22-26, 2008
Source: DBLP
1 Bookmark
 · 
65 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: The speech signal is an important tool for conveying information between humans; at the same time, it is an indicator of a speaker's emotions. In this paper, the automatic identification of affect from speech containing spontaneously expressed (not acted) emotions within different environments was investigated. The teager energy operator-perceptual wavelet packet (TEO-PWP) features as well as the mel frequency cepstral coefficients (MFCC) were used to model the emotions using two classifiers: the Gaussian mixture model (GMM) and the probabilistic neural network (PNN). The classification experiments were conducted using two data sets: SUSAS with three classes (high stress, moderate stress and neutral) and ORI with five classes (angry, happy, anxious, dysphoric and neutral). Depending on the features/classifier combination, the average classification results for the SUSAS data ranged from 95% to 61%, whereas the ORI data provided lower average rates ranging from 57% to 37%. The best overall performance was achieved while using the TEO-PWP in combination with the GMM classifier giving an average of 94.75% correct classifications for the SUSAS data and 56.6% for the ORI data. Different arousal levels between SUSAS and ORI emotional classes were suggested to be most likely cause for the difference in classification rates between these two data sets.
    Bioinformatics and Biomedical Engineering , 2009. ICBBE 2009. 3rd International Conference on; 07/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper investigates automatic affect classification in spontaneous speech within normal and clinical family environments. The data base used in this study comprised speech recordings of parents of depressed adolescents (19 fathers and 20 mothers) and parents of non-depressed adolescents (25 fathers and 7 mothers). The speech data were recorded during natural parent-child conversations. Five emotional classes were considered: neutral, angry, anxious, dysphoric, and happy. Four different combinations of features (set A, B, C, and D) derived from the Teager energy operator (TEO) and two different classifiers: probabilistic neural network (PNN) and Gaussian mixture model (GMM) were tested and compared. The feature extraction process was combined with an optimal feature selection algorithm based on the mutual information criteria. The GMM classifier provided consistently higher correct classification rates (49.6% to 62.0%) compared with the PNN classifier (31.6% to 42.7%). Set C/GMM was found to be the best performing feature/classifier combination. In all cases, the classification rates for parents of depressed adolescents were higher than for parents of non-depressed adolescents. Similarly, the classification rates for mothers were higher than for fathers. The results appear to suggest that parents of depressed adolescents express their emotions with higher degree of discrimination between different types of affect than parents of non-depressed adolescents. Similarly, mothers appear to express their affect with higher degree of discrimination between different types of affect than fathers.
    Bioinformatics and Biomedical Engineering , 2009. ICBBE 2009. 3rd International Conference on; 07/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new system for automatic stress detection in speech. In the process of feature extraction speech spectrograms were used as the primary features. The sigma-pi neuron cells were then employed to derive the secondary features. The analysis was performed at three alternative sets of analytical frequency bands: critical bands, Bark scale bands and equivalent rectangular bandwidth (ERB) scale bands. The presented algorithm was tested using actual stressful speech utterances from SUSAS (Speech Under Simulated and Actual Stress) database on the vowel-based level. The automatic stress-level classification was implemented using Gaussian mixture model (GMM) and k-nearest neighbor (KNN) classifiers. The strongest effect on the classification results was observed when selecting the type of frequency bands. The ERB scale provided the highest classification results ranging from 67.84% to 73.76%. The classification results did not differ between data sets containing specific types of vowels and data sets containing mixtures of vowels. This indicates that the proposed method can be applied to voiced speech in speech independent conditions.
    Fifth International Conference on Natural Computation, ICNC 2009, Tianjian, China, 14-16 August 2009, 6 Volumes; 01/2009