Recognition of stress in speech using wavelet analysis and Teager energy operator.
01/2008; In proceeding of: INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, September 22-26, 2008
Conference Proceeding: Emotion Recognition in Spontaneous Speech within Work and Family Environments[show abstract] [hide abstract]
ABSTRACT: The speech signal is an important tool for conveying information between humans; at the same time, it is an indicator of a speaker's emotions. In this paper, the automatic identification of affect from speech containing spontaneously expressed (not acted) emotions within different environments was investigated. The teager energy operator-perceptual wavelet packet (TEO-PWP) features as well as the mel frequency cepstral coefficients (MFCC) were used to model the emotions using two classifiers: the Gaussian mixture model (GMM) and the probabilistic neural network (PNN). The classification experiments were conducted using two data sets: SUSAS with three classes (high stress, moderate stress and neutral) and ORI with five classes (angry, happy, anxious, dysphoric and neutral). Depending on the features/classifier combination, the average classification results for the SUSAS data ranged from 95% to 61%, whereas the ORI data provided lower average rates ranging from 57% to 37%. The best overall performance was achieved while using the TEO-PWP in combination with the GMM classifier giving an average of 94.75% correct classifications for the SUSAS data and 56.6% for the ORI data. Different arousal levels between SUSAS and ORI emotional classes were suggested to be most likely cause for the difference in classification rates between these two data sets.Bioinformatics and Biomedical Engineering , 2009. ICBBE 2009. 3rd International Conference on; 07/2009
Conference Proceeding: Emotion Recognition in Speech of Parents of Depressed Adolescents[show abstract] [hide abstract]
ABSTRACT: This paper investigates automatic affect classification in spontaneous speech within normal and clinical family environments. The data base used in this study comprised speech recordings of parents of depressed adolescents (19 fathers and 20 mothers) and parents of non-depressed adolescents (25 fathers and 7 mothers). The speech data were recorded during natural parent-child conversations. Five emotional classes were considered: neutral, angry, anxious, dysphoric, and happy. Four different combinations of features (set A, B, C, and D) derived from the Teager energy operator (TEO) and two different classifiers: probabilistic neural network (PNN) and Gaussian mixture model (GMM) were tested and compared. The feature extraction process was combined with an optimal feature selection algorithm based on the mutual information criteria. The GMM classifier provided consistently higher correct classification rates (49.6% to 62.0%) compared with the PNN classifier (31.6% to 42.7%). Set C/GMM was found to be the best performing feature/classifier combination. In all cases, the classification rates for parents of depressed adolescents were higher than for parents of non-depressed adolescents. Similarly, the classification rates for mothers were higher than for fathers. The results appear to suggest that parents of depressed adolescents express their emotions with higher degree of discrimination between different types of affect than parents of non-depressed adolescents. Similarly, mothers appear to express their affect with higher degree of discrimination between different types of affect than fathers.Bioinformatics and Biomedical Engineering , 2009. ICBBE 2009. 3rd International Conference on; 07/2009
- [show abstract] [hide abstract]
ABSTRACT: This study presents automatic stress recognition methods based on acoustic speech analysis. Novel approaches to feature extraction based on the nonlinear Teager energy operator (TEO) calculated within critical bands, discrete wavelet transform bands, and wavelet packet bands are presented. The classification process was performed using two types of neural networks: the multilayer perceptron neural network (MLPNN) and the probabilistic neural network (PNN). The classification efficiency was tested using the actual stress dataset from the SUSAS database. The speech recordings were made by 15 speakers (8 females and 7 males) reading a list of 35 words under three actual conditions: high stress, low stress, and neutral. The best overall performance was observed for the features extracted using the TEO parameters calculated within perceptual wavelet packet bands(TEO-PWP). Depending on the type of mother wavelet, the correct classification scores for the PWP features ranged from 71.24% to 91.56% (using the MLPNN classifier), and from 86.63% to 93.67% (using the PNN). The PNN classifier outperformed the MLPNN classification method.Fifth International Conference on Natural Computation, ICNC 2009, Tianjian, China, 14-16 August 2009, 6 Volumes; 01/2009
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.