Conference Paper

Prosody based emotion recognition for MEXI

Paderborn Univ., Germany
DOI: 10.1109/IROS.2005.1545341 Conference: Intelligent Robots and Systems, 2005. (IROS 2005). 2005 IEEE/RSJ International Conference on
Source: IEEE Xplore

ABSTRACT This paper describes the emotion recognition from natural speech as realized for the robot head MEXI. We use a fuzzy logic approach for analysis of prosody in natural speech. Since MEXI often communicates with well known persons but also with unknown humans, for instance at exhibitions, we realized a speaker dependent mode as well as a speaker independent mode in our prosody based emotion recognition. A key point of our approach is that it automatically selects the most significant features from a set of twenty analyzed features based on a training database of speech samples. This is important according to our results, since the set of significant features differs considerably between the distinguished emotions. With our approach we reach average recognition rates of 84% in speaker dependent mode and 60% in speaker independent mode.

1 Bookmark
  • [Show abstract] [Hide abstract]
    ABSTRACT: Existing emotional speech recognition applications usually distinguish between a small number of emotions in speech. However this set of so called basic emotions in speech varies from one application to another depending on their according needs. In order to support such differing application needs an emotional speech model based on the fuzzy emotion hypercube is presented. In addition to existing models it supports also the recognition of derived emotions which are combinations of basic emotions in speech. We show the application of this model by a prosody based Hidden Markov Models(HMM). The approach is based on standard speech recognition technology using hidden semi-continuous Markov models. Both the selection of features and the design of the recognition system are addressed.
    Pervasive Computing, Signal Porcessing and Applications, International Conference on. 09/2010;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: For human–robot interaction (HRI), perception is one of the most important capabilities. This paper reviews several widely used perception methods of HRI in social robots. Specifically, we investigate general perception tasks crucial for HRI, such as where the objects are located in the rooms, what objects are in the scene, and how they interact with humans. We first enumerate representative social robots and summarize the most three important perception methods from these robots: feature extraction, dimensionality reduction, and semantic understanding. For feature extraction, four widely used signals including visual-based, audio-based, tactile-based and rang sensors-based are reviewed, and they are compared based on their advantages and disadvantages. For dimensionality reduction, representative methods including principle component analysis (PCA), linear discriminant analysis (LDA), and locality preserving projections (LPP) are reviewed. For semantic understanding, conventional techniques for several typical applications such as object recognition, object tracking, object segmentation, and speaker localization are discussed, and their characteristics and limitations are also analyzed. Moreover, several popular data sets used in social robotics and published semantic understanding results are analyzed and compared in light of our analysis of HRI perception methods. Lastly, we suggest important future work to analyze fundamental questions on perception methods in HRI.
    International Journal of Social Robotics 01/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emo-tion recognition. Performance is enhanced because com-monly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisaki's model parameters, voice quality, jitter, and shimmer. Selected features are fed as in-put to K nearest neighborhood classifier and to support vec-tor machines. Two kernels are tested for the latter: linear and Gaussian radial basis function. The recently proposed speaker-independent experimental protocol is tested on the Berlin emotional speech database for each gender sepa-rately. The best emotion recognition accuracy, achieved by support vector machines with linear kernel, equals 87.7%, outperforming state-of-the-art approaches. Statistical analy-sis is first carried out with respect to the classifiers' error rates and then to evaluate the information expressed by the classifiers' confusion matrices.
    International Journal of Speech Technology 06/2012;


Available from