Conference Paper

Prosody based emotion recognition for MEXI

Paderborn Univ., Germany
DOI: 10.1109/IROS.2005.1545341 Conference: Intelligent Robots and Systems, 2005. (IROS 2005). 2005 IEEE/RSJ International Conference on
Source: IEEE Xplore

ABSTRACT This paper describes the emotion recognition from natural speech as realized for the robot head MEXI. We use a fuzzy logic approach for analysis of prosody in natural speech. Since MEXI often communicates with well known persons but also with unknown humans, for instance at exhibitions, we realized a speaker dependent mode as well as a speaker independent mode in our prosody based emotion recognition. A key point of our approach is that it automatically selects the most significant features from a set of twenty analyzed features based on a training database of speech samples. This is important according to our results, since the set of significant features differs considerably between the distinguished emotions. With our approach we reach average recognition rates of 84% in speaker dependent mode and 60% in speaker independent mode.

1 Follower
 · 
102 Views
  • Source
    • "MFCC and cepstral features). Most of the existing approaches are trained and tested on speech data that was collected by asking actors to speak prescribed utterances with certain emotions [13], [10]. However, the fact that deliberate behavior differs in audio profile and timing from spontaneous behavior has "
    [Show abstract] [Hide abstract]
    ABSTRACT: Automated analysis of human affective behavior has attracted increasing attention in recent years. With the research shift toward spontaneous behavior, many challenges have come to surface ranging from database collection strategies to the use of new feature sets (e.g., lexical cues apart from prosodic features). Use of contextual information, however, is rarely addressed in the field of affect expression recognition, yet it is evident that affect recognition by human is largely influenced by the context information. Our contribution in this paper is threefold. First, we introduce a novel set of features based on cepstrum analysis of pitch and intensity contours. We evaluate the usefulness of these features on two different databases: Berlin Database of emotional speech (EMO-DB) and locally collected audiovisual database in car settings (CVRRCar-AVDB). The overall recognition accuracy achieved for seven emotions in the EMO-DB database is over 84% and over 87% for three emotion classes in CVRRCar-AVDB. This is based on tenfold stratified cross validation. Second, we introduce the collection of a new audiovisual database in an automobile setting (CVRRCar-AVDB). In this current study, we only use the audio channel of the database. Third, we systematically analyze the effects of different contexts on two different databases. We present context analysis of subject and text based on speaker/text-dependent/-independent analysis on EMO-DB. Furthermore, we perform context analysis based on gender information on EMO-DB and CVRRCar-AVDB. The results based on these analyses are promising.
    IEEE Transactions on Multimedia 11/2010; DOI:10.1109/TMM.2010.2058095 · 1.78 Impact Factor
  • Source
    • "MFCC and cepstral features). Most of the existing approaches are trained and tested on speech data that was collected by asking actors to speak prescribed utterances with certain emotions [13], [10]. However, the fact that deliberate behavior differs in audio profile and timing from spontaneous behavior has "
    [Show abstract] [Hide abstract]
    ABSTRACT: Automated analysis of human affective behavior has attracted increasing attention in recent years. With the research shift toward spontaneous behavior, many challenges have come to surface ranging from database collection strategies to the use of new feature sets (e.g. lexical cues apart from prosodic features). Use of contextual information, however, is rarely addressed in the field of affect expression recognition. Yet it is evident that affect recognition by human is largely influenced by the context information. Our contribution in this paper is three fold. First, we introduce a novel set of features based on cepstrum analysis of pitch and intensity contours. We evaluate the usefulness of these features on two different databases: Berlin Database of emotional speech (EMO-DB) and locally collected audiovisual database in car settings (CVRRCar-AVDB). The overall recognition accuracy achieved for seven emotions in EMO-DB database is over 84% and over 87 % for three emotion classes in CVRRCar-AVDB. This is based on 10 fold stratified cross validation. Second, we introduce the collection of a new audiovisual database in an automobile setting (CVRRCar-AVDB). In this current study, we only use the audio channel of the database. Third, we systematically analyze the effects of different contexts on two different databases. We present context analysis of subject and text based on speaker/text dependent/independent analysis on EMO-DB database. Furthermore, we perform context analysis based on gender information on EMO-DB and CVRRCar-AVDB. The results based on these analyses are promising.
    IEEE Transactions on Multimedia 01/2010; 12:502-509. · 1.78 Impact Factor
  • Source
    • "Emotion recognition from facial expression and MEXI's reaction on them is described in this paper. For MEXI also a speech based emotion recognition exists that analyzes the prosody of spoken sentences [2]. Fig. 1. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Emotion recognition and adequate reactions are a crucial part of human communication and hence should also be considered for interactions between humans and robots. In this paper we present the robot head MEXI which is able to recognize emotions of its human counterpart from a video sequence using a fuzzy rule based approach. It reacts on these perceptions in an emotional way. Therefor MEXI maintains an internal state made up of (artificial) emotions and drives. This internal state is used to evaluate its perceptions and action alternatives and controls its behavior on the basis of this evaluation. This is a major difference between MEXI and usual goal based agents that rely on a world model to control and plan their actions. For MEXI the behavior based programming paradigm originally developed by Arkin for robot navigation was extended to support a multidimensional control architecture based on emotions and drives.
    Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ International Conference on; 12/2007
Show more

Preview

Download
2 Downloads
Available from