About
191
Publications
56,793
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,925
Citations
Introduction
Current institution
Publications
Publications (191)
Cette contribution présente une étude sur la détection d’émotions et de mélanges d’émotions dans un corpus collecté dans un centre d’appels d’urgence à Paris (CEMO). Notre corpus, enregistré ‹in the wild›, est riche en diversité vocale (âge, accent, nombre de locuteurs) et est annoté avec un schéma original qui représente jusqu’à deux émotions par...
Affective computing develops systems, which recognize or influence aspects of human life related to emotion, including feelings and attitudes. Significant potential for both good and harm makes it ethically sensitive, and trying to strike sound balances is challenging. Common images of the issues invite oversimplification and offer a limited unders...
Emotion recognition in conversations is essential for ensuring advanced human-machine interactions. However, creating robust and accurate emotion recognition systems in real life is challenging, mainly due to the scarcity of emotion datasets collected in the wild and the inability to take into account the dialogue context. The CEMO dataset, compose...
Speech Emotion recognition (SER) in call center conversations has emerged as a valuable tool for assessing the quality of interactions between clients and agents. In contrast to controlled laboratory environments, real-life conversations take place under uncontrolled conditions and are subject to contextual factors that influence the expression of...
The emotion detection technology to enhance human decision-making is an important research issue for real-world applications, but real-life emotion datasets are relatively rare and small. The experiments conducted in this paper use the CEMO, which was collected in a French emergency call center. Two pre-trained models based on speech and text were...
Zero-shot speech emotion recognition (SER) endows machines with the ability of sensing unseen-emotional states in speech, compared with conventional SER endeavors on supervised cases. On addressing the zero-shot SER task, auditory affective descriptors (AADs) are typically employed to transfer affective knowledge from seen- to unseen-emotional stat...
Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion recognition now achieve equivalent or even better results than conventional machine learning approaches. In this paper, in order to validate the performance of our neural network architecture for emot...
This bulletin is part of the oversight activity that the CNPEN has been pursuing since the
beginning of the health crisis. 1 Its objective is to contribute to that oversight process from an ethical perspective by analyzing and discussing the measures that the social media platforms have undertaken or not during the COVID 19 crisis to fight the spre...
Les phénomènes de désinformation et de mésinformation ont été exacerbés à l'occasion de la crise engendrée par l'épidémie de SARS-CoV-2. Cela a conduit les plateformes numériques telles que les réseaux sociaux, moteurs de recherche, ou systèmes de partage de vidéos à développer des pratiques et des outils numériques pour contribuer à lutter contre...
For the design of socially acceptable robots, field studies in Human-Robot Interaction are necessary. Constructing dialogue benchmarks can have a meaning only if researchers take into account the evaluation of robot, human, and their interaction. This paper describes a study aiming at finding an objective evaluation procedure of the dialogue with a...
Donner aux robots les moyens de percevoir les émotions que nous ressentons et les signaux sociaux que nous émettons est un objectif de recherche ambitieux, qui peut être perçu comme dérangeant car très intrusif. L’interaction sociale est caractérisée par un échange continu et dynamique de signaux, porteurs d’un contenu informatif, émotionnel et com...
This article
addresses the issue of evaluating
Human-Robot spoken interactions in a social context by considering the engagement of the human participant. Our work regards the Communication Accommodation Theory (CAT) as a promising paradigm to consider for engagement, by its study of macro- and micro-contexts influencing the behaviour of dialogue p...
This paper presents a data collection carried out in the framework of the Joker Project. Interaction scenarios have been designed in order to study the e ects of a ect bursts in a human-robot interaction and to build a system capable of using multilevel a ect bursts in a human-robot interaction. We use two main audio expression cues: verbal (synthe...
In the frame of an experiment dealing with quantified-self and reflexivity, we collected audio-video data that provide us with material to discuss the ways in which the participants would work out social synergy through co-presence management and epistemic balance – accounting for their orientation towards the familiar symbiotic nature of human int...
This article summarizes the recommendations concerning robotics as issued by the Commission for the Ethics of Research in Information Sciences and Technologies (CERNA), the French advisory commission for the ethics of information and communication technology (ICT) research. Robotics has numerous applications in which its role can be overwhelming an...
The present study aimed to examine multidimensional factors that contribute to a poor performance in a public speaking
task. An adapted version of the Trier Social Stress Test (TSST) as used to elicit psychosocial stress among 43 university students and
multidimensional assessments were involved to investigate acute stress responses by psychologica...
As part of the Joker project which provides a multimodal dialog system with social skills including humor and empathy, this paper explores idea concerning the human verbal responses to a joking robot. Humor support is defined as the conversational strategies used in reaction to humor utterances. This paper aims at exploring the phenomenon of respon...
Beyond the implementation of moral considerations, an ethical robot should be designed in a way that foresees the potential damages it could cause, and that also anticipates the way the human beings in its environment (from the designer to the user) could be held responsible for its acts. In this present study, the authors offer to consider the act...
Affect bursts play an important role in non-verbal social interaction. Laughter and smile are some of the most important social markers in human-robot social interaction. Not only do they contain affective information, they also may reveal the user’s communication strategy. In the context of human robot interaction, an automatic laughter and smile...
Social Signal Processing such as laughter or emotion detection is a very important issue, particularly in the field of human-robot interaction (HRI). At the moment, very few studies exist on elderly-people’s voices and social markers in real-life HRI situations. This paper presents a cross-corpus study with two realistic corpora featuring elderly p...
This paper addresses the problem of evaluating Human-Robot spoken interactions in a social context by considering the engagement of the human participant. We present an activity model for Human-Robot social dialogue, and show that it offers a convenient local interpretation context to assess the human participation in the activity. We describe an e...
We present automatic systems that implement multimodal social dialogues involving humour with the humanoid robot Nao for the 16th Interspeech conference. Humorous capabilities of the systems are based on three main techniques: riddles, challenging the human participant, and punctual interventions. The presented prototypes will automatically record...
The challenge of this study is twofold: recognizing emotions from audio signals in naturalistic Human-Robot Interaction (HRI) environment, and using a cross-dataset recognition for robustness evaluation. The originality of this work lies in the use of six emotional models in parallel, generated using two training corpora and three acoustic feature...
This paper presents a study of laugh classification using a cross-corpus protocol. It aims at the automatic detection of laughs in a real-time human-machine interaction. Positive and negative laughs are tested with different classification tasks and different acoustic feature sets. F.measure results show an improvement on positive laughs classifica...
Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing ap...
The search of a small acoustic feature set for emotion recognition faces three main challenges. Such a feature set must be robust to large diversity of contexts in real-life applications; model parameters must also be optimized for reduced subsets; finally, the result of feature selection must be evaluated in cross-corpus condition. The goal of the...
Dans un contexte d'interaction homme-machine, les systèmes de détection des émo-tions dans la voix doivent être robustes aux variabilités et efficaces en temps de calcul. Cet article présente les performances que nous pouvons obtenir en utilisant uniquement des indices paraverbaux (nonverbaux). Nous proposons une méthodologie pour sélectionner les...
In a Human-Machine Interaction context, automatic in-voice affective state detection systems have to be robust to variabilities and computationally efficient. This paper presents the performance that can be reached using para-verbal (non-verbal) cues. We propose a methodology to select robust parameters families, based on the study of three sets of...
For a socially intelligent robot, different levels of situation assessment are required, ranging from basic processing of sensor input to high-level analysis of semantics and intention. However, the attempt to combine them all prompts new research challenges and the need of a coherent framework and architecture. This paper presents the situation as...
In many human-robot social interactions, where the robot is to interact with only one human throughout the interaction, the " human " side of a conversation is very likely to interact with other humans present in the same room and temporarily loses the focus on the main interaction. These human-human interactions can be a very brief chat or a prett...
Objectives:
Alexithymia is a personality trait characterized by difficulties in identifying, describing and communicating one's own emotions. Recent studies have associated specific effects of this trait and its subfactors with hypothalamo-pituitary-adrenal (HPA) axis markers during stress. The aim of this study was to analyze the association betw...
A lot of progress have been made in the domain of human-machine dialogue, but it is still a real challenge and, most often, only informative cooperative kind of dialogues are explored. This paper tries to explore the ability of a robot to create and maintain a long-term social relationship through more advanced dialogue techniques. We expose the so...
The long-term goal of this work is to build an assistive robot for elderly and disabled people. It is part of the French ANR ARMEN project. The subjects will interact with a mobile robot controlled by a virtual character. In order to build this system, we collected interactions between patients from different medical centers and a Wizard-of-Oz oper...
Résumé—Il y a environ 69,2 millions de personnes de plus de 80 ans dans le monde d'aujourd'hui [18]. Ce nombre devrait atteindre 379 millions en 2050. Cette tendance démographique est une incitation pour les décideurs à promouvoir le développement de nouveaux services, comme la robotique, pour aider les personnes âgées dans leur vie quotidienne. Ce...
This paper presents a corpus featuring social interaction between elderly people in a retirement home and the humanoid robot Nao. This data collection is part of the French project ROMEO2 that follows the ROMEO project. The goal of the project is to develop a humanoid robot that can act as a comprehensive assistant for persons suffering from loss o...
In a Human-Machine Interaction context, automatic in-voice affective state detection systems have to be robust to variabilities and computationally efficient. This paper presents the performance that can be reached using para-verbal (non-verbal) cues. We propose a methodology to select robust parameters families, based on the study of three sets of...
These proceedings presents the state-of-the-art in spoken dialog systems with applications in robotics, knowledge access and communication. It addresses specifically:
1. Dialog for interacting with smartphones;
2. Dialog for Open Domain knowledge access;
3. Dialog for robot interaction;
4. Mediated dialog (including crosslingual dialog involving S...
The video of our demonstrator presents a social interaction system for studying humor with the Aldebaran robot NAO. Our application records and analyzes audio and video stream to provide real-time feedback. Using this dialog system during show & tell sessions at Interspeech 2013, we have collected different kind of laughter (positive and negative)...
Databases of spontaneous multimodal expressions of affective states occurring during a task are few. This paper presents a protocol for eliciting stress in a public speaking task. Behaviors of 19 participants were recorded via a multimodal setup including speech, video of the facial expressions and body movements, balance via a force plate, and phy...
Speech production modifications are one of the many indications of stress in humans. A job interview simulation task permitted the collection of a multimodal corpus, including physiological data. Physiological cues of stress are reliable on long periods, and require invasive sensors. Human voice variations have been proved to be a non-invasive stre...
This document presents the research project ARMEN (Assistive Robotics to Maintain Elderly People in a Natural environment), aimed at the development of a user friendly robot with advanced functions for assistance to elderly or disabled persons at home. Focus is given to the robot SAM (Smart Autonomous Majordomo) and its new features of navigation,...
This document presents the research project ARMEN aiming at the conception of an assistive robot very simple to use and providing advanced functions to help maintaining disabled or elderly people at home. The document presents the robot SAM and the functions of navigation, of emotion detection from speech, of image understanding, the knowledge repr...
This chapter presents different acquisition strategies of mono- and multimodal emotional corpora and describes challenges of interpreting these corpora. It examines emotional coding schemes, emotional events and current challenges along with a typology of the complex emotions found in spontaneous data. Building a corpus is of paramount importance f...
Paralinguistic analysis is increasingly turning into a mainstream topic in speech and language processing. This article aims to provide a broad overview of the constantly growing field by defining the field, introducing typical applications, presenting exemplary resources, and sharing a unified view of the chain of processing. It then presents the...
We present the implementation of a data collection tool of multicultural and multi-modal laughter for the 14th Interspeech conference. The application will automatically record and analyze audio and video stream to provide real-time feedback. Using this tool, we expect to collect multimodal cues of different kind of laughers elicited in participant...
The contributions to this volume focus on the interrelation between prosody and iconicity and shed new light on the topic by enlarging the number of parameters traditionally considered, and by confronting various theoretical backgrounds. The parameters taken into account include socio-linguistic criteria (age, sex, socio-economic category, region);...
We focus in this paper on the detection of emotions collected in real-life context. In order to improve our emotional valence detection system, we have tested new voice quality features that are mainly used for speech synthesis or voice transformation: the relaxation coefficient (Rd) and the functions of phase distortion (FPD); but also usual voice...
Negative emotions such as anger recognition in particular can deliver useful information to both the customer and the agent of Interactive Voice Response platforms. The state-of-the-art of emotion detection is characterized as not taking into account real-life emotion behavior but “realistic” induced emotion. This study is part of the French projec...
Conversations do not only consist of spoken words but they also consist of non-verbal vocalisations. Since there is no standard to define and to classify (possible) non-speech sounds the annotations for these vocalisations differ very much for various corpora of conversational speech. There seems to be agreement in the six inspected corpora that he...
We present in this paper a contribution to the INTERSPEECH 2012 Speaker Trait Challenge. We participated in the Personality Sub-Challenge, where the main characteristics of speakers according to the five OCEAN dimensions had to be determined based on short audio recordings solely. We considered the task as a general optimization problem and applied...
The HUMAINE Database is grounded in HUMAINE’s core emphasis on considering emotion in a broad sense – ‘pervasive emotion’ – and engaging with the way it colours action and interaction. The aim of the database is to provide a resource to which the community can go to see and hear the forms that emotion takes in everyday action and interaction, and t...
In order to be believable, embodied conversational agents (ECAs) must show expression of emotions in a consistent and natural
looking way across modalities. The ECA has to be able to display coordinated signs of emotion during realistic emotional behaviour.
Such a capability requires one to study and represent emotions and coordination of modalitie...
The chapter reviews methods of obtaining records that show signs of emotion. Concern with authenticity is central to the task.
Converging lines of argument indicate that even sophisticated acting does not reproduce emotion as it appears in everyday
action and interaction. Acting is the appropriate source for some kinds of material, and work on that...
In this article, the detection of real-life emotions is explored across different corpora featuring a similar task. Two emotional states (Anger and Neutral) are thus examined across three French corpora collected in call centers in different contexts (service complaints, Stock Exchange service and medical emergency). The effects of these differents...
The present paper aims at filling the lack that currently exists with respect to databases containing emotional manifestations. Emotions, such as strong emotions, are indeed difficult to collect in real-life. They occur during contexts, which are generally unpredictable, and some of them such as anger are less frequent in public life than in privat...
In this article, we describe and interpret a set of acoustic and linguistic features that characterise emotional/emotion-related user states - confined to the one database processed: four classes in a German corpus of children interacting with a pet ...
In this article, we describe and interpret a set of acoustic and linguistic features that characterise emotional/emotion-related user states – confined to the one database processed: four classes in a German corpus of children interacting with a pet robot. To this end, we collected a very large feature vector consisting of more than 4000 features e...
In this chapter, we focus on the automatic recognition of emotional states using acoustic and linguistic parameters as features and classifiers as tools to predict the ‘correct’ emotional states. We first sketch history and state of the art in this field; then we describe the process of ‘corpus engineering’, i.e. the design and the recording of dat...
In this paper we describe a corpus set together from two sub-corpora. The CINEMO corpus contains acted emotional expression obtained by playing dubbing exercises. This new protocol is a way to collect mood-induced data in large amount which show several complex and shaded emotions. JEMO is a corpus collected with an emotion-detection game and conta...
In this paper we suggest feature selection and Principal Component Analysis as a way to analyze and compare corpora of emotional speech. To this end, a fast improvement of the Sequential Forward Floating Search algorithm is introduced, and subsequently extensive tests are run on a selection of French emotional language resources well suited for a f...
We focus on audio cues required for the interaction between a human and a robot. We argue that a multi-level use of different paralinguistic cues is needed to pilot the decisions of the robot. Our challenge is to know how to use them to pilot the human-robot interaction. We offer in this paper a protocol for a study on the way paralinguistic cues c...
Most paralinguistic analysis tasks are lacking agreed-upon evaluation procedures and comparability, in contrast to more 'traditional' disciplines in speech analysis. The INTERSPEECH 2010 Paralinguistic Challenge shall help overcome the usually low compatibility of results, by addressing three selected sub-challenges. In the Age Sub-Challenge, the a...
It is not fully known how long it takes a human to reliably recognize emotion in speech from the beginning of a phrase. However, many technical applications demand for very quick system responses, e. g. to prepare different feedback alternatives before the end of a speaker turn in a dialog system. We therefore investigate this 'gating paradigm' emp...
The CINEMO corpus of French emotional speech provides a richly annotated resource to help overcome the apparent lack of learning and testing speech material for complex, i. e. blended or mixed emotions. The protocol for its collection was dubbing selected emotional scenes from French movies. 51 speakers are contained and the total speech time amoun...
Databases are fundamental to affective computing. Directly or indirectly, they provide a large proportion of the information about human affective functioning that is used by affective systems. Information may be drawn from them in many ways – by machine learning, by direct use of extracts, or by a combination of human judgment and machine measurem...
This paper deals with a new corpus, called corpus IDV for “Institut De la Vision”, collected within the framework of the project ROMEO (Cap Digital French national project founded by FUI6). The aim of the project is to construct a robot assistant for dependent person (blind, elderly person). Two of the robot functionalities are speaker identificati...
In this paper, we focus on the recording protocol for gathering emotional audio data during interactions between the Nao robot and children. The robot is operated by a Wizard-of-Oz, according to strategies meant to elicit vocal expressions of emotions in children. These recordings will provide data to develop a real-time emotion detection module fo...
In this study we have collected a corpus for training model for emotion detection in the context of monitoring artificial agent's by voice. In order to control occurrences of various range of affective state in a modeling purpose, we used acted emotional expression through dubbing play exercises. We observed that some natural affective states occur...
In this study we have focused our attention on the study of consumer's emotion during interviews about products. We have based our analysis on annotation of video-taped dialogs. The collected corpus has been annotated by two experts, and then a perceptive test has been carried out with 40 subjects. The interviews have shown many ¿real-life¿ complex...
There have been a lot of psychological researches on emotion and nonverbal communication. Yet, these studies were based mostly
on acted basic emotions. This paper explores how manual annotation and image processing can cooperate towards the representation
of spontaneous emotional behavior in low-resolution videos from TV. We describe a corpus of TV...
Classification performance of emotional user states found in realistic, spontaneous speech is not very high, compared to the performance reported for acted speech in the literature. This might be partly due to the difficulty of providing reliable annotations, partly due to suboptimal feature vectors used for classification, and partly due to the di...
The design of future interactive affective computing systems requires the representation of spontaneous emotions and their associated multimodal signs. Current prototypes are often limited to the detection and synthesis of a few primary emotions and are most of the time grounded on acted data collected in-lab. In order to model the sophisticated re...