ArticlePDF Available

Emotional Triggers and Responses in Spontaneous Affective Interaction: Recognition, Prediction, and Analysis

Authors:

Abstract and Figures

To completely mimic the naturalness of human interaction in Human-Computer Interaction (HCI), emotion is an essential aspect that should not be overlooked. Emotion allows for a rich and meaningful human interaction. In communicating, not only we express our emotional state, but we are also affected by our conversational counterpart. However, existing works have largely focused only on occurrences of emotion through recognition and simulation. The relationship between an utterance of a speaker and the resulting emotional response that it triggers is not yet closely examined. Observation and incorporation of the underlying process that causes change of emotion can provide useful information for dialogue systems in making a more emotionally intelligent decision, such as being able to take proper action with regard to user’s emotion, and to be aware of the emotional implication of their response. To bridge this gap, in this paper, we tackle three main tasks: 1) recognition of emotional states, 2) analysis of social-affective events in spontaneous conversational data, to capture the relationship between actions taken in discourse and the emotional response that follows, and 3) prediction of emotional triggers and responses in a conversational context. The proposed study differs from existing works in that it focuses on the change of emotion (emotional response) and its cause (emotional triggers) on top of the occurrence of emotion itself. The analysis and experimental results are reported in detail in this paper, showing promising initial results for future works and development. © 2018, Japanese Society for Artificial Intelligence. All rights reserved.
Content may be subject to copyright.
A preview of the PDF is not available
... In the era of information and big data [1], massive data can be acquired, stored, and applied [2]. By mining the comment attitude in this information, it plays a certain role in the purchase decision of potential consumers [3]. ...
Article
Full-text available
With the continuous expansion of the field of natural language processing, researchers have found that there is a phenomenon of imbalanced data distribution in some practical problems, and the excellent performance of most methods is based on the assumption that the samples in the dataset are data balanced. Therefore, the imbalanced data classification problem has gradually become a problem that needs to be studied. Aiming at the sentiment information mining of an imbalanced short text review dataset, this paper proposed a fusion multi-channel BLTCN-BLSTM self-attention sentiment classification method. By building a multi-channel BLTCN-BLSTM self-attention network model, the sample after word embedding processing is used as the input of the multi-channel, and after fully extracting features, the self-attention mechanism is fused to strengthen the sentiment to further fully extract text features. At the same time, focus loss rebalancing and classifier enhancement are combined to realize text sentiment predictions. The experimental results show that the optimal F1 value is up to 0.893 on the Chnsenticorp-HPL-10,000 corpus. The comparison and ablation of experimental results, including accuracy, recall, and F1-measure, show that the proposed model can fully integrate the weight of emotional feature words. It effectively improves the sentiment classification performance of imbalanced short-text review data.
... The importance of identifying and incorporating responses to conversational signals was recognized early in the human-computer interaction community by Nagao and Takeuchi [47]. Elements such as empathy and the emotions associated with certain utterances have also been studied and play an important role in error recognition [36,51]. Bousmalis et al. have surveyed the conversation analysis literature for nonverbal audiovisual cues that indicate agreement and disagreement between human speakers, with the goal of developing machine recognition of these cues [8]. ...
Article
Full-text available
One key technique people use in conversation and collaboration is conversational repair. Self-repair is the recognition and attempted correction of one’s own mistakes. We investigate how the self-repair of errors by intelligent voice assistants affects user interaction. In a controlled human-participant study (N=101), participants asked Amazon Alexa to perform four tasks, and we manipulated whether Alexa would “make a mistake” understanding the participant (for example, playing heavy metal in response to a request for relaxing music) and whether Alexa would perform a correction (for example, stating, “You don’t seem pleased. Did I get that wrong?”) We measured the impact of self-repair on the participant’s perception of the interaction in four conditions: correction (mistakes made and repair performed), under correction (mistakes made, no repair performed), overcorrection (no mistakes made, but repair performed), and control (no mistakes made, and no repair performed). Subsequently, we conducted free-response interviews with each participant about their interactions. This study finds that self-repair greatly improves people’s assessment of an intelligent voice assistant if a mistake has been made, but can degrade assessment if no correction is needed. However, we find that the positive impact of self-repair in the wake of an error outweighs the negative impact of overcorrection. In addition, participants who recently experienced an error saw increased value in self-repair as a feature, regardless of whether they experienced a repair themselves.
... The importance of identifying and incorporating responses to such conversational signals was recognized early in the human-computer interaction community by Nagao and Takeuchi [50]. While linguists and behavioral psychologists have recognized and analyzed the regulatory use of facial displays, the machine learning and computer vision community has largely focused on emotion recognition in their analysis of faces [42,52]. This is in part due to the widespread availability of emotional expression image databases such as Ekman's Pictures of Facial Affect [24], the Belfast database [21], the Extended Cohn-Kanade Dataset [43], or the Affectiva-MIT Facial Expression Dataset [45]. ...
Preprint
People interacting with voice assistants are often frustrated by voice assistants' frequent errors and inability to respond to backchannel cues. We introduce an open-source video dataset of 21 participants' interactions with a voice assistant, and explore the possibility of using this dataset to enable automatic error recognition to inform self-repair. The dataset includes clipped and labeled videos of participants' faces during free-form interactions with the voice assistant from the smart speaker's perspective. To validate our dataset, we emulated a machine learning classifier by asking crowdsourced workers to recognize voice assistant errors from watching soundless video clips of participants' reactions. We found trends suggesting it is possible to determine the voice assistant's performance from a participant's facial reaction alone. This work posits elicited datasets of interactive responses as a key step towards improving error recognition for repair for voice assistants in a wide variety of applications.
... Emotional analysis is a comprehensive application of natural language processing, viewpoint extraction and text analysis, which can identify and retrieve emotional polarity from natural language texts [7][8][9]. This early approach to emotional analysis focuses on determining the overall emotional orientation (i.e., positive, neutral, or negative) or emotional polarity (i.e., one to five stars) of a comment [10]. Emotional polarity can be formed at different levels: Document level [11], sentence level and vocabulary level [12]. ...
Article
As an interdisciplinary comprehensive subject involving multidisciplinary knowledge, emotional analysis has become a hot topic in psychology, health medicine and computer science. It has a high comprehensive and practical application value. Emotion research based on the social network is a relatively new topic in the field of psychology and medical health research. The text emotion analysis of college students also has an important research significance for the emotional state of students at a certain time or a certain period, so as to understand their normal state, abnormal state and the reason of state change from the information they wrote. In view of the fact that convolutional neural network cannot make full use of the unique emotional information in sentences, and the need to label a large number of high-quality training sets for emotional analysis to improve the accuracy of the model, an emotional analysis model using the emotional dictionary and multichannel convolutional neural network is proposed in this paper. Firstly, the input matrix of emotion dictionary is constructed according to the emotion information, and the different feature information of sentences is combined to form different network input channels, so that the model can learn the emotion information of input sentences from various feature representations in the training process. Then, the loss function is reconstructed to realize the semi supervised learning of the network. Finally, experiments are carried on COAE 2014 and self-built data sets. The proposed model can not only extract more semantic information in emotional text, but also learn the hidden emotional information in emotional text. The experimental results show that the proposed emotion analysis model can achieve a better classification performance. Compared with the best benchmark model gram-CNN, the F1 value can be increased by 0.026 in the self-built data set, and it can be increased by 0.032 in the COAE 2014 data set.
Conference Paper
Full-text available
This paper demonstrates a new technology that can infer a person's emotions from RF signals reflected off his body. EQ-Radio transmits an RF signal and analyzes its reflections off a person's body to recognize his emotional state (happy, sad, etc.). The key enabler underlying EQ-Radio is a new algorithm for extracting the individual heartbeats from the wireless signal at an accuracy comparable to on-body ECG monitors. The resulting beats are then used to compute emotion-dependent features which feed a machine-learning emotion classifier. We describe the design and implementation of EQ-Radio, and demonstrate through a user study that its emotion recognition accuracy is on par with state-of-the-art emotion recognition systems that require a person to be hooked to an ECG monitor.
Article
Full-text available
Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing appropriate comparison of results across studies and potential integration and combination of extraction and recognition systems. In this paper we propose a basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis. In contrast to a large brute-force parameter set, we present a minimalistic set of voice parameters here. These were selected based on a) their potential to index affective physiological changes in voice production, b) their proven value in former studies as well as their automatic extractability, and c) their theoretical significance. The set is intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Our implementation is publicly available with the openSMILE toolkit. Comparative evaluations of the proposed feature set and large baseline feature sets of INTERSPEECH challenges show a high performance of the proposed set in relation to its size.
Chapter
This volume is a comprehensive roadmap to the burgeoning area of affective sciences, which now spans several disciplines. The Handbook brings together, for the first time, the various strands of inquiry and latest research in the scientific study of the relationship between the mechanisms of the brain and the psychology of mind. In recent years, scientists have made considerable advances in understanding how brain processes shape emotions and are changed by human emotion. Drawing on a wide range of neuroimaging techniques, neuropsychological assessment, and clinical research, scientists are beginning to understand the biological mechanisms for emotions. As a result, researchers are gaining insight into such compelling questions as: How do people experience life emotionally? Why do people respond so differently to the same experiences? What can the face tell us about internal states? How does emotion in significant social relationships influence health? Are there basic emotions common to all humans? This volume brings together the most eminent scholars in the field to present, in sixty original chapters, the latest research and theories in the field. The book is divided into ten sections: Neuroscience; Autonomic Psychophysiology; Genetics and Development; Expression; Components of Emotion; Personality; Emotion and Social Processes; Adaptation, Culture, and Evolution; Emotion and Psychopathology; and Emotion and Health. This major new volume will be an invaluable resource for researchers that will define affective sciences for the next decade.
Article
The scientific study of emotion has long been dominated by theories emphasizing the subjective experience of emotions and their accompanying expressive and physiological responses. The processes by which different emotions are elicited has received less attention, the implicit assumption being that certain emotions arise automatically in response to certain types of events or situations. Such an assumption is incompatible with data showing that similar situations can provoke a range of emotions in different individuals, or even the same individual at different times. Appraisal theory, first suggested by Magda Arnold and Richard Lazarus, was formulated to address this shortcoming in our understanding of emotion. The central tenet of appraisal theory is that emotions are elicited according to an individual's subjective interpretation or evaluation of important events or situations. Appraisal research focuses on identifying the evaluative dimensions or criteria that predict which emotion will be elicited in an individual, as well as linking the appraisal process with the production of emotional responses. This book represents the first full-scale summary of the current state of appraisal research. Separate sections cover the history of apraisal theory and its fundamental ideas, the views of some of the major theorists currently active in the field, theoretical and methodological problems with the appraisal approach including suggestions for their resolution, social, cultural and individual differences and the application of appraisal theory to understanding and treating emotional pathology, and the methodology used in appraisal research including measuring and analyzing self-report, physiological, facial, and vocal indicators of appraisal, and simulating appraisal processes via computational models. Intended for advanced students and researchers in emotion psychology, it provides an authoritative assessment and critique of the current state of the art in appraisal research.
Chapter
This paper introduces a text dialog system that can provide counseling dialog based on the semantic content of user utterances. We extract emotion-, problem-, and reason-oriented semantic contents from user utterances to generate micro-counseling system responses. Our counseling strategy follows microcounseling techniques to build a working relationship with a client and to discover the client's concerns and problems. Extracting semantic contents allows the system to generate appropriate counseling responses for various user utterances. Experiments show that our system works well as a virtual counselor. © Springer International Publishing Switzerland 2015. All rights are reserved.
Article
This paper describes the design and evaluation of a method for developing a chat-oriented dialog system by utilizing real human-to-human conversation examples from movie scripts and Twitter conversations. The aim of the proposed method is to build a conversational agent that can interact with users in as natural a fashion as possible, while reducing the time requirement for database design and collection. A number of the challenging design issues we faced are described, including (1) constructing an appropriate dialog corpora from raw movie scripts and Twitter data, and (2) developing an multi domain chat-oriented dialog management system which can retrieve a proper system response based on the current user query. To build a dialog corpus, we propose a unit of conversation called a tri-turn (a trigram conversation turn), as well as extraction and semantic similarity analysis techniques to help ensure that the content extracted from raw movie/drama script files forms appropriate dialog-pair (query-response) examples. The constructed dialog corpora are then utilized in a data-driven dialog management system. Here, various approaches are investigated including example-based (EBDM) and response generation using phrase-based statistical machine translation (SMT). In particular, we use two EBDM: syntactic-semantic similarity retrieval and TF-IDF based cosine similarity retrieval. Experiments are conducted to compare and contrast EBDM and SMT approaches in building a chat-oriented dialog system, and we investigate a combined method that addresses the advantages and disadvantages of both approaches. System performance was evaluated based on objective metrics (semantic similarity and cosine similarity) and human subjective evaluation from a small user study. Experimental results show that the proposed filtering approach effectively improve the performance. Furthermore, the results also show that by combing both EBDM and SMT approaches, we could overcome the shortcomings of each. Copyright © 2014 The Institute of Electronics, Information and Communication Engineers.