Article

Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We present a real-time system for detecting facial action units and inferring emotional states from head and shoulder gestures and facial expressions. The dynamic system uses three levels of inference on progressively longer time scales. Firstly, facial action units and head orientation are identified from 22 feature points and Gabor filters. Secondly, Hidden Markov Models are used to classify sequences of actions into head and shoulder gestures. Finally, a multi level Dynamic Bayesian Network is used to model the unfolding emotional state based on probabilities of different gestures. The most probable state over a given video clip is chosen as the label for that clip. The average F1 score for 12 action units (AUs 1, 2, 4, 6, 7, 10, 12, 15, 17, 18, 25, 26), labelled on a frame by frame basis, was 0.461. The average classification rate for five emotional states (anger, fear, joy, relief, sadness) was 0.440. Sadness had the greatest rate, 0.64, anger the smallest, 0.11.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... On a related note, head rotation plays an important role in stabilizing gaze to fixate on objects of interest [64]. There is evidence of the relationship between head pose dynamics and expression and perception of different emotional and mental states [65]- [67], being particularly related to emotional intensity [68]. Affect recognition works have relied on head pose categorizations such as head tilts, nods, and shakes [65], [69], which usually require specific action detectors. ...
... There is evidence of the relationship between head pose dynamics and expression and perception of different emotional and mental states [65]- [67], being particularly related to emotional intensity [68]. Affect recognition works have relied on head pose categorizations such as head tilts, nods, and shakes [65], [69], which usually require specific action detectors. More recent approaches directly use temporal 3D rotational angles (yaw, pitch, and roll) to describe head motion trajectories, as well as angular displacement, velocity, acceleration, and windowbased functionals computed from such trajectories [61], [70], [71], dynamic features based on the discrete Fourier transform [72], or clustered sequences of kinemes [73]. ...
... The first four are part of the Ekman's basic emotions [26]. Pensive is a mental state rather than an emotional expression; however, it was included in our model as it was found to be a frequent facial expression during the conversation when users were preparing their response, as in previous HMI-oriented works [65], [91]. Similarly to audio-based annotations, some categories were combined into a single label due to being often confused by annotators. ...
Preprint
Full-text available
The EMPATHIC project aimed to design an emotionally expressive virtual coach capable of engaging healthy seniors to improve well-being and promote independent aging. One of the core aspects of the system is its human sensing capabilities, allowing for the perception of emotional states to provide a personalized experience. This paper outlines the development of the emotion expression recognition module of the virtual coach, encompassing data collection, annotation design, and a first methodological approach, all tailored to the project requirements. With the latter, we investigate the role of various modalities, individually and combined, for discrete emotion expression recognition in this context: speech from audio, and facial expressions, gaze, and head dynamics from video. The collected corpus includes users from Spain, France, and Norway, and was annotated separately for the audio and video channels with distinct emotional labels, allowing for a performance comparison across cultures and label types. Results confirm the informative power of the modalities studied for the emotional categories considered, with multimodal methods generally outperforming others (around 68% accuracy with audio labels and 72-74% with video labels). The findings are expected to contribute to the limited literature on emotion recognition applied to older adults in conversational human-machine interaction.
... Signaling and inferring these mental states is paramount to effective communication (14), especially in conversational settings (15). We identified five studies (16)(17)(18)(19)(20) that described their facial movements, coded them as AU combinations, and converted them into predictive models using our hypothesis kernel analysis method. Using an additional dataset of 2400 categorizations of the conversational signals from 40 participants (20 WE and 20 EA), we used the prediction-explanation-exploration framework to evaluate and optimize the conversational signal models just as we did with the basic emotion models. ...
... The exploration stage used these insights to construct optimized, cultureaccented models. Figure 6C outlines changes in predictive performance (Δ AUROC) of the optimized versus original models, showing that each model improved significantly, except el Kaliouby and Robinson [2005; (19)] which already performed close to the noise ceiling in the prediction stage (see Fig. 6A). As with the basic emotion models, the optimized conversational signal models do not perform significantly better or worse for either WE or EA participants-except for "confused" that still performed better for WE participants [t(10) = 2.42, P = 0.03, d = 1.40]. ...
... For the conversational signal dataset, this leaves a grand total of 83,540 trials (total WE: 40,540, total EA: 43,000) with an average of 2089 trials per participant (average WE: 2027, average EA: 2150). This grand total contains 4134 repeated trials (total WE: 2314, total EA: 2322) with an average of 20 repetitions per participant (average WE: 18, average EA: 19). ...
Article
Full-text available
Models are the hallmark of mature scientific inquiry. In psychology, this maturity has been reached in a pervasive question-what models best represent facial expressions of emotion? Several hypotheses propose different combinations of facial movements [action units (AUs)] as best representing the six basic emotions and four conversational signals across cultures. We developed a new framework to formalize such hypotheses as predictive models, compare their ability to predict human emotion categorizations in Western and East Asian cultures, explain the causal role of individual AUs, and explore updated, culture-accented models that improve performance by reducing a prevalent Western bias. Our predictive models also provide a noise ceiling to inform the explanatory power and limitations of different factors (e.g., AUs and individual differences). Thus, our framework provides a new approach to test models of social signals, explain their predictive power, and explore their optimization, with direct implications for theory development.
... Although dis-/agreement and confusion is sometimes communicated verbally, many cues and indicators for these complex mental states are expressed through non-verbal behaviour [9], like head movements or hand gestures. In everyday interactions, the human face is a particularly important source of spontaneous reactions, as it can express a wide variety of different complex mental states, like conveying interest or indicating confusion [2,26]. This can be an especially important but also highly complex problem for emotionally intelligent robots [29,35], and affect-sensitive human computer interaction [28], as many people convey similar reactions in very different ways. ...
... Semi-supervised algorithms attempt to alleviate some of these problems by training on small amounts of labeled data combined with very large unlabeled data sets [34,42,50], but are primarily focused on computer vision [23,24,48] and natural language problems [7,13,25]. Previous work on automatic dis-/agreement recognition has focused on using verbal [15,[18][19][20] or non-verbal cues [14,26,38], with only a limited number of approaches combining both modalities [5,22]. For verbal cues, Wang et al. developed an approach on conditional random fields, with which they achieved an F1-score of 57.2% for agreement and 51.2% for disagreement [44]. ...
Preprint
Full-text available
Detecting mental states of human users is crucial for the development of cooperative and intelligent robots, as it enables the robot to understand the user's intentions and desires. Despite their importance, it is difficult to obtain a large amount of high quality data for training automatic recognition algorithms as the time and effort required to collect and label such data is prohibitively high. In this paper we present a multimodal machine learning approach for detecting dis-/agreement and confusion states in a human-robot interaction environment, using just a small amount of manually annotated data. We collect a data set by conducting a human-robot interaction study and develop a novel preprocessing pipeline for our machine learning approach. By combining semi-supervised and supervised architectures, we are able to achieve an average F1-score of 81.1\% for dis-/agreement detection with a small amount of labeled data and a large unlabeled data set, while simultaneously increasing the robustness of the model compared to the supervised approach.
... Although dis-/agreement and confusion is sometimes communicated verbally, many cues and indicators for these complex mental states are expressed through non-verbal behaviour [9], like head movements or hand gestures. In everyday interactions, the human face is a particularly important source of spontaneous reactions, as it can express a wide variety of different complex mental states, like conveying interest or indicating confusion [2,26]. This can be an especially important but also highly complex problem for emotionally intelligent robots [29,35], and affect-sensitive human computer interaction [28], as many people convey similar reactions in very different ways. ...
... Semi-supervised algorithms attempt to alleviate some of these problems by training on small amounts of labeled data combined with very large unlabeled data sets [34,42,50], but are primarily focused on computer vision [23,24,48] and natural language problems [7,13,25]. Previous work on automatic dis-/agreement recognition has focused on using verbal [15,[18][19][20] or non-verbal cues [14,26,38], with only a limited number of approaches combining both modalities [5,22]. For verbal cues, Wang et al. developed an approach on conditional random fields, with which they achieved an F1-score of 57.2% for agreement and 51.2% for disagreement [44]. ...
Conference Paper
Full-text available
Detecting mental states of human users is crucial for the development of cooperative and intelligent robots, as it enables the robot to understand the user's intentions and desires. Despite their importance, it is difficult to obtain a large amount of high quality data for training automatic recognition algorithms as the time and effort required to collect and label such data is prohibitively high. In this paper we present a multimodal machine learning approach for detecting dis-/agreement and confusion states in a human-robot interaction environment, using just a small amount of manually annotated data. We collect a data set by conducting a human-robot interaction study and develop a novel preprocessing pipeline for our machine learning approach. By combining semi-supervised and supervised architectures, we are able to achieve an average F1-score of 81.1% for dis-/agreement detection with a small amount of labeled data and a large unlabeled data set, while simultaneously increasing the robustness of the model compared to the supervised approach.
... Some studies try to externalize the internal states of humans. El Kaliouby et al. developed a system that estimates emotions from facial expressions (el Kaliouby & Robinson, 2005). The system can determine joy, anger, and sadness, but cannot distinguish discomfort or irritation, which is often experienced by audiences. ...
Article
During a presentation, an effective presenter will modify his/her own presentation style by observing the audience's reaction. To hone this skill, we have developed an audience robot that reacts according to the presenter's presentation style. The objective of this study is to propose a system that supports self-reflection after presentation practice with an audience robot, To modify the presentation style, the presenter needs to understand how the audience's psychological state corresponds to each reaction and what presentation style the presenter should adopt to modify the audience's psychological state. In addition. he/she needs to utilize this knowledge at the appropriate time, Therefore. during reflection, the presenter should evaluate whether he/she has such knowledge and whether he/she can utilize it. This study constructs a system that visualizes the presentation data graphically to represent each evaluation item clearly for effective consideration.
... These studies are limited to exaggerated expressions and controlled environments. There are a few tentatives efforts to detect non-basic affective states including mental states ("irritated", "worried"...) (El Kaliouby and Robinson, 2005). But those expressions are closer to natural behavior. ...
Preprint
Full-text available
Facial expression is the most natural means for human beings to communicate their emotions. Most facial expression analysis studies consider the case of acted expressions. Spontaneous facial expression recognition is significantly more challenging since each person has a different way to react to a given emotion. We consider the problem of recognizing spontaneous facial expression by learning discriminative dictionaries for sparse representation. Facial images are represented as a sparse linear combination of prototype atoms via Orthogonal Matching Pursuit algorithm. Sparse codes are then used to train an SVM classifier dedicated to the recognition task. The dictionary that sparsifies the facial images (feature points with the same class labels should have similar sparse codes) is crucial for robust classification. Learning sparsifying dictionaries heavily relies on the initialization process of the dictionary. To improve the performance of dictionaries, a random face feature descriptor based on the Random Projection concept is developed. The effectiveness of the proposed method is evaluated through several experiments on the spontaneous facial expressions DynEmo database. It is also estimated on the well-known acted facial expressions JAFFE database for a purpose of comparison with state-of-the-art methods.
... E Motion recognition is an important part of affective computing, which focuses on identifying and understanding human emotions from facial expressions [1], body gestures [2], speech [3], physiological signals [4], etc. It has potential applications in healthcare and human-machine interactions, e.g., emotion health surveillance [5] and emotion-based music recommendation [6]. ...
Preprint
Full-text available
Emotion recognition is a critical component of affective computing. Training accurate machine learning models for emotion recognition typically requires a large amount of labeled data. Due to the subtleness and complexity of emotions, multiple evaluators are usually needed for each affective sample to obtain its ground-truth label, which is expensive. To save the labeling cost, this paper proposes an inconsistency-based active learning approach for cross-task transfer between emotion classification and estimation. Affective norms are utilized as prior knowledge to connect the label spaces of categorical and dimensional emotions. Then, the prediction inconsistency on the two tasks for the unlabeled samples is used to guide sample selection in active learning for the target task. Experiments on within-corpus and cross-corpus transfers demonstrated that cross-task inconsistency could be a very valuable metric in active learning. To our knowledge, this is the first work that utilizes prior knowledge on affective norms and data in a different task to facilitate active learning for a new task, even the two tasks are from different datasets.
... These single-modality studies have achieved significant progress in their respective fields (Chung et al., 2017). Recently the focus has shifted from single-modality to multimodal emotion recognition (El Kaliouby & Robinson, 2005). Multimodal emotion recognition comprehensively considers the different combinations and interactions of the aforementioned single-modality data (Fan et al., 2023), leveraging the relationships between these data to capture complementary information, thereby establishing emotion recognition models with strong generalization ability and excellent recognition performance. ...
Article
Full-text available
Driven by the global pandemic, the demand for online education has significantly increased, making it crucial to enhance the interactive experience of online teaching to improve student learning outcomes and engagement. In this context, we have designed a model based on Particle Swarm Optimization (PSO) that combines the strengths of LSTNet in handling long-term dependencies and the capabilities of the Prophet model in trend and seasonality modeling. Our PSO-optimized LSTNet-Prophet model outperformed baseline models by 10% in accuracy and 12% in F-1 score, as shown by experiments conducted on the CMU-MOSEI dataset. Additionally, we have implemented a real-time emotion monitoring system capable of analyzing students' emotional states in real-time and providing feedback to teachers, aiding them in promptly adjusting teaching strategies, thereby improving teaching quality and interaction effectiveness. Our method achieved an emotion recognition accuracy of 82.42% and an F-1 score of 82.31%, demonstrating its effectiveness and robustness.
... brow raise) to calculate the likelihood of seven basic emotions, anger, contempt, disgust, fear, joy, sadness, and surprise (Ekman, 1992), engagement (emotional expressiveness-based on facial muscle activation), valence (positive or negative nature of the experience), and attention (point of focus based on head position). The software uses the AFFDEX algorithm (El Kaliouby & Robinson, 2005), which builds on EMFACS mappings developed by Ekman and colleagues since the 1970s (Ekman & Friesen, 2003). A validation study conducted by Stöckli et al. (2018), based on a total of 600 prototypical pictures of faces in different emotional states, resulted in accuracy rates of 73% for the Amsterdam Dynamic Facial Expression Set (ADFES), 66% for the Warsaw Set of Emotional Facial Expression Pictures (WSEFEP), and 77% for the Radboud Faces Database (RaFD). ...
... Esto se debe principalmente a que las emociones básicas representan propiedades universales entre personas de diferentes regiones, y además existe disponibilidad de diversos conjuntos de datos (dataset) digitales de rostros humanos o material relevante que se puede utilizar para el entrenamiento de sistemas automáticos de reconocimiento de emociones. Por otra parte, algunos investigadores también realizan esfuerzos para detectar estados afectivos no básicos usando expresiones faciales deliberadas que incluyen fatiga (Ji et al., 2006), dolor (Littlewort et al., 2007) y estados mentales como concentración, desacuerdo, interés, frustración e inseguridad (Kaliouby & Robinson, 2005). Los trabajos de investigación en el área de emociones y afecto se han orientado al análisis automático de datos de expresiones faciales espontáneas o no deliberadas, tal como se muestra en los trabajos de Cohn (2006), Bartlett et al. (2006), Little wort et al. (2007). ...
Book
Full-text available
En esta obra se presentan conceptos y temas básicos de algunas áreas de conocimiento necesarias para desarrollar sistemas computacionales inteligentes aplicados a la educación, así como su implementación en cuatro trabajos de investigación que incorporan diferentes técnicas y módulos inteligentes (lógica difusa, algoritmos genéticos, o redes neuronales) con el fin de proporcionar al usuario una experiencia de aprendizaje personalizada más adecuada a sus necesidades cognitivas y afectivas.
... In literature, there are different coding systems of facial expressions such as Ekman's FACS or the classification of facial expressions by El Kaliouby and Robinson: the first one is organized on the identification of the movements of the face and head; the second one is based on the combination of sequences identified through a dynamic Bayesian network as regards facial expression. The latter system detects the user's moods that are fundamental in the collaboration between man and robot (Ekman et al., 2002;El Kaliouby, Robinson, 2005). ...
... In literature, there are different coding systems of facial expressions such as Ekman's FACS or the classification of facial expressions by El Kaliouby and Robinson: the first one is organized on the identification of the movements of the face and head; the second one is based on the combination of sequences identified through a dynamic Bayesian network as regards facial expression. The latter system detects the user's moods that are fundamental in the collaboration between man and robot (Ekman et al., 2002;El Kaliouby, Robinson, 2005). ...
... DBNs are often used to reason about relationships among facial displays like AUs [129] and, of course, for AU-label graph representations. The BN is a DAG that reflects a joint probability distribution among a set of variables. ...
Article
Full-text available
As one of the most important affective signals, facial affect analysis (FAA) is essential for developing human-computer interaction systems. Early methods focus on extracting appearance and geometry features associated with human affects while ignoring the latent semantic information among individual facial changes, leading to limited performance and generalization. Recent work attempts to establish a graph-based representation to model these semantic relationships and develop frameworks to leverage them for various FAA tasks. This paper provides a comprehensive review of graph-based FAA, including the evolution of algorithms and their applications. First, the FAA background knowledge is introduced, especially on the role of the graph. We then discuss approaches widely used for graph-based affective representation in literature and show a trend towards graph construction. For the relational reasoning in graph-based FAA, existing studies are categorized according to their non-deep or deep learning methods, emphasizing the latest graph neural networks. Performance comparisons of the state-of-the-art graph-based FAA methods are also summarized. Finally, we discuss the challenges and potential directions. As far as we know, this is the first survey of graph-based FAA methods. Our findings can serve as a reference for future research in this field.
... That can be demonstrated by the FACS (Facial Action Coding System) [8]. AU (action unit) [9] and expressions have some correspondence. For example, there is a greater symmetrical similarity between happiness and contempt compared with the symmetrical similarity between happiness and sadness because happiness and contempt contain AU12, and there is no intersection between the AU domain of happiness and sadness. ...
Article
Full-text available
Existing facial expression recognition methods have some drawbacks. For example, it becomes difficult for network learning on cross-dataset facial expressions, multi-region learning on an image did not extract the overall image information, and a frequency multiplication network did not take into account the inter-class and intra-class features in image classification. In order to deal with the above problems, in our current research, we raise a symmetric mode to extract the inter-class features and intra-class diversity features, and then propose a triple-structure network model based upon MobileNet V1, which is trained via a new multi-branch loss function. Such a proposed network consists of triple structures, viz., a global branch network, an attention mechanism branch network, and a diversified feature learning branch network. To begin with, the global branch network is used to extract the global features of the facial expression images. Furthermore, an attention mechanism branch network concentrates to extract inter-class features. In addition, the diversified feature learning branch network is utilized to extract intra-class diverse features. The network training is performed by using multiple loss functions to decrease intra-class differences and inter-class similarities. Finally, through ablation experiments and visualization, the intrinsic mechanism of our triple-structure network model is proved to be very reasonable. Experiments on the KDEF, MMI, and CK+ datasets show that the accuracy of facial expression recognition using the proposed model is 1.224%, 13.051%, and 3.085% higher than that using MC-loss (VGG16), respectively. In addition, related comparison tests and analyses proved that our raised triple-structure network model reaches better performance than dozens of state-of-the-art methods.
... During the conversation, the face of the "salesperson" was recorded using the iMotions research tool, which is a software platform that integrates a number of biometric technologies (eye tracking, facial recognition, galvanic skin response, and electroencephalography). With the purpose of gaining deeper insight into human emotional reactions via facial expressions, the iMotions software uses the AFFDEX algorithm by Affectiva Inc. (El Kaliouby & Robinson, 2005;McDuff, El Kaliouby, Kassam, and Picard, 2010). The algorithm builds on Emotional Facial Action Coding System (EMFACS) mappings developed by Ekman and colleagues (Ekman & Friesen, 2003;Ekman & Rosenberg, 1997) and uses classified pictures as a training database. ...
Conference Paper
This research examines how a salesperson's personality and the emotions he/she displays during a sales conversation are related, how both are related to buyers' evaluations of the seller, and whether there is an interaction between personality and emotions. Based on data from 63 role-played sales conversations that were analyzed using automated facial recognition, as well as pre-and post-questionnaires, our findings indicate that openness and agreeableness seem to be particularly relevant personality traits with regards to subjective sales performance. Furthermore, overall engagement, expressions of joy, and surprisingly also anger, are positively related to buyer evaluations of the seller. Finally, we found that the emotion of joy positively interacts with agreeableness, but negatively interacts with openness, in influencing buyers' perceptions of the seller.
... An often-overlooked form of nonverbal behavior is head movement, which is commonly used during emotional expression and recognition (Cohn et al., 2004;El Kaliouby & Robinson, 2005). Head movements can support or refute the content of a verbal message. ...
Article
Men with elevated psychopathic traits have been characterized by unique patterns of nonverbal communication, including more fixed and focused head positions during clinical interviews, compared to men scoring low on measures of psychopathy. However, it is unclear whether similar patterns of head dynamics help characterize women scoring high on psychopathic traits. Here, we utilized an automated detection algorithm to assess head position and dynamics during a videotaped clinical interview (i.e., the Psychopathy Checklist – Revised [PCL-R]) in a sample of n = 213 incarcerated women. PCL-R Total, Factor 1 (i.e., interpersonal and affective psychopathic traits), and Factor 2 (i.e., lifestyle/behavioral and antisocial/developmental psychopathic traits) scores were associated with a pattern of head dynamics indicative of a rigid head position. The current study extends analyses of nonverbal behavior studies in men to women and highlights how individuals with elevated psychopathic traits demonstrate unique nonverbal behaviors relative to individuals who score low on psychopathic traits. The implications and clinical value of these findings are discussed.
... At present, a few commercially available versions of AEFEA technology exist, based on the FACS, including the AFFDEX (developed by Affectiva, Boston, MA, USA, distributed by Affectiva and iMotions, Copenhagen, Denmark) [16][17][18]; the FACET (developed by Emotient, San Diego, CA, USA, distributed by iMotions, Copenhagen, Denmark) [19]; and the Noldus FaceReader (developed by VicarVision, Amsterdam, The Netherlands, distributed by Noldus Information Technology) [20]. See Dupré, Krumhuber, Küster, and McKeown's review [21] for others. ...
Article
Full-text available
Automated emotional facial expression analysis (AEFEA) is used widely in applied research, including the development of screening/diagnostic systems for atypical human neurodevelopmental conditions. The validity of AEFEA systems has been systematically studied, but their test–retest reliability has not been researched thus far. We explored the test–retest reliability of a specific AEFEA software, Noldus FaceReader 8.0 (FR8; by Noldus Information Technology). We collected intensity estimates for 8 repeated emotions through FR8 from facial video recordings of 60 children: 31 typically developing children and 29 children with autism spectrum disorder. Test–retest reliability was imperfect in 20% of cases, affecting a substantial proportion of data points; however, the test–retest differences were small. This shows that the test–retest reliability of FR8 is high but not perfect. A proportion of cases which initially failed to show perfect test–retest reliability reached it in a subsequent analysis by FR8. This suggests that repeated analyses by FR8 can, in some cases, lead to the “stabilization” of emotion intensity datasets. Under ANOVA, the test–retest differences did not influence the pattern of cross-emotion and cross-group effects and interactions. Our study does not question the validity of previous results gained by AEFEA technology, but it shows that further exploration of the test–retest reliability of AEFEA systems is desirable.
... The process of detecting AUs from human faces is now possible automatically with tools such as OpenFace, as first mentioned above. Certain combinations of these AUs can then be used to infer an emotional state 1 (Baltrusaitis et al., 2011;Benitez-Quiroz et al., 2016;El Kaliouby & Robinson, 2004). We use emotional states such as valence and arousal. ...
Article
Full-text available
Understanding the way learners engage with learning technologies, and its relation with their learning, is crucial for motivating design of effective learning interventions. Assessing the learners’ state of engagement, however, is non-trivial. Research suggests that performance is not always a good indicator of learning, especially with open-ended constructivist activities. In this paper, we describe a combined multi-modal learning analytics and interaction analysis method that uses video, audio and log data to identify multi-modal collaborative learning behavioral profiles of 32 dyads as they work on an open-ended task around interactive tabletops with a robot mediator. These profiles, which we name Expressive Explorers , Calm Tinkerers , and Silent Wanderers , confirm previous collaborative learning findings. In particular, the amount of speech interaction and the overlap of speech between a pair of learners are behavior patterns that strongly distinguish between learning and non-learning pairs. Delving deeper, findings suggest that overlapping speech between learners can indicate engagement that is conducive to learning. When we more broadly consider learner affect and actions during the task, we are better able to characterize the range of behavioral profiles exhibited among those who learn. Specifically, we discover two behavioral dimensions along which those who learn vary, namely, problem solving strategy (actions) and emotional expressivity (affect). This finding suggests a relation between problem solving strategy and emotional behavior; one strategy leads to more frustration compared to another. These findings have implications for the design of real-time learning interventions that support productive collaborative learning in open-ended tasks.
... Participants' facial activity was recorded via a Logitech HD webcam. The videos were post-processed using the AFFDEX algorithm for automatic facial coding developed by Affectiva Inc. ( El Kaliouby and Robinson, 2005 ;McDuff et al., 2010 ). AFFDEX is grounded in the Facial Action Coding System (FACS) and provides an output for 20 'channels' based on the FACS action units ( Ekman and Friesen, 1975 ). ...
Article
Full-text available
The neuropeptide oxytocin (OT) has been shown to influence social cognition, including better recognition of emotion in faces. One potential way in which OT improves emotion recognition is by increasing the correspondence between a perceiver's own facial activity and observed facial expressions. Here we investigate whether increased facial synchrony while viewing facial expressions increases emotion recognition, and whether this effect is moderated by OT. Change in visual attention as captured by eye-gaze is another way in which OT might improve emotion recognition. We also examine visual attention to observed expressions, and whether this is influenced by OT. One hundred and four male undergraduates took part in a double-blind, randomized, between-subjects study in which they self-administered either a placebo (PL) or 24 IU of OT before viewing dynamic facial expressions of emotion, during which their facial activity and eye-gaze were measured, before answering questions on emotion recognition and affiliation. It was hypothesized that participants in the OT condition would exhibit more facial synchrony than would those in the PL condition, and that OT would influence time spent looking at the eye region of target faces. Consistent with previous research, participants in the OT condition were marginally but significantly better at emotion recognition than those in the PL condition. However, participants in the OT condition displayed less facial synchrony for fearful expressions, and there was no effect of OT on measures of eye-gaze. These results suggest that OT does not improve emotion recognition through increased facial synchrony or changing visual attention.
... The software immediately displays the nature and strength of the emotions and records the conversations as a video. The tool uses the AFFDEX algorithm by Affectiva Inc. (El Kaliouby & Robinson, 2005). The algorithm builds on the Emotional Facial Action Coding System (EMFACS) mappings developed by Ekman and colleagues (Ekman & Friesen, 2003;Ekman & Rosenberg, 1997) and uses classified pictures as a training database. ...
Article
Full-text available
Emotions play a key role in sales negotiations. Thus, sales representatives should be able to be aware and take advantage of their own emotional inventory on the one hand, and to accurately interpret customers' emotions to empathize with them on the other hand. However, it is still unclear how a salesperson's emotions can reinforce or inhibit the creation of a pleasant atmosphere in negotiations and, consequently, increase sales performance. Furthermore, it is even less clear how emotional intelligence can be implemented in sales training. We bridge this gap by introducing a new pedagogical method and reporting on its application in a university sales course. In particular, the proposed educational concept consists of sales negotiation role-plays that are recorded using a facial expression analysis software, putting students at the center and enabling feedback based on objective data. This paper provides support for the success of this interactive approach and discusses the challenges and opportunities with regard to its implementation.
... Kaliouby and Robinson [24] provided the first classification system for agreement and disagreement as well as other mental states based on nonverbal cues only. They used head motion and facial AUs together with a dynamic Bayesian network for classification. ...
Article
Full-text available
Identifying the direction of emotional influence in a dyadic dialogue is of increasing interest in the psychological sciences with applications in psychotherapy, analysis of political interactions, or interpersonal conflict behavior. Facial expressions are widely described as being automatic and thus hard to be overtly influenced. As such, they are a perfect measure for a better understanding of unintentional behavior cues about socio-emotional cognitive processes. With this view, this study is concerned with the analysis of the direction of emotional influence in dyadic dialogues based on facial expressions only. We exploit computer vision capabilities along with causal inference theory for quantitative verification of hypotheses on the direction of emotional influence, i.e., cause-effect relationships, in dyadic dialogues. We address two main issues. First, in a dyadic dialogue, emotional influence occurs over transient time intervals and with intensity and direction that are variant over time. To this end, we propose a relevant interval selection approach that we use prior to causal inference to identify those transient intervals where causal inference should be applied. Second, we propose to use fine-grained facial expressions that are present when strong distinct facial emotions are not visible. To specify the direction of influence, we apply the concept of Granger causality to the time-series of facial expressions over selected relevant intervals. We tested our approach on newly, experimentally obtained data. Based on quantitative verification of hypotheses on the direction of emotional influence, we were able to show that the proposed approach is promising to reveal the cause-effect pattern in various instructed interaction conditions.
... The first four are included in Ekman's universal expressions of emotion (Ekman and Keltner 1997). Pensive is not an emotion per se; however, it is included in our model as it has shown to be a frequent facial expression present in conversation and it is informative of our internal and cognitive states (El Kaliouby and Robinson 2005;Rozin and Cohen 2003). Annotators were instructed to annotate as one of the first 5 categories those segments in which it was clear for them that the expression was present. ...
Preprint
Full-text available
This paper outlines the EMPATHIC Research & Innovation project, which aims to research, innovate, explore and validate new interaction paradigms and plat-forms for future generations of Personalized Virtual Coaches to assist elderly people living independently at and around their home. Innovative multimodal face analytics, adaptive spoken dialogue systems, and natural language inter-faces are part of what the project investigates and innovates, aiming to help dependent aging persons and their carers. It will uses remote, non-intrusive technologies to extract physiological markers of emotional states and adapt respective coach responses. In doing so, it aims to develop causal models for emotionally believable coach-user interactions, which shall engage elders and thus keep off loneliness, sustain health, enhance quality of life, and simplify access to future telecare services. Through measurable end-user validations performed in Spain, Norway and France (and complementary user evaluations in Italy), the proposed methods and solutions will have to demonstrate useful-ness, reliability, flexibility and robustness.
... The study by Yeasin et al. [86] represents one of several studies that uses facial expressions to detect emotions and interest e.g., [89,90] (for a review, see [91]). However, only the Yeasin et al. study is discussed here because it used information from six emotions (surprise, sadness, fear, anger, happiness, and disgust) to measure the level of interest of subjects while they watch different movie clips. ...
Article
Full-text available
The positive effects of interest on different aspects, e.g., learning and education, economy, psychological well-being, and social relations, have been widely addressed by many psychological and physiological studies in the last two decades. While the psychological work has investigated this impact of interest theoretically, the physiological studies have focused more on the modulatory effects. However, some studies have addressed both sides of the effects. In this work, we conduct a comprehensive review of physiological studies on interest detection, from different perspectives carried out between 2003 and 2019. A lack of connection between the psychological and physiological studies was identified. Therefore, this paper aims to integrate the unique psychological and physiological aspects and characteristics of interest to form a base for future research by considering the pros and cons of the included studies. For example, considering the two types of interest (situational and individual) the detected interest in learning, gaming, and advertisement’s physiological experiments could be referring specifically to situational interest. Hence, bridging the gap between both physiological and psychological studies is essential for improving the research on interest. Furthermore, we propose several suggestions for future work direction.
... Through facial expressions, the face is able to communicate countless emotions without any spoken words. Moreover, facial expressions provide substantial evidence of the human's level of interest, understanding, and mental state [15], as well as a continuous feedback of agreement or disagreement within social interactions. According to the universality hypothesis of facial expressions of emotions established since Darwin's seminal work [23], all humans communicate six basic internal emotional states (happiness, surprise, fear, disgust, anger, and sadness) using the same facial movements. ...
Article
Full-text available
In this paper, an approach for Facial Expressions Recognition (FER) based on a multi-facial patches (MFP) aggregation network is proposed. Deep features are learned from facial patches using convolutional neural sub-networks and aggregated within one architecture for expression classification. Besides, a framework based on two data augmentation techniques is proposed to expand FER labels training datasets. Consequently, the proposed shallow convolutional neural networks (CNN) based approach does not need large datasets for training. The proposed framework is evaluated on three FER datasets. Results show that the proposed approach achieves state-of-art FER deep learning approaches performance when the model is trained and tested on images from the same dataset. Moreover, the proposed data augmentation techniques improve the expression recognition rate, and thus can be a solution for training deep learning FER models using small datasets. The accuracy degrades significantly when testing for dataset bias. A fine-tuning can overcome the problem of transition from laboratory-controlled conditions to in-the-wild conditions. Finally, the emotional face is mapped using the MFP-CNN and the contribution of the different facial areas in displaying emotion as well as their importance in the recognition of each facial expression are studied.
... the context of clinical reasoning, it is our contention that students may feel distracted if they look left-right or turn their heads left-right. As for head tilt, it usually indicates the occurrence of a range of cognitive mental states such as concentrating and thinking (El Kaliouby & Robinson, 2005). Moreover, our finding is partially consistent with the research of Grafsgaard et al. (2013), who found that AU01 and AU04 were predictors of student engagement, whereas AU14 predicted task performance and learning gains. ...
Article
In the present paper, we used supervised machine learning algorithms to predict students’ cognitive engagement states from their facial behaviors as 61 students solved a clinical reasoning problem in an intelligent tutoring system. We also examined how high and low performers differed in cognitive engagement levels when performing surface and deep learning behaviors. We found that students’ facial behaviors were powerful predictors of their cognitive engagement states. In particular, we found that the SVM (Support Vector Machine) model demonstrated excellent capacity for distinguishing engaged and less engaged states when 17 informative facial features were added into the model. In addition, the results suggested that high performers did not differ significantly in the general level of cognitive engagement with low performers. There was also no difference in cognitive engagement levels between high and low performers when they performed shallow learning behaviors. However, high performers showed a significantly higher level of cognitive engagement than low performers when conducting deep learning behaviors. This study advances our understanding of how students regulate their engagement to succeed in problem-solving. This study also has significant methodological implications for the automated measurement of cognitive engagement.
... Kaliouby and Robinson [24] provided the first classification system for agreement and disagreement as well as other mental states based on nonverbal cues only. They used head motion and facial AUs together with a dynamic Bayesian network for classification. ...
Preprint
Full-text available
Identifying the direction of emotional influence in a dyadic dialogue is of increasing interest in the psychological sciences with applications in psychotherapy, analysis of political interactions, or interpersonal conflict behavior. Facial expressions are widely described as being automatic and thus hard to overtly influence. As such, they are a perfect measure for a better understanding of unintentional behavior cues about social-emotional cognitive processes. With this view, this study is concerned with the analysis of the direction of emotional influence in dyadic dialogue based on facial expressions only. We exploit computer vision capabilities along with causal inference theory for quantitative verification of hypotheses on the direction of emotional influence, i.e., causal effect relationships, in dyadic dialogues. We address two main issues. First, in a dyadic dialogue, emotional influence occurs over transient time intervals and with intensity and direction that are variant over time. To this end, we propose a relevant interval selection approach that we use prior to causal inference to identify those transient intervals where causal inference should be applied. Second, we propose to use fine-grained facial expressions that are present when strong distinct facial emotions are not visible. To specify the direction of influence, we apply the concept of Granger causality to the time series of facial expressions over selected relevant intervals. We tested our approach on newly, experimentally obtained data. Based on the quantitative verification of hypotheses on the direction of emotional influence, we were able to show that the proposed approach is most promising to reveal the causal effect pattern in various instructed interaction conditions.
... In Medical and Psychiatry, the automatic detection of facial expressions is used to monitor the emotional states of patients including pain detection [32], monitoring of depression [33] and helping individuals on the autism-spectrum [34], epilepsy, and schizophrenic [35]. ...
Thesis
Full-text available
Facial Expression conveys important signs about the human affective state, cognitive activity, intention, and personality. In fact, automatic facial expression recognition systems are getting more interest year after year due to its wide range of applications in several interesting fields, such as human computer/robot interaction, medical applications, animation and video gaming. In this thesis, we deal with the recognition of the basic facial expressions which are: Anger, Disgust, Fear, Happy, Sadness, Surprise and Neutral. In recent years, facial expression recognition field has reached some maturity due to the considerable data augmentation and abundant methods that have reached high performance. In fact, most of the recent studies have concentrated on the recognition of the facial expression for subjects that were not included in the training phase which known as Subject-Independent protocol. Furthermore, there are some works that have studied the generalization ability of their method on Cross-Databases task. The state-of-art approaches are even Hand-Crafted or learned methods. Although Deep Learning architectures have outperformed the facial descriptors, the Convolutional Neural Network architectures require high computational and experimental cost. The main focuses of this thesis is to find the right balance between the accuracy and computational cost. To this end, combining between Hand-Crafted methods and combining between the Hand-Crafted and Learned features are proposed. In this thesis, two fully automatic approaches to recognize the basic facial expressions from static images are proposed. For both approaches, we propose an improved scheme for face detection by using the facial landmarks to align then assign the boundaries of the facial box. In the first approach, we propose an effective method to combine different feature types that are extracted by descriptors possessing different properties. In the second approach, we extract more meaningful shallow features by using more sophisticated facial representation. Moreover, our approach is strengthened by exploiting Deep learning features. We used pre-trained model as feature descriptor to avoid the computation complexity or the need of data augmentation which have restricted Deep learning method. To evaluate our proposed approaches, four popular databases are used with two evaluation protocols which are Within-Database and Cross-Databases. The obtained results show that our proposed approaches are even better of competitive with the state-of-art methods. Keywords: Facial Expression, Facial Representation, Basic Expressions, Hand-Crafted, Face Detection, Deep Learning, Convolutional Neural Network.
... Based on the Facial Action Coding System (FACS), which originally described 44 single action units (AU) including head and eye movements, with each action unit linked with an independent motion on the face and the correponding muscles, for example lip suck motion with the muscle orbicularis oris [22]. Several deep learning techniques have been used to build automatic facial emotion recognition (FER) system, including deep boltzmann machine (DBM), deep belief networks (DBNs) [23][24][25], convolutional neural networks (CNNs) [11,[26][27][28][29], auto-encoders [30][31][32], and recurrent neural networks (RNNs), to mention a few. ...
Article
Full-text available
Using multimodal signals to solve the problem of emotion recognition is one of the emerging trends in affective computing. Several studies have utilized state of the art deep learning methods and combined physiological signals, such as the electrocardiogram (EEG), electroencephalogram (ECG), skin temperature, along with facial expressions, voice, posture to name a few, in order to classify emotions. Spiking neural networks (SNNs) represent the third generation of neural networks and employ biologically plausible models of neurons. SNNs have been shown to handle Spatio-temporal data, which is essentially the nature of the data encountered in emotion recognition problem, in an efficient manner. In this work, for the first time, we propose the application of SNNs in order to solve the emotion recognition problem with the multimodal dataset. Specifically, we use the NeuCube framework, which employs an evolving SNN architecture to classify emotional valence and evaluate the performance of our approach on the MAHNOB-HCI dataset. The multimodal data used in our work consists of facial expressions along with physiological signals such as ECG, skin temperature, skin conductance, respiration signal, mouth length, and pupil size. We perform classification under the Leave-One-Subject-Out (LOSO) cross-validation mode. Our results show that the proposed approach achieves an accuracy of 73.15% for classifying binary valence when applying feature-level fusion, which is comparable to other deep learning methods. We achieve this accuracy even without using EEG, which other deep learning methods have relied on to achieve this level of accuracy. In conclusion, we have demonstrated that the SNN can be successfully used for solving the emotion recognition problem with multimodal data and also provide directions for future research utilizing SNN for Affective computing. In addition to the good accuracy, the SNN recognition system is requires incrementally trainable on new data in an adaptive way. It only one pass training, which makes it suitable for practical and on-line applications. These features are not manifested in other methods for this problem.
Article
Full-text available
Emotions are a vital semantic part of human correspondence. Emotions are significant for human correspondence as well as basic for human–computer cooperation. Viable correspondence between people is possibly achieved when both the importance and the emotion of the correspondence are perceived by all groups included. Understanding the significance of language has generally been concentrated on in natural language processing (NLP) as a semantic examination. In NLP, the text can be handled appropriately for classification. Emotion detection from facial emotion is the subfield of social signal processing applied in a wide assortment of regions, explicitly for human and PC collaboration. Many researchers have proposed various approaches, generally utilizing machine learning concepts. Automatic emotion recognition (AER) is significant for working with consistent intuitiveness between a person and a smart device toward fully acknowledging an intelligent society. Many researchers examined cross-lingual and multilingual speech emotion as a stage toward language-free emotion acknowledgment in natural speech. In the present work, we are proposing a deep learning-based AER system using four openly accessible datasets, namely Basic Arabic Vocal Emotions Dataset (BAVED), Acted Emotional Speech Dynamic Database (AESDD), Urdu written in Latin/Roman Script (URDU), and Toronto Emotional Speech Set (TESS), by utilizing the Jupyter notebook and a Python library for music and audio synthesis named Librosa. The experimental results exhibited that the proposed approach achieves better than the existing approaches, i.e., the accuracy of the proposed system with the URDU dataset is 96.24%, the TESS dataset is 99.10%, the AESDD dataset is 65.97%, and the BAVED dataset is 73.12%.
Article
The objective of this study is to enhance the precision in predicting human emotions through speech signals.This is achieved by introducing a novel approach, the Viola Jones (VJ) method, in contrast to the conventionalHistogram of Oriented Gradients (HOG) algorithm. In this research we used Toronto Emotional Speech Set(TESS) as a dataset for this with a G-power of 0.8, alpha and beta values of 0.05 and 0.2, and a ConfidenceInterval of 95%, sample size is calculated as twenty (ten from Group 1 and ten from Group 2). Viola Jones(VJ) and Histogram of Oriented Gradients, both with the same amount of data samples (N=10), are used toperform the prediction of human emotion recognition from speech signals. The performance of the proposedviola jones is much greater than the accuracy rate of 88.65 percent achieved by the histogram of orientedgradients classifier. This is because the success rate of the proposed viola jones is 95.66 percent. The level ofsignificance that was assessed to be attained by the research was p = 0.001 (p<0.05) which infers the twogroups are statistically significant. For the performance evaluation of human emotion classification fromspeech data, the proposed Viola Jones (VJ) model achieves a greater level of precision than Histogram ofOriented Gradients (HOG).
Article
Full-text available
This paper proposes models for predicting the subjective impressions of interlocutors in discussions according to multimodal nonverbal behaviors. To that end, we focus mainly on the functional aspects of head movement and facial expressions as insightful cues. For example, head movement functions include the speaker’s rhythm and the listener’s back channel and thinking processes, as well as their positive emotions . Facial expression functions include emotional expressions and communicative functions such as the speaker addressing the listener and the listener’s affirmation . In addition, our model employs synergetic functions , which are jointly performed with head movements and facial expressions, assuming that the simultaneous appearance of head and face functions could strengthen the results or lead to multiplexing. On the basis of these nonverbal functions, we define a set of functional features , including the rate of occurrence and composition balance among different functions that emerge during conversation. Then, a feature selection scheme is used to identify the best combinations of intermodal and intramodal features. In the experiments, an SA-Off corpus of 17 groups of discussions involving 4 female participants was used, including interlocutors’ self-reported scores for 16 impression items felt during the discussion, such as helpfulness and interest . The experiments confirmed that our models’ predictions were significantly correlated with the self-reported scores for more than 70% of the impression items. These results indicate the effectiveness of multimodal nonverbal functional features for predicting subjective impressions.
Article
Emotion recognition is a critical component of affective computing. Training accurate machine learning models for emotion recognition typically requires a large amount of labeled data. Due to the subtleness and complexity of emotions, multiple evaluators are usually needed for each affective sample to obtain its ground-truth label, which is expensive. To save the labeling cost, this paper proposes an inconsistency-based active learning approach for cross-task transfer between emotion classification and estimation. Affective norms are utilized as prior knowledge to connect the label spaces of categorical and dimensional emotions. Then, the prediction inconsistency on the two tasks for the unlabeled samples is used to guide sample selection in active learning for the target task. Experiments on within-corpus and cross-corpus transfers demonstrated that cross-task inconsistency could be a very valuable metric in active learning. To our knowledge, this is the first work that utilizes prior knowledge on affective norms and data in a different task to facilitate active learning for a new task, even the two tasks are from different datasets.
Article
Full-text available
As the principal processing method for nonverbal intentions, Facial Expression Recognition (FER) is an important and promising topic of computer vision and artificial intelligence, as well as one of the subject areas of symmetry. This research work provides a thorough and well-organized comprehensive comparative empirical study of facial expression recognition based on a deep learning study in frequency domain, convolution neural network, and local binary patterns features. We have attained the FER by incorporating neutral, joy, anger, fear, sadness, disgust, and surprise as seven universal emotional categories. In terms of methodology, we present a broad framework for a traditional FER approach and analyze the possible technologies that can be used in each component to emphasis the contrasts and similarities. Even though there has been a lot of research done with static images, there is still a lot of work being done to develop new ways that are easier to compute and use less memory than prior methods. This research could pave the way for a new approach to facial emotion identification in terms of accuracy and high-performance.
Conference Paper
Full-text available
The influence of the sales force on buyers in business-to-business market transactions is continuously waning. Particularly due to various new digital information channels buyers are in their decision-making processes only to a limited extent still dependent on the information provided by the sales persons. This development is putting sales forces increasingly under pressure as it is restricting their range of actions. In this conceptual paper it will be shown that regaining influence in the sales process will only be possible by the implementation of an integrated market intelligence system.
Conference Paper
Full-text available
This study investigates whether there is an impact of a salesperson's empathy on buyer's satisfaction and whether the emotions displayed by the salesperson during a sales conversation moderate this relationship. Using PROCESS, data generated from automated emotion tracking of 89 sales conversations are analyzed. Results show that empathy is a significant predictor of satisfaction. The model further reveals that disgust significantly moderates this relationship. Several implications for personal selling and, in particular, emotional intelligence training emerge from this work.
Conference Paper
This research examines if there is an influence of a salesperson's empathy on perceived selling skills and whether the emotions displayed by the salesperson during a sales conversation are related to buyers' evaluations of the seller's empathy. Using structural equation modelling, we analyse data generated from automated emotion tracking of 63 role-played sales conversations, as well as post-questionnaires. Our findings indicate that empathy is a significant predictor of selling skills. Furthermore, the model reveals that overall negative emotions are negatively related to perceived empathy. Specifically, long displays of sadness have a significant negative effect on perceived empathy. Our work has important implications for personal selling and training on emotional intelligence in particular.
Article
Full-text available
To support a healthy human mental state, controlling the environment is one of the best-known solutions. However, it is difficult to design an environmental control system because of the uniqueness of individual preferences and mental state fluctuations. Here, we propose “Buddy system” as an adaptive mental state support solution that controls devices in the environment, depending on the recognized user’s mental state at the time and how it could be improved, serving a role similar to a “buddy” to individuals. The recognition of mental states and the locus of actions to control one’s surrounding environment are implemented on the basis of a brain-derived theory of computation known as active inference and free-energy principles, which provide biologically plausible computations for perceptions and behavior in a changing world. For the generation of actions, we modify the general calculations of active inference to adjust to individual environmental preferences. In the experiments, the Buddy system sought to maintain the participants’ concentration while a calculation task was conducted. As a result, the task performance for most of the participants was improved through the aid of the Buddy system. The results indicated that the Buddy system can adaptively support to improve the mental states of individual users.
Article
Emotion recognition is an important part of affective computing. Human emotions can be described categorically or dimensionally. Accurate machine learning models for emotion classification and estimation usually depend on a large amount of annotated data. However, label acquisition in emotion recognition is costly: obtaining the ground-truth labels of an emotional sample usually requires multiple annotators' assessments, which is expensive and time-consuming. To reduce the labeling effort in multi-task emotions recognition, the paper proposes an inconsistency measure that can indicate the difference between the labels estimated from the feature space and the label distribution of labeled dataset. Using the inconsistency as an indicator of sample informativeness, we further propose an inconsistency-based multi-task cooperative learning framework that integrates multi-task active learning and self-training semi-supervised learning. Experiments in two multi-task emotion recognition scenarios, multi-dimensional emotion estimation and simultaneous emotion classification and estimation, were conducted under this framework. The results demonstrated that the proposed multi-task active learning framework outperformed several single-task and multi-task active learning approaches.
Article
Clinical educators have used robotic and virtual patient simulator systems (RPS) for dozens of years, to help clinical learners (CL) gain key skills to help avoid future patient harm. These systems can simulate human physiological traits; however, they have static faces and lack the realistic depiction of facial cues, which limits CL engagement and immersion. In this article, we provide a detailed review of existing systems in use, as well as describe the possibilities for new technologies from the human–robot interaction and intelligent virtual agents communities to push forward the state of the art. We also discuss our own work in this area, including new approaches for facial recognition and synthesis on RPS systems, including the ability to realistically display patient facial cues such as pain and stroke. Finally, we discuss future research directions for the field.
Article
Full-text available
Attention maps, a popular heatmap‐based explanation method for Visual Question Answering (VQA), are supposed to help users understand the model by highlighting portions of the image/question used by the model to infer answers. However, we see that users are often misled by current attention map visualizations that point to relevant regions despite the model producing an incorrect answer. Hence, we propose Error Maps that clarify the error by highlighting image regions where the model is prone to err. Error maps can indicate when a correctly attended region may be processed incorrectly leading to an incorrect answer, and hence, improve users’ understanding of those cases. To evaluate our new explanations, we further introduce a metric that simulates users’ interpretation of explanations to evaluate their potential helpfulness to understand model correctness. We finally conduct user studies to see that our new explanations help users understand model correctness better than baselines by an expected 30% and that our proxy helpfulness metrics correlate strongly (𝜌 > 0.97) with how well users can predict model correctness.
Article
Full-text available
Interstitial spaces are spheres in-between and borderlines, which join two main functions in the buildings, and therefore play a great role in the mutual communication of humans and place. Liminality and alterity as two applicable terms derived from anthropology and sociology are applied in this research, to describe the characters of a place in-between in the body of the museums. Upon this interdisciplinary approach, the levels of social and emotional interaction were analyzed. The main question is “Which geometrical attribution in liminal spaces of the museums can enhance the social and emotional interaction in the visitors?” To answer the question, Behavioral neuroscience was used as a methodology to observe the behaviors, and body language was employed as an analytical standard. The museum of The Holy defense in Tehran was taken as the case study. Five exhibition halls as in-between joints with the Participation of random visitors were selected for observation and analysis via ObserverX10 software. Finally, it was obtained that the “plaster relief wall” and the “brick passage", had the highest level of emotional interaction, yet, the “brick passage” and the “glass floor” reported the highest level of social engagement. The asymmetric, non-Euclidean, nonlinear, unconnected geometry and Complex polygons are the best selections for using in designing of the in-between spaces of the museums due to the incensement they cause in the activity of the brain. The results of this research can be retrieved as a guideline for designing the boundaries and liminal spaces in future museums.
Article
Full-text available
A functional head-movement corpus and convolutional neural networks (CNNs) for detecting head-movement functions are presented for analyzing the multiple communicative functions of head movements in multiparty face-to-face conversations. First, focusing on the multifunctionality of head movements, i.e., that a single head movement can simultaneously perform multiple functions, this paper defines 32 non-mutually-exclusive function categories, whose genres are speech production, eliciting and giving feedback, turn management, and cognitive and affect display. To represent and capture arbitrary multifunctional structures, our corpus employs multiple binary codes and logical-sum-based aggregations of multiple coders’ judgments. A corpus analysis targeting four-party Japanese conversations revealed multifunctional patterns in which the speaker modulates multiple functions, such as emphasis and eliciting listeners’ responses, through rhythmic head movements, and listeners express various attitudes and responses through continuous back-channel head movements. This paper proposes CNN-based binary classifiers for detecting each of the functions from the angular velocity of the head pose and the presence or absence of utterances. The experimental results showed that the recognition performance varies greatly, from approximately 30% to 90% in terms of the F-score, depending on the function category, and the performance was positively correlated with the amount of data and inter-coder agreement. In addition, we noted a tendency toward overdetection that added more functions to those originally in the corpus. The analyses and experiments confirm that our approach is promising for studying the multifunctionality of head movements.
Chapter
Recognition and analysis of human emotions have attracted a lot of interest in the past two decades and have been researched extensively in neuroscience, psychology, cognitive sciences, and computer sciences. Most of the past research in machine analysis of human emotion has focused on recognition of prototypic expressions of six basic emotions based on data that has been posed on demand and acquired in laboratory settings. More recently, there has been a shift toward recognition of affective displays recorded in naturalistic settings as driven by real world applications. This shift in affective computing research is aimed toward subtle, continuous, and context-specific interpretations of affective displays recorded in real-world settings and toward combining multiple modalities for analysis and recognition of human emotion. Accordingly, this paper explores recent advances in dimensional and continuous affect modelling, sensing, and automatic recognition from visual, audio, tactile, and brain-wave modalities.
ResearchGate has not been able to resolve any references for this publication.