Article

What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS)

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

While we have known for centuries that facial expressions can reveal what people are thinking and feeling, it is only recently that the face has been studied scientifically for what it can tell us about internal states, social behavior, and psychopathology. Today's widely available, sophisticated measuring systems have allowed us to conduct a wealth of new research on facial behavior that has contributed enormously to our understanding of the relationship between facial expression and human psychology. The chapters in this volume present the state-of-the-art in this research. They address key topics and questions, such as the dynamic and morphological differences between voluntary and involuntary expressions, the relationship between what people show on their faces and what they say they feel, whether it is possible to use facial behavior to draw distinctions among psychiatric populations, and how far research on automating facial measurement has progressed. © 1997, 2005 by Oxford University Press, Inc. All rights reserved.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Psychological studies show that Facial Action Unit (AU) * Correspondence: pengxiaojiang@sztu.edu.cn is an objective and common standard to describe physical expression of emotions [9]. For instance, AU 12 and AU 6 indicate lip corner puller and cheek raiser, which relate to happiness; AU 1, AU 4, and AU 6 denote inner brow raiser, brow lower, and cheek raiser, which relate to sadness. ...
... The face images in RAF-AU are FACS-coded by two experienced coders independently. It contains 4601 real-world images with 26 kinds of AUs being annotated,including AU1, 2,4,5,6,7,9,10,12,14,15,16,17,18,20,22,23,24,25,26,27,28,29,35,43. In our experients, we only use the first 21 AUs, which have a strong relationship with expressions. ...
... The AU Branch guides the Emotion branch to learn a good representation of AU to improve the CVT by 2.08%. Due to the lack of AU labels in LSD dataset, we use OpenFace [1] to generate 16 kinds of AU annotations for the LSD, including AU1, 2,4,5,6,7,9,10,12,14,15,17,20,23,25,26. During the training, we feed the images of the LSD into the AU-CVT and calculate the expression loss and the pseudo AU loss. ...
Preprint
The paper describes our proposed methodology for the six basic expression classification track of Affective Behavior Analysis in-the-wild (ABAW) Competition 2022. In Learing from Synthetic Data(LSD) task, facial expression recognition (FER) methods aim to learn the representation of expression from the artificially generated data and generalise to real data. Because of the ambiguous of the synthetic data and the objectivity of the facial Action Unit (AU), we resort to the AU information for performance boosting, and make contributions as follows. First, to adapt the model to synthetic scenarios, we use the knowledge from pre-trained large-scale face recognition data. Second, we propose a conceptually-new framework, termed as AU-Supervised Convolutional Vision Transformers (AU-CVT), which clearly improves the performance of FER by jointly training auxiliary datasets with AU or pseudo AU labels. Our AU-CVT achieved F1 score as $0.6863$, accuracy as $0.7433$ on the validation set. The source code of our work is publicly available online: https://github.com/msy1412/ABAW4
... [19] proves that the driver's emotion affects driving performance, which can be modeled by a U-shaped relationship between performance, arousal, and valence. Given the facial landmark motion, researchers have figured out a way to identify facial expressions with the Facial Action Coding System (FACS) [20]. Also, based on the facial landmarks, a drowsiness alarm can be designed according to the eye aspect ratio (EAR) [21]. ...
... FACS refers to a set of facial muscle movements that correspond to a displayed emotion. The basic elements of FACS are called action units (AU) [20], which include 46 main action units, such as inner brow raiser, outer brow raiser, cheek raiser, and nose wrinkle; 8 head movement action units, such as head turn left, head turn right, head up and head down; and four eye movement, including eyes, turn left, right, up, and down. Specific joint activities of facial muscles pertain to a displayed emotion. ...
... The risk heat map is created to visualize the risk scores for each segment. The risk scores are divided into five very small levels (if the risk score falls in [0,20)), small (if the risk score falls in [20,40)), medium (if the risk score falls in [40,60)), large (if the risk score falls in [60,80)), and very large (if the risk score falls in [80,100]). The corresponding colors assigned to each level are green, blue, yellow, orange, and red. ...
Preprint
Risk assessment of roadways is commonly practiced based on historical crash data. Information on driver behaviors and real-time traffic situations is sometimes missing. In this paper, the Safe Route Mapping (SRM) model, a methodology for developing dynamic risk heat maps of roadways, is extended to consider driver behaviors when making predictions. An Android App is designed to gather drivers' information and upload it to a server. On the server, facial recognition extracts drivers' data, such as facial landmarks, gaze directions, and emotions. The driver's drowsiness and distraction are detected, and driving performance is evaluated. Meanwhile, dynamic traffic information is captured by a roadside camera and uploaded to the same server. A longitudinal-scanline-based arterial traffic video analytics is applied to recognize vehicles from the video to build speed and trajectory profiles. Based on these data, a LightGBM model is introduced to predict conflict indices for drivers in the next one or two seconds. Then, multiple data sources, including historical crash counts and predicted traffic conflict indicators, are combined using a Fuzzy logic model to calculate risk scores for road segments. The proposed SRM model is illustrated using data collected from an actual traffic intersection and a driving simulation platform. The prediction results show that the model is accurate, and the added driver behavior features will improve the model's performance. Finally, risk heat maps are generated for visualization purposes. The authorities can use the dynamic heat map to designate safe corridors and dispatch law enforcement and drivers for early warning and trip planning.
... The formulae for the metrics as mentioned-above are given in Eqs. (12)(13)(14)(15)(16)(17)(18) below. Performance metrics for the existing and proposed approaches are given in Fig. 6a, b, c, d, e and f. ...
... Real and fake emotions of contempt and surprise in the SASE-FE dataset had few similarities with other emotions. Sample images of 12 classes of real and fake emotions of the FED dataset are given in Fig. 4. Facial Action Coding System FACS [18] was used to extract AU's from images of real and fake emotions. AU's are presented in Table 5 which shows a set of unique AU's in real and fake emotions, which helps to discriminate them both. ...
Article
Full-text available
Differentiating real and fake emotions becomes a new challenge in facial expression recognition and emotion detection. Real and fake emotions should be taken into account when developing an application. Otherwise, a fake emotion can be categorized as real emotion thereby rendering the model as futile. Very limited research has dealt with identifying fake emotions with accuracy as results are in a range of 51–76%. Performance of the available methods in detecting fake emotions is not encouraging. Thus, in this paper, we have proposed Enhanced Boosted Support Vector Machine (EBSVM) algorithm. EBSVM is a novel technique to determine important thresholds required to understand fake emotions. We have created a new dataset named FED comprising both real and fake emotion images of 50 subjects and used them with experiments along with SASE-FE. EBSVM considers the entire data for classification at each iteration using the ensemble classifier. The EBSVM algorithm achieved 98.08% as classification accuracy for different K-fold validations.
... A call to these emotion recognition services takes a video or an image as an input and analyzes its emotional content returning a set of values using JSON syntax. After parsing the returned JSON, a value ranging from 0 to 100 is obtained for each of the six universal emotions (anger, disgust, fear, joy, sadness and surprise) [19] plus contempt and neutral. This value indicates the "activation" level for each expression. ...
... The Cohn-Kanade database consists in approximately 500 frontal camera image sequences from 100 subjects. Accompanying meta-data include annotation of FACS action units (AUs) (i.e., micro-expressions) [19]. However, image sequences have no sound and are only labelled in terms of action units but not emotional expressions. ...
Article
Full-text available
Emotionally responsive agents that can simulate emotional intelligence increase the acceptance of users towards them, as the feeling of empathy reduces negative perceptual feedback. This has fostered research on emotional intelligence during last decades, and nowadays numerous cloud and local tools for automatic emotional recognition are available, even for inexperienced users. These tools however usually focus on the recognition of discrete emotions sensed from one communication channel, even though multimodal approaches have been shown to have advantages over unimodal approaches. Therefore, the objective of this paper is to show our approach for multimodal emotion recognition using Kalman filters for the fusion of available discrete emotion recognition tools. The proposed system has been modularly developed based on an evolutionary approach so to be integrated in our digital ecosystems, and new emotional recognition sources can be easily integrated. Obtained results show improvements over unimodal tools when recognizing naturally displayed emotions.
... The visual modality, usually featured by facial expressions [1,2] , is one of the most dominant modalities for emotion recognition. By utilizing ei-ther a finely hand-crafted descriptor, e.g., facial action coding system (FACS) [3] , or a powerful convolutional neural network, e.g., the Resnet for feature extraction, an emotion recognition method can achieve promising results. In recent years, Electroencephalography (EEG) has drawn considerable attention from researchers [4] , due to its simple, cheap, portable, and easy-to-use solution for identifying emotions [5] . ...
... A Resnet50 is used as the visual backbone. It is pre-trained on the MS-CELEB-1M dataset 3 [31] as a facial recognition task, it is then fine-tuned on the FER+ [35] dataset as a facial expression recognition task. ...
Article
Full-text available
Visual modality is one of the most dominant modalities for current continuous emotion recognition methods. Compared to which the EEG modality is relatively less sound due to its intrinsic limitation such as subject bias and low spatial resolution. This work attempts to improve the continuous prediction of the EEG modality by using the dark knowledge from the visual modality. The teacher model is built by a cascade convolutional neural network - temporal convolutional network (CNN-TCN) architecture, and the student model is built by TCNs. They are fed by video frames and EEG average band power features, respectively. Two data partitioning schemes are employed, i.e., the trial-level random shuffling (TRS) and the leave-one-subject-out (LOSO). The standalone teacher and student can produce continuous prediction superior to the baseline method, and the employment of the visual-to-EEG cross-modal KD further improves the prediction with statistical significance, i.e., p-value <0.01 for TRS and p-value <0.05 for LOSO partitioning. The saliency maps of the trained student model show that the brain areas associated with the active valence state are not located in precise brain areas. Instead, it results from synchronized activity among various brain areas. And the fast beta and gamma waves, with the frequency of 18−30Hz and 30−45Hz, contribute the most to the human emotion process compared to other bands. The code is available at https://github.com/sucv/Visual_to_EEG_Cross_Modal_KD_for_CER.
... In the past decade, it has been used widely in a variety of contexts, such as psychology [5,23,24], social psychology [25] and advertising research [26]. It is a facial analysis program based on the Facial Action Coding System (FACS, [27,28]). FACS is based on the visually discernible movement of a set of facial muscles referred to as 44 action units and on several head and eye movements. ...
... As differences exist between men and women regarding the semiology of ASPD and especially their level of impulsivity [54], it would be interesting in a future study to examine whether our results can be generalized to women. Finally, we chose to focus the present study on the theory of basic emotions [28,55] using the dominant FE as an indicator. However, other studies propose a more functional conceptualization of emotions more adapted to studying the pathological expression of emotions [56,57]. ...
Article
Full-text available
While a deficit in the recognition of facial expression has been demonstrated in persons with antisocial personality disorder (ASPD), few studies have investigated how individuals with ASPD produce their own emotional facial expressions. This study examines the production of facial emotional expressions of male inpatients with ASPD in a forensic hospital compared with a control group as they retrieve autobiographical memories. This design constitutes a specific ecological experimental approach fostering the evocation of personal feelings. Two indicators characterizing the activation of facial expression were used: activation of emotional action units and emotional dominance. The results showed that individuals with ASPD 1) activated angrier facial expressions than control participants for both indicators, 2) displayed a higher dominance of angry facial expressions during the retrieval of positive self-defining memories than control participants and 3) recalled significant memories that were less associated with neutral facial states than the control sample, regardless of the valence of their memories. These findings highlight the core role of anger in ASPD and the possible development of pathological anger, which would distinguish trajectories toward anxious or mood disorders and trajectories characterized by external disorders.
... AFFDEX uses the facial action coding system, an objective taxonomy for detecting prominent facial landmarks and features, for classifying facial actions, and for modeling expressed emotions (Ekman and Rosenberg, 1997;Magdin et al., 2019). Once the subject's face is recognized by the software, it registers key facial landmarks that are used to detect emotion-related features. ...
Article
How do emotions influence one’s willingness to take on risk? We show that answering this question can be particularly challenging because induced emotions are highly sensitive to a respondent’s contemporaneous experiences and can be rapidly diluted while respondents answer subsequent survey questions. We randomly assign respondents to one of three emotion-inducing videos (positive, negative, and neutral), and we also randomize whether subjects complete a self-assessment survey tool measuring emotions prior to completing risk preference elicitation tasks and immediately after watching the emotion-inducing video. We verify changes in emotions for all respondents by using facial expression analysis software. Respondents who watched a positive video and skipped the emotion-measuring survey were less risk averse, particularly in the first of two preference elicitation tasks. Our findings indicate that estimates of how induced emotions affect risk aversion may be attenuated by including any intermediate tasks, including a survey to measure such emotions. PsychINFO Classification Code: 2360.
... Note, however, that the facial expressions have a rich and controversial history of indexing emotions (Darwin, 1872;Barrett et al., 2019). The successful encoding of facial muscle movement patterns as facial Action Units (AU) is based on the Facial Action Unit Coding System (FACS) (Ekman et al., 1983;Ekman and Rosenberg, 2005). Recent research have shown the ability to use FACS as a way to quantify human attention and affect (Lints-Martindale et al., 2007;Hamm et al., 2011), and pain (Kunz et al., 2019). ...
Article
Full-text available
A key goal of cognitive neuroscience is to better understand how dynamic brain activity relates to behavior. Such dynamics, in terms of spatial and temporal patterns of brain activity, are directly measured with neurophysiological methods such as EEG, but can also be indirectly expressed by the body. Autonomic nervous system activity is the best-known example, but, muscles in the eyes and face can also index brain activity. Mostly parallel lines of artificial intelligence research show that EEG and facial muscles both encode information about emotion, pain, attention, and social interactions, among other topics. In this study, we examined adults who stutter (AWS) to understand the relations between dynamic brain and facial muscle activity and predictions about future behavior (fluent or stuttered speech). AWS can provide insight into brain-behavior dynamics because they naturally fluctuate between episodes of fluent and stuttered speech behavior. We focused on the period when speech preparation occurs, and used EEG and facial muscle activity measured from video to predict whether the upcoming speech would be fluent or stuttered. An explainable self-supervised multimodal architecture learned the temporal dynamics of both EEG and facial muscle movements during speech preparation in AWS, and predicted fluent or stuttered speech at 80.8% accuracy (chance=50%). Specific EEG and facial muscle signals distinguished fluent and stuttered trials, and systematically varied from early to late speech preparation time periods. The self-supervised architecture successfully identified multimodal activity that predicted upcoming behavior on a trial-by-trial basis. This approach could be applied to understanding the neural mechanisms driving variable behavior and symptoms in a wide range of neurological and psychiatric disorders. The combination of direct measures of neural activity and simple video data may be applied to developing technologies that estimate brain state from subtle bodily signals.
... To counteract the drawbacks of deep learning approaches, the literature [16] used a multitask learning strategy in neural network construction to enhance the primary task by shifting the learning number of different tasks. In addition, the literature [17,18] added facial detection flags in the feature design of the facial action structure unit, which can aid in improving facial emotion recognition accuracy. In terms of multitask parameters, most of the previous studies launched optimization based on hard parameter sharing, but this approach limited the recognition efficiency of facial expressions to some extent. ...
Article
Full-text available
To assess students’ learning efficiency under different teaching modes, we used students’ facial expressions in the classroom as a study point. An enhanced generative adversarial network is presented. We designed a generator as an automatic coding-decoding combination in a cascade structure with a discriminator configuration. It can retain different expression intensity features to the maximum extent. We also added a new auxiliary classifier, which can classify different intensity features and improve the model’s recognition of detailed features of similar expressions, thus improving the comprehensive facial expression recognition accuracy. Our approach has a great advantage over the other facial expression recognition approaches on public datasets. Finally, we conduct experimental validation on the self-made student facial expression dataset in all cases. The experimental findings showed that our approach’s recognition accuracy is superior to that of other methods, demonstrating the method’s efficacy.
... While the current study used all deliberate human expressions at the expense of ecological validity, this methodology has an advantage in controlling the overall duration and the position of the peak. Previous studies point out that there may be multiple peaks in spontaneous facial reactions [49,76], and thus, future research will need to take into account such complexity that cannot be investigated in deliberate expressions. Further, 2000 ms before and after the peak of expression were arbitrarily extracted in this study. ...
Article
Full-text available
Reading the genuineness of facial expressions is important for increasing the credibility of information conveyed by faces. However, it remains unclear which spatio-temporal characteristics of facial movements serve as critical cues to the perceived genuineness of facial expressions. This study focused on observable spatio-temporal differences between perceived-as-genuine and deliberate expressions of happiness and anger expressions. In this experiment, 89 Japanese participants were asked to judge the perceived genuineness of faces in videos showing happiness or anger expressions. To identify diagnostic facial cues to the perceived genuineness of the facial expressions, we analyzed a total of 128 face videos using an automated facial action detection system; thereby, moment-to-moment activations in facial action units were annotated, and nonnegative matrix factorization extracted sparse and meaningful components from all action units data. The results showed that genuineness judgments reduced when more spatial patterns were observed in facial expressions. As for the temporal features, the perceived-as-deliberate expressions of happiness generally had faster onsets to the peak than the perceived-as-genuine expressions of happiness. Moreover, opening the mouth negatively contributed to the perceived-as-genuine expressions, irrespective of the type of facial expressions. These findings provide the first evidence for dynamic facial cues to the perceived genuineness of happiness and anger expressions.
... In this approach, the focus is on extracting facial microexpressions instead of facial macro-expressions. Facial macroexpressions or intense facial expressions are voluntary muscle movements in the face that are distinguishable, cover a large area of the face, and their duration is between 0.5 and 4 s (Ekman and Rosenberg, 1997). In contrast, facial micro-expressions refer to brief and involuntary facial changes like the upturn of the inner eyebrows or wrinkling of the nose that happen spontaneously in response to external stimuli, typically over a short time frame of between 65 and 500 ms (Yan et al., 2013). ...
Article
Full-text available
Emotions are multimodal processes that play a crucial role in our everyday lives. Recognizing emotions is becoming more critical in a wide range of application domains such as healthcare, education, human-computer interaction, Virtual Reality, intelligent agents, entertainment, and more. Facial macro-expressions or intense facial expressions are the most common modalities in recognizing emotional states. However, since facial expressions can be voluntarily controlled, they may not accurately represent emotional states. Earlier studies have shown that facial micro-expressions are more reliable than facial macro-expressions for revealing emotions. They are subtle, involuntary movements responding to external stimuli that cannot be controlled. This paper proposes using facial micro-expressions combined with brain and physiological signals to more reliably detect underlying emotions. We describe our models for measuring arousal and valence levels from a combination of facial micro-expressions, Electroencephalography (EEG) signals, galvanic skin responses (GSR), and Photoplethysmography (PPG) signals. We then evaluate our model using the DEAP dataset and our own dataset based on a subject-independent approach. Lastly, we discuss our results, the limitations of our work, and how these limitations could be overcome. We also discuss future directions for using facial micro-expressions and physiological signals in emotion recognition.
... Facial expressions are a typical research subject in anthropology, sociology, neuroscience, psychology, and humancomputer interface (HCI); in addition, they are studied with considerable interest, particularly in the emotion research field. Ekman et al. [12] objectively and comprehensively described the relationship between what people display on their faces and what they feel using a facial action coding system (FACS). Kapoor et al. [13] analyzed head movements, postures, facial expressions, skin conductivity, and computer mouse pressure to propose a system that senses frustration in a learning environment. ...
Article
Full-text available
This study proposes a pleasure–arousal–outlier (PAO) model to quantify the experiences derived from games. The proposed technique identifies pleasure, arousal, and outlier levels based on the facial expression of a user, keyboard input information, and mouse movement information received from a multimodal interface and then projects the received information in three-dimensional space to quantify the game experience state of the user. Facial expression recognition and distribution, eye blink, and eye glance concentration graphs were introduced to determine the immersion levels of games. A convolutional neural network-based facial expression recognition algorithm and dynamic time warp-based outlier behavior detection algorithm were adopted to obtain numerical values required for the PAO model evaluation. We applied the proposed PAO model for first-person shooter games and consequently acquired evaluation result values that were clearly distinguishable for different players. Such information allows player experiences to be quantitatively evaluated when designing game levels.
... The use of a robotic head allows for the interaction through speech, facial expressions and body language. Cid et al. [19] also presented a software architecture that detects, recognizes, classifies and generates facial expressions using the Facial Action Coding System (FACS) [20,21] and also compared the scientific literature describing the implementation of different robotic heads according to their appearance, sensors used, degrees of freedom (DOF) and the use of the FACS. ...
Article
Full-text available
One direct way to express the sense of attention in a human interaction is through the gaze. This paper presents the enhancement of the sense of attention from the face of a human-sized mobile robot during an interaction. This mobile robot was designed as an assistance mobile robot and uses a flat screen at the top of the robot to display an iconic (simplified) face with big round eyes and a single line as a mouth. The implementation of eye-gaze contact from this iconic face is a problem because of the difficulty of simulating real 3D spherical eyes in a 2D image considering the perspective of the person interacting with the mobile robot. The perception of eye-gaze contact has been improved by manually calibrating the gaze of the robot relative to the location of the face of the person interacting with the robot. The sense of attention has been further enhanced by implementing cyclic face explorations with saccades in the gaze and by performing blinking and small movements of the mouth.
... Action Unit detection AU Action Units (AUs) encode movements of facial muscles and their intensity according to the Facial Action Coding System (FACS) 14 . ...
Article
Full-text available
This work focuses on facial processing, which refers to artificial intelligence (AI) systems that take facial images or videos as input data and perform some AI-driven processing to obtain higher-level information (e.g. a person’s identity, emotions, demographic attributes) or newly generated imagery (e.g. with modified facial attributes). Facial processing tasks, such as face detection, face identification, facial expression recognition or facial attribute manipulation, are generally studied as separate research fields and without considering a particular scenario, context of use or intended purpose. This paper studies the field of facial processing in a holistic manner. It establishes the landscape of key computational tasks, applications and industrial players in the field in order to identify the 60 most relevant applications adopted for real-world uses. These applications are analysed in the context of the new proposal of the European Commission for harmonised rules on AI (the AI Act) and the 7 requirements for Trustworthy AI defined by the European High Level Expert Group on AI. More particularly, we assess the risk level conveyed by each application according to the AI Act and reflect on current research, technical and societal challenges towards trustworthy facial processing systems.
... Bi-GRU [34]. e process is repeated for each sampled frame within the vocalized video sequence. ...
Article
Full-text available
Multimodal sentiment analysis has been an active subfield in natural language processing. This makes multimodal sentiment tasks challenging due to the use of different sources for predicting a speaker’s sentiment. Previous research has focused on extracting single contextual information within a modality and trying different modality fusion stages to improve prediction accuracy. However, a factor that may lead to poor model performance is that this does not consider the variability between modalities. Furthermore, existing fusion methods tend to extract the representational information of individual modalities before fusion. This ignores the critical role of intermodal interaction information for model prediction. This paper proposes a multimodal sentiment analysis method based on cross-modal attention and gated cyclic hierarchical fusion network MGHF. MGHF is based on the idea of distribution matching, which enables modalities to obtain representational information with a synergistic effect on the overall sentiment orientation in the temporal interaction phase. After that, we designed a gated cyclic hierarchical fusion network that takes text-based acoustic representation, text-based visual representation, and text representation as inputs and eliminates redundant information through a gating mechanism to achieve effective multimodal representation interaction fusion. Our extensive experiments on two publicly available and popular multimodal datasets show that MGHF has significant advantages over previous complex and robust baselines.
... Admittedly, this subject is still in its infancy which does not come as a surprise as this gesture type has long remained a marginal research phenomenon. On the other hand, the study of affect and emotion has been dominated by a paradigm that considers emotions to be static and discrete (Ekman & Rosenberg, 1997). Both approaches share the assumption that affect and emotions are symbolized, displayed and thus represented by gestures as if they were "enclosed in an inner mental sphere to be deciphered from outside" (Fuchs & De Jaegher, 2009, p. 479). ...
Article
This paper introduces the Slapping movement as an embodied practice of dislike or meta-commentary recurring in conflictive situations between German children aged four to six ( Hotze, 2019 ). Children move this way primarily in stopping a co-participant’s action and protesting against the action to be stopped. The Slapping movements documented showed different manners of execution. Some forms appeared to be very expressive, others were more schematic. Inspired by a phenomenological approach to gestures our analysis shows that the movement qualities show different degrees of communicative effort and affective intensity which respond to the inter-affective dynamics unfolding between the participants of a situation. This means that the affective intensities unfolding in an interaction not only give rise to the Slapping movement, but they also influence how the hands are moved. In more detail, we observed that the higher the affective intensities become the larger and more vigorous the Slapping movements are.
... According to it, the content of an utterance alone conveys 7% of the emotional state, the tone of voice accounts for 38%, while non-verbal communication (facial expressions, gestures) accounts for as much as 55%. This indicates that the ability to recognise emotions from the face is highly important [4]. On the other hand, it should be pointed out that emotional expressions are fairly universal, i.e., they are recognised by members of dierent cultures even those that have had no contact with each other [5]. ...
Chapter
Full-text available
Facial expressions convey the vast majority of the emotional information contained in social utterances. From the point of view of affective intelligent systems, it is therefore important to develop appropriate emotion recognition models based on facial images. As a result of the high interest of the research and industrial community in this problem, many ready-to-use tools are being developed, which can be used via suitable web APIs. In this paper, two of the most popular APIs were tested: Microsoft Face API and Kairos Emotion Analysis API. The evaluation was performed on images representing 8 emotions—anger, contempt, disgust, fear, joy, sadness, surprise and neutral—distributed in 4 benchmark datasets: Cohn-Kanade (CK), Extended Cohn-Kanade (CK+), Amsterdam Dynamic Facial Expression Set (ADFES) and Radboud Faces Database (RaFD). The results indicated a significant advantage of the Microsoft API in the accuracy of emotion recognition both in photos taken en face and at a 45∘ angle. Microsoft’s API also has an advantage in the larger number of recognised emotions: contempt and neutral are also included.
... The upper-facial movements and head rotations are represented by mean of Action Units (AUs) as defined in the Facial Action Coding Systems (FACS) [17] and 3D head angles. The work presented in this paper only considers the AUs that represent eyebrows movements which are: inner raise eyebrow AU1, outer raise eyebrow AU2, frown AU4, upper lid raiser AU5, cheek raiser AU6, and lid tightener AU7. ...
Conference Paper
Full-text available
We propose a semantically-aware speech driven model to generate expressive and natural upper-facial and head motion for Embodied Conversational Agents (ECA). In this work, we aim to produce natural and continuous head motion and upper-facial gestures synchronized with speech. We propose a model that generates these gestures based on multimodal input features: the first modality is text, and the second one is speech prosody. Our model makes use of Transformers and Convolutions to map the multimodal features that correspond to an utterance to continuous eyebrows and head gestures. We conduct subjective and objective evaluations to validate our approach and compare it with state of the art.
... As for strengths, objective measures of fear-reactivity were used, measuring multiple aspects of the fear response. Facial expressions are considered the most objective and precise measures of basic emotion (Ekman & Rosenberg, 2005;Skiendziel et al., 2019). EDA is also one of the most widely used indicators of sympathetic arousal and is commonly used as a measure of fear-related psychophysiology in fear-conditioning paradigms and other forms of laboratory fear-elicitation (Esteves et al., 1994;Vervliet et al., 2004). ...
Article
Full-text available
Objective: A significant proportion of military veterans successfully transition out of the military into civilian careers as first responders, such as firefighters. Like military service, being a firefighter is a high-risk profession involving exposure to aversive environments. Thus, it is possible that military experience might serve to buffer or exacerbate risk for negative psychological outcomes in firefighters. However, both occupations are associated with increased risk for psychopathology, such as PTSD, and little research has examined the effect of military service on processes that underlie stress in veterans serving as active-duty firefighters. The current study explores whether military service confers an adaptive advantage or an additional risk. Method: Using a case-control design, we examined differences in fear reactivity through electrodermal activity (EDA) and recording of fearful facial expressions, between 32 firefighters with and 32 firefighters without military veteran status (MVS; all men). Participants completed a semistructured, emotionally evocative interview with multiple contexts eliciting varying levels of emotion. Results: MVS firefighters had relatively elevated EDA across contexts. However, lower baseline levels indicated calmer resting state in MVS firefighters. There was greater incidence of lifetime PTSD in MVS compared with non-MVS firefighters (40.6% vs. 15.6%). Overall, firefighters with past PTSD had less EDA reactivity. Finally, number of military deployments was associated with higher fear expressions throughout the interview. Conclusions: These findings highlight the need to consider interactions between military experience and psychiatric history in future investigations examining risk and resilience in first responders. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
... Facial action unit system (FACS) [5] is a comprehensive and objective system to describe facial expressions. It defines a unique set of basic facial muscle movements, called action units (AU). ...
Preprint
Facial Action Unit (AU) detection is a crucial task for emotion analysis from facial movements. The apparent differences of different subjects sometimes mislead changes brought by AUs, resulting in inaccurate results. However, most of the existing AU detection methods based on deep learning didn't consider the identity information of different subjects. The paper proposes a meta-learning-based cross-subject AU detection model to eliminate the identity-caused differences. Besides, a transformer-based relation learning module is introduced to learn the latent relations of multiple AUs. To be specific, our proposed work is composed of two sub-tasks. The first sub-task is meta-learning-based AU local region representation learning, called MARL, which learns discriminative representation of local AU regions that incorporates the shared information of multiple subjects and eliminates identity-caused differences. The second sub-task uses the local region representation of AU of the first sub-task as input, then adds relationship learning based on the transformer encoder architecture to capture AU relationships. The entire training process is cascaded. Ablation study and visualization show that our MARL can eliminate identity-caused differences, thus obtaining a robust and generalized AU discriminative embedding representation. Our results prove that on the two public datasets BP4D and DISFA, our method is superior to the state-of-the-art technology, and the F1 score is improved by 1.3% and 1.4%, respectively.
... Note, however, that the facial expressions have a rich and controversial history of indexing emotions (Darwin, 1872;Barrett et al., 2019). The successful encoding of facial muscle movement patterns as facial Action Units (AU) is based on the Facial Action Unit Coding System (FACS) (Ekman et al., 1983;Ekman and Rosenberg, 2005). Recent research have shown the ability to use FACS as a way to quantify human attention and affect (Lints-Martindale et al., 2007;Hamm et al., 2011), and pain (Kunz et al., 2019). ...
... Ekman's Facial Action Coding System (FACS) [11] is a widely used framework for identifying emotional facial expressions (EFEs) and assessing their intensities by human coders based on the anatomical features of the face. It also serves as the foundation for making emotion estimations in several AEFEA technologies [12], as it is highly valid and reliable when used by human coders [13,14]. It can be elaborated precisely enough to be implemented in computational algorithms [7,15]. ...
Article
Full-text available
Automated emotional facial expression analysis (AEFEA) is used widely in applied research, including the development of screening/diagnostic systems for atypical human neurodevelopmental conditions. The validity of AEFEA systems has been systematically studied, but their test–retest reliability has not been researched thus far. We explored the test–retest reliability of a specific AEFEA software, Noldus FaceReader 8.0 (FR8; by Noldus Information Technology). We collected intensity estimates for 8 repeated emotions through FR8 from facial video recordings of 60 children: 31 typically developing children and 29 children with autism spectrum disorder. Test–retest reliability was imperfect in 20% of cases, affecting a substantial proportion of data points; however, the test–retest differences were small. This shows that the test–retest reliability of FR8 is high but not perfect. A proportion of cases which initially failed to show perfect test–retest reliability reached it in a subsequent analysis by FR8. This suggests that repeated analyses by FR8 can, in some cases, lead to the “stabilization” of emotion intensity datasets. Under ANOVA, the test–retest differences did not influence the pattern of cross-emotion and cross-group effects and interactions. Our study does not question the validity of previous results gained by AEFEA technology, but it shows that further exploration of the test–retest reliability of AEFEA systems is desirable.
... The former is common and easy to understand the expression of face behavior. The latter comes from Facial Action Coding System (FACS) [14], which is the domain knowledge resulted from human expert research. The facial action units could more subtly describe the action information of the local area of the face. ...
Article
Full-text available
Facial expression recognition has been widely used in lots of fields such as health care and intelligent robot systems. However, recognizing facial expression in the wild is still very challenging due to variations, light intensity, occlusions and the ambiguity of human emotion. When training samples cannot include all these environments, the classification can easily lead to errors. Therefore, this paper proposes a new heuristic objective function based on the domain knowledge so as to better optimize deep neural networks for facial expression recognition. Moreover, we take the specific relationship between the facial expression and facial action units as the domain knowledge. By analyzing the mixing relationship between different expression categories and then enlarging the distance of easily confused categories, we define a new heuristic objective function which can guide deep neural network to learn better features and then improve the accuracy of facial expression recognition. The experimental results verify the effectiveness, universality and the superior performance of our methods.
... As observed in FACS (Ekman 1997), for specific expression, some AU activations are strongly correlated such that many previous works exploit the intrinsic dependencies among AUs for AU detection. (Zhao, Chu, and Zhang 2016) proposed to combine the region learning with multi-label learning in a unified deep network, which simultaneously captures the important region and the AU dependencies. ...
Article
Capturing the dependencies among different facial action units (AU) is extremely important for the AU detection task. Many studies have employed graph-based deep learning methods to exploit the dependencies among AUs. However, the dependencies among AUs in real world data are often noisy and the uncertainty is essential to be taken into consideration. Rather than employing a deterministic mode, we propose an uncertain graph neural network (UGN) to learn the probabilistic mask that simultaneously captures both the individual dependencies among AUs and the uncertainties. Further, we propose an adaptive weighted loss function based on the epistemic uncertainties to adaptively vary the weights of the training samples during the training process to account for unbalanced data distributions among AUs. We also provide an insightful analysis on how the uncertainties are related to the performance of AU detection. Extensive experiments, conducted on two benchmark datasets, i.e., BP4D and DISFA, demonstrate our method achieves the state-of-the-art performance.
... MakeHuman toolkit is an open-source and free 3D computer graphics toolset designed for prototyping human-like models. FACSHuman offers the possibility of manipulating the Action Units (AU) presented in the Facial Action Coding System (FACS) [9] on the 3D models created in the MakeHuman software. This manipulation of AUs is a key component of our Sim2Real process. ...
Preprint
Robots and artificial agents that interact with humans should be able to do so without bias and inequity, but facial perception systems have notoriously been found to work more poorly for certain groups of people than others. In our work, we aim to build a system that can perceive humans in a more transparent and inclusive manner. Specifically, we focus on dynamic expressions on the human face, which are difficult to collect for a broad set of people due to privacy concerns and the fact that faces are inherently identifiable. Furthermore, datasets collected from the Internet are not necessarily representative of the general population. We address this problem by offering a Sim2Real approach in which we use a suite of 3D simulated human models that enables us to create an auditable synthetic dataset covering 1) underrepresented facial expressions, outside of the six basic emotions, such as confusion; 2) ethnic or gender minority groups; and 3) a wide range of viewing angles that a robot may encounter a human in the real world. By augmenting a small dynamic emotional expression dataset containing 123 samples with a synthetic dataset containing 4536 samples, we achieved an improvement in accuracy of 15% on our own dataset and 11% on an external benchmark dataset, compared to the performance of the same model architecture without synthetic training data. We also show that this additional step improves accuracy specifically for racial minorities when the architecture's feature extraction weights are trained from scratch.
... The lower the captured features are, the more complex and individual the facial retargeting will be. The Facial Action Coding System (FACS) [4] offers a good compromise as the features, the Facial Action Units (AUs), are independent of the person and describe facial muscle movements. ...
Conference Paper
Full-text available
Virtual characters are a key element in human-computer interaction, as these enhance the communication process via facial expressions, gestures and body postures. The process of creating facial animations is a time-consuming and laborious task. In this paper we present our recent advances in our web-based tools to animate faces. Our tools try improve the control of the facial expression through simple interfaces. We demonstrate that real-time facial retargeting on the web is possible and we present a facial rig based on natural neighbor interpolation between predefined facial expressions.
... Indeed, the vast majority of previous research investigating perception of facial expressions have focused on posed (or fake) emotions (Dawel et al., 2017;Tcherkassof et al., 2013), raising serious doubts about the ecological validity of these studies (Tcherkassof et al., 2013;Barrett et al., 2019;Russell, 1994;Wallbott & Scherer, 1986;Zuckerman et al., 1976;Wallbott, 1990). Spontaneous/genuine and posed/ fake emotional expressions differ in their temporal and morphological characteristics, such as duration, intensity, and asymmetry (Cohn & Schmidt, 2003;Ekman, 1997;Sato & Yoshikawa, 2004;Valstar & Pantic, 2010;Wehrle et al., 2000;Yoshikawa & Sato, 2006). Indeed, posed emotions display stereotypical and exaggerate facial configuration that is rarely met in real life (Barrett et al., 2019). ...
Article
Full-text available
Facial expressions are among the most powerful signals for human beings to convey their emotional states. Indeed, emotional facial datasets represent the most effective and controlled method of examining humans’ interpretation of and reaction to various emotions. However, scientific research on emotion mainly relied on static pictures of facial expressions posed (i.e., simulated) by actors, creating a significant bias in emotion literature. This dataset tries to fill this gap, providing a considerable amount ( N = 1458) of dynamic genuine ( N = 707) and posed ( N = 751) clips of the six universal emotions from 56 participants. The dataset is available in two versions: original clips, including participants’ body and background, and modified clips, where only the face of participants is visible. Notably, the original dataset has been validated by 122 human raters, while the modified dataset has been validated by 280 human raters. Hit rates for emotion and genuineness, as well as the mean, standard deviation of genuineness, and intensity perception, are provided for each clip to allow future users to select the most appropriate clips needed to answer their scientific questions.
... where the projector is consists of a two-layer projection head. Visual Embedding Subnetwork: Both MOSI and MOSEI use Facet to extract facial expression features, which include facial action units and face pose based on the Facial Action Coding System (FACS) [35]. This process is repeated for each sampled frame within the utterance video sequence, which outputs N length visual feature V = [V 1 , V 2 , ..., V N ] ∈ R N ×dv . ...
Preprint
Multimodal sentiment analysis has become an increasingly popular research area as the demand for multimodal online content is growing. For multimodal sentiment analysis, words can have different meanings depending on the linguistic context and non-verbal information, so it is crucial to understand the meaning of the words accordingly. In addition, the word meanings should be interpreted within the whole utterance context that includes nonverbal information. In this paper, we present a Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations (CMSBERT-CLR), which incorporates the whole context's non-verbal and verbal information and aligns modalities more effectively through contrastive learning. First, we introduce a Context-driven Modality Shifting (CMS) to incorporate the non-verbal and verbal information within the whole context of the sentence utterance. Then, for improving the alignment of different modalities within a common embedding space, we apply contrastive learning. Furthermore, we use an exponential moving average parameter and label smoothing as optimization strategies, which can make the convergence of the network more stable and increase the flexibility of the alignment. In our experiments, we demonstrate that our approach achieves state-of-the-art results.
... Duchenne smiles require not just upturned mouths but uplifted cheeks crinkling the skin around the eyes (Ekman et al., 1990). Some investigators classifying laughter (Hofmann et al., 2015;Keltner & Bonanno, 1997) employed certified coders to analyze video using the Facial Action Coding System (Ekman & Rosenberg, 2005), but even they did not operationalize laughter in the same way. Duchenne laughter has been defined as a smile with an open mouth and laughing sounds (Keltner & Bonanno, 1997), a Duchenne smile with a forceful outbreath (Beermann & Ruch, 2011), and a laugh with a symmetrical smile and movement in the muscles around the eyes (Mehu & Dunbar, 2008). ...
Article
Full-text available
Is the physiological act of laughing good for physical health? Or is it a sign of nervousness, people-pleasing agreeableness, or not taking life seriously? A review of the laughter measurement literature reveals that the frequency of laughter rarely has been measured, and never in a sample of adults over 65, the population that might reap the most from any benefit of laughter. We analyzed the correlations of unobtrusively observed laughter, with self-reported physical health, personality, and well-being, during face-to-face interviews with 82 rural, Canadian, senior-housing residents, aged 67 to 95. Twenty-one of them were over 90 years old. Participants laughed 0 to 60 times per hour, averaging 23 laughs per hour. This rate is consistent with observed laughter of younger adults, and much higher than any published self-reported laughter rate, by adults of any age. Laughter frequency was unrelated to physical health, well-being (mental health, satisfaction with life, positive or negative emotional experiences, stress, self-esteem), age, gender, and four of the five-factor-model personality dimensions: neuroticism (which includes nervousness), agreeableness, extraversion, and openness to experience. The fifth factor, conscientiousness, was negatively correlated with laughter, rs = − 0.35, indicating that more conscientious people laughed less. Many of these results were unexpected, based on research showing extraverts laughing more, and older adults and men laughing less than younger adults and women. This study highlights the importance of observation over self-report for accurate measurement of laughter.
... However, individuals may suppress or hide their true emotional expressions in certain social situations, which complicates the interpretation of facial expressions. Brief facial expressions revealed under such voluntary manipulation are often referred to as microexpressions (Ekman and Rosenberg, 2005;Ekman, 2009). As instantaneous expressions within 1/2 to 1/25 of a second, microexpressions are faint and difficult to recognize by the naked eye, but they are believed to reflect a person's true intent, especially those of hostile nature (ten Brinke et al., 2012a). ...
Article
Full-text available
Micro-expressions can reflect an individual’s subjective emotions and true mental state and are widely used in the fields of mental health, justice, law enforcement, intelligence, and security. However, the current approach based on image and expert assessment-based micro-expression recognition technology has limitations such as limited application scenarios and time consumption. Therefore, to overcome these limitations, this study is the first to explore the brain mechanisms of micro-expressions and their differences from macro-expressions from a neuroscientific perspective. This can be a foundation for micro-expression recognition based on EEG signals. We designed a real-time supervision and emotional expression suppression (SEES) experimental paradigm to synchronously collect facial expressions and electroencephalograms. Electroencephalogram signals were analyzed at the scalp and source levels to determine the temporal and spatial neural patterns of micro- and macro-expressions. We found that micro-expressions were more strongly activated in the premotor cortex, supplementary motor cortex, and middle frontal gyrus in frontal regions under positive emotions than macro-expressions. Under negative emotions, micro-expressions were more weakly activated in the somatosensory cortex and corneal gyrus regions than macro-expressions. The activation of the right temporoparietal junction (rTPJ) was stronger in micro-expressions under positive than negative emotions. The reason for this difference is that the pathways of facial control are different; the production of micro-expressions under positive emotion is dependent on the control of the face, while micro-expressions under negative emotions are more dependent on the intensity of the emotion.
... There are numerous methods to measure the quality and magnitude of facial movements during both emotional and nonemotional expressions, most notably the facial action coding system. 10 However, there has been little description of the patterns of motor overflow and cocontractions of the face in healthy persons. ...
Article
Background: Motor overflow refers to involuntary movements that accompany voluntary movements in healthy individuals. This may have a role in synkinesis. Objective: To describe the frequency and magnitude of facial motor overflow in a healthy population. Methodology: Healthy participants performed unilateral facial movements: brow elevation, wink, snarl, and closed smile. Two reviewers analyzed the magnitude of each movement and cocontraction. Patterns of movements are described. Univariate analysis was used to assess the relationship between efficacy of unilateral facial control and the frequency and magnitude of cocontractions. Results: Eighty-nine participants completed the videos. Consensual mirror movements occurred in 96% of participants during unilateral eye closure and 86% during brow elevation. The most common associated movement was ipsilateral eye constriction occurring during snarl (90.1%). Improved unilateral facial control was associated with a decrease in frequency and magnitude of associated movements during brow elevation, wink, and snarl. Conclusion: This study showed stereotyped patterns of motor overflow in facial muscles that resemble those in synkinesis and become more evident as unilateral control of the face decreases.
... The facial feedback hypothesis also asserts that facial expressions are not just emotional expressions, but that the afferent sensory feedback from the facial action can also influence the emotional experience [3]. Therefore, related resources such as EMFACS (emotional FACS), the FACS Investigators' Guide [43] as well as the FACS interpretive database [46,47] have been elaborated to make emotion-based inferences from single and/or combinations of AUs. As suggested in the FACS Investigators' Guide [43], it is possible to map AUs onto the basic emotion categories using a finite number of rules, as we will explain later in this paper. ...
Article
Full-text available
In second-language communication, emotional feedbacks play a preponderant role in instilling positive emotions and thereby facilitating the production of the target language by second-language learners. In contrast, facial expressions help convey emotion, intent, and sometimes even desired actions more effectively. Additionally, according to the facial feedback hypothesis, a major component of several contemporary theories of emotion, facial expressions can regulate emotional behavior and experience. The aim of this study was to determine whether and to what extent emotional expressions reproduced by virtual agents could provide empathetic support to second-language learners during communication tasks. To do so, using the Facial Coding Action System, we implemented a prototype virtual agent that can display a collection of nonverbal feedbacks, including Ekman’ six basic universal emotions and gazing and nodding behaviors. Then, we designed a Wizard of Oz experiment in which second-language learners were assigned independent speaking tasks with a virtual agent. In this paper, we outline our proposed method and report on an initial experimental evaluation which validated the meaningfulness of our approach. Moreover, we present our next steps for improving the system and validating its usefulness through large-scale experiments.
... Visual: OpenFace 2 (Baltrusaitis et al. 2018) is used to extract facial Action Units (AU) features and Rigid and nonrigid facial shape parameters. Facial action unit features are based on the Facial Action Coding System (FACS) (Ekman 1997) which are widely used in human affect analysis. ...
Article
Recognizing humor from a video utterance requires understanding the verbal and non-verbal components as well as incorporating the appropriate context and external knowledge. In this paper, we propose Humor Knowledge enriched Transformer (HKT) that can capture the gist of a multimodal humorous expression by integrating the preceding context and external knowledge. We incorporate humor centric external knowledge into the model by capturing the ambiguity and sentiment present in the language. We encode all the language, acoustic, vision, and humor centric features separately using Transformer based encoders, followed by a cross attention layer to exchange information among them. Our model achieves 77.36% and 79.41% accuracy in humorous punchline detection on UR-FUNNY and MUStaRD datasets -- achieving a new state-of-the-art on both datasets with the margin of 4.93% and 2.94% respectively. Furthermore, we demonstrate that our model can capture interpretable, humor-inducing patterns from all modalities.
... Paul Ekman (1997), precursor en la detección de emociones y expresiones faciales, creó el Sistema de Codificación Facial de Acciones (FACS por sus siglas en inglés), donde clasifica las emociones a través de los movimientos de los músculos faciales. Entre las emociones que logra identificar gracias a las expresiones faciales se encuentran la diversión, desprecio, culpa, satisfacción, orgullo por los logros, entre otras. ...
Chapter
Full-text available
Facial movements can reveal the emotional state of a person, since they can observe and classify in a classroom the development of the chair or training activity in the student: boredom, interest, disinterest or absence for learning. The teacher being in front of a group is able to detect certain emotional states of their students just by looking at their faces. The problem arises when this (teacher) is in front of a monitor and his students are virtualized as avatars that are at a distance either in a national or international scope without existing form of obtaining information about the emotional state or the attention that the student presents in the use of the class. Therefore, this work propases to implement a facial-emotional detection method that helps in the use of teaching within virtual classrooms, without the need to be physically with the student, and that generates only an alert for the teacher when teaching. his chair, this only with the detection of lines of expression and contours of the face, processing them to obtain the information in real time. Firstly, the characteristics of the frame taken by the webcam of the student's computer equipment are extracted, since the student has accepted the use of his webcam, which will generate a vector of values that will be processed in a classifier to determine the emotional state that matches the values obtained.
Chapter
Socially active humanoid robots (SAHRs) are designed to communicate and interact with humans in humancentric environment using speech, movements, gestures, or facial expressions to communicate with their users following some set of social behavior while providing their assistance. Just like humans interact in an adaptive manner with others by changing their speech, tone, and body language intuitively, such type of adaptive behavior can be developed in SAHRs to get a human-like rich interaction capabilities. Therefore, a lot of research work and studies are going on to replicate various behavioral aspects of humans into SAHRs, so that human-robot interaction can be improved further. Besides interacting with humans, humanoid robot should be able to perform the assigned tasks remotely and also in real time with better accuracy. Thus, these social robots designed can be used in a diversified field of applications like education, healthcare, entertainment, communication, constructions, medical, collaborations, hazard management systems, etc.
Chapter
This chapter discusses how affective states are characterized with respect to facial expressions. It presents two major challenges of facial expression analysis, which are to make the analysis invariant to changes in expression intensity and to be robust to changes in pose and large facial movements. The chapter details how facial analysis processes adapt to take these challenges into account. It discusses the different learning databases, their design and their ability to reflect the challenges posed by facial expression analysis in a natural usage context. In addition to facial movements, the intensities of facial movements vary and require that both low and high intensities are processed. Learning databases play an important role in facial expression recognition. The chapter presents the most significant approaches in the literature for macro‐ and micro‐expression recognition. It presents methods proposed in the literature to reduce large displacements and pose variations.
Article
Full-text available
In the present study we investigated the influence of positive and negative arousal situations and the presence of an audience on dogs’ behavioural displays and facial expressions. We exposed dogs to positive anticipation, non-social frustration and social frustration evoking test sessions and measured pre and post-test salivary cortisol concentrations. Cortisol concentration did not increase during the tests and there was no difference in pre or post-test concentrations in the different test conditions, excluding a different level of arousal. Displacement behaviours of “looking away” and “sniffing the environment” occurred more in the frustration-evoking situations compared to the positive anticipation and were correlated with cortisol concentrations. “Ears forward” occurred more in the positive anticipation condition compared to the frustration-evoking conditions, was positively influenced by the presence of an audience, and negatively correlated to the pre-test cortisol concentrations, suggesting it may be a good indicator of dogs’ level of attention. “Ears flattener”, “blink”, “nose lick”, “tail wagging” and “whining” were associated with the presence of an audience but were not correlated to cortisol concentrations, suggesting a communicative component of these visual displays. These findings are a first step to systematically test which subtle cues could be considered communicative signals in domestic dogs.
Preprint
Full-text available
This explorative study of chronic schizophrenic patients aims to clarify whether group art therapy followed by a therapist-guided picture review could influence the patients' communication behavior. Characteristics of voice and speech were obtained via objective technological instruments and selected as indicators of communication behavior. Seven patients were recruited to participate in weekly group art therapy over a period of six months. Three days after each group meeting, they talked about their last picture during a standardized interview that was digitally recorded. The audio documents were evaluated using validated computer-assisted procedures, the transcribed texts using the German version of LIWC2015 , and the voices using the audio analysis software VocEmoApI. The dual methodological approach was intended to form an internal control of the study results. An exploratory factor analysis of the complete sets of output parameters was carried out in the expectation of obtaining disease typical characteristics in speech and voice that map barriers to communication. The parameters of both methods were thus processed into five factors each, i.e., into a quantitative digitized classification of the texts and voices. The scores of the factors were subjected to a linear regression analysis to capture possible process-related changes. Most patients continued to participate in the study. This resulted in high quality data sets for statistical analysis. In answer to the study question, two results were summarized: A text analysis factor called presence proved a potential surrogate parameter for positive language development. Quantitative changes in vocal emotional factors were detected, demonstrating differentiated activation patterns of emotions. These results can presumably be interpreted as an expression of a cathartic healing process. The methods presented in this study make a potentially significant contribution to quantitative research into the effectiveness and mode of action of art therapy.
Article
Recent work in unsupervised learning has focused on efficient inference and learning in latent variables models. Training these models by maximizing the evidence (marginal likelihood) is typically intractable. Thus, a common approximation is to maximize the Evidence Lower BOund (ELBO) instead. Variational autoencoders (VAE) are a powerful and widely-used class of generative models that optimize the ELBO efficiently for large datasets. However, the VAE's default Gaussian choice for the prior imposes a strong constraint on its ability to represent the true posterior, thereby degrading overall performance. A Gaussian mixture model (GMM) would be a richer prior but cannot be handled efficiently within the VAE framework because of the intractability of the Kullback{Leibler divergence for GMMs. We deviate from the common VAE framework in favor of one with an analytical solution for Gaussian mixture prior. To perform efficient inference for GMM priors, we introduce a new constrained objective based on the Cauchy{Schwarz divergence, which can be computed analytically for GMMs. This new objective allows us to incorporate richer, multi-modal priors into the autoencoding framework. We provide empirical studies on a range of datasets and show that our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
Chapter
Wie gestaltet man die Zukunft verantwortungsvoll? Diese Frage stand im Mittelpunkt des 14. Forschungsforums der österreichischen Fachhochschulen im April 2021. Globalisierung, Internationalisierung und Digitalisierung von Wirtschaft und Gesellschaft führen zu vielen neuen Herausforderungen für Individuen und Organisationen. Institutionen im Hochschulsektor stellen wichtige Hebel dar, um globale Herausforderungen zu meistern und eine nachhaltige Entwicklung zu stärken. Die Bereiche Technik und Informationstechnologie, Wirtschafts-, Sozial- und Gesundheitswissenschaften an den Fachhochschulen als wichtige Bildungsinstitutionen sind daher seit einigen Jahren aufgefordert, sich gemeinsam und verantwortungsvoll der Zukunftsgestaltung anzunehmen. Sie tun dies in ganz unterschiedlichen Bereichen, haben viele Facetten, greifen ineinander – und dies bleibt nicht folgenlos. Diese Vielfalt findet sich in den 17 Beiträgen des Sammelbandes wieder.
Article
Full-text available
Smiles are universal but nuanced facial expressions that are most frequently used in face-to-face communications, typically indicating amusement but sometimes conveying negative emotions such as embarrassment and pain. Although previous studies have suggested that spatial and temporal properties could differ among these various types of smiles, no study has thoroughly analyzed these properties. This study aimed to clarify the spatiotemporal properties of smiles conveying amusement, embarrassment, and pain using a spontaneous facial behavior database. The results regarding spatial patterns revealed that pained smiles showed less eye constriction and more overall facial tension than amused smiles; no spatial differences were identified between embarrassed and amused smiles. Regarding temporal properties, embarrassed and pained smiles remained in a state of higher facial tension than amused smiles. Moreover, embarrassed smiles showed a more gradual change from tension states to the smile state than amused smiles, and pained smiles had lower probabilities of staying in or transitioning to the smile state compared to amused smiles. By comparing the spatiotemporal properties of these three smile types, this study revealed that the probability of transitioning between discrete states could help distinguish amused, embarrassed, and pained smiles.
Chapter
Human emotion recognition is an active research topic in analysing the emotional state of humans over the past few decades. It is still a challenging task in artificial intelligence and human–computer interaction due to its high intra-class variation. Facial emotion analysis achieved more appreciation in academic and commercial potential challenges mainly in the field of behaviour prediction and recommendation systems. This paper proposes a novel scattering approach for recognizing facial dynamics using image sequences. Initially, we extract the temporal information from the facial frame by applying a saliency map and hyper-complex Fourier transform (HFT). Later the extracted high-level features are fed to the scattering transform method and machine learning algorithms to classify the seven emotions from the MUG dataset. The performance of proposed wavelet scattering network was evaluated on four different machine learning algorithms and achieved a high rate of recognition accuracy in all classes. In the experimental results, K-NN exhibits the proposed architecture’s effectiveness with an accuracy rate of 97% for the MUG dataset, 95.7% for SVM, 93.7% for decision tree and 91.2% naive Bayes, respectively. KeywordsFacial emotion analysisHyper-complex Fourier transformFeature descriptorsWavelet scattering
Chapter
Facial analysis is an active research topic in examining the emotional state of humans over the past few decades. It is still a challenging task in computer vision due to its high intra-class variation, head pose, suitable environment conditions like lighting and illumination factors in behaviour prediction and recommendation systems. This paper proposes a novel facial emotion representation approach based on dense descriptors for recognizing facial dynamics on image sequences. Initially, the face is detected using the Haar cascade classifer to extract the temporal information from the facial frame by applying a scale invariant feature transform by combining a bag of visual words. Later, the extracted high-level features are fed to machine learning algorithms to classify the seven emotions from the MUG dataset. The proposed dense SIFT clustering performance was evaluated on four different machine learning algorithms and achieved a high rate of recognition accuracy in all classes. In the experimental results, K-NN exhibits the proposed architecture’s effectiveness with an accuracy rate of 91.8% for the MUG dataset, 89% for SVM, 87.6% for Naive Bayes, and 85.7% decision tree, respectively.
ResearchGate has not been able to resolve any references for this publication.