Article

Automatic Affect Perception Based on Body Gait and Posture: A Survey

Authors:
  • Meta Reality Labs Research
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

There has been a growing interest in machine-based recognition of emotions from body gait and its combination with other modalities. In order to highlight the major trends and state of the art in this area, the literature dealing with machine-based human emotion perception through gait and posture is explored. Initially the effectiveness of human intellect and intuition in perceiving emotions in a range of cultures is examined. Subsequently, major studies in machine-based affect recognition are reviewed and their performance is compared. The survey concludes by critically analysing some of the issues raised in affect recognition using gait and posture, and identifying gaps in the current understanding in this area.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... According to these models, the underline feelings are hardwired and they are common across all cultures, a concept though that has been debated. Several reviews highlighted the fact that age, gender as well as culture/language shape feelings and their intensity [14,[53][54][55]. These factors are important both for designing appropriate experiments and meaningfully interpret the results across studies. ...
... Dimensional models can have two or more dimensions. Among the most popular ones are the Circumplex, and the PAD (Pleasure, Arousal, Dominance) model [54,55]. The Circumplex model describes feeling based on two dimensions, namely valence/pleasure and arousal. ...
... Fig. 3(a) shows how different emotion patterns are mapped continuously to a twodimensional Circumplex model. This representation could link more intuitively to physiological signals that relate to arousal and valence [54,55]. Several studies of body movements with relation to emotions also highlighted that the largest variance is encoded along the arousal dimension [56][57][58]. ...
Article
Full-text available
Mood disorders affect more than 300 million people worldwide and can cause devastating consequences. Elderly people and patients with neurological conditions are particularly susceptible to depression. Gait and body movements can be affected by mood disorders, and thus they can be used as a surrogate sign, as well as an objective index for pervasive monitoring of emotion and mood disorders in daily life. Here we review evidence that demonstrates the relationship between gait, emotions and mood disorders, highlighting the potential of a multimodal approach that couples gait data with physiological signals and home-based monitoring for early detection and management of mood disorders. This could enhance self-awareness, enable the development of objective biomarkers that identify high risk subjects and promote subject-specific treatment.
... Emotions are also expressed nonverbally through body movement as surveyed in [3,[16][17][18][19]. Affect expressive movement is categorised into four types: communicative (e.g., gestures), functional (e.g., walking), artistic (e.g., choreography), and abstract (e.g., arm lifting), where a single or a combination of these types represents affect [3]. ...
... Gait patterns, in particular, were analysed for recognising certain emotion expressions [27][28][29]. Raw data and processed features from motion capture, video cameras, and Kinect were deployed for automatic emotion recognition from gait analysis as surveyed in [18], where the results vary between studies given the variations of the studies' goals and the emotions recognised. However, to the best of our knowledge, no study analysed gait and body movement for the automatic detection of deception or guilt. ...
Article
Full-text available
Detecting deceptive behaviour for surveillance and border protection is critical for a country’s security. With the advancement of technology in relation to sensors and artificial intelligence, recognising deceptive behaviour could be performed automatically. Following the success of affective computing in emotion recognition from verbal and nonverbal cues, we aim to apply a similar concept for deception detection. Recognising deceptive behaviour has been attempted; however, only a few studies have analysed this behaviour from gait and body movement. This research involves a multimodal approach for deception detection from gait, where we fuse features extracted from body movement behaviours from a video signal, acoustic features from walking steps from an audio signal, and the dynamics of walking movement using an accelerometer sensor. Using the video recording of walking from the Whodunnit deception dataset, which contains 49 subjects performing scenarios that elicit deceptive behaviour, we conduct multimodal two-category (guilty/not guilty) subject-independent classification. The classification results obtained reached an accuracy of up to 88% through feature fusion, with an average of 60% from both single and multimodal signals. Analysing body movement using single modality showed that the visual signal had the highest performance followed by the accelerometer and acoustic signals. Several fusion techniques were explored, including early, late, and hybrid fusion, where hybrid fusion not only achieved the highest classification results, but also increased the confidence of the results. Moreover, using a systematic framework for selecting the most distinguishing features of guilty gait behaviour, we were able to interpret the performance of our models. From these baseline results, we can conclude that pattern recognition techniques could help in characterising deceptive behaviour, where future work will focus on exploring the tuning and enhancement of the results and techniques.
... Also some methods were developed for accumulating features or cues from the frames of an image given in a specific time period. When modelling temporal dynamics, statistical functionals are typically utilised as an aggregation strategy [8], [21], [27], [23]. Authors in [27] proposed a model that uses analysis of human postures and body motion to predict engagement levels of children. ...
... When modelling temporal dynamics, statistical functionals are typically utilised as an aggregation strategy [8], [21], [27], [23]. Authors in [27] proposed a model that uses analysis of human postures and body motion to predict engagement levels of children. In their model, temporal series features are aggregated and transformed into meta features using min, max, mean, and normalised histogram functions. ...
... Vision data is commonly used to perceive human emotion. Emotion perception can be based on body movements [28], body posture [96], or facial expression. Vision-based features are extracted from raw images. ...
... Usually, the emotion perception model learns a mapping from the selected features to emotional states [28]. According to [96], perception models based on human gait and posture are in the early development stages. Therefore, they produce results that have lower accuracy than the models based on human facial expression features. ...
Article
Full-text available
Emotions are important in many aspects of human behavior. Emotions are displayed on human faces, they are reflected in human memory, and they even influence human intelligence. The creation of robots that try to mimic humans arises the question of how the concept of emotion can be transferred to robots. There is no unique answer to the question, however many robots that leverage emotions exist. By summarizing the work done on these robots we try to enlighten the relations between robots and emotions from several perspectives. We first identify how artificial emotion can be defined in a robotic system. Next, we investigate the possible roles of emotions in robotic behavior models and analyze different implementations of the concept of emotion in these models. Finally, we elaborate on the evaluation of how emotions influence human-robot interaction. For this purpose, we qualitatively analyzed a selected set of robots that include emotions in their model. Considering the diversity of state-of-the-art approaches to using emotions in robots, we try to present the findings in a structured and comprehensive way that could be valuable for future researchers.
... Even if emotion recognition from gait and posture still seems to be at an early stage, a thorough review on the matter is presented by Stephens-Fripp et al. [10] as well as cultural similarities in emotional recognition by people from different backgrounds. ML methods reported in this review can be found in Table 1. ...
... Jaco robotic arm, Nao [33] Head pose estimation Discriminative random regression forest RGB-D camera [10] ...
Article
Full-text available
Purpose of Review As intelligent robots enter our daily routine, it is important to be equipped with proper adaptable social perception and explainable behaviours. To do so, machine learning (ML) is often employed. This paper intends to find a trend in the way ML methods are used and applied to model human social perception and produce explainable robot behaviours. Recent Findings The literature has shown a substantial advancement in ML methods with application to social perception and explainable behaviours. There are papers which report models for robots to imitate humans and also for humans to imitate robots. Others use classical methods and propose new and/or improved ones which led to better human-robot interaction performances. Summary This paper reports a review on social perception and explainable behaviours based on ML methods. First, we present literature background on these three research areas and finish with a discussion on limitations and future research venues.
... The fields of psychology and more recently those such as cognitive science, have developed a range of models in this area. The most recent models rely on AI through use of signal processing, machine learning, natural language processing (NLP) and other modern implementations [11]. ...
... Currently, there is significant research in progress that focuses primarily on facial expressions, NLP and voice characteristics in a bid to help machines to learn to achieve emotion detection and recognition in a way that emulates human ability [8]. Research has also indicated that analysis of gait and posture may also reveal indicators of emotion [11]. ...
Conference Paper
Full-text available
Ninety percent of an iceberg is said to reside below the surface, in the hidden depths of the water, leaving only ten percent to be easily observed. In this paper the authors posit that many human emotion indicators emulate this trait, residing within the inferential data from interactions with popular IoT devices and applications. The visible 'tip of the iceberg' encapsulates the most widely studied "tells" of emotion in the form of facial analysis, natural language processing and voice analysis. These provide a discrete frozen snapshot of a person's emotional disposition. This paper presents the hypothesis that below the surface lies a largely untapped, vast resource of submerged data that may be used to infer the emotional state of an individual. The phenomenon of the Internet of Things has cultivated a societal shift where sensors and applications gather data relating to every facet of daily life. This data is centralized by hub devices such as Voice Command Devices and accessible via Intelligent Assistants such as the Amazon Echo and Alexa. Emotographic Modelling is a new concept rendering how human emotional state may be gleaned from the raft of digital indicators available from these hubs. The 'Emotographic' classifications generated are constituted by study of the statistical data relating to digital emotion indicators. By utilizing the IoT, the Cloud and Machine Learning, the inferential depths of the iceberg may be explored to provide insight into sleep, diet, exercise and other routines and habits. The complex "hidden" portion of the Emotographic Iceberg may reveal patterns that indicate emotion over a continuous timescale. Changes in these patterns may allow for a more sagacious comprehension of an individual's state of mind for healthcare clinicians and marketers. Preliminary testing is outlined in which the authors demonstrate how the emotion of sadness may be inferred from a range of questions asked to an IoT connected Amazon Echo Voice Command Device.
... Currently, there is significant research in progress that focuses primarily on facial expressions, NLP and voice characteristics in a bid to help machines to learn to achieve emotion detection and recognition in a way that emulates human ability [17]. Research has also indicated that analysis of gait and posture may also reveal indicators of emotion [18]. Additionally, studies of human behaviour and social interactions may also contain clues relating to the emotional state of a person. ...
... Additionally, studies of human behaviour and social interactions may also contain clues relating to the emotional state of a person. EEG, ECG and MRI medical technologies facilitate the scanning of brains and bodies, these enable researchers and medical professionals to detect emotion associated neural signals that allow emotion detection and classification beyond the instinctive and learned mechanisms invoked by humans [18]. Although there are many indicators that may be examined when using modern technology to detect and classify emotion, current successful models rely heavily on facial expression and textual analysis. ...
Conference Paper
Full-text available
Voice Command Devices (VCD) such as the ubiquitous Amazon Echo have entered the lives and homes of people the world over. Although emotional recognition and digital empathy are idealistic goals within the development of AI and intelligent agents, the current technology available lacks outward emotional understanding and the personas contribute only Alexithymic (no understanding of emotion) responses. Despite extensive research by large multinational technological organizations, authentic human-like empathic interactions with intelligent agents have not yet been achieved. Consequently, users are lulled into a false sense of security where they believe that their emotions remain private. This paper determines that despite Alexa's demonstrated lack of emotion and emotional understanding, Voice Command Devices such as the Amazon Echo have the ability to deduce emotions such as sadness through inferential data. This is displayed through responses to questions that offer the same information as those posed by health practitioners to establish potential cases of depression. This type of data paves the way for parent companies to effectively target future advertising and build EMOTOgraphic models. As users are presented with no indication of this by intelligent agents, most would be unaware that combined inferential data could be so revealing and potentially extremely profitable from a sales and marketing perspective. This potentially leads to great ethical and privacy concerns as intelligent agents such as Alexa are gradually and incrementally cured of Alexithymia indicators.
... Zacharatos et al. in [50] focus on the importance of movement segmentation, since several gestures and emotion may appear in a long movement. Fripp in [51] complete the review by integrating methods for recognizing expression in static posture. Larboulette et al. in [52] provide a review of computable expressive descriptors in human motion. ...
Article
Full-text available
Many areas in computer science are facing the need to analyze, quantify and reproduce movements expressing emotions. This paper presents a systematic review of the intelligible factors involved in the expression of emotions in human movement and posture. We have gathered the works that have studied and tried to identify these factors by sweeping many disciplinary fields such as psychology, biomechanics, choreography, robotics and computer vision. These researches have each used their own definitions, units and emotions, which prevents a global and coherent vision. We propose a meta-analysis approach that cross-references and aggregates these researches in order to have a unified list of expressive factors quantified for each emotion. A calculation method is then proposed for each of the expressive factors and we extract them from an emotionally annotated animation dataset: Emilya. The comparison between the results of the meta-analysis and the Emilya analysis reveals high correlation rates, which validates the relevance of the quantified values obtained by both methodologies. The analysis of the results raises interesting perspectives for future research in affective computing.
... Besides, some researchers have found that head pose are another essential factor driving reliably for learners' affection. Stephens-Fripp et al. (2017) carried out a research to identify learners' affections and obtain a reliable recognition rate based on body gait and pose. Furthermore, some researchers employed physiological parameters to recognize students' learning affections. ...
Article
Full-text available
Affective computing (AC) has been regarded as a relevant approach to identifying online learners’ mental states and predicting their learning performance. Previous research mainly used one single-source data set, typically learners’ facial expression, to compute learners’ affection. However, a single facial expression may represent different affections in various head poses. This study proposed a dual-source data approach to solve the problem. Facial expression and head pose are two typical data sources that can be captured from online learning videos. The current study collected a dual-source data set of facial expressions and head poses from an online learning class in a middle school. A deep learning neural network using AlexNet with an attention mechanism was developed to verify the syncretic effect on affective computing of the proposed dual-source fusion strategy. The results show that the dual-source fusion approach significantly outperforms the single-source approach based on the AC recognition accuracy between the two approaches (dual-source approach using Attention-AlexNet model 80.96%; single-source approach, facial expression 76.65% and head pose 64.34%). This study contributes to the theoretical construction of the dual-source data fusion approach, and the empirical validation of the effect of the Attention-AlexNet neural network approach on affective computing in online learning contexts.
... However, using human observers is time consuming and not sufficiently consistent for use in real-world applications. Automatic emotion recognition, which is a more suitable and accurate approach, has thus been developed (Stephens-Fripp et al., 2017). Most publicly available methods nowadays use facial expressions as features for emotion analysis and prediction. ...
Article
Full-text available
Emotion recognition is useful in many applications such as preventing crime or improving customer satisfaction. Most of current methods are performed using facial features, which require close-up face information. Such information is difficult to capture with normal security cameras. The advantage of using gait and posture over conventional biometrics such as facial features is that gaits and postures can be obtained unobtrusively from faraway, even in a noisy environment. This study aims to investigate and analyze the relationship between human emotions and their gaits or postures. We collected a dataset made from the input of 49 participants for our experiments. Subjects were instructed to walk naturally in a circular walking path, while watching emotion-inducing videos on Microsoft HoloLens 2 smart glasses. An OptiTrack motion-capturing system was used for recording the gaits and postures of participants. The angles between body parts and walking straightness were calculated as features for comparison of body-part movements while walking under different emotions. Results of statistical analyses show that the subjects' arm swings are significantly different among emotions. And the arm swings on one side of the body could reveal subjects' emotions more obviously than those on the other side. Our results suggest that the arm movements together with information of arm side and walking straightness can reveal the subjects' current emotions while walking. That is, emotions of humans are unconsciously expressed by their arm swings, especially by the left arm, when they are walking in a non-straight walking path. We found that arm swings in happy emotion are larger than arm swings in sad emotion. To the best of our knowledge, this study is the first to perform emotion induction by showing emotion-inducing videos to the participants using smart glasses during walking instead of showing videos before walking. This induction method is expected to be more consistent and more realistic than conventional methods. Our study will be useful for implementation of emotion recognition applications in real-world scenarios, since our emotion induction method and the walking direction we used are designed to mimic the real-time emotions of humans as they walk in a non-straight walking direction.
... Nonverbal behaviors are also expressed through body movement as surveyed in Kleinsmith and Bianchi-Berthouze (2013), Karg et al. (2013), Zacharatos et al. (2014), Stephens-Fripp et al. (2017) and Noroozi et al. (2018). Expressive movement is categorized into four types: communicative (e.g., gestures), functional (e.g., walking), artistic (e.g., choreography), and abstract (e.g., arm lifting), where a single or a combination of these types represents an affect (Karg et al. 2013). ...
Article
Full-text available
Despite the increase in awareness and support for mental health, college students’ mental health is reported to decline every year in many countries. Several interactive technologies for mental health have been proposed and are aiming to make therapeutic service more accessible, but most of them only provide one-way passive contents for their users, such as psycho-education, health monitoring, and clinical assessment. We present a robotic coach that not only delivers interactive positive psychology interventions but also provides other useful skills to build rapport with college students. Results from our on-campus housing deployment feasibility study showed that the robotic intervention showed significant association with increases in students’ psychological well-being, mood, and motivation to change. We further found that students’ personality traits were associated with the intervention outcomes as well as their working alliance with the robot and their satisfaction with the interventions. Also, students’ working alliance with the robot was shown to be associated with their pre-to-post change in motivation for better well-being. Analyses on students’ behavioral cues showed that several verbal and nonverbal behaviors were associated with the change in self-reported intervention outcomes. The qualitative analyses on the post-study interview suggest that the robotic coach’s companionship made a positive impression on students, but also revealed areas for improvement in the design of the robotic coach. Results from our feasibility study give insight into how learning users’ traits and recognizing behavioral cues can help an AI agent provide personalized intervention experiences for better mental health outcomes
... Similar results were found in a study where emotional walking was used to animate virtual avatars in different virtual scenarios (e.g., a park, street, and garden) (Randhavane et al., 2019b). Previous studies in kinematic-based movement analysis and affective computing returned that both postural and kinematic features are essential for an accurate description of the individual's affective states (Kleinsmith and Bianchi-Berthouze, 2013;McColl and Nejat, 2014;Stephens-Fripp et al., 2017). In this regard, valence and arousal are typically associated with different body features and are considered crucial characteristics to describe the human affective experience on continuous and dimensional scales (Lindquist et al., 2012;Kuppens et al., 2013;Kragel and LaBar, 2016). ...
Article
Full-text available
Dynamic virtual representations of the human being can communicate a broad range of affective states through body movements, thus effectively studying emotion perception. However, the possibility of modeling static body postures preserving affective information is still fundamental in a broad spectrum of experimental settings exploring time-locked cognitive processes. We propose a novel automatic method for creating virtual affective body postures starting from kinematics data. Exploiting body features related to postural cues and movement velocity, we transferred the affective components from dynamic walking to static body postures of male and female virtual avatars. Results of two online experiments showed that participants coherently judged different valence and arousal levels in the avatar’s body posture, highlighting the reliability of the proposed methodology. In addition, esthetic and postural cues made women more emotionally expressive than men. Overall, we provided a valid methodology to create affective body postures of virtual avatars, which can be used within different virtual scenarios to understand better the way we perceive the affective state of others.
... Other examples are [25,31,75] whose reviews were on human action/activity recognition datasets. Only two datasets reviews ( [29,41]) touch on datasets that can be used for affect modelling (automatic affect recognition [42,50,55,65,78,89,91] in particular). ...
Article
Movement dataset reviews exist but are limited in coverage, both in terms of size and research discipline. While topic-specific reviews clearly have their merit, it is critical to have a comprehensive overview based on a systematic survey across disciplines. This enables higher visibility of datasets available to the research communities and can foster interdisciplinary collaborations. We present a catalogue of 704 open datasets described by 10 variables that can be valuable to researchers searching for secondary data: name and reference, creation purpose, data type, annotations, source, population groups, ordinal size of people captured simultaneously, URL, motion capture sensor, and funders. The catalogue is available in the supplementary materials. We provide an analysis of the datasets and further review them under the themes of human diversity, ecological validity, and data recorded. The resulting 12-dimension framework can guide researchers in planning the creation of open movement datasets. This work has been the interdisciplinary effort of researchers across affective computing, clinical psychology, disability innovation, ethnomusicology, human-computer interaction, machine learning, music cognition, music computing, and movement neuroscience.
... Furthermore, the same authors also studied the harder problem of emotion detection from connected action sequences (Bernhardt and Robinson, 2009). More recent work on emotion recognition from body movements of single individuals continued to investigate the recognition of emotions displayed in walking (Stephens-Fripp et al., 2017; and the contribution of different pose-based cues on emotion classification during motion captured daily activities (Fourati et al., 2019), as well as the development of real-time systems (Wang et al., 2015b). ...
... As emotion recognition using these modalities is generally less effective than FER, many researchers are using multimodal approaches for one or more information channels and developing early and late fusion methods. A survey of publications in this domain can be found, among others in [18][19][20][21]. ...
Article
Full-text available
In recent years, emotion recognition algorithms have achieved high efficiency, allowing the development of various affective and affect-aware applications. This advancement has taken place mainly in the environment of personal computers offering the appropriate hardware and sufficient power to process complex data from video, audio, and other channels. However, the increase in computing and communication capabilities of smartphones, the variety of their built-in sensors, as well as the availability of cloud computing services have made them an environment in which the task of recognising emotions can be performed at least as effectively. This is possible and particularly important due to the fact that smartphones and other mobile devices have become the main computer devices used by most people. This article provides a systematic overview of publications from the last 10 years related to emotion recognition methods using smartphone sensors. The characteristics of the most important sensors in this respect are presented, and the methods applied to extract informative features on the basis of data read from these input channels. Then, various machine learning approaches implemented to recognise emotional states are described.
... Different from that survey, our study focuses on a generic target group with physically and mentally healthy individuals in this work. Another survey by Stephens-Fripp et al. [23] focused on emotion detection based on gait and posture, but it discussed gait and posture together and emphasized more on the posture part whereas our current paper is all around gait and shares the details of how gait can be affected by emotions and the process of gaitbased emotion detection. Moreover, we also introduce emotion models in current emotion theory and dynamics of gait. ...
Preprint
Human gait refers to a daily motion that represents not only mobility, but it can also be used to identify the walker by either human observers or computers. Recent studies reveal that gait even conveys information about the walker's emotion. Individuals in different emotion states may show different gait patterns. The mapping between various emotions and gait patterns provides a new source for automated emotion recognition. Compared to traditional emotion detection biometrics, such as facial expression, speech and physiological parameters, gait is remotely observable, more difficult to imitate, and requires less cooperation from the subject. These advantages make gait a promising source for emotion detection. This article reviews current research on gait-based emotion detection, particularly on how gait parameters can be affected by different emotion states and how the emotion states can be recognized through distinct gait patterns. We focus on the detailed methods and techniques applied in the whole process of emotion recognition: data collection, preprocessing, and classification. At last, we discuss possible future developments of efficient and effective gait-based emotion recognition using the state of the art techniques on intelligent computation and big data.
... Expressive robotic systems have been shown to be important in human robot interaction [1][2][3][4][5]; for example, movable facial features enabled the robot Kismet to interact with human counterparts. One approach to creating expressive bipedal walkers has been to add faces to existing humanoid platforms [6][7][8][9][10][11][12][13]. This augmentation uses facial expression, similar to Kismet, to indicate internal state of the system. ...
Article
Full-text available
Humans are efficient, yet expressive in their motion. Human walking behaviors can be used to walk across a great variety of surfaces without falling and to communicate internal state to other humans through variable gait styles. This provides inspiration for creating similarly expressive bipedal robots. To this end, a framework is presented for stylistic gait generation in a compass-like under-actuated planar biped model. The gait design is done using model-based trajectory optimization with variable constraints. For a finite range of optimization parameters, a large set of 360 gaits can be generated for this model. In particular, step length and cost function are varied to produce distinct cyclic walking gaits. From these resulting gaits, 6 gaits are identified and labeled, using embodied movement analysis, with stylistic verbs that correlate with human activity, e.g., “lope” and “saunter”. These labels have been validated by conducting user studies in Amazon Mechanical Turk and thus demonstrate that visually distinguishable, meaningful gaits are generated using this framework. This lays groundwork for creating a bipedal humanoid with variable socially competent movement profiles.
... Researchers develop a framework for producing traits, affect, mood, and emotion on humanoid robots [48]. Just as important as the generation side is the task of recognizing affect in human counterparts, which is an extensively studied topic (especially inside the field of computer vision) by using body posture and gait of humans [49]. This topic is studied through joint kinematics as well [50]. ...
Article
Full-text available
As more and more robots move into social settings, humans will be monitoring the external motion profile of counterparts in order to make judgments about the internal state of these counterparts. This means that generating motion with an understanding of how humans will interpret it is paramount. This paper investigates the connection between environmental context, stylized gaits, and perception via a model of affect parameterized by valence and arousal. The predictive model proposed indicates that, for the motion stimuli used, environmental context has a larger influence on valence and style of walking has a larger influence on arousal. This work expands on previous research in affect recognition by exploring the critical relationship between environmental context, stylized gait, and affective perception. The results of this work indicate that social behavior of robots may be informed by environmental context for improved performance.
... Expressive robotic systems have been shown to be important in human robot interaction [1]- [5]; for example, movable facial features enabled the robot Kismet to interact with human counterparts. One approach to creating expressive bipedal walkers has been to add faces to existing humanoid platforms [6]- [13]. This augmentation uses facial expression, similar to Kismet, to indicate internal state of the system. ...
Preprint
Humans are efficient, yet expressive in their motion. Human walking behaviors can be used to walk across a great variety of surfaces without falling and to communicate internal state to other humans through variable gait styles. This provides inspiration for creating similarly expressive bipedal robots. To this end, a framework is presented for stylistic gait generation in a compass-like under-actuated planar biped model. The gait design is done using model-based trajectory optimization with variable constraints. For a finite range of optimization parameters, a large set of 360 gaits can be generated for this model. In particular, step length and cost function are varied to produce distinct cyclic walking gaits. From these resulting gaits, 6 gaits are identified and labeled, using embodied movement analysis, with stylistic verbs that correlate with human activity, e.g., "lope" and "saunter". These labels have been validated by conducting user studies in Amazon Mechanical Turk and thus demonstrate that visually distinguishable, meaningful gaits are generated using this framework. This lays groundwork for creating a bipedal humanoid with variable socially competent movement profiles.
Chapter
In this chapter we introduce the specific nonverbal channel of posture and describe how it functions in the creation of ongoing relationships at the choice, beginning, deepening, and ending phases of interaction. We suggest that posture has been largely overlooked as a contributor to relationship outcomes and provide evidence to support our claim. Because of its importance to the initiation and deepening of relationships, programs to help individuals gain skill in the communication of this nonverbal channel are needed, and one possible option is described in some depth. Continued research is necessary to accurately assess postural skill and how to improve it in children and adults.
Article
Human gait refers to a daily motion that represents not only mobility but can also be used to identify the walker by either human observers or computers. Recent studies reveal that gait even conveys information about the walker’s emotion. Individuals in different emotion states may show different gait patterns. The mapping between various emotions and gait patterns provides a new source for automated emotion recognition. Compared to traditional emotion detection biometrics, such as facial expression, speech, and physiological parameters, gait is remotely observable, more difficult to imitate, and requires less cooperation from the subject. These advantages make gait a promising source for emotion detection. This article reviews current research on gait-based emotion detection, particularly on how gait parameters can be affected by different emotion states and how the emotion states can be recognized through distinct gait patterns. We focus on the detailed methods and techniques applied in the whole process of emotion recognition: data collection, preprocessing, and classification. Finally, we discuss possible future developments of efficient and effective gait-based emotion recognition using state-of-the-art techniques in intelligent computation and big data.
Preprint
Full-text available
The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted.
Article
Full-text available
Actors are required to engage in multimodal modulations of their body, face, and voice in order to create a holistic portrayal of a character during performance. We present here the first trimodal analysis, to our knowledge, of the process of character portrayal in professional actors. The actors portrayed a series of stock characters (e.g., king, bully) that were organized according to a predictive scheme based on the two orthogonal personality dimensions of assertiveness and cooperativeness. We used 3D motion capture technology to analyze the relative expansion/contraction of 6 body segments across the head, torso, arms, and hands. We compared this with previous results for these portrayals for 4 segments of facial expression and the vocal parameters of pitch and loudness. The results demonstrated significant cross-modal correlations for character assertiveness (but not cooperativeness), as manifested collectively in a straightening of the head and torso, expansion of the arms and hands, lowering of the jaw, and a rise in vocal pitch and loudness. These results demonstrate what communication theorists refer to as “multichannel reinforcement”. We discuss this reinforcement in light of both acting theories and theories of human communication more generally.
Article
Social scientists increasingly use video data, but large-scale analysis of its content is often constrained by scarce manual coding resources. Upscaling may be possible with the application of automated coding procedures, which are being developed in the field of computer vision. Here, we introduce computer vision to social scientists, review the state-of-the-art in relevant subfields, and provide a working example of how computer vision can be applied in empirical sociological work. Our application involves defining a ground truth by human coders, developing an algorithm for automated coding, testing the performance of the algorithm against the ground truth, and running the algorithm on a large-scale dataset of CCTV images. The working example concerns monitoring social distancing behavior in public space over more than a year of the COVID-19 pandemic. Finally, we discuss prospects for the use of computer vision in empirical social science research and address technical and ethical challenges.
Technical Report
Full-text available
It is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living (AAL) technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one's activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters (e.g., heart rate, respiratory rate). Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals' activities and health status can derive from processing audio signals (e.g., speech recordings). Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach. This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users. The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted. The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed.
Article
Human gait conveys significant information that can be used for identity recognition and emotion recognition. Recent studies have focused more on gait identity recognition than emotion recognition and regarded these two recognition tasks as independent and unrelated. How to train a unified model to effectively recognize the identity and emotion from gait at the same time is a novel and challenging problem. In this paper, we propose a novel Attention Enhanced Temporal Graph Convolutional Network (AT-GCN) for gait-based recognition and motion prediction. Enhanced by spatial and temporal attention, the proposed model can capture discriminative features in spatial dependency and temporal dynamics. We also present a multi-task learning architecture, which can jointly learn representations for multiple tasks. It helps the emotion recognition task with limited data considerably benefit from the identity recognition task and helps the recognition tasks benefit from the auxiliary prediction task. Furthermore, we present a new dataset (EMOGAIT) that consists of 1, 440 real gaits, annotated with identity and emotion labels. Experimental results on two datasets demonstrate the effectiveness of our approach and show that our approach achieves substantial improvements over mainstream methods for identity recognition and emotion recognition.
Article
Electroencephalogram (EEG) emotion recognition based on a hybrid feature extraction method in Empirical Mode Decomposition (EMD) domain combining with optimal feature selection based on Sequence Backward Selection (SBS) is proposed, which can reflect subtle information of multi-scale components of unstable and non-linear EEG signals and remove the reductant features to improve the performance of emotion recognition. The proposal is tested on DEAP dataset, in which the emotional states in the Valance dimension and Arousal dimension are classified by both K-nearest neighbor (KNN) and support vector machine (SVM), respectively. In the experiments, temporal windows of different length and three kinds of rhythms of EEG signal are taken into account for comparison, from which the results show that EEG signal with 1s temporal window achieves highest recognition accuracy of 86.46% in Valence dimension and 84.90% in Arousal dimension, respectively, which is superior to some state-of-the-art works. The proposed method would be applied to real-time emotion recognition in multimodal emotional communication based humans-robots interaction system. IEEE
Article
Full-text available
At the heart of emotion, mood, and any other emotionally charged event are states experienced as simply feeling good or bad, energized or enervated. These states - called core affect - influence reflexes, perception, cognition, and behavior and are influenced by many causes internal and external, but people have no direct access to these causal connections. Core affect can therefore be experienced as free-floating (mood) or can be attributed to some cause (and thereby begin an emotional episode). These basic processes spawn a broad framework that includes perception of the core-affect-altering properties of stimuli, motives, empathy, emotional meta-experience, and affect versus emotion regulation; it accounts for prototypical emotional episodes, such as fear and anger, as core affect attributed to something plus various nonemotional processes.
Article
Full-text available
Automatic emotion recognition is of great value in many applications, however, to fully display the application value of emotion recognition, more portable, non-intrusive, inexpensive technologies need to be developed. Human gaits could reflect the walker?s emotional state, and could be an information source for emotion recognition. This paper proposed a novel method to recognize emotional state through human gaits by using Microsoft Kinect, a low-cost, portable, camera-based sensor. Fifty-nine participants? gaits under neutral state, induced anger and induced happiness were recorded by two Kinect cameras, and the original data were processed through joint selection, coordinate system transformation, sliding window gauss filtering, differential operation, and data segmentation. Features of gait patterns were extracted from 3-dimentional coordinates of 14 main body joints by Fourier transformation and Principal Component Analysis (PCA). The classifiers NaiveBayes, RandomForests, LibSVM and SMO (Sequential Minimal Optimization) were trained and evaluated, and the accuracy of recognizing anger and happiness from neutral state achieved 80.5% and 75.4%. Although the results of distinguishing angry and happiness states were not ideal in current study, it showed the feasibility of automatically recognizing emotional states from gaits, with the characteristics meeting the application requirements.
Conference Paper
Full-text available
In this paper, we present an emotion recognition methodology that utilizes information extracted from body motion analysis to assess affective state during gameplay scenarios. A set of kinematic and geometrical features are extracted from joint-oriented skeleton tracking and are fed to a deep learning network classifier. In order to evaluate the performance of our methodology, we created a dataset with Microsoft Kinect recordings of body motions expressing the five basic emotions (anger, happiness, fear, sadness and surprise) which are likely to appear in a gameplay scenario. In this five emotions recognition problem, our methodology outperformed all other classifiers, achieving an overall recognition rate of 93 %. Furthermore, we conducted a second series of experiments to perform a qualitative analysis of the features and assess the descriptive power of different groups of features.
Article
Full-text available
Understanding emotional human behavior in its multimodal and continuous aspect is necessary for studying human machine interaction and creating constituent social agents. As a first step, we propose a system for continuous emotional behavior recognition expressed by people during communication based on their gesture and their whole body dynamical motion. The features used to classify the motion are inspired by the Laban Movement Analysis entities [11] and are mapped onto the well-known Russell Circumplex Model [4]. We choose a specific case study that corresponds to an ideal case of multimodal behavior that emphasizes the body motion expression: theater performance. Using a trained neural network and annotated data, our system is able to describe the motion behavior as trajectories on the Russell Circumplex Model diagram during theater performances over time. This work contributes to the understanding of human behavior and expression and is a first step through a complete continuous emotion recognition system whose next step will be adding facial expressions. Copyright
Article
Full-text available
That all humans recognize certain specific emotions from their facial expression-the Universality Thesis-is a pillar of research, theory, and application in the psychology of emotion. Its most rigorous test occurs in indigenous societies with limited contact with external cultural influences, but such tests are scarce. Here we report 2 such tests. Study 1 was of children and adolescents (N = 68; aged 6-16 years) of the Trobriand Islands (Papua New Guinea, South Pacific) with a Western control group from Spain (N = 113, of similar ages). Study 2 was of children and adolescents (N = 36; same age range) of Matemo Island (Mozambique, Africa). In both studies, participants were shown an array of prototypical facial expressions and asked to point to the person feeling a specific emotion: happiness, fear, anger, disgust, or sadness. The Spanish control group matched faces to emotions as predicted by the Universality Thesis: matching was seen on 83% to 100% of trials. For the indigenous societies, in both studies, the Universality Thesis was moderately supported for happiness: smiles were matched to happiness on 58% and 56% of trials, respectively. For other emotions, however, results were even more modest: 7% to 46% in the Trobriand Islands and 22% to 53% in Matemo Island. These results were robust across age, gender, static versus dynamic display of the facial expressions, and between- versus within-subjects design. (PsycINFO Database Record
Article
Full-text available
An ever more sedentary lifestyle is a serious problem in our society. Enhancing people's exercise adherence through technology remains an important research challenge. We propose a novel approach for a system supporting walking that draws from basic findings in neuroscience research. Our shoe-based prototype senses a person's footsteps and alters in real-time the frequency spectra of the sound they produce while walking. The resulting sounds are consistent with those produced by either a lighter or heavier body. Our user study showed that modified walking sounds change one's own perceived body weight and lead to a related gait pattern. In particular, augmenting the high frequencies of the sound leads to the perception of having a thinner body and enhances the motivation for physical activity inducing a more dynamic swing and a shorter heel strike. We here discuss the opportunities and the questions our findings open.
Conference Paper
Full-text available
Bodily expression of emotion is recently receiving a growing interest, in particular regarding the study of bodily expression of emotions in daily actions such as walking or knocking at the door. However, previous studies tend to focus on a limited range of actions or emotions. Based on a new motion capture database of emotional body expression in daily actions, we propose a deeper analysis of the expression of emotions in body movement based on different emotions, actions and a wide range of low-level body cues. Random Forest approach is applied to investigate the classification of emotions in different movement tasks and to study the contribution of different types of body cues to the classification of each expressed emotion.
Chapter
Full-text available
This chapter summarizes the body of work about cultural differences in emotion recognition based on the match versus mismatch of the cultural group expressing the emotion and the cultural group perceiving the emotion. Two major perspectives have arisen to explain the well-replicated empirical finding that there tends to be better recognition of facial expressions where there is a match. The first is the notion of in-group advantage, which is an information-based explanation, arguing that individuals are more accurate judging emotional expressions from their own cultural group versus foreign groups due to better information about culturally-specific elements of emotional expression. The finding of systematic in-group advantage has led to the development of a recent dialect theory of emotion, which uses a linguistic metaphor to argue emotion is a universal language with subtly different dialects. Just as it is more challenging to understand someone speaking a different dialect in verbal language, it can be more challenging to recognize emotions that are expressed in a different dialect. This dialect theory has been the subject of controversy due to its implications for dominant theories about cross-cultural differences in emotion. A second perspective is the notion of out-group bias, which is a motivation-based explanation. Individuals may use decoding rules to understand out-group emotional expressions differently, or they may be less motivated to recognize the emotions of other individuals who are members of foreign cultures, even when they are merely lead to believe falsely that expressions originate elsewhere. Both of these theoretical mechanisms can act singly or simultaneously.
Article
Full-text available
Automatic recognition of emotions remains an ongoing challenge and much effort is being invested towards developing a system to solve this problem. Although several systems have been proposed, there is still none that considers the cultural context for emotion recognition. It remains unclear whether emotions are universal or culturally specific. A study on how culture influences the recognition of emotions is presented. For this purpose, a multicultural corpus for cross-cultural emotion analysis is constructed. Subjects from three different cultures—American, Asian and European—are recruited. The corpus is segmented and annotated. To avoid language artifacts, the emotion recognition model considers facial expressions, head movements, body motions and dimensional emotions. Three training and testing paradigms are carried out to compare cultural effects: intra-cultural, cross-cultural and multicultural emotion recognition. Intra-cultural and multicultural emotion recognition paradigms raised the best recognition results; cross-cultural emotion recognition rates were lower. These results suggest that emotion expression varies by culture, representing a hint of emotion specificity.
Conference Paper
Full-text available
We present a new robust signal for detecting deception: full body motion. Previous work on detecting deception from body movement has relied either on human judges or on specific gestures (such as fidgeting or gaze aversion) that are coded or rated by humans. The results are characterized by inconsistent and often contradictory findings, with small-stakes lies under lab conditions detected at rates only slightly better than guessing. Building on previous work that uses automatic analysis of facial videos and rhythmic body movements to diagnose stress, we set out to see whether a full body motion capture suit, which records the position, velocity and orientation of 23 points in the subject's body, could yield a better signal of deception. Interviewees of South Asian (n = 60) or White British culture (n = 30) were required to either tell the truth or lie about two experienced tasks while being interviewed by somebody from their own (n = 60) or different culture (n = 30). We discovered that full body motion – the sum of joint displacements – was indicative of lying approximately 75% of the time. Furthermore, movement was guilt-related, and occurred independently of anxiety, cognitive load and cultural background. Further analyses indicate that including individual limb data in our full body motion measurements, in combination with appropriate questioning strategies, can increase its discriminatory power to around 82%. This culture-sensitive study provides an objective and inclusive view on how people actually behave when lying. It appears that full body motion can be a robust nonverbal indicator of deceit, and suggests that lying does not cause people to freeze. However, should full body motion capture become a routine investigative technique, liars might freeze in order not to give themselves away; but this in itself should be a telltale.
Article
Full-text available
If robots are to successfully interact with humans, they need to measure, quantify and respond to the emotions we produce. Similar to humans, the perceptual cue inputs to any modelling that allows this will be based on behavioural expression and body activity features that are prototypical of each emotion. However, the likely employment of such robots in different cultures necessitates the tuning of the emotion feature recognition system to the specific feature profiles present in these cultures. The amount of tuning depends on the relative convergence of the cross-cultural mappings between the emotion feature profiles of the cultures where the robots will be used. The GRID instrument and the cognitive corpus linguistics methodology were used in a contrastive study analysing a selection of behavioural expression and body activity features to compare the feature profiles of joy, sadness, fear and anger within and between Polish and British English. The intra-linguistic differences that were found in the profile of emotion features suggest that weightings based on this profile can be used in robotic modelling to create emotion-sensitive socially interacting robots. Our cross-cultural results further indicate that this profile of features needs to be tuned in robots to make them emotionally competent in different cultures.
Article
Full-text available
Humans convey emotions through different ways. Gait is one of them. Here we propose to use gait data to highlight features that characterize emotions. Gait analysis study usually focuses on stance phase, frequency, footstep length. Here the study is based on the joint angles obtained from inverse kinematics computation from the 3D motion-capture data using a combination of degrees of freedom (DOF) out of a 34DOF human body model obtained from inverse kinematics of markers 3D position. The candidates are four professional actors, and five emotional states are simulated: Neutral, Joy, Anger, Sadness, and Fear. The paper presents first a psychological approach which results are used to propose numerical approaches. The first study provides psychological results on motion perception and the possibility of emotion recognition from gait by 32 observers. Then, the motion data is studied using a feature vector approach to verify the numerical identifiability of the emotions. Finally each motion is tested against a database of reference motions to identify the conveyed emotion. Using the first and second study results, we utilize a 6DOF model then a 12DOF model. The experimental results show that by using the gait characteristics it is possible to characterize each emotion with good accuracy for intra-subject data-base. For inter-subject database results show that recognition is more prone to error, suggesting strong inter-personal differences in emotional features.
Article
Full-text available
Emotional states influence whole-body movements during quiet standing, gait initiation, and steady state gait. A notable gap exists, however, in understanding how emotions affect postural changes during the period preceding the execution of planned whole-body movements. The impact of emotion induced postural reactions on forthcoming posturomotor movements remains unknown. We sought to determine the influence of emotional reactions on center of pressure (COP) displacement prior to the initiation of forward gait. Participants (N = 23, 14 females) stood on a force plate and initiated forward gait at the offset of an emotional image (representing five discrete categories; attack, sad faces, erotica, happy faces, and neutral objects). COP displacement in the anteroposterior direction was quantified for a 2s period during image presentation. Following picture onset, participants produced a posterior postural response to all image types. The greatest posterior displacement was occasioned in response to attack/threat stimuli compared to happy faces and erotica images. Results suggest the impact of emotional states on gait behavior begins during the motor planning period prior to the preparatory phase of gait initiation, and manifest in center of pressure displacement alterations.
Article
Full-text available
NumenSoft s.n.c. di M. Peri & C., V.le Brigate Partigiane 10/4, 16129 Genova, Italy (3) Eidomedia s.a.s., Abstract The EyesWeb open platform (www.eyesweb.org) has been originally conceived at the DIST-InfoMus Lab for supporting research on multimodal expressive interfaces and multimedia interactive systems. EyesWeb has also been widely employed for designing and developing real-time dance, music, and multimedia applications. It supports the user in experimenting computational models of non-verbal expressive communication and in mapping gestures from different modalities (e.g., human full-body movement, music) onto multimedia output (e.g., sound, music, visual media). It allows fast development and experiment cycles of interactive performance set-ups by including a visual programming language enabling mapping, at different levels, of movement and audio into integrated music, visual, and mobile scenery. EyesWeb has been designed with a special focus on the analysis and processing of expressive gesture in movement, midi, audio, and music signals. It was the basic platform of the EU-IST Project MEGA (www.megaproject.org) and it has been employed in many artistic performances and interactive installations. However, the use of EyesWeb is not limited to performing arts. Museum installations, entertainment, edutainment, therapy and rehabilitation are just some of a wide number of different application domains where the system has been successfully applied. For example, EyesWeb has been adopted as standard in other EU IST projects such as MEDIATE and CARE HERE in the therapy and rehabilitation field, and EU TMR MOSART. Currently, it is employed in the framework of the EU-IST project TAI-CHI and in the 6FP Networks of Excellence ENACTIVE and HUMAINE. EyesWeb users include universities, public and private research centers, companies, and private users.
Conference Paper
Full-text available
Object recognition is a central problem in computer vision research. Most object recognition systems have taken one of two approaches, using either global or local features exclu-sively. This may be in part due to the difficulty of combining a single global feature vector with a set of local features in a suitable manner. In this paper, we show that combining local and global features is beneficial in an application where rough seg-mentations of objects are available. We present a method for classification with local features using non-parametric density estimation. Subsequently, we present two methods for combining local and global features. The first uses a "stacking" ensemble technique, and the second uses a hi-erarchical classification system. Results show the superior performance of these combined methods over the compo-nent classifiers, with a reduction of over 20% in the error rate on a challenging marine science application.
Article
Full-text available
In this article the role of different categories of postures in the detection, recognition, and interpretation of emotion in contextually rich scenarios, including ironic items, is investigated. Animated scenarios are designed with 3D virtual agents in order to test 3 conditions: In the “still” condition, the narrative content was accompanied by emotional facial expressions without any body movements; in the “idle” condition, emotionally neutral body movements were introduced; and in the “congruent” condition, emotional body postures congruent with the character's facial expressions were displayed. Those conditions were examined by 27 subjects, and their impact on the viewers’ attentional and emotional processes was assessed. The results highlight the importance of the contextual information to emotion recognition and irony interpretation. It is also shown that both idle and emotional postures improve the detection of emotional expressions. Moreover, emotional postures increase the perceived intensity of emotions and the realism of the animations.
Article
Full-text available
We introduce the first steps in a developmental robot called multimodal emotional intelligence (MEI), a robot that can understand and express emotions in voice, gesture and gait using a controller trained only on voice. Whereas it is known that humans can perceive affect in voice, movement, music and even as little as point light displays, it is not clear how humans develop this skill. Is it innate? If not, how does this emotional intelligence develop in infants? The MEI robot develops these skills through vocal input and perceptual mapping of vocal features to other modalities. We base MEI's development on the idea that motherese is used as a way to associate dynamic vocal contours to facial emotion from an early age. MEI uses these dynamic contours to both understand and express multimodal emotions using a unified model called SIRE (Speed, Intensity, irRegularity, and Extent). Offline experiments with MEI support its cross-modal generalization ability: a model trained with voice data can recognize happiness, sadness, and fear in a completely different modality—human gait. User evaluations of the MEI robot speaking, gesturing and walking show that it can reliably express multimodal happiness and sadness using only the voice-trained model as a basis.
Conference Paper
Full-text available
In this paper, a novel human-virtual human interaction system is proposed. This system supports a real human to communicate with a virtual human using natural body language. Meanwhile, the virtual human is capable of understanding the meaning of human upper body gestures and reacting with its own personality by the means of body action, facial expression and verbal language simultaneously. In total, 11 human upper body gestures with and without human-object interaction are currently involved in the system. They can be characterized by human head, hand and arm posture. In our system implementation, the wearable Immersion CyberGlove II is used to capture the hand posture and the vision-based Microsoft Kinect takes charge of capturing the head and arm posture. This is a new sensor solution for human-gesture capture, and can be regarded as the most important contribution of this paper. Based on the posture data from the CyberGlove II and the Kinect, an effective and real-time human gesture recognition algorithm is also proposed. To verify the effectiveness of the gesture recognition method, we build a human gesture sample dataset. Additionally, the experiments demonstrate that our algorithm can recognize human gestures with high accuracy in real time.
Conference Paper
Full-text available
In this paper, we explored the use of features that represent body posture and movement for automatically detecting people's emotions in non-acted standing scenarios. We focused on four emotions that are often observed when people are playing video games: triumph, frustration, defeat, and concentration. The dataset consists of recordings of the rotation angles of the player's joints while playing Wii sports games. We applied various machine learning techniques and bagged them for prediction. When body pose and movement features are used we can reach an overall accuracy of 66.5% for differentiating between these four emotions. In contrast, when using the raw joint rotations, limb rotation movement, or posture features alone, we were only able to achieve accuracy rates of 59%, 61%, and 62% respectively. Our results suggest that features representing changes in body posture can yield improved classification rates over using static postures or joint information alone.
Article
Full-text available
Body movements communicate affective expressions and, in recent years, computational models have been developed to recognize affective expressions from body movements or to generate movements for virtual agents or robots which convey affective expressions. This survey summarizes the state of the art on automatic recognition and generation of such movements. For both automatic recognition and generation, important aspects such as the movements analyzed, the affective state representation used, and the use of notation systems is discussed. The survey concludes with an outline of open problems and directions for future work.
Article
Full-text available
Thanks to the decreasing cost of whole-body sensing technology and its increasing reliability, there is an increasing interest in, and understanding of, the role played by body expressions as a powerful affective communication channel. The aim of this survey is to review the literature on affective body expression perception and recognition. One issue is whether there are universal aspects to affect expression perception and recognition models or if they are affected by human factors such as culture. Next, we discuss the difference between form and movement information as studies have shown that they are governed by separate pathways in the brain. We also review psychological studies that have investigated bodily configurations to evaluate if specific features can be identified that contribute to the recognition of specific affective states. The survey then turns to automatic affect recognition systems using body expressions as at least one input modality. The survey ends by raising open questions on data collecting, labeling, modeling, and setting benchmarks for comparing automatic recognition systems.
Conference Paper
Full-text available
Postural control is a dynamical process that has been extensively studied in motor control research. Recent experimental work shows a direct impact of affects on human balance. However, few studies on the automatic recognition of affects in full body expressions consider balance variables such as center of gravity displacements. Force plates enable the capture of balance variables with high precision. Automatic video extraction of the center of gravity is a basic alternative, which can be easily accessible for a wide range of public applications. This paper presents a comparison of balance variables extracted from the force plate and video processing. These variables are used to capture the bodily expressions of participants in a public speaking task designed to elicit stress. Results show that the variability of the center of gravity displacements from the force plate and video are related to negative emotions and situation appraisals. The power spectrum density broadness of the center of pressure from the force plate is related to Difficulty Describing Feelings, an important factor from a dispositional trait of Alexithymia. Implications of the use of such methods are discussed.
Conference Paper
An ever more sedentary lifestyle is a serious problem in our society. Enhancing people’s exercise adherence through technology remains an important research challenge. We propose a novel approach for a system supporting walking that draws from basic findings in neuroscience research. Our shoe-based prototype senses a person’s footsteps and alters in real-time the frequency spectra of the sound they produce while walking. The resulting sounds are consistent with those produced by either a lighter or heavier body. Our user study showed that modified walking sounds change one’s own perceived body weight and lead to a related gait pattern. In particular, augmenting the high frequencies of the sound leads to the perception of having a thinner body and enhances the motivation for physical activity inducing a more dynamic swing and a shorter heel strike. We here discuss the opportunities and the questions our findings open.
Article
: Research relevant to psychotherapy regarding facial expression and body movement, has shown that the kind of information which can be gleaned from the patients words - information about affects, attitudes, interpersonal styles, psychodynamics - can also be derived from his concomitant nonverbal behavior. The study explores the interaction situation, and considers how within deception interactions differences in neuroanatomy and cultural influences combine to produce specific types of body movements and facial expressions which escape efforts to deceive and emerge as leakage or deception clues.
Chapter
Recognition and study of human emotions have fascinated a lot of attention in the past two decades and have been researched broadly in the field of computer vision. The recognition of complete-body expressions is significantly harder, because the pattern of the human pose has additional degrees of self-determination than the face alone, and its overall shape varies robustly during articulated motion. This chapter presents a method for emotion recognition based on the gesture dynamics features extracted from the foreground object to represent various levels of a person’s posture. The experiments are carried out using publicly available emotion recognition dataset, and the extracted motion feature set is modeled by support vector machines (SVM), Naïve Bayes, and dynamic time wrapping (DTW) which are used to classify the human emotions. Experimental results show that DTW is efficient in recognizing the human emotion with an overall recognition accuracy of 93.39 %, when compared to SVM and Naïve Bayes.
Conference Paper
Social robots are envisioned to move into unrestricted environments where they will be interacting with naive users (in terms of their experience as robot operators). Thus, these robots are also envisioned to exploit interaction channels that are natural to humans like speech, gestures, or body movements. A specificity of these interaction channels is that humans do not only convey task-related information but also more subtle information like e.g. Emotions or personal stance through these channels. Thus, to be successful and not accidentally jeopardizing an interaction, robots need to be able to understand these implicit connotations of the signals (often called social signal processing) in order to generate appropriate signals in a given interaction context. One main application area that is envisioned for social robots is related to elder care, but little is known on how seniors will perceive robots and the signals they produce. In this paper we focus on affective connotations of body movements and investigate how the perception of body movements of robots is related to age. Inspired by a study from Japan, we introduce culture as a variable in the experiment and discuss the difficulties of cross-cultural comparisons. The results show that there are certain age-related differences in the perception of affective body movements, but not as strong as in the original study. A follow up experiment puts the affective body movements into context and shows that recognition rates deteriorate for older participants.
Conference Paper
Gestures have been called the leaky source of emotional information. Also gestures are easy to retrieve from a distance by ordinary cameras. Thus as many would agree gestures become an important clue to the emotional state of a person. In this paper we have worked on recognizing emotions of a person by analyzing only gestural information. Subjects are initially trained to perform emotionally expressive gestures by a professional actor. The same actor trained the system to recognize the emotional context of gestures. Finally the gestural performances of the subjects are evaluated by the system to identify the class of emotion indicated. Our system yields an accuracy of 94.4% with a training set of only one gesture per emotion. Apart from this our system is also computationally efficient. Our work analyses emotions from only gestures, which is a significant step towards reducing the cost efficiency of emotion recognition. It may be noted here that this system may also be used for the purpose of general gesture recognition. We have proposed new features and a new classifying approach using fuzzy sets. We have achieved state of art accuracy with minimal complexity as each motion trajectory along each axis generates only 4 displacement features. Each axis generates a trajectory and only 6 joint trajectories among all joint trajectories are compared. The 6 motion trajectories are selected based on maximum motion, as maximum moving regions give more information on gestures. The experiments have been performed on data obtained from Microsoft Kinect sensors. Training and Testing were subject gender independent.
Article
We present a computational model and a system for the automated recognition of emotions starting from full-body ovement. Three-dimensional motion data of full-body movements are obtained either from professional optical motion-capture systems (Qualisys) or from low-cost RGB-D sensors (Kinect and Kinect2). A number of features are then automatically extracted at different levels, from kinematics of a single joint to more global expressive features inspired by psychology and humanistic theories (e.g., contraction index, fluidity, and impulsiveness). An abstraction layer based on dictionary learning further rocesses these movement features to increase the model enerality and to deal with intraclass variability, noise, and incomplete information characterizing emotion expression in human movement. The resulting feature vector is the input for a classifier performing real-time automatic emotion recognition based on linear support vector machines. The recognition erformance of the proposed model is presented and discussed, including the tradeoff between precision of the tracking easures (we compare the Kinect RGB-D sensor and the Qualisys motion-capture system) versus dimension of the training dataset. The resulting model and system have been successfully applied in the development of serious games for helping autistic children learn to recognize and express emotions by means of their full-body movement.
Article
For social robots to be successfully integrated and accepted within society, they need to be able to interpret human social cues that are displayed through natural modes of communication. In particular, a key challenge in the design of social robots is developing the robot's ability to recognize a person's affective states (emotions, moods, and attitudes) in order to respond appropriately during social human-robot interactions (HRIs). In this paper, we present and discuss social HRI experiments we have conducted to investigate the development of an accessibility-aware social robot able to autonomously determine a person's degree of accessibility (rapport, openness) toward the robot based on the person's natural static body language. In particular, we present two one-on-one HRI experiments to: 1) determine the performance of our automated system in being able to recognize and classify a person's accessibility levels and 2) investigate how people interact with an accessibility-aware robot which determines its own behaviors based on a person's speech and accessibility levels.
Article
Facial expression and gesture recognition algorithms are key enabling technologies for human-computer interaction (HCI) systems. State of the art approaches for automatic detection of body movements and analyzing emotions from facial features heavily rely on advanced machine learning algorithms. Most of these methods are designed for the average user, but the assumption 'one-size-fits-all' ignores diversity in cultural background, gender, ethnicity, and personal behavior, and limits their applicability in real-world scenarios. A possible solution is to build personalized interfaces, which practically implies learning person-specific classifiers and usually collecting a significant amount of labeled samples for each novel user. As data annotation is a tedious and time-consuming process, in this paper we present a framework for personalizing classification models which does not require labeled target data. Personalization is achieved by devising a novel transfer learning approach. Specifically, we propose a regression framework which exploits auxiliary (source) annotated data to learn the relation between person-specific sample distributions and parameters of the corresponding classifiers. Then, when considering a new target user, the classification model is computed by simply feeding the associated (unlabeled) sample distribution into the learned regression function. We evaluate the proposed approach in different applications: pain recognition and action unit detection using visual data and gestures classification using inertial measurements, demonstrating the generality of our method with respect to different input data types and basic classifiers. We also show the advantages of our approach in terms of accuracy and computational time both with respect to user-independent approaches and to previous personalization techniques.
Article
In the field of affect recognition, most researchers have focused on using multimodal fusion while temporal fusion techniques have yet to be adequately explored. This paper demonstrates that a powerful temporal fusion approach can significantly improve the performance of affect recognition. As a typical approach in state-of-the-art methods, the segmentbased affect recognition technique from body movements is used as a baseline, whereby a body movement is parsed into a sequence of motion clips (called segments) and all segments are treated as being the same using a majority voting strategy. Our basic idea is that different types of segments have different influences/weights in recognizing the affective state. To verify this idea, an entropy-based method is proposed to estimate the segment weights in supervised and unsupervised learning. Furthermore, the recognition results from all segments are fused with the segment weights using the sum rule. Our experimental results on a public data set demonstrate that the segment weights can greatly improve the percentage of correctness (accuracy) from the baseline.
Article
Our research focuses on the development of a socially assistive robot to provide cognitive and social stimulation during meal-time scenarios in order to promote proper nutrition amongst the elderly. In this paper, we present the design of a novel automated affect recognition and classification system that will allow the robot to interpret natural displays of affective human body language during such one-on-one assistive scenarios. Namely, we identify appropriate body language features and learning-based classifiers that can be utilized for accurate affect estimation. A robot can then utilize this information in order to determine its own appropriate responsive behaviors to keep people engaged in this crucial activity. One-on-one assistive meal-time experiments were conducted with the robot Brian 2.1 and elderly participants at a long-term care facility. The results showed the potential of utilizing the automated affect recognition and classification system to identify and classify natural affective body language features into valence and arousal values using learning-based classifiers. The elderly users displayed a number of affective states, further motivating the use of the affect estimation system.
Article
In Human-Robot Interactions (HRI), robots should be socially intelligent. They should be able to respond appropriately to human affective and social cues in order to effectively engage in bi-directional communications. Social intelligence would allow a robot to relate to, understand, and interact and share information with people in real-world human-centered environments. This survey paper presents an encompassing review of existing automated affect recognition and classification systems for social robots engaged in various HRI settings. Human-affect detection from facial expressions, body language, voice, and physiological signals are investigated, as well as from a combination of the aforementioned modes. The automated systems are described by their corresponding robotic and HRI applications, the sensors they employ, and the feature detection techniques and affect classification strategies utilized. This paper also discusses pertinent future research directions for promoting the development of socially intelligent robots capable of recognizing, classifying and responding to human affective states during real-time HRI.
Article
This paper introduces a method for facial expression recognition combining appearance and geometric facial features. The proposed framework consistently combines multiple facial representations at both global and local levels. First, covariance descriptors are computed to represent regional features combining various feature information with a low dimensionality. Then geometric features are detected to provide a general facial movement description of the facial expression. These appearance and geometric features are combined to form a vector representation of the facial expression. The proposed method is tested on the CK+ database and shows encouraging performance.
Article
The question whether body movements and body postures are indicative of specific emotions is a matter of debate. While some studies have found evidence for specific body movements accompanying specific emotions, others indicate that movement behavior (aside from facial expression) may be only indicative of the quantity (intensity) of emotion, but not of its quality. The study reported here is an attempt to demonstrate that body movements and postures to some degree are specific for certain emotions. A sample of 224 video takes, in which actors and actresses portrayed the emotions of elated joy, happiness, sadness, despair, fear, terror, cold anger, hot anger, disgust, contempt, shame, guilt, pride, and boredom via a scenario approach, was analyzed using coding schemata for the analysis of body movements and postures. Results indicate that some emotion-specific movement and posture characteristics seem to exist, but that for body movements differences between emotions can be partly explained by the dimension of activation. While encoder (actor) differences are rather pronounced with respect to specific movement and posture habits, these differences are largely independent from the emotion-specific differences found. The results are discussed with respect to emotion-specific discrete expression models in contrast to dimensional models of emotion encoding.
Article
Affect detection is an important pattern recognition problem that has inspired researchers from several areas. The field is in need of a systematic review due to the recent influx of Multimodal (MM) affect detection systems that differ in several respects and sometimes yield incompatible results. This article provides such a survey via a quantitative review and meta-analysis of 90 peer-reviewed MM systems. The review indicated that the state of the art mainly consists of person-dependent models (62.2% of systems) that fuse audio and visual (55.6%) information to detect acted (52.2%) expressions of basic emotions and simple dimensions of arousal and valence (64.5%) with feature- (38.9%) and decision-level (35.6%) fusion techniques. However, there were also person-independent systems that considered additional modalities to detect nonbasic emotions and complex dimensions using model-level fusion techniques. The meta-analysis revealed that MM systems were consistently (85% of systems) more accurate than their best unimodal counterparts, with an average improvement of 9.83% (median of 6.60%). However, improvements were three times lower when systems were trained on natural (4.59%) versus acted data (12.7%). Importantly, MM accuracy could be accurately predicted (cross-validated R2 of 0.803) from unimodal accuracies and two system-level factors. Theoretical and applied implications and recommendations are discussed.
Article
Humans are emotional beings and their feelings influence the way they perform and interact with computers. One of the most expressive modality for humans is body posture and movement that has lately started to receive attention from researchers in the use for emotion recognition. This survey outlines the findings relating to the area of body emotion recognition by describing emerging techniques and modalities as well as recent advances on the challenging task of automatic emotion recognition. Important aspects are analyzed, application areas and notation systems are described and the importance for movement segmentation is discussed. The survey concludes with a detailed discussion on unsolved problems in the area and provides promising directions for future work.
Article
For an engaging human–machine interaction, machines need to be equipped with affective communication abilities. Such abilities enable interactive machines to recognize the affective expressions of their users, and respond appropriately through different modalities including movement. This paper focuses on bodily expressions of affect, and presents a new computational model for affective movement recognition, robust to kinematic, interpersonal, and stochastic variations in affective movements. The proposed approach derives a stochastic model of the affective movement dynamics using hidden Markov models (HMMs). The resulting HMMs are then used to derive a Fisher score representation of the movements, which is subsequently used to optimize affective movement recognition using support vector machine classification. In addition, this paper presents an approach to obtain a minimal discriminative representation of the movements using supervised principal component analysis (SPCA) that is based on Hilbert–Schmidt independence criterion in the Fisher score space. The dimensions of the resulting SPCA subspace consist of intrinsic movement features salient to affective movement recognition. These salient features enable a low-dimensional encoding of observed movements during a human–machine interaction, which can be used to recognize and analyze human affect that is displayed through movement. The efficacy of the proposed approach in recognizing affective movements and identifying a minimal discriminative movement representation is demonstrated using two challenging affective movement datasets.
Article
This paper investigates the use of statistical dimensionality reduction (DR) techniques for discriminative low dimensional embedding to enable affective movement recognition. Human movements are defined by a collection of sequential observations (time-series features) representing body joint angle or joint Cartesian trajectories. In this work, these sequential observations are modelled as temporal functions using B-spline basis function expansion, and dimensionality reduction techniques are adapted to enable application to the functional observations. The DR techniques adapted here are: Fischer discriminant analysis (FDA), supervised principal component analysis (PCA), and Isomap. These functional DR techniques along with functional PCA are applied on affective human movement datasets and their performance is evaluated using leave-one-out cross validation with a one-nearest neighbour classifier in the corresponding low-dimensional subspaces. The results show that functional supervised PCA outperforms the other DR techniques examined in terms of classification accuracy and time resource requirements.
Article
Recently, recognizing affects from both face and body gestures attracts more attentions. However, it still lacks of efficient and effective features to describe the dynamics of face and gestures for real-time automatic affect recognition. In this paper, we combine both local motion and appearance feature in a novel framework to model the temporal dynamics of face and body gesture. The proposed framework employs MHI-HOG and Image-HOG features through temporal normalization or bag of words to capture motion and appearance information. The MHI-HOG stands for Histogram of Oriented Gradients (HOG) on the Motion History Image (MHI). It captures motion direction and speed of a region of interest as an expression evolves over the time. The Image-HOG captures the appearance information of the corresponding region of interest. The temporal normalization method explicitly solves the time resolution issue in the video-based affect recognition. To implicitly model local temporal dynamics of an expression, we further propose a bag of words (BOW) based representation for both MHI-HOG and Image-HOG features. Experimental results demonstrate promising performance as compared with the state-of-the-art. Significant improvement of recognition accuracy is achieved as compared with the frame-based approach that does not consider the underlying temporal dynamics.
Conference Paper
We have conducted a study analyzing motion capture data of bodily expressions of human emotions towards the goal of building a social expressive robot that interacts with and supports hospitalized children. Although modeling emotional expression (and recognition) in (by) robots in terms of discrete categories presents advantages such as ease and clarity of interpretation, our results show that this approach also poses a number of problems. The main issues relate to the loss of subtle expressions and feelings, individual features, context, and social interaction elements that are present in real life.
Conference Paper
Recent advances in multiple-kernel learning (MKL) show the effectiveness to fuse multiple base features in object detection and recognition. However, MKL tends to select only the most discriminative base features but ignore other less discriminative base features which may provide complementary information. Moreover, MKL usually employ Gaussian RBF kernels to transform each base feature to its high dimensional space. Generally, base features from different modalities require different kernel parameters for obtaining the optimal performance. Therefore, MKL may fail to utilize the maximum discriminative power of all base features from multiple modalities at the same time. In order to address these issues, we propose a margin-constrained multiple-kernel learning (MCMKL) method by extending MKL with margin constraints and applying dimensionally normalized RBF (DNRBF) kernels for application of multi-modal feature fusion. The proposed MCMKL method learns weights of different base features according to their discriminative power. Unlike the conventional MKL, MCMKL incorporates less discriminative base features by assigning smaller weights when constructing the optimal combined kernel, so that we can fully take the advantages of the complementary features from different modalities. We validate the proposed MCMKL method for affect recognition from face and body gesture modalities on the FABO dataset. Our extensive experiments demonstrate favorable results as compared to the existing work, and MKL-based approach.
Conference Paper
In this paper, emotional motion representation is proposed for Human Robot Interaction: HRI. The proposed representation is based on “Laban Movement Analysis: LMA” and trajectories of 3-dimensional whole body joint positions using an RGB-D camera such as a “Microsoft Kinect”. The experimental results show that the proposed method distinguishes two types of human emotional motion well.