Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Micro-expressions (MEs) are rapid, involuntary facial expres sions which reveal emotions that people do not intend to show. Studying MEs is valuable as recognizing them has many important applications, particularly in forensic science and psychotherapy. However, analyzing spontaneous MEs is very challenging due to their short duration and low intensity. Automatic ME analysis includes two tasks: ME spotting and ME recognition. For ME spotting, previous studies have focused on posed rather than spontaneous videos. For ME recognition, the performance of previous studies is low. To address these challenges, we make the following contributions: (i) We propose the first method for spotting spontaneous MEs in long videos (by exploiting feature difference contrast). This method is training free and works on arbitrary unseen videos. (ii) We present an advanced ME recognition framework, which outperforms previous work by a large margin on two challenging spontaneous ME databases (SMIC and CASMEII). (iii) We propose the first automatic ME analysis system (MESR), which can spot and recog17 nize MEs from spontaneous video data. Finally, we show our method outperforms humans in the ME recognition task by a large margin, and achieves comparable performance to humans at the very challenging task of spotting and then recognizing spontaneous MEs.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Apex formation is known as the peak of a ME. Considering the difficulties of developing methods that can cope with short-term and low intensity [8], this study focuses on the apex framework, where the expressions are most intense. ...
... The preprocessing phase consists of separating the non-face area and detecting, aligning, and cropping the face region to prevent the effect of head movement [9]. In studies on face detect, DRMF method [10,11] Active Shape Model (ASM) 68-point method [4,8,12] were used. Dlib machine learning toolkit of the OpenCV library used for both detection and alignment are used [9,13,14]. ...
... Local binary patterns (LBP) are used to extract texture features from gray images [4]. The derivative of LBP, which encodes both spatial and temporal information, is used LBP-TOP [5,8,18,19]. ...
Article
Full-text available
This study proposes a framework for defining ME expressions, in which preprocessing, feature extraction with deep learning, feature selection with an optimization algorithm, and classification methods are used. CASME-II, SMIC-HS, and SAMM, which are among the most used ME datasets in the literature, were combined to overcome the under-sampling problem caused by the datasets. In the preprocessing stage, onset, and apex frames in each video clip in datasets were detected, and optical flow images were obtained from the frames using the FarneBack method. The features of these obtained images were extracted by applying AlexNet, VGG16, MobilenetV2, EfficientNet, Squeezenet from CNN models. Then, combining the image features obtained from all CNN models. And then, the ones which are the most distinctive features were selected with the Particle Swarm Optimization (PSO) algorithm. The new feature set obtained was divided into classes positive, negative, and surprise using SVM. As a result, its success has been demonstrated with an accuracy rate of 0.8784 obtained in our proposed ME framework.
... tern from Three Orthogonal Planes (LBP-TOP) [15] on the first public spontaneous ME dataset: SMIC [16]. Following the work of [15], various approaches based on appearance and geometry features [17], [18] were proposed for improving the performance of MER. ...
... Optical flow+Apex [78] Optical flow+Sequence [79] Optical flow+Key frames [66] Optical flow+Landmarks [80] Dynamic Image with dynamic information: Dynamic image [71], [72] Active image [73] Frame aggregation: Onset-Apex [68] Snippets [69] Selected frames [70] Sequence: [53], [56], [65], [66], [67] Static Apex based recognition: [54], [59], [63], [64] Apex spotting: Optical flow [57] Feature contrast [17], [58] Frequency [59] CNN-based [60], [61], [62] Pre-processing Data augmentation: Multi-ratio [55] Temporal [56] GAN [22], [53] Regions of interest: Grid [49] FACS [50], [51] EyeMouth [52] landmarks [53] Learning-based [54] Motion magnification: EVM [47] GLMM [43] CNN-based [48] Temporal normalization: TIM [45] CNN-based [46] Face registration: ASM [42] AMM [43] CNN-based [44] Face detection: ...
... One of the commonly used methods is the Eulerian Video Magnification method (EVM) [47]. For MEs, the EVM is applied for facial motion magnification [17]. EVM magnifies either motion or color content across two consecutive frames in videos. ...
Article
Full-text available
Micro-expressions (MEs) are involuntary facial movements revealing people's hidden feelings in high-stake situations and have practical importance in various fields. Early methods for Micro-expression Recognition (MER) are mainly based on traditional features. Recently, with the success of Deep Learning (DL) in various tasks, neural networks have received increasing interest in MER. Different from macro-expressions, MEs are spontaneous, subtle, and rapid facial movements, leading to difficult data collection and annotation, thus publicly available datasets are usually small-scale. Currently, various DL approaches have been proposed to solve the ME issues and improve MER performance. In this survey, we provide a comprehensive review of deep MER and define a new taxonomy for the field encompassing all aspects of MER based on DL, including datasets, each step of the deep MER pipeline, and performance comparisons of the most influential methods. The basic approaches and advanced developments are summarized and discussed for each aspect. Additionally, we conclude the remaining challenges and potential directions for the design of robust MER systems. Finally, ethical considerations in MER are discussed. To the best of our knowledge, this is the first survey of deep MER methods, and this survey can serve as a reference point for future MER research.
... Recently, few studies [3]- [5] focused on ME spotting using deep learning methods. Li et al. [6] introduced an ME spotting method for spontaneous ME datasets. Furthermore, Zhang et al. [3] designed a deep learning-based ME spotting method by extracting features from video clips. ...
... Moreover, Liong et al. [4] proposed an automatic apex frame spotting model. (A more detailed categorization and analysis of the ME spotting can be found in existing MER surveys [6]- [9]). Further, face alignment and noise filtration are employed to systematize the input data samples for better feature extraction and learning [8]. ...
... However, it is still difficult to manually design a robust descriptor for capturing quick subtle changes in MEs. The detailed summary of the traditional MER approaches are listed in the supplementary draft (supplementary : Table III), A more detailed categorization of traditional methods and classifiers can be found in [6], [9]. ...
Preprint
Full-text available
Micro expression recognition (MER) is a very challenging area of research due to its intrinsic nature and fine-grained changes. In the literature, the problem of MER has been solved through handcrafted/descriptor-based techniques. However, in recent times, deep learning (DL) based techniques have been adopted to gain higher performance for MER. Also, rich survey articles on MER are available by summarizing the datasets, experimental settings, conventional and deep learning methods. In contrast, these studies lack the ability to convey the impact of network design paradigms and experimental setting strategies for DL-based MER. Therefore, this paper aims to provide a deep insight into the DL-based MER frameworks with a perspective on promises in network model designing, experimental strategies, challenges, and research needs. Also, the detailed categorization of available MER frameworks is presented in various aspects of model design and technical characteristics. Moreover, an empirical analysis of the experimental and validation protocols adopted by MER methods is presented. The challenges mentioned earlier and network design strategies may assist the affective computing research community in forging ahead in MER research. Finally, we point out the future directions, research needs, and draw our conclusions.
... Most of the research works treat the spotting and recognition as separate tasks, motivated by the Micro-Expression Grand Challenge (MEGC) [9,10]. To the best of our knowledge, the current attempts in the ME analysis domain are limited [6,11]. In particular, they employed traditional algorithms to spot the ME on the short videos that consist of only one ME and without the involvement of MaE, which is inappropriate to be adapted on the long videos datasets. ...
... Both these works relied on handcrafted feature descriptors. Li et al. [6] experimented with LBP and HOOF descriptors for the spotting task, while LBP, HOG, HIGO features were extracted for the recognition task. Meanwhile, Liong et al. [11] employed optical strain and Bi-WOOF features for spotting and recognition tasks respectively. ...
... S: Spotting, A: Analysis. Overall: We review the metric introduced in the work of [6], which multiplies the accuracy metric of spotting and recognition tasks. Due to the imbalanced class distribution, we propose a new metric, i.e. ...
Article
Facial Micro-Expressions (MEs) reveal a person’s hidden emotions in high stake situations within a fraction of a second and at a low intensity. The broad range of potential real-world applications that can be applied has drawn considerable attention from researchers in recent years. However, both spotting and recognition tasks are often treated separately. In this paper, we present Micro-Expression Analysis Network (MEAN), a shallow multi-stream multi-output network architecture comprising of task-specific (spotting and recognition) networks that is designed to effectively learn a meaningful representation from both ME class labels and location-wise pseudo-labels. Notably, this is the first known work that addresses ME analysis on long videos using a deep learning approach, whereby ME spotting and recognition are performed sequentially in a two-step procedure: first spotting the ME intervals using the spotting network, and proceeding to predict their emotion classes using the recognition network. We report extensive benchmark results on the ME analysis task on both short video datasets (CASME II, SMIC-E-HS, SMIC-E-VIS, and SMIC-E-NIR), and long video datasets (CAS(ME)² and SAMMLV); the latter in particular demonstrates the capability of the proposed approach under unconstrained settings. Besides the standard measures, we promote the usage of fairer metrics in evaluating the performance of a complete ME analysis system. We also provide visual explanations of where the network is “looking” and showcasing the effectiveness of inductive transfer applied during network training. An analysis is performed on the in-the-wild dataset (MEVIEW) to open up further research into real-world scenarios.
... Generally, most MER works using developed magnification techniques follow the method of first magnifying original ME images and then sending them to the network for recognition. To ensure the classifier better distinguishes between different ME classes, Li et al. [26] adopted Eulerian Motion Magnification (EMM) to magnify the subtle motions in videos. Ngo et al. [24] then introduced a new magnification technique, i.e., Global Lagrangian Motion Magnification (GLMM), which was found to provide more discriminative magnification. ...
... The recording rate of the camera is 200 fps. We also selected five categories from this database, i.e., anger (57), happiness (26), contempt (12), surprise (15) and other (26). ...
... The recording rate of the camera is 200 fps. We also selected five categories from this database, i.e., anger (57), happiness (26), contempt (12), surprise (15) and other (26). ...
Article
Full-text available
A micro-expression (ME) is a kind of involuntary facial expressions, which commonly occurs with subtle intensity. The accurately recognition ME, a. k. a. micro-expression recognition (MER), has a number of potential applications, e.g., interrogation and clinical diagnosis. Therefore, the subject has received a high level of attention among researchers in affective computing and pattern recognition communities. In this paper, we proposed a straightforward and effective deep learning method called uncertainty-aware magnification-robust networks (UAMRN) for MER, which attempts to address two key issues in MER including the low intensity of ME and imbalance of ME samples. Specifically, to better distinguish subtle ME movements, we reconstructed a new sequence by magnifying the ME intensity. Furthermore, a sparse self-attention (SSA) block was implemented which rectifies the standard self-attention with locality sensitive hashing (LSH), resulting in the suppression of artefacts generated during magnification. On the other hand, for the class imbalance problem, we guided the network optimization based on the confidence about the estimation, through which the samples from rare classes were allotted greater uncertainty and thus trained more carefully. We conducted the experiments on three public ME databases, i.e., CASME II, SAMM and SMIC-HS, the results of which demonstrate improvement compared to recent state-of-the-art MER methods.
... The first group is handcrafted methods [20,21]. These methods are the Local Binary Pattern (LBP)-based, Optical Flow (OF)-based, gradientbased, wavelet-based, and motion magnification methods [7,11,[22][23][24]. ...
... The LBP-based methods such as the LBP on Three Orthogonal Planes (LBP-TOP) [25], and the cubic-LBP [7] have good efficiency on texture [7,24,26]. Their characteristics are simple in the calculation, sensitive to illumination, and insensitive to orientation. Although the gradient-based methods, such as the Histogram of Image Gradient Orientation (HIGO) [24] are robust to changing light, they have costly calculations. ...
... Their characteristics are simple in the calculation, sensitive to illumination, and insensitive to orientation. Although the gradient-based methods, such as the Histogram of Image Gradient Orientation (HIGO) [24] are robust to changing light, they have costly calculations. Wavelet can be used on different resolutions or scales of images; however, it needs a high elapsed time for performance. ...
Article
Full-text available
Invisible to naked human eyes, micro‐movements are barely noticeable. There are wide ranges of micro‐movement applications from spotting subtle changes in a volcano, vascular pulse, and blood vessel to micro‐expression detection. In the latter, some evil thoughts can be unveiled, resulting in identifying crooks and lawbreakers and it arises from involuntary subtle and short‐duration of facial muscles movements. Precise spotting of these tiny movements is possible only when multiple aspects of temporal images are scrutinized. Meanwhile, since motions often happen in one or two directions, it is rudimentary to extract complete feature sets in textural‐based approaches such as cubic Local Binary Pattern (cubic‐LBP). Approaches like cubic‐LBP also have imposed an unnecessary computation burden. Hence, in this research, a novel method named intelligent cubic‐LBP is proposed, which incorporates Convolutional Neural Network (CNN) model. This model learns to select the useful plane(s) automatically. Apex is then detected by applying Partial Autocorrelation Coefficient (PACF) on the selected plane(s). The experimental results show significant improvement in micro‐movements identification. The accuracy of the apex frame detection has elevated to 10% and 17% in the Chinese Academy of Sciences Micro‐Expressions (CASME) and the CASME II databases, respectively.
... Meanwhile, spontaneous MEs are the facial expressions induced by the subjects are genuine and they have to put in the effort to suppress their true emotions [8]. According to [9], spontaneous MEs are much more difficult for spotting than posed MEs, because a ME is considered involuntary and difficult to disguise. With that being said, posed ME databases are not really helpful for the research of ME spotting. ...
... While, the work in [20] utilized an LBP histogram to extract temporal and spatial locations for spotting the apex frame based on the appearance features between the average frame and the current frame. As an extension, the method in [9] presented another method that is based on deep multi-task learning with Histograms of Oriented Optical Flow (HOOF) as the input feature for ME detection. Yet, they used Convolutional Neural Network (CNN) only to identify the location of the facial landmarks, which are then used to split the facial area into regions of interest (ROI). ...
... The proposed algorithm was designed to capture the continuity information of the movement flows and directions. Li et al. (2018) have proposed the first automatic ME analysis system (MESR) that combines the thresholding value of feature contrast used in ME spotting and recognition tasks for a long spontaneous video (SMIC and CASME II). The feature extractors used in this paper are LBP and HOOF, of which the former method produced better performance. ...
... Another important issue is that the development of ME spotting is far less researched compared to ME recognition. Li et al. (2018) have argued that the lack of precise ME spotting methods has significantly reduced the ME recognition system accuracy. ME spotting is not a new topic, but over the years, there is still no prominent breakthrough due to the lack of spontaneous ME databases, even the available databases are not challenging enough in mimicking real-life applications. ...
Article
Micro-expression is a type of facial expression that is manifested for a very short duration. It is difficult to recognize the expression manually because it involves very subtle facial movements. Such expressions often occur unconsciously, and therefore are defined as a basis to help identify the real human emotions. Hence, an automated approach to micro-expression recognition has become a popular research topic of interest recently. Historically, the early researches on automated micro-expression have utilized traditional machine learning methods, while the more recent development has focused on the deep learning approach. Compared to traditional machine learning, which relies on manual feature processing and requires the use of formulated rules, deep learning networks produce more accurate micro-expression recognition performances through an end-to-end methodology, whereby the features of interest were extracted optimally through the training process, utilizing a large set of data. This paper reviews the developments and trends in micro-expression recognition from the earlier studies (hand-crafted approach) to the present studies (deep learning approach). Some of the important topics that will be covered include the detection of micro-expression from short videos, apex frame spotting, micro-expression recognition as well as performance discussion on the reviewed methods. Furthermore, major limitations that hamper the development of automated micro-expression recognition systems are also analyzed, followed by recommendations of possible future research directions.
... The first step is selecting specific facial regions, and the second is applying spatio-temporal descriptors to the selected regions to extract micro-expression features. In traditional micro-expression recognition, facial region selection has been based on designed n × n non-overlapping blocks [10], [11] which can describe local variations better, or ROIs [12], [13] selected based on a priori knowledge for many years. These methods, in the field of macro-expressions, can yield promising results. ...
... Conventional handcrafted feature extraction methods on micro-expression recognition [19], [28] always separate the whole face into several equal overlapping blocks to better portray local variations. However, some researchers reduced the influence of useless regions by extracting features from ROI [11], such as the forehead, eyebrows, eyes, nose, and mouth, as shown in Fig. 1. Merghani et al. [13], [29] opted for ROIs based on FACS [8], and to eliminate the noise caused by blink movements, glasses, and stationary regions, Le et al. [30] proposed to mask the eye and cheek regions. ...
Article
Full-text available
Facial micro-expressions can reveal a person’s actual mental state and emotions. Therefore, it has crucial applications in many fields, such as lie detection, clinical medicine, and defense security. However, conventional methods have extracted features on designed facial regions to recognize micro-expressions, failing to effectively hit the micro-expression critical regions since micro-expressions are localized and asymmetric. Consequently, we propose the Haphazard Cuboids ( $HC$ ) feature extraction method, which generates target regions by haphazard sampling technique and then extracts micro-expression spatio-temporal features. $HC$ consists of two modules: spatial patches generation ( $SPG$ ) and temporal segments generation ( $TSG$ ). $SPG$ is assigned to generate localized facial regions, and $TSG$ is dedicated to generating temporal intervals. Through extensive experiments, we demonstrate the superiority of the proposed method. Afterward, we analyze two modules with conventional and deep-learning methods and find that they can significantly improve the performance of micro-expression recognition, respectively. Thereinto, we embed the $SPG$ module into deep learning and experimentally demonstrate the effectiveness and superiority of our proposed sampling method in comparison with state-of-the-art methods. Furthermore, we analyze the $TSG$ module with the maximum overlapping interval ( $MOI$ ) method and find its coherence with the maximum interval of the apex frame distribution in CASME II and SAMM. Therefore, analogous to the human face’s region of interest (ROI), micro-expressions also inherit similar ROI in the temporal dimension, whose positions are highly relevant to the intensive moment, i.e., the apex frame.
... As a result of this, spotting the ME is more challenging compared to the MaE. Worst still, it also requires professional training, which is unrealistic for the masses [11]. ...
... In the early works, traditional handcrafted features are extracted for feature difference analysis to determine the interval of expressions by leveraging a threshold value. The feature descriptors include LBP [11,16], optical flow [5,31,33], and HOG [1]. Notably, [31] won first place in MEGC 2021. ...
... However, it is still difficult to manually design a robust descriptor for capturing quick subtle changes in MEs. The detailed summary of the traditional MER approaches are listed in the supplementary daft (supplementary: Table 1), A more detailed categorization of traditional methods and classifiers can be found in [19], [20]. ...
... The CAS(ME) 2 dataset samples were collected from 22 participants (9 males and 13 females). All the samples of CAS(ME) 2 dataset are annotated by using AUs and 4 emotion labels: Positive (8), Negative (21), Surprise (9) ,Others (19). Moreover, to provide more challenging scenarios, SMIC has been introduced with three different illumination conditions: high speed (HS), visual (VIS) and near infrared (NIR) including 164, 71 and 71 samples of 16, 8 and 8 subjects, respectively. ...
Article
div>Micro expression recognition (MER) is a very challenging area of research due to its intrinsic nature and finegrained changes. In the literature, the problem of MER has been solved through handcrafted/descriptor-based techniques. However, in recent times, deep learning (DL) based techniques have been adopted to gain higher performance for MER. Also, rich survey articles on MER are available by summarizing the datasets, experimental settings, conventional and deep learning methods. In contrast, these studies lack the ability to convey the impact of network design paradigms and experimental setting strategies for DL based MER. Therefore, this paper aims to provide a deep insight into the DL-based MER frameworks with a perspective on promises in network model designing, experimental strategies, challenges, and research needs. Also, the detailed categorization of available MER frameworks is presented in various aspects of model design and technical characteristics. Moreover, an empirical analysis of the experimental and validation protocols adopted by MER methods is presented. The challenges mentioned earlier and network design strategies may assist the affective computing research community in forge ahead in MER research. Finally, we point out the future directions, research needs and draw our conclusions.</div
... Various methods have been proposed so far including both traditional feature descriptors which were explored in earlier stage, and deep neural network approaches that thrive in recent years. Traditional approaches [27], [28], [29] usually involve one or more feature descriptors plus one classifier for the ME recognition task. The most popular descriptors are spatio-temporal features, include LBP [30], HOG [31] and optic flow [32]. ...
... Inspired by works in ordinary FE recognition studies, most studies explored CNN or RNN based approaches. Nonetheless, due to the scarcity of ME data, early deep-based models [28], [37] struggled to compete with traditional approaches. As more efforts were made in the following years to gather more data and to specifically tailor the networks for MEs domain, several promising solutions started to emerge. ...
Article
Full-text available
Micro-expressions (ME) are a special form of facial expressions which may occur when people try to hide their true feelings for some reasons. MEs are important clues to reveal people’s true feelings, but are difficult or impossible to be captured by ordinary persons with naked-eyes as they are very short and subtle. It is expected that robust computer vision methods can be developed to automatically analyze MEs which requires lots of ME data. The current ME datasets are insufficient, and mostly contain only one single form of 2D color videos. Researches on 4D data of ordinary facial expressions have prospered, but so far no 4D data is available in ME study. In the current study, we introduce the 4DME dataset: a new spontaneous ME dataset which includes 4D data along with three other video modalities. Both micro- and macro-expression clips are labeled out in 4DME, and 22 AU labels and five categories of emotion labels are annotated. Experiments are carried out using three 2D-based methods and one 4D-based method to provide baseline results. The results indicate that the 4D data can potentially benefit ME recognition. The 4DME dataset could be used for developing 4D-based approaches, or exploring fusion of multiple video sources (e.g., texture and depth) for the task of ME analysis in future. Besides, we also emphasize the importance of forming a clear and unified criteria of ME annotation for future ME data collection studies. Several key questions related with ME annotation are listed and discussed in depth, especially about the relationship between AUs and ME emotion categories. A preliminary AU-Emo mapping table is proposed with justified explanations and supportive experimental results. Several unsolved issues are also summarized for future work.
... As shown in Table 3 and Table 4, our framework outperformed other frameworks on CASME II and SAMM. The authors of [35,36,[38][39][40] did not apply AM and EVM. An MER system may not properly distribute the computational power without the AM. ...
... Recognition accuracy LBP-TOP [38] 34.56% LBP-SIP [39] 36.03% HIGO-TOP [40] 41.18% EVM + optical [37] 43.04% MSCNN [37] 40.71% MER-AMRE 53.33% Content courtesy of Springer Nature, terms of use apply. ...
Article
Full-text available
Micro-expression recognition (MER) is an interdisciplinary research task that has attracted attention. This is because MER can be relevant to multiple fields, such as computer vision, psychology, human-computer interaction, and social security. Because the scarcity of databases and difficulty in video semantics understanding, end-to-end MER still faces many challenges. In this study, we propose an MER framework with attention mechanism and region enhancement (MER-AMRE). Attention mechanisms are introduced to enhance the representation performance of the model, which can improve the recognition accuracy. Additionally, we use Euler video magnification in data preprocessing to enhance facial variation areas. AffectNet is leveraged to pretrain a facial region of interest (RoI) feature extractor with attention regions. Finally, we combine the facial RoI features with global facial features to recognize micro-expressions. Extensive experiments on two well-known micro-expression datasets, CASME II and SAMM, verified the robustness and generalization of the proposed MER-AMRE framework.
... Yet, most studies tend to classify MEs without concern for effective ME spotting, whereby better ME spotting will lead to a more effective ME analysis [24]. Furthermore, a well-established automated ME analysis system based on deep learning is even less common. ...
... An automated ME analysis can be broadly divided into two main parts: spotting and recognition tasks, as depicted in Figure 2. The former task involves detecting the apex frame in a short video of ME extracted from long video sequences, while the latter part involves classifying ME into respective emotion categories. The performance of an ME recognition system is highly dependent on the accuracy of ME spotting [24]. As mentioned in Section 1, some researchers prefer to spot the location of the apex frame instead of using the whole frame sequence in the short video. ...
Article
Full-text available
Micro-expression analysis is the study of subtle and fleeting facial expressions that convey genuine human emotions. Since such expressions cannot be controlled, many believe that it is an excellent way to reveal a human’s inner thoughts. Analyzing micro-expressions manually is a very time-consuming and complicated task, hence many researchers have incorporated deep learning techniques to produce a more efficient analysis system. However, the insufficient amount of micro-expression data has limited the network’s ability to be fully optimized, as overfitting is likely to occur if a deeper network is utilized. In this paper, a complete deep learning-based micro-expression analysis system is introduced that covers the two main components of a general automated system: spotting and recognition, with also an additional element of synthetic data augmentation. For the spotting part, an optimized continuous labeling scheme is introduced to spot the apex frame in a video. Once the apex frames have been recognized, they are passed to the generative adversarial network to produce an additional set of augmented apex frames. Meanwhile, for the recognition part, a novel convolutional neural network, coined as Optimal Compact Network (OC-Net), is introduced for the purpose of emotion recognition. The proposed system achieved the best F1-score of 0.69 in categorizing the emotions with the highest accuracy of 79.14%. In addition, the generated synthetic data used in the training phase also contributed to performance improvement of at least 0.61% for all tested networks. Therefore, the proposed optimized and compact deep learning system is suitable for mobile-based micro-expression analysis to detect the genuine human emotions.
... The potential goes beyond merely sensing the expressions that other people notice, but also includes subconscious expressions that are too fast or subtle for human observers to recognize. Known as "micro-expressions" these events can convey emotions that users had not intended to express and are unaware of revealing (Li et al., 2017). Users may not even be aware of feeling the emotions, leading to situations where the system literally knows the user better than he or she knows himself. ...
Chapter
The metaverse can be described as the largescale societal shift from flat media viewed in the third person to immersive media experienced in the first person. While there is nothing inherently dangerous about immersive media technologies such as virtual and augmented reality, many policymakers have raised concerns about the extreme surveillance capabilities that powerful metaverse platforms could wield over users. What is often overlooked, however, is how surveillance-related risks become amplified when platforms are allowed to simultaneously target users with promotionally altered experiences. When considered in the context of control theory, the pairing of real-time surveillance and real-time influence raises new concerns, as large metaverse platforms could become extremely efficient tools for deception, manipulation, and persuasion. For these reasons, regulation should be considered that limit the ability of metaverse platforms to impart real-time influence on users based on real-time surveillance.
... In general, the current MER methods can be roughly divided into hand-crafted feature based and deep learning based methods. Typical hand-crafted ME features include LBP-TOP [9], HOOF [24], 3DHOG [19], and their variants [25], [26], [27]. However, the hand-crafted feature based methods heavily rely on complex expert knowledge, and the extracted ME features have limited discrimination. ...
Preprint
Full-text available
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
... Recognizing MEs is however extremely difficult for humans in real-time and hence automatic systems have been developed. Hand-crafted methods using LBP-TOP [9] and optical flow [6], [10] were the first to be applied. Due to the limited data (only around 250 samples per dataset), deep learning methods were not as popular until recently. ...
Preprint
Full-text available
Micro-expressions have drawn increasing interest lately due to various potential applications. The task is, however, difficult as it incorporates many challenges from the fields of computer vision, machine learning and emotional sciences. Due to the spontaneous and subtle characteristics of micro-expressions, the available training and testing data are limited, which make evaluation complex. We show that data leakage and fragmented evaluation protocols are issues among the micro-expression literature. We find that fixing data leaks can drastically reduce model performance, in some cases even making the models perform similarly to a random classifier. To this end, we go through common pitfalls, propose a new standardized evaluation protocol using facial action units with over 2000 micro-expression samples, and provide an open source library that implements the evaluation protocols in a standardized manner. Code will be available in \url{https://github.com/tvaranka/meb}.
... For example, a human observer cannot easily detect heart rate, respiration rate, and blood pressure, which means those signals may be revealing emotions that the observed individual had not intended to convey. AI systems can also detect "micro-expressions" on faces that are too brief or too subtle for human observers to notice, again revealing emotions that the observed had not intended [22]. ...
Conference Paper
Over the next five to ten years, the metaverse is likely to transform how consumers interact with digital content, transitioning society from flat media viewed in the third person to immersive experiences engaged in the first person. This will greatly impact the marketing industry, transforming the basic tools, techniques, and tactics from flat artifacts such as images and videos, to immersive and interactive promotional experiences. In the metaverse, marketing campaigns will likely include extensive use of Virtual Product Placements (VPPs) and Virtual Spokespeople (VSPs). Such methods will be highly effective forms of advertising, for they will target users through natural, personal, and immersive means. At the same time, these methods can easily be used and abused in predatory ways. This paper reviews the most likely marketing techniques of the metaverse, outlines the potential risks to consumers, and makes recommendations for policymakers and business leaders that could protect the public.
... Therefore, the number of labeled ME samples is limited. There are only 11 published spontaneous ME databases, including CASME series (CASME [20], CASME II [19], CAS(ME) 2 [14], CAS(ME) 3 [8]), SMIC series (SMIC [12], SMIC-E [13], SMIC-Elong [16], 4DME [11]), SAMM series (SAMM [2], SAMM-LV [22]), and MMEW [1]. Second, ME annotation is subjective to different annotators. ...
... For example, the conventional methods usually extract handcrafted features, e.g., LBP-TOP [1] and its variant (STLBP [2], The associate editor coordinating the review of this manuscript and approving it for publication was Anandakumar Haldorai . DSLBP [3], LBP-SIP [4], and Hierarchical STLBP-IP [5]), MDMO [6], FHOFO [7], Bi-WOOF [8], and LTOGP [9], and then construct various types of classifiers, e.g., SVM [8], RF [10], k-NN [11], SRC [12], relaxed K-SVD [13], and GSL [5], especially for MER tasks [14]. In contrast, some deep learning methods have also been devoted to MER tasks, e.g., long short-term memory (LSTM) [15], pretrained CNNs (e.g., OFF-ApexNet [16], MagGA/SA [17] and 3D-CNNs [18]), CapsuleNet [19] and STRCN [20]. ...
Article
Full-text available
Cross-database micro-expression recognition (CDMER) is a difficult task, where the target (testing) and source (training) samples come from different micro-expression (ME) databases, resulting in the inconsistency of the feature distributions between each other, and hence affecting the performance of many existing MER methods. To address this problem, we propose a dual-stream convolutional neural network (DSCNN) for dealing with CDMER tasks. In the DSCNN, two stream branches are designed to study temporal and facial region cues in ME samples with the goal of recognizing MEs. In addition, in the training process, the domain discrepancy loss is used to enforce the target and source samples to have similar feature distributions in some layers of the DSCNN. Extensive CDMER experiments are conducted to evaluate the DSCNN. The results show that our proposed DSCNN model achieves a higher recognition accuracy when compared with some representative CDMER methods.
... The optical flow data from the entire video need to be retrieved first before they are supplied to the CNN feature extractor. Then, a new automated microexpression analysis technique, which is called Flownet 2.0 [21], was used by Li et al. [22] to improve a dual-template CNN model performance, yet the performance is still inferior to the conventional approaches [23]. Kumar et al. [24] then employed a method based on frequency domain to delete low-intensity expression frames. ...
Article
Full-text available
Understanding a person’s attitude or sentiment from their facial expressions has long been a straightforward task for humans. Numerous methods and techniques have been used to classify and interpret human emotions that are commonly communicated through facial expressions, with either macro- or micro-expressions. However, performing this task using computer-based techniques or algorithms has been proven to be extremely difficult, whereby it is a time-consuming task to annotate it manually. Compared to macro-expressions, micro-expressions manifest the real emotional cues of a human, which they try to suppress and hide. Different methods and algorithms for recognizing emotions using micro-expressions are examined in this research, and the results are presented in a comparative approach. The proposed technique is based on a multi-scale deep learning approach that aims to extract facial cues of various subjects under various conditions. Then, two popular multi-scale approaches are explored, Spatial Pyramid Pooling (SPP) and Atrous Spatial Pyramid Pooling (ASPP), which are then optimized to suit the purpose of emotion recognition using micro-expression cues. There are four new architectures introduced in this paper based on multi-layer multi-scale convolutional networks using both direct and waterfall network flows. The experimental results show that the ASPP module with waterfall network flow, which we coined as WASPP-Net, outperforms the state-of-the-art benchmark techniques with an accuracy of 80.5%. For future work, a high-resolution approach to multi-scale approaches can be explored to further improve the recognition performance.
... If AUs do not change an individual's appearance when integrated with others are called additive; otherwise, non-additive [13]. Paul also discovered universal micro-expressions that appear on the face involuntarily within the quarter or half of a second [14]. It is challenging to detect these micro-expressions using a naked eye, but the high-resolution camera can capture them. ...
Article
Full-text available
Depression is a mental psychological disorder that may cause a physical disorder or lead to death. It is highly impactful on the social-economical life of a person; therefore, its effective and timely detection is needful. Despite speech and gait, facial expressions have valuable clues to depression. This study proposes a depression detection system based on facial expression analysis. Facial features have been used for depression detection using Support Vector Machine (SVM) and Convolutional Neural Network (CNN). We extracted micro-expressions using Facial Action Coding System (FACS) as Action Units (AUs) correlated with the sad, disgust, and contempt features for depression detection. A CNN-based model is also proposed in this study to auto classify depressed subjects from images or videos in real-time. Experiments have been performed on the dataset obtained from Bahawal Victoria Hospital, Bahawalpur, Pakistan, as per the patient health questionnaire depression scale (PHQ-8); for inferring the mental condition of a patient. The experiments revealed 99.9% validation accuracy on the proposed CNN model, while extracted features obtained 100% accuracy on SVM. Moreover, the results proved the superiority of the reported approach over state-of-the-art methods.
... Emotions are more readily detected from facial muscle activity (EMG), as even without awareness, discrete emotions elicit subcutaneous muscle activity from areas related to their expression on the face. Webcams may be used as a low-cost substitute, as they have been shown to offer a feasible, remote solution to detect micro-expressions (MEs; [28]), a substantial part of the same signal as EMG [5]. Furthermore, webcams have been used to detect flow in games [4], and mental workload [3]. ...
... To address the task of micro-expression recognition, several methods have been proposed to simulate the subtle changes of micro-expressions in the spatio-temporal domain [3]. These methods mainly consist of two parts. ...
Preprint
Facial micro-expressions recognition has attracted much attention recently. Micro-expressions have the characteristics of short duration and low intensity, and it is difficult to train a high-performance classifier with the limited number of existing micro-expressions. Therefore, recognizing micro-expressions is a challenge task. In this paper, we propose a micro-expression recognition method based on attribute information embedding and cross-modal contrastive learning. We use 3D CNN to extract RGB features and FLOW features of micro-expression sequences and fuse them, and use BERT network to extract text information in Facial Action Coding System. Through cross-modal contrastive loss, we embed attribute information in the visual network, thereby improving the representation ability of micro-expression recognition in the case of limited samples. We conduct extensive experiments in CASME II and MMEW databases, and the accuracy is 77.82% and 71.04%, respectively. The comparative experiments show that this method has better recognition effect than other methods for micro-expression recognition.
... So for data-driven approaches based on deep learning have come into consideration to capture fleeting changes of ME in an effective, reliable, and faster way [14]. From recent researches, it is evident that deep convolutional neural network (CNN) based methods have been able to outperform the handcrafted features and shallow counterpart [15]. ...
... These techniques are susceptible to head-pose variations and leaned to face registration. Also, gradient-based feature descriptors [18,19] were proposed to mitigate lighting variations but still suffer from head-pose variations. ...
Chapter
Facial Micro-Expression recognition in the field of emotional information processing has become an inexorable necessity for its exotic attributes. It is a non-verbal, spontaneous, and involuntary leakage of true emotion in disguise of most expressive intentional prototypical facial expressions. However, it persists only for a split-second duration and possesses fainted facial muscle movements that make the recognition task more difficult with naked eyes. Besides, there are a limited number of video samples and wide-span domain shifting among datasets. Considering these challenges, several video-based works have been done to improve the classification accuracy but still lack high accuracy. This works addresses these issues and presents an approach with a deep 3D Convolutional Residual Neural Network as a backbone followed by a Long-Short-Term-Memory auto-encoder with 2D convolutions model for automatic Spatio-temporal feature extractions, fine-tuning, and classifications from videos. Also, we have done transfer learning on three standard macro-expression datasets to reduce over-fitting. Our work has shown a significant accuracy gain with extensive experiments on composite video samples from five publicly available micro-expression benchmark datasets, CASME, CASMEII, CAS(ME)2, SMIC, and SAMM. This outweighs the state-of-the-art accuracy. It is the first attempt to work with five datasets and rational implication of LSTM auto-encoder for micro-expression recognition.KeywordsMicro-expressionRecognitionDeep learningTransfer learningSpatio-temporal
Chapter
Aiming at the problem that the existing micro-expression recognition methods do not comprehensively consider the facial spatial structure information and the single input feature, which leads to the low recognition rate of accuracy, the method combining dual-stream convolution and capsule network is proposed. An improved dual-stream convolutional shallow network is used to extract feature, and CapsNet is used for micro-expression identification. This method first takes the image with the magnified motion amplitude and the optical flow image as dual feature input, and uses attention mechanism and dual-stream convolutional network to extract the spatiotemporal features. Dynamic routing between capsules is used to encode features for better expression. Finally, the squashing function for classification. Experiments are used CASME II, SAMM and SMIC datasets. Contrast with existing advanced methods, accuracy of micro-expression i identification is increased by 3.34%, 3.71%, and 4.13%, respectively, indicating the advanced nature and effectiveness of this method.
Article
Full-text available
Expression recognition is a very important direction for computers to understand human emotions and human-computer interaction. However, for 3D data such as video sequences, the complex structure of traditional convolutional neural networks, which stretch the input 3D data into vectors, not only leads to a dimensional explosion, but also fails to retain structural information in 3D space, simultaneously leading to an increase in computational cost and a lower accuracy rate of expression recognition. This paper proposes a video sequence face expression recognition method based on Squeeze-and-Excitation and 3DPCA Network (SE-3DPCANet). The introduction of a 3DPCA algorithm in the convolution layer directly constructs tensor convolution kernels to extract the dynamic expression features of video sequences from the spatial and temporal dimensions, without weighting the convolution kernels of adjacent frames by shared weights. Squeeze-and-Excitation Network is introduced in the feature encoding layer, to automatically learn the weights of local channel features in the tensor features, thus increasing the representation capability of the model and further improving recognition accuracy. The proposed method is validated on three video face expression datasets. Comparisons were made with other common expression recognition methods, achieving higher recognition rates while significantly reducing the time required for training.
Article
Full-text available
Micro-expressions typically reflect suppressed feelings and they can provide an accurate indication of the real feelings and motivations of a person. Accordingly, they have been used for clinical diagnosis, business negotiations, interrogations, and security research. However, it is difficult to detect micro-expressions because of their instantaneity and imperceptibility. Therefore, micro-expression recognition is challenging. So far, various micro-expression recognition algorithms have been proposed to improve micro-expression recognition performance, which LBP, optical flow method and deep learning methods have made good progress as mainstream algorithms. In this survey, we aim to provide a review of micro-expression recognition based on LBP, optical flow, and deep learning. We first introduce the current commonly used micro-expression datasets. Then the three mainstream classical methods, LBP, optical flow, and deep learning, are described and summarized respectively. The existing methods in the last 5 years are discussed in regard to datasets, pre-processing, evaluation metrics and accuracy. Finally, we explain the shortcomings and challenges of micro-expression recognition and propose the future directions. This can help researchers to more quickly understand the current status of research, problems and future development directions, and provide a reference point for further research.
Article
Full-text available
Recognizing facial expressions and estimating their corresponding action units’ intensities have achieved many milestones. However, such estimating is still challenging due to subtle action units’ variations during emotional arousal. The latest approaches are confined to the probabilistic models’ characteristics that model action units’ relationships. Considering ordinal relationships across an emotional transition sequence, we propose two metric learning approaches with self-attention-based triplet and Siamese networks to estimate emotional intensities. Our emotion expert branches use shifted-window SWIN-transformer which restricts self-attention computation to adjacent windows while also allowing for cross-window connection. This offers flexible modeling at various scales of action units with high performance. We evaluated our network’s spatial and time-based feature localization on CK+, KDEF-dyn, AFEW, SAMM, and CASME-II datasets. They outperform deep learning state-of-the-art methods in micro-expression detection on the latter two datasets with 2.4% and 2.6% UAR respectively. Ablation studies highlight the strength of our design with a thorough analysis.
Article
This paper is an extension of our previously published ACM Multimedia 2022 paper, which was ranked 3rd in the macro-expressions (MaEs) and micro-expressions (MEs) spotting task of the FME challenge 2021. In our earlier work, a deep learning framework based on facial action units (AUs) was proposed to emphasize both local and global features to deal with the MaEs and MEs spotting tasks. In this paper, an advanced Concat-CNN model is proposed to not only utilize facial action units (AU) features, which our previous work proved were more effective in detecting MaEs, but also to fuse the optical flow features to improve the detection performance of MEs. The advanced Concat-CNN proposed in this paper not only considers the intra-features correlation of a single frame but also the inter-features correlation between frames. Further, we devise a new adaptive re-labeling method by labeling the emotional frames with distinctive scores. This method takes into account the dynamic changes in expressions to further improve the overall detection performance. Compared with our earlier work and several existing works, the newly proposed deep learning pipeline is able to achieve a better performance in terms of the overall F1-scores: 0.2623 on CAS(ME)2, 0.2839 on CAS(ME)2-cropped, and 0.3241 on SAMM-LV, respectively.
Article
Micro-expressions (MEs) express spontaneous, subtle, and hard to hide real human emotions. Compared with macro expressions, the occurrence of MEs is characterized by a small number of activated muscles, short duration, and low amplitude of action. Therefore, extracting the sparse spatio-temporal features of MEs is a challenge for ME recognition. In this paper, we try to extract the low-dimensional features of MEs while ensuring a high accuracy. Firstly, considering that ME samples may be inconsistent in the time domain, a differential energy image method is improved to fix the temporal variation of MEs to unit time. An integral projection method is then used to improve the information density. Secondly, a fixed-point rotation-based feature selection method is proposed further select features with large motion variations. Specifically, the features are transformed from RGB to rotation axes in 3D space, and a fixed point is rotated separately to form a point set. The relative position of the points is changed by adjusting the rotation angle thus optimizing the distribution of the point set. The subset of points with large rotation angles is selected as the feature for classification. Finally, the effectiveness of the method is evaluated using SVM as a classifier experimented on three datasets. The experimental results show that the low-dimensional features can perform well for ME recognition.
Article
Various modalities have been leveraged for affective computing, alone or combined, such as facial expressions, speech intonations, peripheral physiological signals, and brain activities. Previous studies have shown that the multimodal fusion of affective data usually improves performance. However, the internal interactive mechanism among different modalities is rarely studied. In this paper, we investigated the concordance between facial micro-expressions and physiological signals under high arousal emotion elicitation with strict synchronization. By linking the onset of micro-expressions with physiological signals, a series of epoch durations were created to cover the potential reaction delay that may vary with different physiological signals. The experimental results show a significant correlation between the appearance of micro-expressions and time-domain features of heart rate variability, but not respiration or electrodermal activity related features. These findings indirectly verify the feasibility and reliability of micro-expression as a measure for non-contact genuine emotion recognition and would be beneficial for the fusion of micro-expression and physiological signals for more robust affective computing and their applications in public security and mental health.
Article
Full-text available
Micro-expressions (MEs) can reflect an individual’s subjective emotions and true mental state, and they are widely used in the fields of mental health, justice, law enforcement, intelligence, and security. However, one of the major challenges of working with MEs is that their neural mechanism is not entirely understood. To the best of our knowledge, the present study is the first to use electroencephalography (EEG) to investigate the reorganizations of functional brain networks involved in MEs. We aimed to reveal the underlying neural mechanisms that can provide electrophysiological indicators for ME recognition. A real-time supervision and emotional expression suppression experimental paradigm was designed to collect video and EEG data of MEs and no expressions (NEs) of 70 participants expressing positive emotions. Based on the graph theory, we analyzed the efficiency of functional brain network at the scalp level on both macro and micro scales. The results revealed that in the presence of MEs compared with NEs, the participants exhibited higher global efficiency and nodal efficiency in the frontal, occipital, and temporal regions. Additionally, using the random forest algorithm to select a subset of functional connectivity features as input, the support vector machine classifier achieved a classification accuracy for MEs and NEs of 0.81, with an area under the curve of 0.85. This finding demonstrates the possibility of using EEG to recognize MEs, with a wide range of application scenarios, such as persons wearing face masks or patients with expression disorders.
Article
Full-text available
Being spontaneous, micro-expressions are useful in the inference of a person's true emotions even if an attempt is made to conceal them. Due to their short duration and low intensity, the recognition of micro-expressions is a difficult task in affective computing. The early work based on handcrafted spatio-temporal features which showed some promise, has recently been superseded by different deep learning approaches which now compete for the state of the art performance. Nevertheless, the problem of capturing both local and global spatio-temporal patterns remains challenging. To this end, herein we propose a novel spatio-temporal transformer architecture-to the best of our knowledge, the first purely transformer based approach (i.e. void of any convolutional network use) for micro-expression recognition. The architecture comprises a spatial encoder which learns spatial patterns, a temporal aggregator for temporal dimension analysis, and a classification head. A comprehensive evaluation on three widely used spontaneous micro-expression data sets, namely SMIC-HS, CASME II and SAMM, shows that the proposed approach consistently outperforms the state of the art, and is the first framework in the published literature on micro-expression recognition to achieve the unweighted F1-score greater than 0.9 on any of the aforementioned data sets.
Preprint
Full-text available
In this paper, we present a large-scale, multi-source, and unconstrained database called SDFE-LV for spotting the onset and offset frames of a complete dynamic facial expression from long videos, which is known as the topic of dynamic facial expression spotting (DFES) and a vital prior step for lots of facial expression analysis tasks. Specifically, SDFE-LV consists of 1,191 long videos, each of which contains one or more complete dynamic facial expressions. Moreover, each complete dynamic facial expression in its corresponding long video was independently labeled for five times by 10 well-trained annotators. To the best of our knowledge, SDFE-LV is the first unconstrained large-scale database for the DFES task whose long videos are collected from multiple real-world/closely real-world media sources, e.g., TV interviews, documentaries, movies, and we-media short videos. Therefore, DFES tasks on SDFE-LV database will encounter numerous difficulties in practice such as head posture changes, occlusions, and illumination. We also provided a comprehensive benchmark evaluation from different angles by using lots of recent state-of-the-art deep spotting methods and hence researchers interested in DFES can quickly and easily get started. Finally, with the deep discussions on the experimental evaluation results, we attempt to point out several meaningful directions to deal with DFES tasks and hope that DFES can be better advanced in the future. In addition, SDFE-LV will be freely released for academic use only as soon as possible.
Article
Emotional detection based on facial micro-expressions is essential in high-risk tasks such as criminal investigation or lie detection. However, micro-expressions often occur in high-risk tasks when people often use facial expressions to conceal their actual emotional states. Therefore, spotting macro- and micro-expression intervals in long video sequences has become hot research. Considering the difference in duration and facial muscle movement intensity between macro- and micro-expression, we propose a novel Spatio-temporal Convolutional Emotional Attention Network (STCEAN) for spotting macro- and micro-expression intervals in long video sequences. The spatial features of each frame in the video sequence are extracted through the convolution neural network. Then the emotional self-attention model is used to analyze the temporal weights of spatial features in different emotional dimensions. The emotional weights in the temporal dimension are filtered for spotting macro- and micro-expressions intervals. Finally, the STCEAN model is jointly optimized by the dual emotional focal loss of macro- and micro-expression to solve the problem of sample unbalance. The experimental results on the CAS(ME)² and SAMM-LV datasets show that the STCEAN model achieves competitive results in the Facial Micro-Expression Challenge 2021.
Article
Facial micro-expression (FME) refers to a brief spontaneous facial movement that can disclose a person’s genuine emotion. The investigations of FMEs are hampered by the lack of data. Fortunately, generative deep neural network models can help synthesize new images with desired FMEs. However, FMEs are too subtle to capture and generate. Therefore, we developed an edge-aware motion based FME generation (EAM-FMEG) method to address these challenges. First, we introduced an auxiliary edge prediction (AEP) task for estimating facial edges to aid in the subtle feature extraction. Second, we proposed an edge-intensified multi-head self-attention (EIMHSA) module for focusing on important facial regions to enhance the generation in response to subtle changes. The method was tested on three FME databases and showed satisfactory results. The ablation study demonstrated that the method is capable of producing objects with clear edges, and is robust to texture disturbance, shape distortion, and background defects. Furthermore, the method demonstrated strong cross-database generalization ability, even from RGB to grayscale images or vice versa, enabling general applications.
Article
As one of the important psychological stress reactions, Micro-expressions (MEs) are spontaneous and subtle facial movements, which usually occur in a high-stake situation and can reveal genuine human feelings and cognition. ME, Recognition (MER) has essential applications in many fields such as lie detection, criminal investigation, and psychological healing. However, due to the challenges of learning discriminative ME features via fleeting facial subtle reactions as well as the shortage of available MEs data, this research topic is still far from well-studied. To this end, in this paper, we propose a deep prototypical learning framework, namely ME-PLAN, with a local attention mechanism for the MER problem. Specifically, ME-PLAN consists of two components, i.e., a 3D residual prototypical network and a local-wise attention module, where the former aims to learn the precise ME feature prototypes through expression-related knowledge transfer and episodic training, and the latter could facilitate the attention to the local facial movements. Furthermore, to alleviate the dilemma that most MER methods need to depend on manually annotated apex frames, we propose an apex frame spotting method with Unimodal Pattern Constrained (UPC) and further extract ME key-frames sequences based on the detected apex frames to train our proposed ME-PLAN in an end-to-end manner. Finally, through extensive experiments and interpretable analysis regarding the apex frame spotting and MER on composite-database, we demonstrate the superiority and effectiveness of the proposed methods.
Preprint
Full-text available
The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted.
Article
Full-text available
Tento článek se zaměřuje na tzv. analýzu mikroexpresí a její použitelnost v trestním řízení. Nejprve představuje základy a vývoj této metody a současný stav poznání včetně rozdílů, které ohledně této metody přinesl rozvoj informačních technologií. Následně článek pokračuje ke srovnání této metody s fyziodetekčním vyšetřením (tj. standardní metodou založenou na využití detektoru lži) co do jejich podstat, právních omezení důkazní hodnoty jejich výstupů a jejich důvodů. Článek končí provedením vlastní analýzy použitelnost metody analýzy mikroexpresí v trestním řízení a podmínek, na kterých závěr o použitelnosti spočívá.
Article
Computational research on facial micro-expressions has long focused on videos captured under constrained laboratory conditions due to the challenging elicitation process and limited samples that are publicly available. Moreover, processing micro-expressions is extremely challenging under unconstrained scenarios. This paper introduces, for the first time, a completely automatic micro-expression “spot-and-recognize” framework that is performed on in-the-wild videos, such as in poker games and political interviews. The proposed method first spots the apex frame from a video by handling head movements and unconscious actions which are typically larger in motion intensity, with alignment employed to enforce a canonical face pose. Optical flow guided features play a central role in our method: they can robustly identify the location of the apex frame, and are used to learn a shallow neural network model for emotion classification. Experimental results demonstrate the feasibility of the proposed methodology, establishing good baselines for both spotting and recognition tasks – ASR of 0.33 and F1-score of 0.6758 respectively on the MEVIEW micro-expression database. In addition, we present comprehensive qualitative and quantitative analyses to further show the effectiveness of the proposed framework, with new suggestion for an appropriate evaluation protocol. In a nutshell, this paper provides a new benchmark for apex spotting and emotion recognition in an in-the-wild setting.
Conference Paper
Spatial-temporal local binary pattern (STLBP) has been widely used in dynamic texture recognition. STLBP often encounters the high-dimension problem as its dimension increases exponentially, so that STLBP could only utilize a small neighborhood. To tackle this problem, we propose a method for dynamic texture recognition using PDV hashing and dictionary learning on multi-scale volume local binary pattern (PHD-MVLBP). Instead of forming very high-dimensional LBP-histogram features, it first uses hash functions to map the pixel difference vectors (PDVs) to binary vectors, then forms a dictionary using the derived binary vector, and encodes them using the derived dictionary. In such a way, the PDVs are mapped to feature vectors of the size of the dictionary, instead of LBP histograms of very high dimension. Such an encoding scheme could extract the discriminant information from videos in a much larger neighborhood effectively. The experimental results on two widely-used dynamic textures datasets, DynTex++ and UCLA, show the superior performance of the proposed approach over the state-of-the-art methods.
Conference Paper
Full-text available
Optical strain characterizes the relative amount of displacement by a moving object within a time interval. Its ability to compute any small muscular movements on faces can be advantageous to subtle expression research. This paper proposes a novel optical strain weighted feature extraction scheme for subtle facial micro-expression recognition. Motion information is derived from optical strain magnitudes, which is then pooled spatio-temporally to obtain block-wise weights for the spatial image plane. By simple product with the weights, the resulting feature histograms are intuitively scaled to accommodate the importance of block regions. Experiments conducted on two recent spontaneous micro-expression databases{ CASMEII and SMIC, demonstrated promising improvement over the baseline results.
Conference Paper
Full-text available
Spotting micro-expressions is a primary step for continuous emotion recognition from videos. Spotting in this context refers to automatically finding the temporal locations of the face-related events from a video sequence. Rapid facial movements mainly include micro-expressions and eye blinks. However, the role of eye blinks in expressing emotions is still controversial, and often they are considered as micro-expressions as well. In this paper a simple method for automatically spotting rapid facial movements from videos is proposed. The method relies on analyzing differences in appearance-based features of sequential frames. In addition to finding the temporal locations, the system is able to provide spatial information about the movements in the face. Micro-expression spotting experiments are carried out on three datasets consisting only of spontaneous micro-expressions. Baseline micro-expression spotting results are provided for these three datasets including the publicly available CASME database. Also an example of spatial localization of the spotted rapid movements is presented.
Article
Full-text available
Micro-expressions are brief involuntary facial expressions that reveal genuine emotions and, thus help detect lies. Because of their many promising applications, they have attracted the attention of researchers from various fields. Recent research reveals that two perceptual color spaces (CIELab and CIELuv) provide useful information for expression recognition. This paper is an extended version of our International Conference on Pattern Recognition (ICPR) paper [1], in which we propose a novel color space model, Tensor Independent Color Space (TICS), to help recognize micro-expressions. In this paper, we further show that CIELab and CIELuv are also helpful in recognizing micro-expressions, and we indicate why these three color spaces achieve better performance. A micro-expression color video clip is treated as a fourth-order tensor, i.e., a four-dimension array. The first two dimensions are the spatial information, the third is the temporal information, and the fourth is the color information. We transform the fourth dimension from RGB into TICS, in which the color components are as independent as possible. The combination of dynamic texture and independent color components achieves a higher accuracy than does that of RGB. In addition, we define a set of Regions of Interest (ROIs) based on the Facial Action Coding System (FACS) and calculated the dynamic texture histograms for each ROI. Experiments are conducted on two micro-expression databases, CASME and CASME 2, and the results show that the performances for TICS, CIELab and CIELuv are better than those for RGB or gray.
Conference Paper
Full-text available
Facial micro-expression recognition is an upcoming area in computer vision research. Up until the recent emergence of the extensive CASMEII spontaneous micro-expression database, there were numerous obstacles faced in the elicitation and labeling of data involving facial micro-expressions. In this paper, we propose the Local Binary Patterns with Six Intersection Points (LBP-SIP) volumetric descriptor based on the three intersecting lines crossing over the center point. The proposed LBP-SIP reduces the redundancy in LBP-TOP patterns, providing a more compact and lightweight representation; leading to more efficient computational complexity. Furthermore, we also incorporated a Gaus-sian multi-resolution pyramid to our proposed approach by concatenat-ing the patterns across all pyramid levels. Using an SVM classifier with leave-one-sample-out cross validation, we achieve the best recognition accuracy of 67.21%, surpassing the baseline performance with further computational efficiency.
Conference Paper
Full-text available
Emotion recognition is a very active field of research. The Emotion Recognition In The Wild Challenge and Workshop (EmotiW) 2013 Grand Challenge consists of an audio-video based emotion classification challenges, which mimics real-world conditions. Traditionally, emotion recognition has been performed on laboratory controlled data. While undoubtedly worthwhile at the time, such laboratory controlled data poorly represents the environment and conditions faced in real-world situations. The goal of this Grand Challenge is to define a common platform for evaluation of emotion recognition methods in real-world conditions. The database in the 2013 challenge is the Acted Facial Expression in the Wild (AFEW), which has been collected from movies showing close-to-real-world conditions.
Conference Paper
Full-text available
Slow Feature Analysis (SFA) is a subspace learning method inspired by the human visual system, however, it is seldom seen in computer vision. Motivated by its application for unsupervised activity analysis, we develop SFA's first implementation of online temporal video segmentation to detect episodes of motion changes. We utilize a domain-specific indefinite kernel which takes the data representation into account to introduce robustness. As our kernel is indefinite (i.e. defines instead of a Hilbert, a Krein space), we formulate SFA in Krein space. We propose an incremental kernel SFA framework which utilizes the special properties of our kernel. Finally, we employ our framework to online temporal video segmentation and perform qualitative and quantitative evaluation.
Article
Full-text available
A robust automatic micro-expression recognition system would have broad applications in national safety, police interrogation, and clinical diagnosis. Developing such a system requires high quality databases with sufficient training samples which are currently not available. We reviewed the previously developed micro-expression databases and built an improved one (CASME II), with higher temporal resolution (200 fps) and spatial resolution (about 280×340 pixels on facial area). We elicited participants' facial expressions in a well-controlled laboratory environment and proper illumination (such as removing light flickering). Among nearly 3000 facial movements, 247 micro-expressions were selected for the database with action units (AUs) and emotions labeled. For baseline evaluation, LBP-TOP and SVM were employed respectively for feature extraction and classifier with the leave-one-subject-out cross-validation method. The best performance is 63.41% for 5-class classification.
Article
Full-text available
Micro-expression has gained a lot of attention because of its potential applications (e.g., transportation security) and theoretical implications (e.g., expression of emotions). However, the duration of micro-expression, which is considered as the most important characteristic, has not been firmly established. The present study provides evidence to define the duration of micro-expression by collecting and analyzing the fast facial expressions which are the leakage of genuine emotions. Participants were asked to neutralize their faces while watching emotional video episodes. Among the more than 1,000 elicited facial expressions, 109 leaked fast expressions (less than 500 ms) were selected and analyzed. The distribution curves of total duration and onset duration for the micro-expressions were presented. Based on the distribution and estimation, it seems suitable to define micro-expression by its total duration less than 500 ms or its onset duration less than 260 ms. These findings may facilitate further studies of micro-expressions in the future.
Conference Paper
Full-text available
Micro-expressions are facial expressions which are fleeting and reveal genuine emotions that people try to conceal. These are important clues for detecting lies and dangerous behaviors and therefore have potential applications in various fields such as the clinical field and national security. However, recognition through the naked eye is very difficult. Therefore, researchers in the field of computer vision have tried to develop micro-expression detection and recognition algorithms but lack spontaneous micro-expression databases. In this study, we attempted to create a database of spontaneous micro-expressions which were elicited from neutralized faces. Based on previous psychological studies, we designed an effective procedure in lab situations to elicit spontaneous micro-expressions and analyzed the video data with care to offer valid and reliable codings. From 1500 elicited facial movements filmed under 60fps, 195 micro-expressions were selected. These samples were coded so that the first, peak and last frames were tagged. Action units (AUs) were marked to give an objective and accurate description of the facial movements. Emotions were labeled based on psychological studies and participants’ self-report to enhance the validity.
Article
Full-text available
A vision-based human--computer interface is presented in the paper. The interface detects voluntary eye-blinks and interprets them as control commands. The employed image processing methods include Haar-like features for automatic face detection, and template matching based eye tracking and eye-blink detection. Interface performance was tested by 49 users (of which 12 were with physical disabilities). Test results indicate interface usefulness in offering an alternative mean of communication with computers. The users entered English and Polish text (with average time of less than 12s per character) and were able to browse the Internet. The interface is based on a notebook equipped with a typical web camera and requires no extra light sources. The interface application is available on-line as open-source software. KeywordsHuman–computer interface–Eye-blink detection–Face detection
Article
Full-text available
Encoders were video recorded giving either truthful or deceptive descriptions of video footage designed to generate either emotional or unemotional responses. Decoders were asked to indicate the truthfulness of each item, what cues they used in making their judgements, and then to complete both the Micro Expression Training Tool (METT) and Subtle Expression Training Tool (SETT). Although overall performance on the deception detection task was no better than chance, performance for emotional lie detection was significantly above chance, while that for unemotional lie detection was significantly below chance. Emotional lie detection accuracy was also significantly positively correlated with reported use of facial expressions and with performance on the SETT, but not on the METT. The study highlights the importance of taking the type of lie into account when assessing skill in deception detection.
Conference Paper
Full-text available
Facial micro-expressions were proven to be an important behaviour source for hostile intent and danger demeanour detection [1]. In this paper, we present a novel approach for facial micro-expressions recognition in video sequences. First, 200 frame per second (fps) high speed camera is used to capture the face. Second, the face is divided to specific regions, then the motion in each region is recognized based on 3D-Gradients orientation histogram descriptor. For testing this approach, we create a new dataset of facial micro-expressions, that was manually tagged as a ground truth, using a high speed camera. In this work, we present recognition results of 13 different micro-expressions.
Conference Paper
Full-text available
A practical lipreading system can be considered either as subject dependent (SD) or subject-independent (SI). An SD system is user-specific, i.e., customized for some particular user while an SI system has to cope with a large number of users. These two types of systems pose variant challenges and have to be treated differently. In this paper, we propose a simple deterministic model to tackle the problem. The model first seeks a low-dimensional manifold where visual features extracted from the frames of a video can be projected onto a continuous deterministic curve embedded in a path graph. Moreover, it can map arbitrary points on the curve back into the image space, making it suitable for temporal interpolation. Based on the model, we develop two separate strategies for SD and SI lipreading. The former is turned into a simple curve-matching problem while for the latter, we propose a video-normalization scheme to improve the system developed by Zhao et al. We evaluated our system on the OuluVS database and achieved recognition rates more than 20% higher than the ones reported by Zhao et al. in both SD and SI testing scenarios.
Conference Paper
Full-text available
Facial micro-expressions are rapid involuntary facial expressions which reveal suppressed affect. To the best knowledge of the authors, there is no previous work that successfully recognises spontaneous facial micro-expressions. In this paper we show how a temporal interpolation model together with the first comprehensive spontaneous micro-expression corpus enable us to accurately recognise these very short expressions. We designed an induced emotion suppression experiment to collect the new corpus using a high-speed camera. The system is the first to recognise spontaneous facial micro-expressions and achieves very promising results that compare favourably with the human micro-expression detection accuracy.
Conference Paper
Full-text available
Micro-expressions are one of the most important behavioral clues for lie and dangerous demeanor detections. However, it is difficult for humans to detect micro-expressions. In this paper, a new approach for automatic microexpression recognition is presented. The system is fully automatic and operates in frame by frame manner. It automatically locates the face and extracts the features by using Gabor filters. GentleSVM is then employed to identify microexpressions. As for spotting, the system obtained 95.83% accuracy. As for recognition, the system showed 85.42% accuracy which was higher than the performance of trained human subjects. To further improve the performance, a more representative training set, a more sophisticated testing bed, and an accurate image alignment method should be focused in future research.
Article
Full-text available
Please see: http://www. kasrl. org/jaffe. html for details about the JAFFE database.
Article
Full-text available
Change in a speaker’s emotion is a fundamental component in human communication. Automatic recognition of spontaneous emotion would significantly impact human-computer interaction and emotion-related studies in education, psychology and psychiatry. In this paper, we explore methods for detecting emotional facial expressions occurring in a realistic human conversation setting—the Adult Attachment Interview (AAI). Because non-emotional facial expressions have no distinct description and are expensive to model, we treat emotional facial expression detection as a one- class classification problem, which is to describe target objects (i.e., emotional facial expressions) and distinguish them from outliers (i.e., non-emotional ones). Our preliminary experiments on AAI data suggest that one-class classification methods can reach a good balance between cost (labeling and computing) and recognition performance by avoiding non-emotional expression labeling and modeling.
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.
Article
The factorization method described in this series of reports requires an algorithm to track the motion of features in an image stream. Given the small inter-frame displacement made possible by the factorization approach, the best tracking method turns out to be the one proposed by Lucas and Kanade in 1981. The method defines the measure of match between fixed-size feature windows in the past and current frame as the sum of squared intensity differences over the windows. The displacement is then defined as the one that minimizes this sum. For small motions, a linearization of the image intensities leads to a Newton-Raphson style minimization. In this report, after rederiving the method in a physically intuitive way, we answer the crucial question of how to choose the feature windows that are best suited for tracking. Our selection criterion is based directly on the definition of the tracking algorithm, and expresses how well a feature can be tracked. As a result, the criterion is optima...
Article
: Research relevant to psychotherapy regarding facial expression and body movement, has shown that the kind of information which can be gleaned from the patients words - information about affects, attitudes, interpersonal styles, psychodynamics - can also be derived from his concomitant nonverbal behavior. The study explores the interaction situation, and considers how within deception interactions differences in neuroanatomy and cultural influences combine to produce specific types of body movements and facial expressions which escape efforts to deceive and emerge as leakage or deception clues.
Conference Paper
One of important cues of deception detection is micro-expression. It has three characteristics: short duration, low intensity and usually local movements. These characteristics imply that micro-expression is sparse. In this paper, we use the sparse part of Robust PCA (RPCA) to extract the subtle motion information of micro-expression. The local texture features of the information are extracted by Local Spatiotemporal Directional Features (LSTD). In order to extract more effective local features, 16 Regions of Interest (ROIs) are assigned based on the Facial Action Coding System (FACS). The experimental results on two micro-expression databases show the proposed method gain better performance. Moreover, the proposed method may further be used to extract other subtle motion information (such as lip-reading, the human pulse, and micro-gesture etc.) from video.
Conference Paper
This paper aims to investigate whether micro-facial movement sequences can be distinguished from neutral face sequences. As a micro-facial movement tends to be very quick and subtle, classifying when a movement occurs compared to the face without movement can be a challenging computer vision problem. Using local binary patterns on three orthogonal planes and Gaussian derivatives, local features, when interpreted by machine learning algorithms, can accurately describe when a movement and non-movement occurs. This method can then be applied to help aid humans in detecting when the small movements occur. This also differs from current literature as most only concentrate in emotional expression recognition. Using the CASME II dataset, the results from the investigation of different descriptors have shown a higher accuracy compared to state-of-the-art methods.
Conference Paper
High dimensional engineered features have yielded high performance results on a variety of visual recognition tasks and attracted significant recent attention. Here, we examine the problem of expression recognition in static facial images. We first present a technique to build high dimensional, ∼60k features composed of dense Census transformed vectors based on locations defined by facial keypoint predictions. The approach yields state of the art performance at 96.8% accuracy for detecting facial expressions on the well known Cohn-Kanade plus (CK+) evaluation and 93.2% for smile detection on the GENKI dataset. We also find that the subsequent application of a linear discriminative dimensionality reduction technique can make the approach more robust when keypoint locations are less precise. We go on to explore the recognition of expressions captured under more challenging pose and illumination conditions. Specifically, we test this representation on the GENKI smile detection dataset. Our high dimensional feature technique yields state of the art performance on both of these well known evaluations.
Article
This paper presents a novel method to recognize subtle emotions based on optical strain magnitude feature extraction from the temporal point of view. The common way that subtle emotions are exhibited by a person is in the form of visually observed micro-expressions, which usually occur only over a brief period of time. Optical strain allows small deformations on the face to be computed between successive frames although these subtle changes can be minute. We perform temporal sum pooling for each frame in the video to a single strain map to summarize the features over time. To reduce the dimensionality of the input space, the strain maps are then resized to a pre-defined resolution for consistency across the database. Experiments were conducted on the SMIC (Spontaneous Micro-expression) Database, which was recently established in 2013. A best three-class recognition accuracy of 53.56% is achieved, with the proposed method outperforming the baseline reported in the original work by almost 5%. This is the first known optical strain based classification of micro-expressions. The closest related work employed optical strain to spot micro-expressions, but did not investigate its potential for determining the specific type of micro-expression.
Chapter
This chapter considers two questions: Why are most people so lousy at catching liars? And what are the microexpressions that give liars away, and given that they are easy to learn, why do people have such trouble learning them? An interesting part of the discussion, which begins with a definition of lying, is that it does not hold that one must speak in order to lie, and one can lie without falsifying. It then proceeds to canvass the research on catching liars, which includes interesting information about why and when people lie, and at what costs they are willing to lie. Among many other reasons, it argues that many people are poor lie catchers and poor liars because of evolutionary history and ancestral environment.
Article
Facial micro-expressions are fast and subtle facial motions that are considered as one of the most useful external signs for detecting hidden emotional changes in a person. However, they are not easy to detect and measure as they appear only for a short time, with small muscle contraction in the facial areas where salient features are not available. We propose a new computer vision method for detecting and measuring timing characteristics of facial micro-expressions. The core of this method is based on a descriptor that combines pre-processing masks, histograms and concatenation of spatial-temporal gradient vectors. Presented 3D gradient histogram descriptor is able to detect and measure the timing characteristics of the fast and subtle changes of the facial skin surface. This method is specifically designed for analysis of videos recorded using a hi-speed 200 fps camera. Final classification of micro expressions is done by using a k-mean classifier and a voting procedure. The Facial Action Coding System was utilized to annotate the appearance and dynamics of the expressions in our new hi-speed micro-expressions video database. The efficiency of the proposed approach was validated using our new hi-speed video database.
Article
LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Chapter
In psychotherapy research, one is often hard pressed to make sense out of the many behaviors, processes, and other phenomena which can be observed in the therapy situation. The present report is concerned with one class of behaviors and processes which cannot be observed—namely, facial expressions which are so short-lived that they seem to be quicker-than-the-eye. These rapid expressions can be seen when motion picture films are run at about one-sixth of their normal speed. The film and projector thus become a sort of temporal microscope, in that they expand time sufficiently to enable the investigator to observe events not otherwise apparent to him.
Conference Paper
One of the most interesting aspects of facial expression analysis is recognizing micro-expression. In this paper, a new feature tracking and alignment approach for micro-expression based on FACS systems and Tracking Learning Detection(TLD) is presented. The basic point for detecting first frame feature point is based on Hough Forest, and in order to increase the accuracy, we extracted features by Local Binary Pattern(LBP) as initialization. Unlike many previous works, the proposed approach applies conceptual area in perspective of human cognition. And this approach aims to track the extracted features and quantifies changing trend of these points for analyzing micro-expression. To estimate our approach's rationality, we conducted experiments on the CASME and SMIC facial expression database. The results show that the proposed approach is effective and performs well in recognizing some specific micro-expressions. Furthermore, the proposed approach is more accurate than previous methods based on Temporal Interpolation Model(TIM).
Conference Paper
Obtaining a compact and discriminative representation of facial and body expressions is a difficult problem in emotion recognition. Part of the difficulty is capturing microexpressions, i.e., short, involuntary expressions that last for only a fraction of a second: at a micro-temporal scale, there are so many other subtle face and body movements that do not convey semantically meaningful information. We present a novel approach to this problem by exploiting the sparsity of the frequent micro-temporal motion patterns. Local space-time features are extracted over the face and body region for a very short time period, e.g., few milliseconds. A codebook of microexpressions is learned from the data and used to encode the features in a sparse manner. This allows us to obtain a representation that captures the most salient motion patterns of the face and body at a micro-temporal scale. Experiments performed on the AVEC 2012 dataset show our approach achieving the best published performance on the arousal dimension based solely on visual features. We also report experimental results on audio-visual emotion recognition, comparing early and late data fusion techniques.
Conference Paper
Micro-expressions are short, involuntary facial expressions which reveal hidden emotions. Micro-expressions are important for understanding humans' deceitful behavior. Psychologists have been studying them since the 1960's. Currently the attention is elevated in both academic fields and in media. However, while general facial expression recognition (FER) has been intensively studied for years in computer vision, little research has been done in automatically analyzing micro-expressions. The biggest obstacle to date has been the lack of a suitable database. In this paper we present a novel Spontaneous Micro-expression Database SMIC, which includes 164 micro-expression video clips elicited from 16 participants. Micro-expression detection and recognition performance are provided as baselines. SMIC provides sufficient source material for comprehensive testing of automatic systems for analyzing micro-expressions, which has not been possible with any previously published database.
Article
Model-based vision is firmly established as a robust approach to recognizing and locating known rigid objects in the presence of noise, clutter, and occlusion. It is more problematic to apply model-based methods to images of objects whose appearance can vary, though a number of approaches based on the use of flexible templates have been proposed. The problem with existing methods is that they sacrifice model specificity in order to accommodate variability, thereby compromising robustness during image interpretation. We argue that a model should only be able to deform in ways characteristic of the class of objects it represents. We describe a method for building models by learning patterns of variability from a training set of correctly annotated images. These models can be used for image search in an iterative refinement algorithm analogous to that employed by Active Contour Models (Snakes). The key difference is that our Active Shape Models can only deform to fit the data in ways consistent with the training set. We show several practical examples where we have built such models and used them to locate partially occluded objects in noisy, cluttered images.
Article
Our goal is to reveal temporal variations in videos that are difficult or impossible to see with the naked eye and display them in an indicative manner. Our method, which we call Eulerian Video Magnification, takes a standard video sequence as input, and applies spatial decomposition, followed by temporal filtering to the frames. The resulting signal is then amplified to reveal hidden information. Using our method, we are able to visualize the flow of blood as it fills the face and also to amplify and reveal small motions. Our technique can run in real time to show phenomena occurring at the temporal frequencies selected by the user.
Article
Researchers interested in emotion have long struggled with the problem of how to elicit emotional responses in the laboratory. In this article, we summarise five years of work to develop a set of films that reliably elicit each of eight emotional states (amusement, anger, contentment, disgust, fear, neutral, sadness, and surprise). After evaluating over 250 films, we showed selected film clips to an ethnically diverse sample of 494 English-speaking subjects. We then chose the two best films for each of the eight target emotions based on the intensity and discreteness of subjects' responses to each film. We found that our set of 16 films successfully elicited amusement, anger, contentment. disgust, sadness, surprise, a relatively neutral state, and, to a lesser extent, fear. We compare this set of films with another set recently described by Philippot (1993), and indicate that detailed instructions for creating our set of film stimuli will be provided on request.
Conference Paper
In object recognition a robust feature set is considered as an important component in almost all the approaches proposed in the literature. In facial analysis, one of the best known feature set is based in Local Binary Patterns (LBP) which extracts the information contained in the image using comparisons between pixels in a region, finally such comparisons are encoded in form of histogram. We argue that this kind of encoding is statistically non-stable and can lead to errors during the recognition process, specially in noisy and low-resolution images, where the information contained in the image is not enough to generate a statistically robust histogram. In this paper, we propose a new method to encode the Local Binary Patterns using an re-parametrization of the second local order Gaussian Jet which generates more robust and reliable histograms suitable for different facial analysis tasks. We show that our method can be used for recognizing micro-expressions with competitive performances on the Spontaneous Micro-expression Corpus (SMIC) and the YORK Deception Detection Test.