Figures
Explore figures and images from publications
Figure 5 - uploaded by Kevin Bailly
Content may be subject to copyright.
Examples of images extracted from the DISFA dataset 

Examples of images extracted from the DISFA dataset 

Source publication
Article
Full-text available
The ability to automatically infer emotional states, engagement, depression or pain from nonverbal behavior has recently become of great interest in many research and industrial works. This will result in the emergence of a wide range of applications in robotics, biometrics, marketing and medicine. The Facial Action Coding System (FACS) proposed by...

Similar publications

Article
Full-text available
Using one of the key bibliometric methods, namely the index of citations, from a comprehensive multidisciplinary bibliographic electronic database, Web of Science, this article provides a circumscribed descriptive analysis of 1000 most-cited papers in the research field of visible nonverbal behavior. Using this method, we outline the most influenti...
Article
Full-text available
Researchers have theoretically proposed that humans decode other individuals' emotions or elementary cognitive appraisals from particular sets of facial action units (AUs). However, only a few empirical studies have systematically tested the relationships between the decoding of emotions/appraisals and sets of AUs, and the results are mixed. Furthe...

Citations

... In real time, in most cases, certain positive examples of AUs are minimal, owing to the rarity of becoming activated due to natural facial expression (such as AU9 or AU20). This has to be taken into consideration to avoid "overfitting on the training data" [411]. ...
... It has been shown that for posed and spontaneous smile temporal characteristics (cues), such as frame rate, morphology, configuration, the speed of activation, time and total duration, co-occurrence, trajectory, and asymmetry [486], these are fundamental factors in distinguishing between the two types of classes. Furthermore, the difference between frustrated and delighted smiling faces showed the importance of analysing the temporal pattern code of AU12 in different affective states [411] [173]. Frowning faces frequently present an embedded subtle appearance and dynamic consecutive contraction of muscles in one's brows of an AU4 episode [246]. ...
... The automatic AU event detection (temporal segments) and intensity levels is a general challenging problem [267] owing to the following reasons: First, the facial AUs event might occur in a different time scale [115], for example, very short activations or long ones or none [411]. Second, the expression can originate from a different affective state [173]. ...
... In real time, in most cases, certain positive examples of AUs are minimal, owing to the rarity of becoming activated due to natural facial expression (such as AU9 or AU20). This has to be taken into consideration to avoid 'overfitting on the training data' [12]. Finally, other factors are adversely susceptible such as registration errors, low intensity of facial expressions, noise and occlusions, time delay, age progression, face size, mood and behaviour, scale and orientation, motion blur, gender, ethnicity, facial hair, recording environment, permanent furrows, decorations, accessories and skin marks, make-up, glasses, piercings, tattoos, beards and scars which can either occlude or obscure the face [13,14,17] [18,19]. ...
Full-text available
Article
Abstract Action Units activation is a set of local individual facial muscle parts that occur in time constituting a natural facial expression event. AUs occurrence activation detection can be inferred as temporally consecutive evolving movements of these parts. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. Our work is divided into three contributions: first, we extracted the features from Local Binary Patterns, Local Phase Quantisation, and dynamic texture descriptor LPQTOP with two distinct leveraged network models from different CNN architectures for local deep visual learning for AU image analysis. Second, cascading the LPQTOP feature vector with Long Short‐Term Memory is used for coding longer term temporal information. Next, we discovered the importance of stacking LSTM on top of CNN for learning temporal information in combining the spatially and temporally schemes simultaneously. Also, we hypothesised that using an unsupervised Slow Feature Analysis method is able to leach invariant information from dynamic textures. Third, we compared continuous scoring predictions between LPQTOP and SVM, LPQTOP with LSTM, and AlexNet. A competitive substantial performance evaluation was carried out on the Enhanced CK dataset. Overall, the results indicate that CNN is very promising and surpassed all other methods
... Many researches combine various types of representations to enhance the system performance, e.g., geometry representation in conjunction with appearance representation [128,129,164,182], holistic representation plus local representation [8,75], wavelet analysis representation plus histogram representation [123,233], low-level representation plus high-level representation [34]. Eleftheriadis et al. [50,51] proposed a multi-conditional latent variable model by introducing topological and relational constraints that encode the AUs dependencies at both feature and model level into the proposed manifold learning for AUs recognition. ...
... The second group of studies provide regression-based scheme for AUs intensity estimation, where the dependent variables are the five intensity levels, and the independent variables are either facial features or decision scores, such as Support Vector Regression (SVR) [71,154], Sparse Regression [125], Relevance Vector Regression [81], Gaussian kernel regression [128]. The regression-based methods were also exploited for 3D facial modality [156,235]. ...
Full-text available
Article
Facial Action Coding System is the most influential sign judgment method for facial behavior, and it is a comprehensive and anatomical system which could encode various facial movements by the combination of basic AUs (Action Units). AUs define certain facial configurations caused by contraction of one or more facial muscles, and they are independent of interpretation of emotions. However, automatic facial action unit recognition remains challenging due to several open questions such as rigid and non-rigid facial motions, multiple AUs detection, intensity estimation and naturalistic context application. This paper introduces recent advances in automatic facial action unit recognition, focusing on the fundamental components of face detection and registration, facial action representation, feature selection and classification. The comprehensive analysis of facial representations is presented according to the facial data properties (image and video, 2D and 3D) and characteristics of facial features (predesign and learning, appearance and geometry, hybrid and fusion). Facial action unit recognition involves AUs occurrence detection, AUs temporal segment detection and AUs intensity estimation. We discussed the role of each component, main techniques with their characteristics, challenges and potential directions of facial action unit analysis.
... There exists several methods for feature extraction from images. In [33], authors proposed an iterative nonlinear feature selection method, Iterative Regularized Kernel Regression (IRKR) combined with a Lasso-regularized version of MLKR formulation for real-time Activation Units (AUs) intensity prediction. Authors of [6] used video cues from BlackDog dataset [1] for emotion classification. ...
Full-text available
Article
Emotions are spontaneous feelings that are accompanied by uc-tuations in facial muscles, which leads to facial expressions. Categorization of these facial expressions as one of the seven basic emotions-happy, sad, anger, disgust, fear, surprise, and neutral is the intention behind Emotion Recognition. This is a dicult problem because of the complexity of human expressions, but is gaining immense popularity due to its vast number of applications such as predicting behavior. Using deeper architectures has enabled researchers to achieve state-of-the-art performance in emotion recognition. Motivated from the aforementioned discussion, in this paper, we propose a model named as PRATIT, used for facial expression recognition that uses specic image preprocessing steps and a Convolutional Neural Network (CNN) model. In PRATIT, preprocessing techniques such as grayscaling, cropping, resizing, and histogram equalization have been used to handle variations in the images. CNNs accomplish better accuracy with larger datasets, but there are no openly accessible datasets with adequate information for emotion recogni
... AU intensity is quantified into 6 discrete ordinal levels, i.e., Neutral < Trace < Slight < Pronounced < Extreme < Maximum. Prior works on AUs can be divided into two groups, i.e., AU detection [1], [9], [10], [16], [18], [33], [45], [57] and AU intensity estimation [25], [27], [47], [51], [63]. The former task focuses on identifying the existence of AUs, while the latter focuses on estimating detailed AU intensity. ...
Full-text available
Article
Facial action units (AUs) are defined to depict movements of facial muscles, which are basic elements to encode facial expressions. Automatic AU intensity estimation is an important task in affective computing. Previous works leverage the representation power of deep neural networks (DNNs) to improve the performance of intensity estimation. However, a large number of intensity annotations are required to train DNNs that contain millions of parameters. But it is expensive and difficult to build a largescale database with AU intensity annotation since AU annotation requires annotators have strong domain expertise.We propose a novel semi-supervised deep convolutional network that leverages extremely limited AU annotations for AU intensity estimation. It requires only intensity annotations of keyframes of training sequences. Domain knowledge on AUs is leveraged to provide weak supervisory information, including relative appearance similarity, temporal intensity ordering, facial symmetry, and contrastive appearance difference. We also propose a strategy to train a model for joint intensity estimation of multiple AUs under the setting of semi-supervised learning, which greatly improves the efficiency during inference.We perform empirical experiments on two public benchmark expression databases and make comparisons with state-ofthe-art methods to demonstrate the effectiveness of the proposed method.
... Research on automatic facial behavior analysis has mainly focused on the fully-supervised setting. In the specific problems of Action Unit and Pain Intensity Estimation, recent works have developed models based on HCORF [9], Metric Learning [31], Convolutional Neural Networks [11] or Gaussian Processes [32] among others. However, as discussed in Sec I, supervised models are limited in this context because they involve a laborious data labelling. ...
Article
We propose a Multi-Instance-Learning (MIL) approach for weakly-supervised learning problems, where a training set is formed by bags (sets of feature vectors or instances) and only labels at bag-level are provided. Specifically, we consider the Multi-Instance Dynamic-Ordinal-Regression (MI-DOR) setting, where the instance labels are naturally represented as ordinal variables and bags are structured as temporal sequences. To this end, we propose Multi-Instance Dynamic Ordinal Random Fields (MI-DORF). In this framework, we treat instance-labels as temporally-dependent latent variables in an Undirected Graphical Model. Different MIL assumptions are modelled via newly introduced high-order potentials relating bag and instance-labels within the energy function of the model. We also extend our framework to address the Partially-Observed MI-DOR problems, where a subset of instance labels are available during training. We show on the tasks of weakly-supervised facial behavior analysis, Facial Action Unit (DISFA dataset) and Pain (UNBC dataset) Intensity estimation, that the proposed framework outperforms alternative learning approaches. Furthermore, we show that MIDORF can be employed to reduce the data annotation efforts in this context by large-scale.
... In [11], angles and distances are computed using the landmarks detected in the face and Gabor filters are used to describe the face texture. Likewise, [12] and [13] compute a set of geometric features from landmarks, and use Histogram of Oriented Gradients (HOG) as appearance features. On the contrary, [14] only computes Scale Invariant Feature Transform (SIFT) from landmarks. ...
... On the contrary, there has been a lot of proposed models for AU intensity estimation. For example, [12] used regression, [13] applied metric learning, [11] employed kernel subclass discriminant analysis, [19] used ordinal regression, and [2] applied a multiclass classification. ...
Full-text available
Conference Paper
This paper presents an unified convolutional neural network (CNN), named AUMPNet, to perform both Action Units (AUs) detection and intensity estimation on facial images with multiple poses. Although there are a variety of methods in the literature designed for facial expression analysis, only few of them can handle head pose variations. Therefore, it is essential to develop new models to work on non-frontal face images, for instance, those obtained from unconstrained environments. In order to cope with problems raised by pose variations, an unique CNN, based on region and multitask learning, is proposed for both AU detection and intensity estimation tasks. Also, the available head pose information was added to the multitask loss as a constraint to the network optimization, pushing the network towards learning better representations. As opposed to current approaches that require ad hoc models for every single AU in each task, the proposed network simultaneously learns AU occurrence and intensity levels for all AUs. The AUMPNet was evaluated on an extended version of the BP4D-Spontaneous database, which was synthesized into nine different head poses and made available to FG 2017 Facial Expression Recognition and Analysis Challenge (FERA 2017) participants. The achieved results surpass the FERA 2017 baseline, using the challenge metrics, for AU detection by 0.054 in F 1-score and 0.182 in ICC(3, 1) for intensity estimation.
... These levels of intensity are also important in analyzing facial expression dynamics (changes of intensity over time) which goal is to detect onset, apex and offset of facial expressions [1], [2]. Detection of AUs can be handled using geometric features from facial fiducial points (landmarks) [11], appearance features [2], [12], or a combination of both [1], [13]. ...
... After feature extraction, they used Laplacian Eigenmap and Principal Component Analysis (PCA) for dimensionality reduction. For the prediction of smile intensity they evaluated Support Vector Regression (SVR), binary Support Vector Machines (SVMs), and multi- Nicolle et al. [13] worked on facial action unit prediction using shape and appearance features. The shape features were extracted from 49 landmarks. ...
... Face landmarks are prone to tracking failure [13], because of this we evaluate a set of appearance descriptors for smile intensity estimation. In this section we describe our approach in three steps: preprocessing, feature extraction and estimation. ...
Full-text available
Conference Paper
Fig. 1. Overview of our method for smile intensity estimation Abstract—Facial expression analysis is an important field of research, mostly because of the rich information faces can provide. The majority of works published in the literature have focused on facial expression recognition and so far estimating facial expression intensities have not gathered same attention. The analysis of these intensities could improve face processing applications on distinct areas, such as computer assisted health care, human-computer interaction and biometrics. Because the smile is the most common expression, studying its intensity is a first step towards estimating other expressions intensities. Most related works are based on facial landmarks, sometimes combined with appearance features around these points, to estimate smile intensities. Relying on landmarks can lead to wrong estimations due to errors in the registration step. In this work we investigate a landmark-free approach for smile intensity estimation using appearance features from a grid division of the face. We tested our approach on two different databases, one with spontaneous expressions (BP4D) and the other with posed expressions (BU-3DFE); results are compared to state-of-the-art works in the field. Our method shows competitive results even using only appearance features on spontaneous facial expression intensities, but we found that there is still need for further investigation on posed expressions.
Chapter
Facial Action Unit (AU) detection is essential for automatic emotion analysis. However, most AU detection researches pay attention to real-valued deep neural networks, which include heavy multiplication operations and require high computation and memory. In this paper, we propose binary convolutional neural networks for AU detection. The main contributions of our work include the following two aspects: (1) we propose a multi-layer decision fusion strategy (MDFS), which combines multiple prediction channels to produce the final result. (2) MDFS is encoded with binary parameters and activations (BMDFS) instead of real values to reduce memory cost and accelerate convolutional operations. Our proposed method is evaluated on BP4D database. MDFS and BMDFS obtain the overall F1 score of 65.5% and 54.8%, respectively. Moreover, BMDFS achieves 32× memory saving and 4.6 × speed up compared to real-valued models. Experimental results show that our approach achieves promising good performance.