Conference Paper

Double Complete D-LBP with Extreme Learning Machine Auto-Encoder and Cascade Forest for Facial Expression Analysis

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The random forest has been employed mostly as a classifier for facial expression features; Figure 13 illustrates the application of random forest to FER. Random Forest is recently used to classify the facial expression 30 VOLUME 00, 2021 feature, selected by Extreme Learning Auto-Encoder (ELAE) from a complete doubled-LBP features [197]. [198] proposed an extension of random forest termed Pair-wise Condition Random Forest (PCRF). ...
Article
Full-text available
Facial Expression Recognition (FER) is presently the aspect of cognitive and affective computing with the most attention and popularity, aided by its vast application areas. Several studies have been conducted on FER, and many review works are also available. The existing FER review works only give an account of FER models capable of predicting the basic expressions. None of the works considers intensity estimation of an emotion; neither do they include studies that address data annotation inconsistencies and correlation among labels in their works. This work first introduces some identified FER application areas and provides a discussion on recognised FER challenges. We proceed to provide a comprehensive FER review in three different machine learning problem definitions: Single Label Learning (SLL)- which presents FER as a multiclass problem, Multilabel Learning (MLL)- that resolves the ambiguity nature of FER, and Label Distribution Learning- that recovers the distribution of emotion in FER data annotation. We also include studies on expression intensity estimation from the face. Furthermore, popularly employed FER models are thoroughly and carefully discussed in handcrafted, conventional machine learning and deep learning models. We finally itemise some recognise unresolved issues and also suggest future research areas in the field.
... In many real-world classification problems, the total number of data for a class is less than the other class (He and Ma 2013;Shen et al. 2018). We call it the imbalance data sets. ...
Article
Full-text available
Class-specific cost regulation extreme learning machine (CCR-ELM) can effectively deal with the class imbalance problems. However, its key parameters, including the number of hidden nodes, the input weights, the biases and the tradeoff factors are normally generated randomly or preset by human. Moreover, the number of input weights and biases depend on the size of hidden layer. Inappropriate quantity of hidden nodes may lead to the useless or redundant neuron nodes, and make the whole structure complex, even cause the worse generalization and unstable classification performances. Based on this, an adaptive CCR-ELM with variable-length brain storm optimization algorithm is proposed for the class imbalance learning. Each individual consists of all above parameters of CCR-ELM and its length varies with the number of hidden nodes. A novel mergence operator is presented to incorporate two parent individuals with different length and generate a new individual. The experimental results for nine imbalance datasets show that variable-length brain storm optimization algorithm can find better parameters of CCR-ELM, resulting in the better classification accuracy than other evolutionary optimization algorithms, such as GA, PSO, and VPSO. In addition, the classification performance of the proposed adaptive algorithm is relatively stable under varied imbalance ratios. Applying the proposed algorithm in the fault diagnosis of conveyor belt also proves that ACCR-ELM with VLen-BSO has the better classification performances.
... For the FER task on RAF-DB database, the proposed method performance is compared with performances of recently published methods in literatures [64]- [69], as shown in Table Ⅴ. It can be seen that the method proposed in this paper achieves an average recognition accuracy up to 81.83%, which outperforms the listed state-of-the-arts methods including handcrafted feature based method [67], CNN-based methods [64], [68], [69], capsule-based method [65], and data augmentation based method [66]. Compared with the listed state-of-the-arts methods, our cGAN based approach is able to disentangle facial expression factor by individually controlling the facial expressions and optimizing both synthesis and classification loss functions (LI, La, Lcyc, and Lexp) and thus achieves high accuracy in FER task on RAF-DB database. ...
Article
Full-text available
As an emerging research topic for Proximity Service (ProSe), automatic emotion recognition enables the machines to understand the emotional changes of human beings which can not only facilitate natural, effective, seamless, and advanced human-robot interaction (HRI) or human-computer interface (HCI), but also promote emotional health. Facial expression recognition (FER) is a vital task for emotion recognition. However, significant gap between human and machine exists in FER task. In this paper, we present a conditional generative adversarial network (cGAN) based approach to alleviate the intra-class variations by individually controlling the facial expressions and learning the generative and discriminative representations simultaneously. The proposed framework consists of a generator G and three discriminators (Di, Da, and Dexp). The generator G transforms any query face image into another prototypic facial expression image with other factors preserved. Armed with action units (AUs) condition, the generator G pays more attention to information relevant to facial expression. Three loss functions (LI, La, and Lexp) corresponding to the three discriminators (Di, Da, and Dexp) were designed to learn generative and discriminative representations. Moreover, after rendering the generated expression back to its original facial expression, cycle consistency loss is also applied to guarantee the identity and produce more constrained visual representations. Optimized by combining both synthesis and classification loss functions, the learnt representation is explicitly disentangled from other variations such as identity, head pose and illumination. Qualitative and quantitative experimental results demonstrate the proposed FER system is effective for expression recognition.
Preprint
Human emotions analysis has been the focus of many studies, especially in the field of Affective Computing, and is important for many applications, e.g. human-computer intelligent interaction, stress analysis, interactive games, animations, etc. Solutions for automatic emotion analysis have also benefited from the development of deep learning approaches and the availability of vast amount of visual facial data on the internet. This paper proposes a novel method for human emotion recognition from a single RGB image. We construct a large-scale dataset of facial videos (\textbf{FaceVid}), rich in facial dynamics, identities, expressions, appearance and 3D pose variations. We use this dataset to train a deep Convolutional Neural Network for estimating expression parameters of a 3D Morphable Model and combine it with an effective back-end emotion classifier. Our proposed framework runs at 50 frames per second and is capable of robustly estimating parameters of 3D expression variation and accurately recognizing facial expressions from in-the-wild images. We present extensive experimental evaluation that shows that the proposed method outperforms the compared techniques in estimating the 3D expression parameters and achieves state-of-the-art performance in recognising the basic emotions from facial images, as well as recognising stress from facial videos. %compared to the current state of the art in emotion recognition from facial images.
Article
Full-text available
We propose a method for automatically classifying facial images based on labeled elastic graph matching, a 2D Gabor wavelet representation, and linear discriminant analysis.
Conference Paper
Full-text available
Recognition of human emotions from the imaging templates is useful in a wide variety of human-computer interaction and intelligent systems applications. However, the automatic recognition of facial expressions using image template matching techniques suffer from the natural variability with facial features and recording conditions. In spite of the progress achieved in facial emotion recognition in recent years, the effective and computationally simple feature selection and classification technique for emotion recognition is still an open problem. In this paper, we propose an efficient and straightforward facial emotion recognition algorithm to reduce the problem of inter-class pixel mismatch during classification. The proposed method includes the application of pixel normalization to remove intensity offsets followed-up with a Min-Max metric in a nearest neighbor classifier that is capable of suppressing feature outliers. The results indicate an improvement of recognition performance from 92.85% to 98.57% for the proposed Min-Max classification method when tested on JAFFE database. The proposed emotion recognition technique outperforms the existing template matching methods.
Article
Full-text available
As the expressive depth of an emotional face differs with individuals, expressions, or situations, recognizing an expression using a single facial image at a moment is difficult. One of the approaches to alleviate this difficulty is using a video-based method that utilizes multiple frames to extract temporal information between facial expression images. In this paper, we attempt to utilize a generative image that is estimated based on a given single image. Then, we propose to utilize a contrastive representation that explains an expression difference for discriminative purposes. The contrastive representation is calculated at the embedding layer of a deep network by comparing a single given image with a reference sample generated by a deep encoder-decoder network. Consequently, we deploy deep neural networks that embed a combination of a generative model, a contrastive model, and a discriminative model. In our proposed networks, we attempt to disentangle a facial expressive factor in two steps including learning of a reference generator network and learning of a contrastive encoder network. We conducted extensive experiments on three publicly available face expression databases (CK+, MMI, and Oulu-CASIA) that have been widely adopted in the recent literatures. The proposed method outperforms the known state-of-the art methods in terms of the recognition accuracy.
Article
Full-text available
Automated Facial Expression Recognition (FER) has remained a challenging and interesting problem. Despite efforts made in developing various methods for FER, existing approaches traditionally lack generalizability when applied to unseen images or those that are captured in wild setting. Most of the existing approaches are based on engineered features (e.g. HOG, LBPH, and Gabor) where the classifier's hyperparameters are tuned to give best recognition accuracies across a single database, or a small collection of similar databases. Nevertheless, the results are not significant when they are applied to novel data. This paper proposes a deep neural network architecture to address the FER problem across multiple well-known standard face datasets. Specifically, our network consists of two convolutional layers each followed by max pooling and then four Inception layers. The network is a single component architecture that takes registered facial images as the input and classifies them into either of the six basic or the neutral expressions. We conducted comprehensive experiments on seven publically available facial expression databases, viz. MultiPIE, MMI, CK+, DISFA, FERA, SFEW, and FER2013. The results of proposed architecture are comparable to or better than the state-of-the-art methods and better than traditional convolutional neural networks and in both accuracy and training time.
Article
Full-text available
Collecting richly annotated, large datasets representing real-world conditions is a challenging task. With the progress in computer vision research, researchers have developed robust human facial-expression analysis solutions, but largely only for tightly controlled environments. Facial expressions are the visible facial changes in response to a person's internal affective state, intention, or social communication. Automatic facial-expression analysis has been an active research field for more than a decade, with applications in affective computing, intelligent environments, lie detection, psychiatry, emotion and paralinguistic communication, and multimodal human-computer interface (HCI).
Article
Full-text available
Two large facial-expression databases depicting challenging real-world conditions were constructed using a semi-automatic approach via a recommender system based on subtitles.
Conference Paper
Full-text available
In 2000, the Cohn-Kanade (CK) database was released for the purpose of promoting research into automatically detecting individual facial expressions. Since then, the CK database has become one of the most widely used test-beds for algorithm development and evaluation. During this period, three limitations have become apparent: 1) While AU codes are well validated, emotion labels are not, as they refer to what was requested rather than what was actually performed, 2) The lack of a common performance metric against which to evaluate new algorithms, and 3) Standard protocols for common databases have not emerged. As a consequence, the CK database has been used for both AU and emotion detection (even though labels for the latter have not been validated), comparison with benchmark algorithms is missing, and use of random subsets of the original database makes meta-analyses difficult. To address these and other concerns, we present the Extended Cohn-Kanade (CK+) database. The number of sequences is increased by 22% and the number of subjects by 27%. The target expression for each sequence is fully FACS coded and emotion labels have been revised and validated. In addition to this, non-posed sequences for several types of smiles and their associated metadata have been added. We present baseline results using Active Appearance Models (AAMs) and a linear support vector machine (SVM) classifier using a leave-one-out subject cross-validation for both AU and emotion detection for the posed data. The emotion and AU labels, along with the extended image data and tracked landmarks will be made available July 2010.
Article
Full-text available
Automatic facial expression analysis is an interesting and challenging problem, and impacts important applications in many areas such as human–computer interaction and data-driven animation. Deriving an effective facial representation from original face images is a vital step for successful facial expression recognition. In this paper, we empirically evaluate facial representation based on statistical local features, Local Binary Patterns, for person-independent facial expression recognition. Different machine learning methods are systematically examined on several databases. Extensive experiments illustrate that LBP features are effective and efficient for facial expression recognition. We further formulate Boosted-LBP to extract the most discriminant LBP features, and the best recognition performance is obtained by using Support Vector Machine classifiers with Boosted-LBP features. Moreover, we investigate LBP features for low-resolution facial expression recognition, which is a critical problem but seldom addressed in the existing work. We observe in our experiments that LBP features perform stably and robustly over a useful range of low resolutions of face images, and yield promising performance in compressed low-resolution video sequences captured in real-world environments.
Conference Paper
Full-text available
Quality data recorded in varied realistic environments is vital for effective human face related research. Currently available datasets for human facial expression analysis have been generated in highly controlled lab environments. We present a new static facial expression database Static Facial Expressions in the Wild (SFEW) extracted from a temporal facial expressions database Acted Facial Expressions in the Wild (AFEW) [9], which we have extracted from movies. In the past, many robust methods have been reported in the literature. However, these methods have been experimented on different databases or using different protocols within the same databases. The lack of a standard protocol makes it difficult to compare systems and acts as a hindrance in the progress of the field. Therefore, we propose a person independent training and testing protocol for expression recognition as part of the BEFIT workshop. Further, we compare our dataset with the JAFFE and Multi-PIE datasets and provide baseline results.
Preprint
Full-text available
Two art exhibitions, “Training Humans” and “Making Faces,” and the accompanying essay “Excavating AI: The politics of images in machine learning training sets” by Kate Crawford and Trevor Paglen, are making substantial impact on discourse taking place in the social and mass media networks, and some scholarly circles. Critical scrutiny reveals, however, a self-contradictory stance regarding informed consent for the use of facial images, as well as serious flaws in their critique of ML training sets. Our analysis underlines the non-negotiability of informed consent when using human data in artistic and other contexts, and clarifies issues relating to the description of ML training sets.
Article
Full-text available
In this correspondence, a completed modeling of the local binary pattern (LBP) operator is proposed and an associated completed LBP (CLBP) scheme is developed for texture classification. A local region is represented by its center pixel and a local difference sign-magnitude transform (LDSMT). The center pixels represent the image gray level and they are converted into a binary code, namely CLBP-Center (CLBP_C), by global thresholding. LDSMT decomposes the image local differences into two complementary components: the signs and the magnitudes, and two operators, namely CLBP-Sign (CLBP_S) and CLBP-Magnitude (CLBP_M), are proposed to code them. The traditional LBP is equivalent to the CLBP_S part of CLBP, and we show that CLBP_S preserves more information of the local structure than CLBP_M, which explains why the simple LBP operator can extract the texture features reasonably well. By combining CLBP_S, CLBP_M, and CLBP_C features into joint or hybrid distributions, significant improvement can be made for rotation invariant texture classification.
Conference Paper
Full-text available
In the last decade, the research topic of automatic analysis of facial expressions has become a central topic in machine vision research. Nonetheless, there is a glaring lack of a comprehensive, readily accessible reference set of face images that could be used as a basis for benchmarks for efforts in the field. This lack of easily accessible, suitable, common testing resource forms the major impediment to comparing and extending the issues concerned with automatic facial expression analysis. In this paper, we discuss a number of issues that make the problem of creating a benchmark facial expression database difficult. We then present the MMI facial expression database, which includes more than 1500 samples of both static images and image sequences of faces in frontal and in profile view displaying various expressions of emotion, single and multiple facial muscle activation. It has been built as a Web-based direct-manipulation application, allowing easy access and easy search of the available images. This database represents the most comprehensive reference set of images for studies on facial expression analysis to date.
Conference Paper
Full-text available
A method for extracting information about facial expressions from images is presented. Facial expression images are coded using a multi-orientation multi-resolution set of Gabor filters which are topographically ordered and aligned approximately with the face. The similarity space derived from this representation is compared with one derived from semantic ratings of the images by human observers. The results show that it is possible to construct a facial expression classifier with Gabor coding of the facial images as the input stage. The Gabor representation shows a significant degree of psychological plausibility, a design feature which may be important for human-computer interfaces
Conference Paper
In this paper, we propose gcForest, a decision tree ensemble approach with performance highly competitive to deep neural networks in a broad range of tasks. In contrast to deep neural networks which require great effort in hyper-parameter tuning, gcForest is much easier to train; even when it is applied to different data across different domains in our experiments, excellent performance can be achieved by almost same settings of hyper-parameters. The training process of gcForest is efficient, and users can control training cost according to computational resource available. The efficiency may be further enhanced because gcForest is naturally apt to parallel implementation. Furthermore, in contrast to deep neural networks which require large-scale training data, gcForest can work well even when there are only small-scale training data.
Conference Paper
Expressions are facial activities invoked by sets of muscle motions, which would give rise to large variations in appearance mainly around facial parts. Therefore, for visual-based expression analysis, localizing the action parts and encoding them effectively become two essential but challenging problems. To take them into account jointly for expression analysis, in this paper, we propose to adapt 3D Convolutional Neural Networks (3D CNN) with deformable action parts constraints. Specifically, we incorporate a deformable parts learning component into the 3D CNN framework, which can detect specific facial action parts under the structured spatial constraints, and obtain the discriminative part-based representation simultaneously. The proposed method is evaluated on two posed expression datasets, CK+, MMI, and a spontaneous dataset FERA. We show that, besides achieving state-of-the-art expression recognition accuracy, our method also enjoys the intuitive appeal that the part detection map can desirably encode the mid-level semantics of different facial action parts.
Article
Data may often contain noise or irrelevant information which negatively affect the generalization capability of machine learning algorithms. The objective of dimension reduction algorithms such as Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF), random projection (RP) and auto-encoder (AE) is to reduce the noise or irrelevant information of the data. The features of PCA (eigenvectors) and linear AE is not able to represent data as parts (e.g. nose in a face image); On the other hand, NMF and non-linear AE is maimed by slow learning speed and RP only represents a subspace of original data. This paper introduces a dimension reduction framework which to some extend represents data as parts, has fast learning speed and learns the between-class scatter subspace. To this end, this paper investigates a linear and nonlinear dimension reduction framework referred to as Extreme Learning Machine Auto-Encoder (ELM-AE) and Sparse Extreme Learning Machine Auto-Encoder (SELM-AE). In contrast to tied weight auto-encoder (TAE), the hidden neurons in ELMAE and SELM-AE need not be tuned, their parameters (e.g, input weights in additive neurons) are initialized using orthogonal and sparse random weights respectively. Experimental results on USPS handwritten digit recognition dataset, CIFAR-10 object recognition and NORB object recognition data set show the efficacy of linear and non-linear ELM-AE and SELM-AE in terms of discriminative capability, sparsity, training time and Normalized Mean Square Error (NMSE).
Article
Automatic recognition of facial expressions is an interesting and challenging research topic in the field of pattern recognition due to applications such as human-machine interface design and developmental psychology. Designing classifiers for facial expression recognition with high reliability is a vital step in this research. This paper presents a novel framework for person-independent expression recognition by combining multiple types of facial features via multiple kernel learning (MKL) in multiclass support vector machines (SVM). Existing MKL-based approaches jointly learn the same kernel weights with l1l_{1} -norm constraint for all binary classifiers, whereas our framework learns one kernel weight vector per binary classifier in the multiclass-SVM with lpl_{p} -norm constraints (p1)(p \ge 1) , which considers both sparse and non-sparse kernel combinations within MKL. We studied the effect of lpl_{p} -norm MKL algorithm for learning the kernel weights and empirically evaluated the recognition results of six basic facial expressions and neutral faces with respect to the value of “ p ”. In our experiments, we combined two popular facial feature representations, histogram of oriented gradient and local binary pattern histogram, with two kernel functions, the heavy-tailed radial basis function and the polynomial function. Our experimental results on the CK + , MMI and GEMEP-FERA face databases as well as our theoretical justification show that this framework outperforms the state-of-the-art methods and the SimpleMKL-based multiclass-SVM for facial expression recognition.
Article
We present a system for facial expression recognition that is evaluated on multiple databases. Automated facial expression recognition systems face a number of characteristic challenges. Firstly, obtaining natural training data is difficult, especially for facial configurations expressing emotions like sadness or fear. Therefore, publicly available databases consist of acted facial expressions and are biased by the authors’ design decisions. Secondly, evaluating trained algorithms towards real-world behavior is challenging, again due to the artificial conditions in available image data. To tackle these challenges and since our goal is to train classifiers for an online system, we use several databases in our evaluation. Comparing classifiers across data-bases determines the classifiers capability to generalize more reliable than traditional self-classification.
Conference Paper
http://www.ee.oulu.fi/research/imag/texture Abstract. This paper presents generalizations to the gray scale and rotation invariant texture classification method based on local binary patterns that we have recently introduced. We derive a generalized presentation that allo ws for realizing a gray scale and rotation in variant LBP operator for any quantization of the angular space and for any spatial resolution, and present a method for combining multiple operators for multiresolution analysis. The proposed approach is very robust in terms of gray scale variations, since the operator is by definition invariant against any monotonic transformation of the gray scale. Another advantage is computational simplicity, as the operator can be realized with a few operations in a small neighborhood and a lookup table. Excellent experimental results obtained in a true problem of rotation in variance, where the classifier is trained at one particular rotation angle and tested with samples from other rotation angles, demonstrate that good discrimination can be achieved with the occurrence statistics of simple rotation in variant local binary patterns. These operators characterize the spatial configuration of local image texture and the performance can be further improved by combining them with rotation invariant variance measures that characterize the contrast of local image texture. The joint distrib utions of these orthogonal measures are shown to be very powerful tools for rotation invariant texture analysis. nonparametric texture analysis distribution histogram Brodatz contrast
Article
This paper presents a novel and efficient facial image representation based on local binary pattern (LBP) texture features. The face image is divided into several regions from which the LBP feature distributions are extracted and concatenated into an enhanced feature vector to be used as a face descriptor. The performance of the proposed method is assessed in the face recognition problem under different challenges. Other applications and several extensions are also discussed.
Conference Paper
Within the past decade, significant effort has occurred in developing methods of facial expression analysis. Because most investigators have used relatively limited data sets, the generalizability of these various methods remains unknown. We describe the problem space for facial expression analysis, which includes level of description, transitions among expressions, eliciting conditions, reliability and validity of training and test data, individual differences in subjects, head orientation and scene complexity image characteristics, and relation to non-verbal behavior. We then present the CMU-Pittsburgh AU-Coded Face Expression Image Database, which currently includes 2105 digitized image sequences from 182 adult subjects of varying ethnicity, performing multiple tokens of most primary FACS action units. This database is the most comprehensive testbed to date for comparative studies of facial expression analysis
Article
Presents a theoretically very simple, yet efficient, multiresolution approach to gray-scale and rotation invariant texture classification based on local binary patterns and nonparametric discrimination of sample and prototype distributions. The method is based on recognizing that certain local binary patterns, termed "uniform," are fundamental properties of local image texture and their occurrence histogram is proven to be a very powerful texture feature. We derive a generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis. The proposed approach is very robust in terms of gray-scale variations since the operator is, by definition, invariant against any monotonic transformation of the gray scale. Another advantage is computational simplicity as the operator can be realized with a few operations in a small neighborhood and a lookup table. Experimental results demonstrate that good discrimination can be achieved with the occurrence statistics of simple rotation invariant local binary patterns
Infrared target detection based on LBP
  • S Lu
  • J H Yang
  • B Zhang
  • J Q Zhang
Infrared target detection based on LBP
  • lu