Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper presents a constructive training algorithm for Multi Layer Perceptron (MLP) applied to facial expression recognition applications. The developed algorithm is composed by a single hidden-layer using a given number of neurons and a small number of training patterns. When the Mean Square Error MSE on the Training Data TD is not reduced to a predefined value, the number of hidden neurons grows during the neural network learning. Input patterns are trained incrementally until all patterns of TD are presented and learned. The proposed MLP constructive training algorithm seeks to find synthesis parameters as the number of patterns corresponding for subsets of each class to be presented initially in the training step, the initial number of hidden neurons, the number of iterations during the training step as well as the MSE predefined value. The suggested algorithm is developed in order to classify a facial expression. For the feature extraction stage, a biological vision-based facial description, namely Perceived Facial Images PFI has been applied to extract features from human face images. To evaluate, the proposed approach is tested on three databases which are the GEMEP FERA 2011, the Cohn-Kanade facial expression and the facial expression recognition FER-2013 databases. Compared to the fixed MLP architecture and the literature review, experimental results clearly demonstrate the efficiency of the proposed algorithm.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In Boughrara et al. (2016), they proposed an MLP for facial expression classification. The established MLP model consists of a single hidden layer, which seeks to find synthesis parameters in the training stage. ...
... Facial expressions/features Soleymani and Pantic, 2012;Zhao et al., 2013;Boughrara et al., 2016;Choi et al., 2016;Kaklauskas et al., 2016;Mahata et al., 2017;Diaz et al., 2018;Fonnegra, 2018;Hewitt and Gunes, 2018;Kaklauskas et al., 2018;Bohlin et al., 2019;Soni et al., 2019;De Pessemier et al., 2020;Mishra et al., 2020;Leite et al., 2022 Skin-estimated pulse/heart rate Dabas et al., 2018;Diaz et al., 2018;Shu et al., 2018a;Bohlin et al., 2019;Soni et al., 2019;ÐordevićČegar et al., 2020Mood Winoto and Tang, 2010EDA ÐordevićČegar et al., 2020BA Alhagry, 2017Liu et al., 2017;Kwon et al., 2018;Ogawa et al., 2018;Yang et al., 2019;ÐordevićČegar et al., 2020 User interactions Niu et al., 2013;Niu et al., 2016GSR Kwon et al., 2018 Body gestures Hassib et al., 2017 Perceived connotative properties Martha and Larson, 2013;Zhang and Zhang, 2017 Movie reviews/comments/web recordings Mulholland et al., 2017;Yenter, 2017;Tripathi et al., 2019;Krishnamurthy, 2020;Pan et al., 2020;Breitfuss et al., 2021;Cao et al., 2022 Questionnaire/survey/quiz Arapakis et al., 2009a;Soleymani and Pantic, 2012;Tkalčič et al., 2013bTkalčič et al., , 2014Polignano, 2015;Hassib et al., 2017;Diaz et al., 2018;Dnodxvndv et al., 2018;Kaklauskas et al., 2018;Bohlin et al., 2019;Zhu et al., 2019;Mishra et al., 2020;Kaklauskas et al., 2020;Leite et al., 2022 TABLE 6 The evaluation metrics of different publications. ...
... Pearson's chi-square test and the dependent t-test Arapakis et al., 2009a Mean accuracy Zhao et al., 2011Zhao et al., , 2013Tkalčič et al., 2013b;Fan et al., 2016;Alhagry, 2017;Liu et al., 2017;Yenter, 2017;Zhang and Zhang, 2017;Dabas et al., 2018;Fonnegra, 2018;Hewitt and Gunes, 2018;Kwon et al., 2018;Shu et al., 2018a;Zhang et al., 2018;Bohlin et al., 2019;Soni et al., 2019;Yang et al., 2019;De Pessemier et al., 2020;Krishnamurthy, 2020;Mishra et al., 2020;Nie et al., 2020;Qi et al., 2021;Leite et al., 2022 Precision/recall/F1 Niu et al., 2013;Shi et al., 2013;Liu et al., 2017;Ogawa et al., 2018;Zhang et al., 2018;Tripathi et al., 2019;Yang et al., 2019;Krishnamurthy, 2020;Mishra et al., 2020;Cao et Session length Niu et al., 2013;Niu et al., 2016 Confusion matrix Tkalčič et al., 2011bTkalčič et al., , 2013bZhao et al., 2013;Boughrara et al., 2016;Fan et al., 2016CACE Leite et al., 2022 Sparsity impact, the granularity of emotions, extensibility, recommendation quality, additional characteristics Breitfuss et al., 2021 Valence, arousal Wang and Cheong, 2006;Soleymani et al., 2009;Soleymani and Pantic, 2012;Oliveira et al., 2013;Tkalčič et al., 2013b;Liu et al., 2017;Kwon et al., 2018;Yang et al., 2019 of recommender systems is not comprehensive. More complex expressions that are not easily exposed should be paid attention to, for example, micro-expression recognition (Ben et al., 2021). ...
Article
Full-text available
Traditional video recommendation provides the viewers with customized media content according to their historical records (e.g., ratings, reviews). However, such systems tend to generate terrible results if the data is insufficient, which leads to a cold-start problem. An affective video recommender system (AVRS) is a multidiscipline and multimodal human-robot interaction (HRI) system, and it incorporates physical, physiological, neuroscience, and computer science subjects and multimedia resources, including text, audio, and video. As a promising research domain, AVRS employs advanced affective analysis technologies in video resources; therefore, it can solve the cold-start problem. In AVRS, the viewers’ emotional responses can be obtained from various techniques, including physical signals (e.g., facial expression, gestures, and speech) and internal signals (e.g., physiological signals). The changes in these signals can be detected when the viewers face specific situations. The physiological signals are a response to central and autonomic nervous systems and are mostly involuntarily activated, which cannot be easily controlled. Therefore, it is suitable for reliable emotion analysis. The physical signals can be recorded by a webcam or recorder. In contrast, the physiological signals can be collected by various equipment, e.g., psychophysiological heart rate (HR) signals calculated by echocardiogram (ECG), electro-dermal activity (EDA), and brain activity (GA) from electroencephalography (EEG) signals, skin conductance response (SCR) by a galvanic skin response (GSR), and photoplethysmography (PPG) estimating users’ pulse. This survey aims to provide a comprehensive overview of the AVRS domain. To analyze the recent efforts in the field of affective video recommendation, we collected 92 relevant published articles from Google Scholar and summarized the articles and their key findings. In this survey, we feature these articles concerning AVRS from different perspectives, including various traditional recommendation algorithms and advanced deep learning-based algorithms, the commonly used affective video recommendation databases, audience response categories, and evaluation methods. Finally, we conclude the challenge of AVRS and provide the potential future research directions.
... For DL, a pre-trained Resnet50 45 network architecture was used. For LDM, we used the Multilayer Perceptron (MLP) neural network 46 with three hidden layers. In each of the approaches, we performed training with and without data augmentation and cat face alignment to test the effect of each of these steps on performance. ...
... We adapt the notion of multi-region vectors of Qiu at al. 44 to the context of cat faces based on the Feline Grimace Scale 21 as follows. 3,13,14,15,16,17,18,19,20,21,22,33,34,35,36,43,44,45,46,47,48 with central point on landmark 17); and forehead region (defined by landmarks 23,24,25,26,27,28,29,30,31,32 with central point on the median of landmarks 27 and 28). Figure 3 demonstrates the division. ...
Preprint
Full-text available
Facial expressions in non-human animals are closely linked to their internal affective states , with the majority of empirical work focusing on facial shape changes associated with pain. However, existing tools for facial expression analysis are prone to human subjectivity and bias, and in many cases also require special expertise and training. This paper presents the first comparative study of two different paths towards automatizing pain recognition in facial images of domestic short haired cats (n = 29), captured during ovariohysterectomy at different time points corresponding to varying intensities of pain. One approach is based on convolutional neural networks (ResNet50), while the other – on machine learning models based on geometric landmarks analysis inspired by species specific Facial Action Coding Systems (i.e. catFACS). Both types of approaches reach comparable accuracy of above 72%, indicating their potential usefulness as a basis for automating cat pain detection from images.
... Moreover, based on the learning scheme, the existing FER techniques are classified as traditional machine learning-based methods and deep learning-based methods. The machine learning-based approach for FER uses a combination of handcrafted feature extractor and the standard machine learning classifiers such as the Support Vector Machine (SVM) [13], Support Vector Neural Network (SVNN) [14], K-Nearest Neighbor (K-NN) [15], Sequential Minimal Optimization (SMO) [16], Classification Trees (CT) [17], Multi-layer Perceptron (MLP) [18], Neural Network (NN) [19], etc. In contrast to the methods based on traditional machine learning, the FER methods based on deep learning techniques are end-to-end trainable. ...
... To solve the constrained optimization problem expressed in (18), the Lagrange multiplier technique is used [80]. The technique determines the value of output weight β based on the nature of the matrix ( I C + H T H). ...
Article
Full-text available
Rapid growth in advanced human-computer interaction (HCI) based applications has led to the immense popularity of facial expression recognition (FER) research among computer vision and pattern recognition researchers. Lately, a robust texture descriptor named Dynamic Local Ternary Pattern (DLTP) developed for face liveness detection has proved to be very useful in preserving facial texture information. The findings motivated us to investigate DLTP in more detail and examine its usefulness in the FER task. To this end, a FER pipeline is developed, which uses a sequence of steps to detect possible facial expressions in a given input image. Given an input image, the pipeline first locates and registers faces in it. In the next step, using an image enhancement operator, the FER pipeline enhances the facial images. Afterward, from the enhanced images, facial features are extracted using the DLTP descriptor. Subsequently, the pipeline reduces dimensions of the high-dimensional DLTP features via Principal Component Analysis (PCA). Finally, using the multi-class Kernel Extreme Learning Machine (K-ELM) classifier, the proposed FER scheme classifies the features into facial expressions. Extensive experiments performed on four in-the-lab and one in-the-wild FER datasets confirmed the superiority of the method. Besides, the cross-dataset experiments performed on different combinations of the FER datasets revealed its robustness. Comparison results with several state-of-the-art FER methods demonstrate the usefulness of the proposed FER scheme. The pipeline with a recognition accuracy of 99.76%, 99.72%, 93.98%, 96.71%, and 78.75%, respectively, on the CK+, RaF, KDEF, JAFFE, and RAF-DB datasets, outperformed the previous state-of-the-art.
... In the presented method, three different techniques of features selection are used as an objective criterion to produce the most characteristic features of facial images. The combination of first 10 ranked features of each feature selection techniques are used to classify the facial expressions using multilayer perceptron (MLP) (Boughrara et al., 2016), Naïve Bayes (NB) (Lajevardi and Hussain, 2010), decision tree J48 (Yan Nie et al., 2015), and K-nearest neighbor (KNN) classifiers. ...
... While Li et al., 2010, andZhong et al., 2018, obtained a good recognition rate but used a larger number of features with the ability to recognize fewer facial expressions. Boughrara et al., 2016, Kumbhar et al., 2012, and Peng and Yin, 2018, could gain a high accuracy using less number of features but with the ability of recognizing fewer expressions. Table 1 shows the comparison summery of related works. ...
Article
Full-text available
Facial expression recognition (FER) has achieved an extreme role in research area since the 1990s. This paper provides a comparison approach for FER based on three feature selection methods which are correlation, gain ration, and information gain for determining the most distinguished features of face images using multi-classification algorithms which are multilayer perceptron, Naïve Bayes, decision tree, and K-nearest neighbor (KNN). These classifiers are used for the mission of expression recognition and for comparing their proportional performance. The main aim of the provided approach is to determine the most effective classifier based on minimum acceptable number of features by analyzing and comparing their performance. The provided approach has been applied on CK+ dataset. The experimental results show that KNN is proven to be better classifier with 91% accuracy using only 30 features.
... Figure 4 presents the prediction of happy with the surprise emotion image and Fig. 5 includes the prediction of sad with an angry emotion sample image. Based on the findings and evaluation using the FER2013 dataset, the proposed model is compared with the state of the art from Ionescu et al. [11], Boughrara et al. [3] and Christou et al. [7]. The proposed model is also compared with the state of the art from Cheng et al. [6], Shan et al. [26] and Minaee et al. [23] based on the findings using the JAFFE dataset. ...
... They adapted a bag of visual words model and tested over the FER2013 dataset resulting in 67.48% accuracy. Boughrara et al. [3] proposed a model using a multi-layer perceptron with a single hidden layer. The authors trained input patterns incrementally so that all data from the training dataset was learned and then tested on the FER2013 dataset with 84.58% accuracy. ...
Article
Full-text available
Humans use facial expressions as a tool to show emotional states. Facial expression recognition remains an interesting and challenging area of research in computer vision. An improved deep learning approach using a convolution neural network (CNN) is proposed in this paper to predict emotions by analysing facial expressions contained in an image. The model developed in this work consists of one CNN to analyse the primary emotion of the image as being happy or sad and a second CNN to predict the secondary emotion of the image. The proposed model was trained on the FER2013 and Japanese female facial expression (JAFFE) datasets with results suggesting its capability of predicting emotions from facial expressions better than existing state-of-the-art approaches.
... For the final activation function in the output layer, it may depend on whether the problem is a regression or classification problem. This allows the network to calculate the errors in the outputs that the network penalizes and then adjust the weights of the network to minimize the errors (Boughrara et al., 2016). This is a mathematical update of the weights as in Equation 28: ...
Article
Full-text available
The main objective of this study is to contribute to the literature by forecasting green bond index with different machine learning models supported by artificial intelligence. The data from 1 June 2021 to 29 April 2024, collected from many sources, was separated into training and test sets, and standard preparation was conducted for each. The model's dependent variable is the Global S&P Green Bond Index, which monitors the performance of green bonds in global financial markets and serves as a comprehensive benchmark for the study. To evaluate and compare the performance of the trained machine learning models (Random Forest, Linear Regression, Rational Quadratic Gaussian Process Regression (GPR), XGBoost, MLP, and Linear SVM), RMSE, MSE, MAE, MAPE, and R² were used as evaluation metrics and the best performing model was Rational Quadratic GPR. The concluding segment of the SHAP analysis reveals the primary factors influencing the model's forecasts. It is evident that the model assigns considerable importance to macroeconomic indicators, including the DXY (US Dollar Index), XAU (Gold Spot Price), and MSCI (Morgan Stanley Capital International). This work is expected to enhance the literature, as studies directly comparable to this research are limited in this field.
... Furthermore, as a practical application, consider the training of an artificial neural network for classification or data fitting problems [69,70]. Neural networks are non-linear parametric tools with many applications in real-world problems [71][72][73]. Neural networks can be defined as functions N( − → x , − → w ). ...
Article
Full-text available
Global optimization is widely adopted presently in a variety of practical and scientific problems. In this context, a group of widely used techniques are evolutionary techniques. A relatively new evolutionary technique in this direction is that of Giant-Armadillo Optimization, which is based on the hunting strategy of giant armadillos. In this paper, modifications to this technique are proposed, such as the periodic application of a local minimization method as well as the use of modern termination techniques based on statistical observations. The proposed modifications have been tested on a wide series of test functions available from the relevant literature and compared against other evolutionary methods.
... To ensure unified fair investigation and to avoid unjust perception confusion, we compare the performance of each model with the performance of other related methods that used the same dataset. For the FER-2013 dataset, Boughrara et al. [8] introduced a model employing a multi-layer perceptron featuring a single hidden layer. They adopted an incremental training approach to ensure comprehensive learning of all data from the training dataset, achieving an 84.58% accuracy rate on the FER2013 dataset. ...
... With MLP, the model recognized the emotion with an accuracy of 99.2%. Boughrara et al. [36] proposed a constructive algorithm for multilayer perceptron in which the network starts with a small structure and grows as per the requirement. The resulting classifier is applied for the recognition of facial expressions in different fields and tested on three different databases including GEMEP FERA 2011, the Cohn-Kanade facial expression, and the FER-2013 databases. ...
Article
Full-text available
Human ideas and sentiments are mirrored in facial expressions. Facial expression recognition (FER) is a crucial type of visual data that can be utilized to deduce a person’s emotional state. It gives the spectator a plethora of social cues, such as the viewer’s focus of attention, emotion, motivation, and intention. It’s said to be a powerful instrument for silent communication. AI-based facial recognition systems can be deployed at different areas like bus stations, railway stations, airports, or stadiums to help security forces identify potential threats. There has been a lot of research done in this area. But, it lacks a detailed review of the literature that highlights and analyses the previous work in FER (including work on compound emotion and micro-expressions), and a comparative analysis of different models applied to available datasets, further identifying aligned future directions. So, this paper includes a comprehensive overview of different models that can be used in the field of FER and a comparative study of the traditional methods based on hand-crafted feature extraction and deep learning methods in terms of their advantages and disadvantages which distinguishes our work from existing review studies.This paper also brings you to an eye on the analysis of different FER systems, the performance of different models on available datasets, evaluation of the classification performance of traditional and deep learning algorithms in the context of facial emotion recognition which reveals a good understanding of the classifier’s characteristics. Along with the proposed models, this study describes the commonly used datasets showing the year-wise performance achieved by state-of-the-art methods which lacks in the existing manuscripts. At last, the authors itemize recognized research gaps and challenges encountered by researchers which can be considered in future research work.
... Furthermore, neural networks have been used with success to solve differential equations [15][16][17], solar radiation prediction [18,19], spam detection [20][21][22], etc. Moreover, variations of artificial neural networks have been employed to solve agricultural problems [23,24], facial expression recognition [25], prediction of the speed of wind [26], the gas consumption problem [27], intrusion detection [28], hydrological systems [29], etc. Furthermore, Swales and Yoon discussed the application of artificial neural networks to investment analysis in their work [30]. ...
Article
Full-text available
Perhaps one of the best-known machine learning models is the artificial neural network, where a number of parameters must be adjusted to learn a wide range of practical problems from areas such as physics, chemistry, medicine, etc. Such problems can be reduced to pattern recognition problems and then modeled from artificial neural networks, whether these problems are classification problems or regression problems. To achieve the goal of neural networks, they must be trained by appropriately adjusting their parameters using some global optimization methods. In this work, the application of a recent global minimization technique is suggested for the adjustment of neural network parameters. In this technique, an approximation of the objective function to be minimized is created using artificial neural networks and then sampling is performed from the approximation function and not the original one. Therefore, in the present work, learning of the parameters of artificial neural networks is performed using other neural networks. The new training method was tested on a series of well-known problems, a comparative study was conducted against other neural network parameter tuning techniques, and the results were more than promising. From what was seen after performing the experiments and comparing the proposed technique with others that have been used for classification datasets as well as regression datasets, there was a significant difference in the performance of the proposed technique, starting with 30% for classification datasets and reaching 50% for regression problems. However, the proposed technique, because it presupposes the use of global optimization techniques involving artificial neural networks, may require significantly higher execution time than other techniques.
... Hayet Boughrara et al. [19] proposed a constructive Multi-Layer Perceptron ( MLP) training algorithm for face recognition implementations. A single layer of hidden neurons and minimal training patterns composes the developmental algorithm. ...
Article
In this study, we present a multi-level fusion of deep learning technique for facial expression identification, with applications spanning the fields of cognitive science, personality development, and the detection and diagnosis of mental health disorders in humans. The suggested approach, named Deep Learning aided Hybridized Face Expression Recognition system (DLFERS), classifies human behavior from a single image frame through the use of feature extraction and a support vector machine. An information classification algorithm is incorporated into the methodology to generate a new fused image consisting of two integrated blocks of eyes and mouth, which are very sensitive to changes in human expression and relevant for interpreting emotional expressions. The Transformation of Invariant Structural Features (TISF) and the Transformation of Invariant Powerful Movement (TIPM) are utilized to extract features in the suggested method's Storage Pack of Features (SPOF). Multiple datasets are used to compare the effectiveness of different neural network algorithms for learning facial expressions. The study's major findings show that the suggested DLFERS approach achieves an overall classification accuracy of 93.96 percent and successfully displays a user's genuine emotions during common computer-based tasks.
... Artificial neural networks (ANNs) are parametric machine learning models [1,2] which have been widely used during the last decades in a series of practical problems from scientific fields such as physics problems [3][4][5], chemistry problems [6][7][8], problems related to medicine [9,10], economic problems [11][12][13], etc. In addition, ANNs have recently been applied to models solving differential equations [14,15], agricultural problems [16,17], facial expression recognition [18], wind speed prediction [19], the gas consumption problem [20], intrusion detection [21], etc. Usually, neural networks are defined as a function N( − → x , − → w ), provided that the vector − → x is the input pattern to the network and the vector − → w stands for the weight vector. To estimate the weight vector, the so-called training error is minimized, which is defined as the sum: ...
Article
Full-text available
Artificial neural networks are machine learning models widely used in many sciences as well as in practical applications. The basic element of these models is a vector of parameters; the values of these parameters should be estimated using some computational method, and this process is called training. For effective training of the network, computational methods from the field of global minimization are often used. However, for global minimization techniques to be effective, the bounds of the objective function should also be clearly defined. In this paper, a two-stage global optimization technique is presented for efficient training of artificial neural networks. In the first stage, the bounds for the neural network parameters are estimated using Particle Swarm Optimization and, in the following phase, the parameters of the network are optimized within the bounds of the first phase using global optimization techniques. The suggested method was used on a series of well-known problems in the literature and the experimental results were more than encouraging.
... They reported an average accuracy of 82.97% on the CK+ database when classifying eight basic expressions. Boughrara et al. [34] developed a new constructive training algorithm for the Multi-Layer Perceptron (MLP) applied to FER. Unlike most traditional algorithms, which fix the structure of the neural network before training, the proposed constructive training algorithm learns the architecture of the network and performs the learning process simultaneously. ...
Article
Full-text available
Facial Expression Recognition (FER) is a growing area of research due to its numerous applications in market research, video gaming, healthcare, security, e-learning, and robotics. One of the most common frameworks for recognizing facial expressions is by extracting facial features from an image and classifying them as one of several prototypic expressions. Despite the recent advances, it is still a challenging task to develop robust facial expression descriptors. This study aimed to analyze the performances of various local descriptors and classifiers in the FER problem. Several experiments were conducted under different settings, such as varied extraction parameters, different numbers of expressions, and two datasets, to discover the best combinations of local descriptors and classifiers. Of all the considered descriptors, HOG (Histogram of Oriented Gradients) and ALDP (Angled Local Directional Patterns) were some of the most promising, while SVM (Support Vector Machines) and MLP (Multi-Layer Perceptron) were the best among the considered classifiers. The results obtained signify that conventional FER approaches are still comparable to state-of-the-art methods based on deep learning.
... Due to the successful performance of previous hybrid approaches, the simultaneous use of deep and hand-crafted features is proposed as the basis of the proposed method. We used the Color features+ gabor transform [3] 86 AUDN [23] 93.7 Salient facial patches [15] 94.1 CNN+SIFT [41] 94.13 Distance signature on MLP [2] 95.9 FER based on MLP using constructive training algorithm [4] 96.6 Proposed method 98.48 combination of the feature vectors resulted from deep and traditional feature descriptors. With this method, deep layers are used to extract features from raw images. ...
Article
Full-text available
Facial expression recognition is still one of the most attractive and challenging problems. This study designed a facial expression recognition approach based on the feature fusion strategy. In this proposed approach, two types of features are used to classify the facial expressions. The first type is deep learned features obtained from the CNN layers, and the other is hand-crafted features in which a geometric approach called DAISY is used to have a more discriminative model. The DAISY descriptor is used to extract the features because of its efficiency and performance in many problems like object detection, image classification, etc. Besides, the Convolutional Neural Network (CNN) layers are used in both standard and custom structures. A robust and highly distinguishing feature vector is conducted when these two types of features are concatenated. This feature vector helps CNN s work in an enhanced manner. The extra information provided by DAISY made it easy for the resulting model to make decisions because this feature descriptor does not require much data to work precisely. Finally, we used the Random Forest classifier for the classification task to make the proposed pipeline complete. To validate the efficiency of the proposed approach, two well-known facial expression datasets, CK+ and FER2013 are used. The proposed feature fusion-based method’s accuracy is 98.48% in the CK+ dataset and 70% in FER2013. The results are compared with some newly proposed approaches in this field to validate our strategy. Since this performance is in the range of state-of-the-art systems, the proposed strategy that enhances the CNN features by hand-crafted techniques can be presented as a suitable FER method.
... The discrete wavelet transform function is a well-known feature extractor used in digital image processing. For the optimization of features used, a glowworm optimization algorithm [3,4,5]. The glowworm optimization algorithm based on the concept of neighbor's selection process. ...
Article
Full-text available
The advancement of digital technology needs biometric security systems. Face detection plays an essential role in the security of digital devices. The detection of a face based on the lower content of the facial image for the processing of detection. In this paper modified the BP Neural Network Model for the detection of the human face. The modification of face detection algorithms incorporates feature optimization. The feature optimization process reduces the distorted features of the facial image. The optimized features of facial image enhance the performance of face detection for the optimization of features used glowworm optimization algorithms. The glowworm optimization algorithm is a dynamic population-based search technique. The concept of glowworm is a neighbor’s selection of worms based on the process of lubrification. For feature extraction we use discrete wavelet transform. The discrete wavelet transform function drives the features component in terms of low frequency and high frequency of facial image. The proposed algorithm simulated in MATLAB software and used a reputed facial image dataset from CSV300. Our experimental results show a better detection rate instead of the BP neural network model.
... The confusion matrix results of this method view better accuracy results (exceeds 80%) for all six emotions for both the datasets used. The previous approaches and other existing works [7][8][9][10][11][12] uses different ways of CNN as the deep learning techniques for FER. These techniques give significant results especially for Emotion Recognition. ...
Article
Full-text available
Generally, the process of detecting micro expressions takes significant importance because all these expressions reflect the hidden emotions even when the person tried to conceal them. In this paper, a new approach has been proposed to estimate the percentage of sarcasm based on the detected degree of happiness of facial expression using fuzzy inference system. Five regions in a face (right/left brows, right/left eyes, and mouth) are considered to determine some active distances from the detected outline points of these regions. The membership functions in the proposed fuzzy inference system are used as a first step to determine the degree of happiness expression based mainly on the computed distances and then another membership function is used to estimate the percentage of sarcasm according the outcomes of the membership functions in the first step. The proposed approach is validated using some face images which are collected from the SMIC, SAMM, and CAS(ME)2 standard datasets.
... For DL, a pre-trained Resnet50 48 network architecture was used. For LDM, we used the Multilayer Perceptron (MLP) neural network 49 with three hidden layers. In each of the approaches, we performed training with and without data augmentation and cat face alignment to test the effect of each of these steps on performance. ...
Article
Full-text available
Facial expressions in non-human animals are closely linked to their internal affective states, with the majority of empirical work focusing on facial shape changes associated with pain. However, existing tools for facial expression analysis are prone to human subjectivity and bias, and in many cases also require special expertise and training. This paper presents the first comparative study of two different paths towards automatizing pain recognition in facial images of domestic short haired cats (n = 29), captured during ovariohysterectomy at different time points corresponding to varying intensities of pain. One approach is based on convolutional neural networks (ResNet50), while the other—on machine learning models based on geometric landmarks analysis inspired by species specific Facial Action Coding Systems (i.e. catFACS). Both types of approaches reach comparable accuracy of above 72%, indicating their potential usefulness as a basis for automating cat pain detection from images.
... These extracted patterns were given as input to the relevance vector machine (RVM) that classified the different emotional states by achieving a recognition rate of 95.6%. Multi-layer perceptron [18] has been presented to classify the emotional states. Certain texture features of the facial skin were extracted using the biological visionary approaches. ...
Article
Full-text available
p> Deep multi-task learning is one of the most challenging research topics widely explored in the field of recognition of facial expression. Most deep learning models rely on the class labels details by eliminating the local information of the sample data which deteriorates the performance of the recognition system. This paper proposes multi-feature-based deep convolutional neural networks (D-CNN) that identify the facial expression of the human face. To enhance the accuracy of recognition systems, the multi-feature learning model is employed in this study. The input images are preprocessed and enhanced via three filtering methods i.e., Gaussian, Wiener, and adaptive mean filtering. The preprocessed image is then segmented using a face detection algorithm. The detected face is further applied with local binary pattern (LBP) that extracts the facial points of each facial expression. These are then fed into the D-CNN that effectively recognizes the facial expression using the features of facial points. The proposed D-CNN is implemented, and the results are compared to the existing support vector machine (SVM). The analysis of deep features helps to extract the local information from the data without incurring a higher computational effort. </p
... Segmentation of local facial components leads to significant reduction in computation cost of both -the feature extraction and classification. Following Tian [21], Shan et al. [22] and Baughrara et al. [28], we used fixed eye distance-based approach to normalize the face. Shan et al. [22] fix the distance between eyeballs to 52 pixels and face is cropped and normalized to 150 Â 110 pixels using prior knowledge of facial geometry. ...
Chapter
Full-text available
In this chapter, we investigated computer vision technique for facial expression recognition, which increase both - the recognition rate and computational efficiency. Local and global appearance-based features are combined in order to incorporate precise local texture and global shapes. We proposed Multi-Level Haar (MLH) feature based system, which is simple and fast in computation. The driving factors behind using the Haar were its two interesting properties - signal compression and energy preservation. To depict the importance of facial geometry, we first segmented the facial components like eyebrows, eye, and mouth, and then applied feature extraction on these facial components only. Experiments are conducted on three well known publicly available expression datasets CK, JAFFE, TFEID and in-house WESFED dataset. The performance is measured against various template matching and machine learning classifiers. We achieved highest recognition rate for proposed operator with Discriminant Analysis Classifier. We studied the performance of proposed approach in several scenarios like expression recognition from low resolution, recognition from small training sample space, recognition in the presence of noise and so forth.
... The neurons in different layers are connected to each other by weighted paths between layers. The multiple layer perceptron (MLP) algorithm, one of the ANN procedures, was used to perform a one-layer efficient neural network in this study [68,69]. ...
Article
Full-text available
Machine Learning (ML) algorithms provide an alternative for the prediction of pollutant concentration. We compared eight ML algorithms (Linear Regression (LR), uniform weighting k-Nearest Neighbor (UW-kNN), variable weighting k-Nearest Neighbor (VW-kNN), Support Vector Regression (SVR), Artificial Neural Network (ANN), Regression Tree (RT), Random Forest (RF), and Adaptive Boosting (AdB)) to evaluate the feasibility of ML approaches for estimation of Total Suspended Solids (TSS) using the national stormwater quality database. Six factors were used as features to train the algorithms with TSS concentration as the target parameter: Drainage area, land use, percent of imperviousness, rainfall depth, runoff volume, and antecedent dry days. Comparisons among the ML methods demonstrated a higher degree of variability in model performance, with the coefficient of determination (R²) and Nash–Sutcliffe (NSE) values ranging from 0.15 to 0.77. The Root Mean Square (RMSE) values ranged from 110 mg/L to 220 mg/L. The best fit was obtained using the AdB and RF models, with R² values of 0.77 and 0.74 in the training step and 0.67 and 0.64 in the prediction step. The NSE values were 0.76 and 0.72 in the training step and 0.67 and 0.62 in the prediction step. The predictions from AdB were sensitive to all six factors. However, the sensitivity level was variable.
... A feed-forward layered neural network called Multi-Layer Perceptron is used in this research as a baseline classifier. In the last decade, many researchers have found that MLP achieved good results for FER [1]. ...
Article
Full-text available
E-learning enables the dissemination of valuable academic information to all users regardless of where they are situated. One of the challenges faced by e-learning systems is the lack of constant interaction between the user and the system. This observability feature is an essential feature of a typical classroom setting and a means of detecting or observing feature reactions and thus such features in the form of expressions should be incorporated into an e-learning platform. The proposed solution is the implementation of a deep-learning-based facial image analysis model to estimate the learning affect and to reflect on the level of student engagement. This work proposes the use of a Temporal Relational Network (TRN), for identifying the changes in the emotions on students’ faces during e-learning session. It is observed that TRN sparsely samples individual frames and then learns their causal relations, which is much more efficient than sampling dense frames and convolving them. In this paper, single-scale and multi-scale temporal relations are considered to achieve the proposed goal. Furthermore, a Multi-Layer Perceptron (MLP) is also tested as a baseline classifier. The proposed framework is end-to-end trainable for video-based Facial Emotion Recognition (FER). The proposed FER model was tested on the open-source DISFA+ database. The TRN based model showed a significant reduction in the length of the feature set which were effective in recognizing expressions. It is observed that the multi-scale TRN has produced better accuracy than the single-scale TRN and MLP with an accuracy of 92.7%, 89.4%, and 86.6% respectively.
... Neural networks, which are also called machine learning algorithms [1], have been introduced into the scientific community for relatively a long time. Notwithstanding their widespread use in fields such as economics [2] and stock market [3], they have not been much pursued in the field of geoscience, except for the field of remote sensing image classification [4], [5], [6]. Recently, however, the geodetic community has been introduced with the potential applicability of the machine learning methods to the geodetic problems such as time series analysis [7], [8], [9], [10]. ...
Method
We investigate the efficiency of the generalized neural networks for the prediction of earthquakes. We particularly focus on the Tohoku 2011 earthquake. Using this machine learning method, we focus on the prediction performance assessment, using the different criteria.
... Multi-Layer Perceptron (MLP) neural network is proposed for facial recognition applications. Compared to CNNs, MLP technique increases the effectiveness and precision of recognition process, due to assembling the neural network in a manner that is efficient in ignoring the majority of the nonfacial patterns [27]. The hybrid method of Wavelet transform and ANN is proposed to recognize a human face. ...
Article
Full-text available
Face feature extraction and classification is an attracting research area for its various applications. This paper proposes a hybrid technique based on modified local binary pattern (MLBP) and Layered-Recurrent neural network (L-RNN) to recognize the human faces. The proposed MLBP algorithm reduces the dimensions of extracted face images features. The classification process is conducted using L-RNN. The quasi-Newton back propagation algorithm is used to train the L-RNN. The proposed hybrid technique is examined on MUCT database and the performance analysis of different ANN techniques shows that the Hybrid technique is robust and has superior performance over conventional methods. It achieves the best classification rate of 98%.
Article
Full-text available
In recent years, A series of environmental problems caused by air pollution have attracted widespread attention. Air quality forecasting has become an indispensable part of people’s daily life. However, the traditional air quality prediction (AQP) model WRF-CMAQ simulation system faces several challenges: (1) the fuzziness of pollutant formation mechanism; (2) the hardness of integrating the features of meteorological conditions; (3) the difficulty of cooperating among monitoring stations. To deal with these challenges, we propose a multi-task station cooperative air quality prediction (MTSC-AQP) system for sustainable development. The MTSC-AQP system contains three modules: air quality analysis, a WRF-LSTM module for initial pollutant prediction, and multi-station cooperative AQP. Through air quality analysis, it can calculate the air quality index (AQI) and analyze the correlation between pollutants and meteorological conditions. Then, the WRF-LSTM module integrates spatio-temporal multi-source data to forecast the initial concentration of pollutants. Finally, the system incorporates the gravity model to predict the final AQI. Extensive experiments conducted on real-world datasets show the effectiveness of forecasting the AQI by using the MTSC-AQP system.
Article
Video data is an asset that may be used in various settings, such as a live broadcast on a personal blog or a security camera at a manufacturing facility. Both of these examples are examples of how video data can be used. It is becoming increasingly common practice across a wide range of applications to use a machine learning appliance as a tool for processing video. Recent years have seen significant advancements made in the field of machine learning in computer vision. These advancements have been achieved. The presentation of humans is approached or even surpassed in areas such as item identification, object categorization, and image segmentation. Despite this, challenging difficulties exist, such as identifying human emotions. This study aims to recognize human emotions by analyzing still images and motion pictures taken from motion pictures using numerous machine learning procedures. To accomplish this, neural networks constructed based on Generative Adversarial Networks (GAN) were used to classify each face picture obtained from a frame into one of the seven categories of facial emotions we chose. To communicate feelings, videos are mined for informative aspects such as audio, single, and multiple video frames. During this process stage, separate instances of the OpenSMILE and Inception-ResNet-v2 models extract feature vectors from the audio and frames. After that, numerous classification models are trained using stochastic gradient descent with the impetus approach (SGDMA). The findings from each of the pictures are compiled into a table, and from that, it is determined which facial expression was seen on-screen the most often throughout the film. The classification of audio feature vectors is accomplished with the application of GAN-SGDMA. The Inception-ResNet-v2 algorithm is utilized to recognize feelings conveyed by still photographs. The findings of several experiments suggest that the presented distributed model GAN-SGDMA could significantly boost the speed at which facial expressions are detected and classified based on the video. We demonstrate the effectiveness of our GAN-SGDMA approach in this paper by applying it to GAN-structured face expression recognition datasets, and the results we acquire are remarkable.
Article
Full-text available
The decision of when to add a new hidden unit or layer is a fundamental challenge for constructive algorithms. It becomes even more complex in the context of multiple hidden layers. Growing both network width and depth offers a robust framework for leveraging the ability to capture more information from the data and model more complex representations. In the context of multiple hidden layers, should growing units occur sequentially with hidden units only being grown in one layer at a time or in parallel with hidden units growing across multiple layers simultaneously? The effects of growing sequentially or in parallel are investigated using a population dynamics-inspired growing algorithm in a multilayer context. A modified version of the constructive growing algorithm capable of growing in parallel is presented. Sequential and parallel growth methodologies are compared in a three-hidden layer multilayer perceptron on several benchmark classification tasks. Several variants of these approaches are developed for a more in-depth comparison based on the type of hidden layer initialization and the weight update methods employed. Comparisons are then made to another sequential growing approach, Dynamic Node Creation. Growing hidden layers in parallel resulted in comparable or higher performances than sequential approaches. Growing hidden layers in parallel promotes growing narrower deep architectures tailored to the task. Dynamic growth inspired by population dynamics offers the potential to grow the width and depth of deeper neural networks in either a sequential or parallel fashion.
Article
Full-text available
Yapay Sinir Ağları (YSA), makine öğrenmesi alanında yaygın olarak kullanılan etkili bir yöntemdir ve tahmin yapmada başarılı sonuçlar sağlayabilir. YSA, biyolojik sinir sisteminden ilham alınarak matematiksel bir model oluşturur. Bu çalışmada, Türkiye'nin aylık binek otomobil ihracatını tahmin etmek için Yapay Sinir Ağı yaklaşımlarından Multilayer Perceptron (MLP) ve Radial Basis Function (RBF) modelleri kullanılmıştır. Geliştirilen sinir ağı modelleri, Türkiye'nin aylık binek otomobil ihracatını tahmin etmek için tasarlanmıştır. Bağımlı değişken olarak binek otomobil ihracat değeri kullanılırken, bağımsız değişkenler arasında Türkiye'nin aylık binek otomobil ithalatı, Amerikan Doları Kuru, Türkiye ithalatı, yeni otomobil satış adedi, motorlu kara taşıtları üretim endeksi ve yurt dışı üretici fiyat endeksi gibi faktörler bulunmaktadır. Türkiye İstatistik Kurumu ve Türkiye Cumhuriyet Merkez Bankası'ndan elde edilen aylık veriler (Ocak 2010 - Kasım 2023, 167 ay süresince) kullanılarak, Aralık 2023 ile Haziran 2024 arasındaki 7 aylık binek otomobil ihracat değerleri tahmin edilmiştir. İki farklı sinir ağı modelinin performansı karşılaştırılarak, tahminlerin farklılıkları ve sonuçları analiz edilmiştir. Bu çalışma, MLP modelinin RBF modele göre daha iyi sonuçlar verdiği sonucuna ulaşmıştır. Elde edilen sonuçlar, gelecekte binek otomobil ihracatının nasıl şekillenebileceği hakkında önemli bilgiler sunmaktadır.
Chapter
Naturalist Charles Darwin was the first to rationalize the importance of emotions. Emotions are key factors that set up the body to adapt to things like apprehension and outrage and amplify one's possibilities of endurance and achievement. The lockdown limitations and fear and consequences of infectious infection brought the whole world to a halt. Humans across the globe have realized the unpredictability of life and let go of the need to control it. This situation not only has surged the suicidal rates, domestic violence, job insecurities, grief, scarcity of basic needs, but also took a positive turn in human development. Many people felt calm, enjoyed the family bond, went back to their basic survival skills, came close to humanity, and many other realizations. This chapter provided a critical analysis of human emotions at the time of stress and different coping mechanisms adopted by the most successful species on the planet. It throws light on the different spectrums of psychosocial behaviour. The authors provide a topography of human emotions during COVID-19 pandemic.
Article
Full-text available
Background. The problem of the automatic facial micro-expression recognition from the image sequence can be solved using technologies on the basis of computer vision methods and algorithms. At present, investigations of such technologies are carried out. However, the accuracy of the recognition results depends essentially on the selection of methods, algorithms, and also their parameters at each stage of the used technology. The correct facial micro-expression recognition is in turn the key factor to solve the problem of the recognition of the hidden emotions experienced by a human. Facial micro-expressions are generated on the basis of the combination of facial micro-movements. The research objective is the investigation of the facial micro-movement detection accuracy dependence on the selection of the Local Binary Patterns from Three Orthogonal Planes (LBP-TOP) feature descriptor algorithm parameters and machine learning method for the classification of the feature vectors. Materials and methods. The Spontaneous Actions and Micro-Movements (SAMM) dataset is used as the initial data. The study was made of changes in the accuracy of detection of facial micromovements for classifiers based on the SVM and multilayer perceptron MLP when changing the parameters of the LBP-TOP algorithm. Results. As a result of the study, it is ascertained that the best result for the SVM classifier is the 94 % detection accuracy, and the best detection accuracy for the MLP classifier is 98 %. Thus, because of optimal selection of algorithm parameters both classifiers could handle the problem of the facial micro-movement detection. Conclusions. The both considered methods MLP and SVM show acceptable results to solve the problem of the facial micro-expression recognition with a slight advantage of MLP in comparison with SVM.
Article
Full-text available
While multi-layer perceptrons (MLPs) remain popular for various classification tasks, their application of gradient-based schemes for training leads to some drawbacks including getting trapped in local optima. To tackle this, population-based metaheuristic methods have been successfully employed. Among these, Lévy flight distribution (LFD), which explores the search space through random walks based on a Lévy distribution, has shown good potential to solve complex optimisation problems. LFD uses two main components, the step length of the walk and the movement direction, for random walk generation to explore the search space. In this paper, we propose a novel MLP training algorithm based on the Lévy flight distribution algorithm for neural network-based pattern classification. We encode the network’s parameters (i.e., its weights and bias terms) into a candidate solution for LFD and employ the classification error as fitness function. The network parameters are then optimised, using LFD, to yield an MLP that is trained to perform well on the classification task at hand. In an extensive set of experiments, we compare our proposed algorithm with a number of other approaches, including both classical algorithms and other metaheuristic approaches, on a number of benchmark classification problems. The obtained results clearly demonstrate the superiority of our LFD training algorithm.
Article
Full-text available
The procedure of determining whether micro expressions are present is accorded a high priority in the majority of settings. This is due to the fact that despite the best attempts of the person, these expressions will always expose the genuine sentiments that are buried under the surface. The purpose of this study is to provide a novel approach to the problem of measuring sarcasm by using a fuzzy inference system. The method involves analysing a person's facial expressions to evaluate the degree to which they are taking pleasure in something. It is feasible to distinguish five separate areas of a person's face, and precise active distances may be determined from the outline points of each of these regions. This category includes the brows on both sides of the face, as well as the eyes and lips. In order to arrive at a representation of an individual's degree of happiness while working within the parameters of the fuzzy inference system that has been provided, membership functions are first applied to computed distances. After that, the findings from the membership functions are put to use in yet another membership function so that an estimate of the sarcasm percentage may be derived from them. The suggested method is validated by using photos of human faces taken from the SMIC, SAMM, and CAS(ME) 2 datasets, which are the industry standards. This helps to guarantee that the method is effective.
Chapter
In machine learning, multilayer perceptrons (MLPs) have long been used as one of the most popular and effective classifiers. With training the crucial process, the susceptibility of conventional algorithms such as back-propagation to get stuck in local optima has encouraged many researchers to opt for metaheuristic algorithms instead. In this paper, we propose a novel population-based metaheuristic algorithm for MLP training using Lévy flight distribution (LFD). In our approach, the optimum weights of the network are found via a population of agents moving through the search space either by Lévy flight motions or by random walks. Comparing the results of this algorithm on several datasets from the UCI repository with other population-based metaheuristic algorithms shows excellent results and superiority of the LFD algorithm.KeywordsNeural network trainingMultilayer perceptronLevy flightClassificationMetaheuristic
Article
Clinical educators have used robotic and virtual patient simulator systems (RPS) for dozens of years, to help clinical learners (CL) gain key skills to help avoid future patient harm. These systems can simulate human physiological traits; however, they have static faces and lack the realistic depiction of facial cues, which limits CL engagement and immersion. In this article, we provide a detailed review of existing systems in use, as well as describe the possibilities for new technologies from the human–robot interaction and intelligent virtual agents communities to push forward the state of the art. We also discuss our own work in this area, including new approaches for facial recognition and synthesis on RPS systems, including the ability to realistically display patient facial cues such as pain and stroke. Finally, we discuss future research directions for the field.
Article
Facial expressions are generally recognized based on hand-crafted and deep-learning-based features extracted from RGB facial images. However, such recognition methods suffer from illumination/pose variations. In particular, they fail to recognize these expressions with weak emotion intensities. In this work, we propose a cross-modality attention-based convolutional neural network (CM-CNN) for FER. We extract expression-related features from complementary facial images (gray-scale, LBP, and depth images) to handle the illumination/pose variations and to capture appearance details that describe expressions with weak emotion intensities. Rather than directly concatenating the complementary features, we propose a novel cross-modality attention fusion network to enhance spatial correlations between any two types of facial images. Finally, the CM-CNN is optimized with an improved focal loss, which pays more attention to facial expressions with weak emotion intensities. The average classification accuracies on VT-KFER, BU-3DFE(P1), BU-3DFE(P2), and Bosphorus are 93.86%, 88.91%, 87.28%, and 85.16%, respectively. Evaluations on these databases demonstrate our approach is competitive to state-of-the-art algorithms.
Article
UV-generated hydrated electrons play a critical role in the defluorination reaction of poly- and perfluoroalkyl substances (PFAS). However, limited experimental data hinder insight into the effects of the structural characteristics of emerging PFAS on their defluorination abilities. Therefore, in this study, we adopted quantity structure−activity relationship models based on machine learning algorithms to develop the predictive models of the relative defluorination ability of PFAS. Five-fold cross-validations were used to perform the hyperparameter tuning of the models, which suggested that the gradient boosting algorithms with PaDEL descriptors as the best model possessed superior predictive performance (R²test = 0.944 and RMSEtest = 0.114). The importance of the descriptor indicated that the electrostatic properties and topological structure of the compounds significantly affected the defluorination ability of the PFAS. For the emerging PFAS the best model showed that most compounds, such as potential alternatives of perfluorooctane sulfonic acid, were recalcitrant to reductive defluorination, whereas perfluoroalkyl ether carboxylic acids had relatively stronger defluorination abilities than perfluorooctanoic acid. The theoretical calculations implied that additional electrons on PFAS could cause molecular deconstruction, such as changes in the dihedral angle involved in the carbon chain, as well as C–F bond and ether C–O bond cleavages. In general, the current computational models could be useful for screening emerging PFAS to assess their defluorination ability for the molecular design of fluorochemical structures.
Method
A study of the state-of-the-art machine learning algorithms is presented for different GNSS time series across the globe. Using different criteria, the performance of the methods are checked against each other.
Article
Full-text available
In recent years, due to its great economic and social potential, the recognition of facial expressions linked to emotions has become one of the most flourishing applications in the field of artificial intelligence, and has been the subject of many developments. However, despite significant progress, this field is still subject to many theoretical debates and technical challenges. It therefore seems important to make a general inventory of the different lines of research and to present a synthesis of recent results in this field. To this end, we have carried out a systematic review of the literature according to the guidelines of the PRISMA method. A search of 13 documentary databases identified a total of 220 references over the period 2014–2019. After a global presentation of the current systems and their performance, we grouped and analyzed the selected articles in the light of the main problems encountered in the field of automated facial expression recognition. The conclusion of this review highlights the strengths, limitations and main directions for future research in this field.
Chapter
The human biological components play an important role in security and authentication process. Nowadays, various smart devices used security patterns such as face, finger and iris. The face recognition and detection are important research area for the recognition of face using various classification algorithms such as KNN, SVM and many more algorithms. The performance of these algorithms suffers from the process of feature selection of human face. The ongoing research trend of biometric security adopts the procedure of feature optimization to enhance the facial detection technique. Essentially, three types of entities are inhered by our face such as skin color, texture and shape and size of face. Skin color and texture of face are the most extensive features of a face. Texture feature of face image is adopted in this detection technique. Partial feature extraction functions are used for texture extraction of the face image. These functions provide the most assuring shape feature analysis. In the following paper, we present the review of face recognition based on different classification and feature optimization algorithms.
Article
Full-text available
In this study, we propose a novel approach to facial expression recognition that capitalizes on the anatomical structure of the human face. We model human face with a high-polygon wireframe model that embeds all major muscles. Influence regions of facial muscles are estimated through a semi-automatic customization process. These regions are projected to the image plane to determine feature points. Relative displacement of each feature point between two image frames is treated as an evidence of muscular activity. Feature point displacements are projected back to the 3D space to estimate the new coordinates of the wireframe vertices. Muscular activities that would produce the estimated deformation are solved through a least squares algorithm. We demonstrate the representative power of muscle force based features on three classifiers; NB, SVM and Adaboost. Ability to extract muscle forces that compose a facial expression will enable detection of subtle expressions, replicating an expression on animated characters and exploration of psychologically unknown mechanisms of facial expressions.
Article
Full-text available
A lot of information is conveyed by human beings in the form of facial expression apart from just what is spoken. Proper recognition of such expression has thus become important for any modern human computer interface. We present here a method of facial expression recognition based on Eigenfaces. It is a modified method from the original Eigenfaces approach [1] and starts out with the human vision as a standard reference point – by making use of the standard JAFFE database and computes the expression contained by the image of a test face. It is a unique approach which directly classifies a test image as belonging to one of the six standard expressions - anger, disgust, fear, happy, sad or surprise with great accuracy. In this paper, we present with experimental proof the accuracy of such a strategy with analysis and discussion on further ways to improve upon it.
Conference Paper
Full-text available
Facial expression recognition is a challenging and interesting problem in computer vision and pattern recognition. Geometric variability in both emotion expression and neutral face is a fundamental challenge in facial expression recognition problem. This variability not only directly affects geometric facial expression recognition methods, but also is a critical problem in appearance methods. To overcome this problem, this paper presents an approach which eliminates geometric variability in emotion expression; thus, appearance features can be accurately used for facial expression recognition. Therefore, a fixed geometric model is used for geometric normalization of facial images. This model is defined as one of the emotional expressions. In addition Local Binary Patterns are utilized to represent facial appearance features. Experimental results show that the proposed method is more accurate than the existing works. Also for facial expression recognition, using geometric expression models of facial images where they have larger size in mouth/eyes regions, such as Surprise, gives better results indicating that mouth and eyes are important regions in emotion expression.
Article
Full-text available
Evidence on universals in facial expression of emotion and renewed controversy about how to interpret that evidence is discussed. New findings on the capability of voluntary facial action to generate changes in both autonomic and central nervous system activity are presented, as well as a discussion of the possible mechanisms relevant to this phenomenon. Finally, new work on the nature of smiling is reviewed which shows that it is possible to distinguish the smile when enjoyment is occurring from other types of smiling. Implications for the differences between voluntary and involuntary expression are considered.
Article
Full-text available
This paper proposes a fundamentally novel extension, namely, flrFAM, of the fuzzy ARTMAP (FAM) neural classifier for incremental real-time learning and generalization based on fuzzy lattice reasoning techniques. FAM is enhanced first by a parameter optimization training (sub)phase, and then by a capacity to process partially ordered (non)numeric data including information granules. The interest here focuses on intervals' numbers (INs) data, where an IN represents a distribution of data samples. We describe the proposed flrFAM classifier as a fuzzy neural network that can induce descriptive as well as flexible (i.e., tunable) decision-making knowledge (rules) from the data. We demonstrate the capacity of the flrFAM classifier for human facial expression recognition on benchmark datasets. The novel feature extraction as well as knowledge-representation is based on orthogonal moments. The reported experimental results compare well with the results by alternative classifiers from the literature. The far-reaching potential of fuzzy lattice reasoning in human-machine interaction applications is discussed.
Conference Paper
Full-text available
Automatic analysis of human facial expression is one of the challenging problems in machine vision systems. The most expressive way humans display emotion is through facial expression. In this paper, we extend texture based facial expression recognition, with a method of 2D image processing implemented for extraction of features and new neural network based decision trees. The proposed algorithm applied set of preprocessing and divided image into two main parts (eyes and lips parts). Then, the Discrete Cosine Transform (DCT) implemented on each part to reduce image data size in different parts of the face. Different decision tree models have examined in order to find the best recognition rate. Experimental results have shown that the combination of decision tree with neural network to identify different facial expressions could improve the recognition rate significantly.
Article
Full-text available
This study improves the recognition accuracy and execution time of facial expression recognition system. Various techniques were utilized to achieve this. The face detection component is implemented by the adoption of Viola–Jones descriptor. The detected face is down-sampled by Bessel transform to reduce the feature extraction space to improve processing time then. Gabor feature extraction techniques were employed to extract thousands of facial features which represent various facial deformation patterns. An AdaBoost-based hypothesis is formulated to select a few hundreds of the numerous extracted features to speed up classification. The selected features were fed into a well designed 3-layer neural network classifier that is trained by a back-propagation algorithm. The system is trained and tested with datasets from JAFFE and Yale facial expression databases. An average recognition rate of 96.83% and 92.22% are registered in JAFFE and Yale databases, respectively. The execution time for a 100 × 100 pixel size is 14.5 ms. The general results of the proposed techniques are very encouraging when compared with others.
Article
Full-text available
Given the nonlinear manifold structure of facial images, a new kernel-based supervised manifold learning algorithm based on locally linear embedding (LLE), called discriminant kernel locally linear embedding (DKLLE), is proposed for facial expression recognition. The proposed DKLLE aims to nonlinearly extract the discriminant information by maximizing the interclass scatter while minimizing the intraclass scatter in a reproducing kernel Hilbert space. DKLLE is compared with LLE, supervised locally linear embedding (SLLE), principal component analysis (PCA), linear discriminant analysis (LDA), kernel principal component analysis (KPCA), and kernel linear discriminant analysis (KLDA). Experimental results on two benchmarking facial expression databases, i.e., the JAFFE database and the Cohn-Kanade database, demonstrate the effectiveness and promising performance of DKLLE.
Article
Full-text available
To make human–computer interaction more naturally and friendly, computers must enjoy the ability to understand human’s affective states the same way as human does. There are many modals such as face, body gesture and speech that people use to express their feelings. In this study, we simulate human perception of emotion through combining emotion-related information using facial expression and speech. Speech emotion recognition system is based on prosody features, mel-frequency cepstral coefficients (a representation of the short-term power spectrum of a sound) and facial expression recognition based on integrated time motion image and quantized image matrix, which can be seen as an extension to temporal templates. Experimental results showed that using the hybrid features and decision-level fusion improves the outcome of unimodal systems. This method can improve the recognition rate by about 15 % with respect to the speech unimodal system and by about 30 % with respect to the facial expression system. By using the proposed multi-classifier system that is an improved hybrid system, recognition rate would increase up to 7.5 % over the hybrid features and decision-level fusion with RBF, up to 22.7 % over the speech-based system and up to 38 % over the facial expression-based system.
Article
Full-text available
Facial expression recognition is a useful feature in modern human computer interaction (HCI). In order to build efficient and reliable recognition systems, face detection, feature extraction and classification have to be robustly realised. Addressing the latter two issues, this work proposes a new method based on geometric and transient optical flow features and illustrates their comparison and integration for facial expression recognition. In the authors' method, photogrammetric techniques are used to extract three-dimensional (3-D) features from every image frame, which is regarded as a geometric feature vector. Additionally, optical flow-based motion detection is carried out between consecutive images, what leads to the transient features. Artificial neural network and support vector machine classification results demonstrate the high performance of the proposed method. In particular, through the use of 3-D normalisation and colour information, the proposed method achieves an advanced feature representation for the accurate and robust classification of facial expressions.
Conference Paper
Full-text available
This paper proposes a novel biological vision-based facial description, namely Perceived Facial Images (PFIs), aiming to highlight intra-class and inter-class variations of both facial range and texture images for textured 3D face recognition. These generated PFIs simulate the response of complex neurons to gradient information within a certain neighborhood and possess the properties of being highly distinctive and robust to affine illumination and geometric transformation. Based on such an intermediate facial representation, SIFT-based matching is further carried out to calculate similarity scores between a given probe face and the gallery ones. Because the facial description generates a PFI for each quantized gradient orientation of range and texture faces, we then propose a score level fusion strategy which optimizes the weights using a genetic algorithm in a learning step. Evaluated on the entire FRGC v2.0 database, the rank-one recognition rate using only 3D or 2D modality is 95.5% and 95.9%, respectively; while fusing both modalities, i.e. range and texture-based PFIs, the final accuracy is 98.0%, demonstrating the effectiveness of the proposed biological vision-based facial description and the optimized weighted sum fusion.
Conference Paper
Full-text available
This work details the authors' efforts to push the baseline of expression recognition performance on a realistic database. Both subject-dependent and subject-independent emotion recognition scenarios are addressed in this work. These two happen frequently in real life settings. The approach towards solving this problem involves face detection, followed by key point identification, then feature generation and then finally classification. An ensemble of features comprising of Hierarchial Gaussianization (HG), Scale Invariant Feature Transform (SIFT) and Optic Flow have been incorporated. In the classification stage we used SVMs. The classification task has been divided into person specific and person independent emotion recognition. Both manual labels and automatic algorithms for person verification have been attempted. They both give similar performance.
Article
Full-text available
Automatic facial expression analysis is an interesting and challenging problem, and impacts important applications in many areas such as human–computer interaction and data-driven animation. Deriving an effective facial representation from original face images is a vital step for successful facial expression recognition. In this paper, we empirically evaluate facial representation based on statistical local features, Local Binary Patterns, for person-independent facial expression recognition. Different machine learning methods are systematically examined on several databases. Extensive experiments illustrate that LBP features are effective and efficient for facial expression recognition. We further formulate Boosted-LBP to extract the most discriminant LBP features, and the best recognition performance is obtained by using Support Vector Machine classifiers with Boosted-LBP features. Moreover, we investigate LBP features for low-resolution facial expression recognition, which is a critical problem but seldom addressed in the existing work. We observe in our experiments that LBP features perform stably and robustly over a useful range of low resolutions of face images, and yield promising performance in compressed low-resolution video sequences captured in real-world environments.
Conference Paper
Full-text available
A novel emotion recognition system has been proposed for classifying facial expression in videos. Firstly, two types of basic facial appearance descriptors were extracted. The first type of descriptor, called Motion History Histogram (MHH), was used to detect temporal changes of each pixels of the face. The second type of descriptor, called Histogram of Local Binary Patterns (LBP), was applied to each frame of the video and was used to capture local textural patterns. Secondly, based on these two basic types of descriptors, two new dynamic facial expression features called MHH_EOH and LBP MCF were proposed. These two features incorporate both dynamic and local information. Finally, the Two View SVK_2K classifier was built to integrate these two dynamic features in an efficient way. The experimental results showed that this method outperformed the baseline results set by the FERA'11 challenge.
Conference Paper
Full-text available
This paper assesses the performance of measures of facial expression dynamics derived from the Computer Expression Recognition Toolbox (CERT) for classifying emotions in the Facial Expression Recognition and Analysis (FERA) Challenge. The CERT system automatically estimates facial action intensity and head position using learned appearance-based models on single frames of video. CERT outputs were used to derive a representation of the intensity and motion in each video, consisting of the extremes of displacement, velocity and acceleration. Using this representation, emotion detectors were trained on the FERA training examples. Experiments on the released portion of the FERA dataset are presented, as well as results on the blind test. No consideration of subject identity was taken into account in the blind test. The F1 scores were well above the baseline criterion for success.
Conference Paper
Full-text available
We propose a method for automatic emotion recognition as part of the FERA 2011 competition. The system extracts pyramid of histogram of gradients (PHOG) and local phase quantisation (LPQ) features for encoding the shape and appearance information. For selecting the key frames, K-means clustering is applied to the normalised shape vectors derived from constraint local model (CLM) based face tracking on the image sequences. Shape vectors closest to the cluster centers are then used to extract the shape and appearance features. We demonstrate the results on the SSPNET GEMEP-FERA dataset. It comprises of both person specific and person independent partitions. For emotion classification we use support vector machine (SVM) and largest margin nearest neighbour (LMNN) and compare our results to the pre-computed FERA 2011 emotion challenge baseline.
Conference Paper
Full-text available
Automatic Facial Expression Recognition and Analysis, in particular FACS Action Unit (AU) detection and discrete emotion detection, has been an active topic in computer science for over two decades. Standardisation and comparability has come some way; for instance, there exist a number of commonly used facial expression databases. However, lack of a common evaluation protocol and lack of sufficient details to reproduce the reported individual results make it difficult to compare systems to each other. This in turn hinders the progress of the field. A periodical challenge in Facial Expression Recognition and Analysis would allow this comparison in a fair manner. It would clarify how far the field has come, and would allow us to identify new goals, challenges and targets. In this paper we present the first challenge in automatic recognition of facial expressions to be held during the IEEE conference on Face and Gesture Recognition 2011, in Santa Barbara, California. Two sub-challenges are defined: one on AU detection and another on discrete emotion detection. It outlines the evaluation protocol, the data used, and the results of a baseline method for the two sub-challenges.
Article
Full-text available
This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the Integral Image which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a cascade which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.
Article
The face is one of the best biometrics for person identification and verification related applications, because it is natural, non-intrusive, and socially weIl accepted. Unfortunately, an human faces are similar to each other and hence offer low distinctiveness as compared with other biometrics, e.g., fingerprints and irises. Furthermore, when employing facial texture images, intra-class variations due to factors as diverse as illumination and pose changes are usually greater than inter-class ones, making 2D face recognition far from reliable in the real condition. Recently, 3D face data have been extensively investigated by the research community to deal with the unsolved issues in 2D face recognition, Le., illumination and pose changes. This Ph.D thesis is dedicated to robust face recognition based on three dimensional data, including only 3D shape based face recognition, textured 3D face recognition as well as asymmetric 3D-2D face recognition. In only 3D shape-based face recognition, since 3D face data, such as facial pointclouds and facial scans, are theoretically insensitive to lighting variations and generally allow easy pose correction using an ICP-based registration step, the key problem mainly lies in how to represent 3D facial surfaces accurately and achieve matching that is robust to facial expression changes. In this thesis, we design an effective and efficient approach in only 3D shape based face recognition. For facial description, we propose a novel geometric representation based on extended Local Binary Pattern (eLBP) depth maps, and it can comprehensively describe local geometry changes of 3D facial surfaces; while a 81FT -based local matching process further improved by facial component and configuration constraints is proposed to associate keypoints between corresponding facial representations of different facial scans belonging to the same subject. Evaluated on the FRGC v2.0 and Gavab databases, the proposed approach proves its effectiveness. Furthermore, due tq the use of local matching, it does not require registration for nearly frontal facial scans and only needs a coarse alignment for the ones with severe pose variations, in contrast to most of the related tasks that are based on a time-consuming fine registration step. Considering that most of the current 3D imaging systems deliver 3D face models along with their aligned texture counterpart, a major trend in the literature is to adopt both the 3D shape and 2D texture based modalities, arguing that the joint use of both clues can generally provides more accurate and robust performance than utilizing only either of the single modality. Two important factors in this issue are facial representation on both types of data as well as result fusion. In this thesis, we propose a biological vision-based facial representation, named Oriented Gradient Maps (OGMs), which can be applied to both facial range and texture images. The OGMs simulate the response of complex neurons to gradient information within a given neighborhood and have properties of being highly distinctive and robust to affine illumination and geometric transformations. The previously proposed matching process is then adopted to calculate similarity measurements between probe and gallery faces. Because the biological vision-based facial representation produces an OGM for each quantized orientation of facial range and texture images, we finally use a score level fusion strategy that optimizes weights by a genetic algorithm in a learning pro cess. The experimental results achieved on the FRGC v2.0 and 3DTEC datasets display the effectiveness of the proposed biological vision-based facial description and the optimized weighted sum fusion. [...]
Chapter
In our daily life, the facial expression contains important information responded to interaction to other people. Human Facial Expression Recognition has been researched in the past years. Thus, this study adds facial muscle streak, for example nasal labial folds and front lines, as another recognition condition. We used the traditional face detection to extract face area from original image. Then to extract eyes, mouth and eyebrow outlines’ position from face area. Afterward, we extracted important contours from different feature areas. Ultimately, we used these features to create a set of feature vector. Then, these vectors were used to process with neural network and to determine user’s facial expression. In summary, this study used TFEID (Taiwanese Facial Expression Image Database) database to determine the expression recognition and face recognition. The experiment result shown, that 96.2% and 92.8% of TFEID database can be recognized in personalizing expression recognition experiment and full member expression recognition, respectively. In face recognition, 97.4% of TFEID sample were recognized.
Conference Paper
Linear discriminant analysis (LDA) is an effective method for solving the classification problems. Many based-discriminant analysis approaches have been proposed to extract more discriminant information and try to overcome the limitation of LDA. Local linear discriminant analysis (LLDA) was proposed to capture the local structure of samples, it can overcome the assumption of Gaussian distribution which emerge in traditional LDA. In this paper, we proposed tensor version of LLDA, tensorLLDA not only can avoid the undersampled problem which appear in LDA and LLDA, but also reduce the computation complexity. Experiment on JAFFE facial expression database and Cohn-Kanade facial expression database show the effectiveness of tensorLLDA.
Conference Paper
In this paper, we present an effective method for facial expression recognition (FER), which is based on adaptive local binary pattern (ALBP) and sparse representation. The new algorithm first solves sparse representations on both raw gray facial expression images and adaptive local binary patterns (ALBP) of these images. Then we can obtain the two expression recognition results on both expression features. Finally, the final expression recognition is performed by fusion of the two results by comparing the residual ratios of sparse representations. The experiment results on Japanese Female Facial Expression (JAFFE) database demonstrate that the proposed fusion algorithm is much better than other methods such as Linear Discriminant Analysis (LDA) + Support Vector Machine (SVM), Kernel Principal Component analysis (KPCA) + SVM, and the performance also improves obviously compared with the direct sparse representation approach.
Conference Paper
Many different types of robots have been developed to facilitate our lives, such as those used in industrial production. However, most robots operate by following human instructions or programs rather than by acting naturally that the action are not behave by robot learning. Conversely, these actions be convinced by the human. Thus, the present research was undertaken as a step toward achieving a natural robot action through consciousness-based architecture (CBA). Our CBA system imitates animal consciousness. Here we present the implementation of a facial expression recognition system that uses constrained local models (CLMs) to fit facial features together with hidden Markov models (HMMs) to classify and recognize emotions. We propose an approach and present our CLM experimental results including time efficiency and accuracy together with the experimental results of emotion recognition such as time efficiency and a confusion matrix. The present experiment demonstrates that our proposed system is an efficient personal facial expression recognition method with the result of the recognition correctness as 96.43 percent.
Article
Artificial Neural Networks (ANN) are biologicallyinspired models of computation. They are networks with elementary processing units called neurons massively interconnected by trainable connections called weights. ANN algorithms involve training the connection weights through a systematic procedure. Learning in ANN refers to searching for an optimal network topology and weights so as to accomplish a given goal-dictated task. Learning can be categorized as Supervised or Unsupervised. The Supervised learning refers to the presence of inputs and desired outputs for training. Unsupervised learning refers to determining the output categories or correlation inherent in inputs for training. ANNs are capable of generalization, adaptation and performing computation in parallel resembling the human brain. A layer of neurons in ANN have common functionality. The choice of number of neurons in input, hidden and output layers and their functionality depend on the learning algorithm and task needed. The numbers of input neurons are usually same as the total number of attributes of the training patterns, and, the numbers of output neurons are same as the number of categories sought. Determination of number of hidden layers and neurons is dependent on the inherent complexity of the task and number of training examples available. A number of ANN architectures and algorithms have been proposed by researchers, of which Constructive Neural Networks (CoNN) offer an attractive framework for pattern classifications problems. CoNN algorithms provide an optimal way to determine the architecture of a Multi Layer Perceptron network trainable with learning algorithms to determine appropriate weights. These algorithms initially start with small network (usually a single neuron) and dynamically allow the network to grow by adding and training neurons as needed until a satisfactory solution is found[1].
Conference Paper
Face recognition is one of the most efficient applications of computer authentication and pattern recognition. Therefore it attracts significant attention of researchers. In the past decades, many feature extraction algorithms have been proposed. In this paper Gabor features and Zernike moment were used to extract features from human face images for recognition application. This paper is a study for new constructive training algorithm for Multi Layer Perceptron (MLP) which is applied to face recognition application. An incremental training procedure was employed where the training patterns are learned incrementally. This algorithm started with a single training pattern and a single hidden-layer using one neuron. During neural network training, the hidden neuron is increased when the Mean Square Error (MSE) of the Training Data (TD) is not reduced or the algorithm gets stuck in a local minimum. Input patterns are trained incrementally (one by one) until all patterns of TD are selected and trained. Face recognition system structure based on a MLP neural network was constructed and was tested for face recognition. The proposed approach was tested on the UMIST database. Experimental results indicate that we can obtain an optimal architecture of neural network classifier (with the least possible number of hidden neuron) using our present constructive algorithm, and prove the effectiveness of the proposed method compared to the MLP architecture with back-propagation algorithm.
Conference Paper
This paper presents a new facial expression recognition (FER) which exploits the effectiveness of color information and sparse representation. For extracting face feature, we compute color vector differences between color pixels so that they can effectively capture change of face appearance (e.g., skin texture). Through comparative and extensive experiment using two public FER databases (DBs), we validate that our color texture features are suited to the sparse representation for improving FER accuracy. Specifically, our color texture features can considerably improve the recognition accuracy obtained by sparse representation compared with other features (e.g., Local Binary Pattern (LBP)) under realistic recognition conditions (e.g., low-resolution faces). It is also shown that the use of our features can yield high discrimination capability and sparsity, justifying the high recognition accuracies obtained. Further, the proposed FER outperforms five other state-of-the-art FER methods.
Article
Creating a large and natural facial expression database is a prerequisite for facial expression analysis and classification. It is, however, not only time consuming but also difficult to capture an adequately large number of spontaneous facial expression images and their meanings because no standard, uniform, and exact measurements are available for database collection and annotation. Thus, comprehensive first-hand data analyses of a spontaneous expression database may provide insight for future research on database construction, expression recognition, and emotion inference. This paper presents our analyses of a multimodal spontaneous facial expression database of natural visible and infrared facial expressions (NVIE). First, the effectiveness of emotion-eliciting videos in the database collection is analyzed with the mean and variance of the subjects' self-reported data. Second, an interrater reliability analysis of raters' subjective evaluations for apex expression images and sequences is conducted using Kappa and Kendall's coefficients. Third, we propose a matching rate matrix to explore the agreements between displayed spontaneous expressions and felt affective states. Lastly, the thermal differences between the posed and spontaneous facial expressions are analyzed using a paired-samples t-test. The results of these analyses demonstrate the effectiveness of our emotion-inducing experimental design, the gender difference in emotional responses, and the coexistence of multiple emotions/expressions. Facial image sequences are more informative than apex images for both expression and emotion recognition. Labeling an expression image or sequence with multiple categories together with their intensities could be a better approach than labeling the expression image or sequence with one dominant category. The results also demonstrate both the importance of facial expressions as a means of communication to convey affective states and the diversity of the displayed ma- ifestations of felt emotions. There are indeed some significant differences between the temperature difference data of most posed and spontaneous facial expressions, many of which are found in the forehead and cheek regions.
Article
Facial expressions recognition is an important part of the study in man-machine interface. Principal component analysis (PCA) is an extraction method based on statistical features which were extracted the global grayscale features of the whole image. But the grayscale global features are environmentally sensitive. So a hybrid method of principal component analysis and local binary pattern (LBP) is introduced in this article. LBP extracts the local grayscale features of the mouth region, which contribute most to facial expression recognition, to assist the global grayscale features of facial expression recognition. The support vector machine (SVM) is used for facial expression recognition. And experiment results show that, this method can classify different expressions more effectively and can get higher recognition rate than the traditional recognition methods.
Article
Spontaneous facial expression recognition is significantly more challenging than recognizing posed ones. We focus on two issues that are still under-addressed in this area. First, due to the inherent subtlety, the geometric and appearance features of spontaneous expressions tend to overlap with each other, making it hard for classifiers to find effective separation boundaries. Second, the training set usually contains dubious class labels which can hurt the recognition performance if no countermeasure is taken. In this paper, we propose a spontaneous expression recognition method based on robust metric learning with the aim of alleviating these two problems. In particular, to increase the discrimination of different facial expressions, we learn a new metric space in which spatially close data points have a higher probability of being in the same class. In addition, instead of using the noisy labels directly for metric learning, we define sensitivity and specificity to characterize the annotation reliability of each annotator. Then the distance metric and annotators' reliability is jointly estimated by maximizing the likelihood of the observed class labels. With the introduction of latent variables representing the true class labels, the distance metric and annotators' reliability can be iteratively solved under the Expectation Maximization framework. Comparative experiments show that our method achieves better recognition accuracy on spontaneous expression recognition, and the learned metric can be reliably transferred to recognize posed expressions.
Article
In this study, we propose a novel approach to facial expression recognition that capitalizes on the anatomical structure of the human face. We model human face with a high–polygon wireframe model that embeds all major muscles. Influence regions of facial muscles are estimated through a semi–automatic customization process. These regions are projected to the image plane to determine feature points. Relative displacement of each feature point between two image frames is treated as an evidence of muscular activity. Feature point displacements are projected back to the 3D space to estimate the new coordinates of the wireframe vertices. Muscular activities that would produce the estimated deformation are solved through a least squares algorithm. We demonstrate the representative power of muscle force based features on three classifiers; NB, SVM and Adaboost. Ability to extract muscle forces that compose a facial expression will enable detection of subtle expressions, replicating an expression on animated characters and exploration of psychologically unknown mechanisms of facial expressions.
Article
This paper presents a novel emotion recognition model using the system identification approach. A comprehensive data driven model using an extended Kohonen self-organizing map (KSOM) has been developed whose input is a 26 dimensional facial geometric feature vector comprising eye, lip and eyebrow feature points. The analytical face model using this 26 dimensional geometric feature vector has been effectively used to describe the facial changes due to different expressions. This paper thus includes an automated generation scheme of this geometric facial feature vector. The proposed non-heuristic model has been developed using training data from MMI facial expression database. The emotion recognition accuracy of the proposed scheme has been compared with radial basis function network, multi-layered perceptron model and support vector machine based recognition schemes. The experimental results show that the proposed model is very efficient in recognizing six basic emotions while ensuring significant increase in average classification accuracy over radial basis function and multi-layered perceptron. It also shows that the average recognition rate of the proposed method is comparatively better than multi-class support vector machine.
Article
In this paper, we focus on developing a novel framework which can be effectively used for both face detection (i.e. discriminate faces from non-face patterns) and facial expression recognition. The proposed statistical framework is based on a Dirichlet process mixture of generalized Dirichlet (GD) distributions used to model local binary pattern (LBP) features. Our method is built on nonparametric Bayesian analysis where the determination of the number of clusters is sidestepped by assuming an infinite number of mixture components. An unsupervised feature selection scheme is also integrated with the proposed nonparametric framework to improve modeling performance and generalization capabilities. By learning the proposed model using an expectation propagation (EP) inference approach, all the involved model parameters and feature saliencies can be evaluated simultaneously in a single optimization framework. Furthermore, the proposed framework is extended by adopting a localized feature selection scheme which has shown, according to our results, superior performance, to determine the most important facial features, as compared to the global one. The effectiveness and utility of the proposed method is illustrated through extensive empirical results using both synthetic data and two challenging applications involving face detection, and facial expression recognition.
Article
Automatic facial expression recognition has made considerable gains in the body of research available due to its vital role in human–computer interaction. So far, research on this problem or problems alike has proposed a wide verity of techniques and algorithms for both information representation and classification. Very recently, Farajzadeh et al. in Int J Pattern Recognit Artif Intell 25(8):1219–1241, (2011) proposed a novel information representation approach that uses machine-learning techniques to derive a set of new informative and descriptive features from the original features. The new features, so called meta probability codes (MPC), have shown a good performance in a wide range of domains. In this paper, we aim to study the performance of the MPC features for the recognition of facial expression via proposing an MPC-based framework. In the proposed framework any feature extractor and classifier can be incorporated using the meta-feature generation mechanism. In the experimental studies, we use four state-of-the-art information representation techniques; local binary pattern, Gabor-wavelet, Zernike moment and facial fiducial point, as the original feature extractors; and four multiclass classifiers, support vector machine, k-nearest neighbor, radial basis function neural network, and sparse representation-based classifier. The results of the extensive experiments conducted on three facial expression datasets, Cohn–Kanade, JAFFE, and TFEID, show that the MPC features promote the performance of facial expression recognition inherently.
Article
Engineered features have been heavily employed in computer vision. Recently, feature learning from unlabeled data for improving the performance of a given vision task has received increasing attention in both machine learning and computer vision. In this paper, we present using unlabeled video data to learn spatiotemporal features for video classification tasks. Specifically, we employ independent component analysis (ICA) to learn spatiotemporal filters from natural videos, and then construct feature representations for the input videos in classification tasks based on the learned filters. We test the performance of proposed feature learning method with application to facial expression recognition. The experimental results on the well-known Cohn-Kanade database show that the learned features perform better than engineered features. The comparison experiments on recognition of low intensity expressions show that our method yields a better performance than spatiotemporal Gabor features.
Article
This paper presents a new dimensionality reduction algorithm for multi-dimensional data based on the tensor rank-one decomposition and graph preserving criterion. Through finding proper rank-one tensors, the algorithm effectively enhances the pairwise inter-class margins and meanwhile preserves the intra-class local manifold structure. In the algorithm, a novel marginal neighboring graph is devised to describe the pairwise inter-class boundaries, and a differential formed objective function is adopted to ensure convergence. Furthermore, the algorithm has less computation in comparison with the vector representation based and the tensor-to-tensor projection based algorithms. The experiments for the basic facial expressions recognition show its effectiveness, especially when it is followed by a neural network classifier.
Article
This paper presents an automatic way to discover pixels in a face image that improves the facial expression recognition results. Main contribution of our study is to provide a practical method to improve classification performance of classifiers by selecting best pixels of interest. Our method exhaustively searches for the best and worst feature window position from a set of face images among all possible combinations using MLP. Then, it creates a non-rectangular emotion mask for feature selection in supervised facial expression recognition problem. It eliminates irrelevant data and improves the classification performance using backward feature elimination. Experimental studies on GENKI, JAFFE and FERET databases showed that the proposed system improves the classification results by selecting the best pixels of interest.
Article
Most of the dynamics in real-world systems are compiled by shifts and drifts, which are uneasy to be overcome by omnipresent neuro-fuzzy systems. Nonetheless, learning in nonstationary environment entails a system owning high degree of flexibility capable of assembling its rule base autonomously according to the degree of nonlinearity contained in the system. In practice, the rule growing and pruning are carried out merely benefiting from a small snapshot of the complete training data to truncate the computational load and memory demand to the low level. An exposure of a novel algorithm, namely parsimonious network based on fuzzy inference system (PANFIS), is to this end presented herein. PANFIS can commence its learning process from scratch with an empty rule base. The fuzzy rules can be stitched up and expelled by virtue of statistical contributions of the fuzzy rules and injected datum afterward. Identical fuzzy sets may be alluded and blended to be one fuzzy set as a pursuit of a transparent rule base escalating human's interpretability. The learning and modeling performances of the proposed PANFIS are numerically validated using several benchmark problems from real-world or synthetic datasets. The validation includes comparisons with state-of-the-art evolving neuro-fuzzy methods and showcases that our new method can compete and in some cases even outperform these approaches in terms of predictive fidelity and model complexity.
Article
Face recognition is becoming a difficult process because of the generally similar shapes of faces and because of the numerous variations between images of the same face. A face recognition system aims at recognizing a face in a manner that is as independent as possible of these image variations. Such variations make face recognition, on the basis of appearance, a difficult task. This paper attempts to overcome the variations of facial expression and proposes a biological vision-based facial description, namely Perceived Facial Images (PFIs), applied to facial images for 2D face recognition. Based on the intermediate facial description, SIFT-based feature matching is then carried out to calculate similarity measures between a given probe face and the gallery ones. Because the proposed biological vision-based facial description generates a PFI for each quantized gradient orientation of facial images, we further propose a weighted sum rule based fusion scheme. The proposed approach was tested on three facial expression databases: the Cohn and Kanade Facial Expression Database, the Japanese Female Facial Expression (JAFFE) Database and the FEEDTUM Database. The experimental results demonstrate the effectiveness of the proposed method.
Article
In this paper, a hidden node pruning algorithm based on the neural complexity is proposed, the entropy of neural network can be calculated by the standard covariance matrix of the neural network's connection matrix in the training stage, and the neural complexity can be acquired. In ensuring the information processing capacity of neural network is not reduced, select and delete the least important hidden node, and the simpler neural network architecture is achieved. It is not necessary to train the cost function of the neural network to a local minimal, and the pre-processing neural network weights is avoided before neural network architecture adjustment. The simulation results of the non-linear function approximation shows that the performance of the approximation is ensured and at the same time a simple architecture of neural networks can be achieved.
Article
This paper describes an efficient constructive training algorithm using a Multi Layer Perceptron (MLP) neural network dedicated for Isolated Word Recognition (IWR) systems. Incremental training procedure was employed and this approach was based on novel hidden neurons recruiting for a single hidden-layer. During Neural Network (NN) training phase, the number of pronunciation samples extracted from the Training Data (TD) was sequentially increased. Optimal structure of the NN classifier with optimized TD size was obtained using this proposed MLP constructive training algorithm. Isolated word recognition system based on MLP neural network was then constructed and tested for recognizing ten words extracted from TIMIT database. Mel Frequency Cepstral Coefficient (MFCC) feature extraction method was employed including energy, first and second derivative coefficients. A proposed Frame-by-Frame Neural Network (FFNN) classification method was explored and compared with the Conventional Neural Network (CNN) classification approach. Principal Component Analysis (PCA) technique was also investigated in order to reduce both TD size as well as recognition system complexity. Experimental results showed superior performance of the proposed FFNN classifier compared to the CNN counter part which was illustrated by the significant improvement obtained in terms of recognition rate.
Conference Paper
Recently Viola et al. [2001] have introduced a rapid object detection. scheme based on a boosted cascade of simple feature classifiers. In this paper we introduce a novel set of rotated Haar-like features. These novel features significantly enrich the simple features of Viola et al. and can also be calculated efficiently. With these new rotated features our sample face detector shows off on average a 10% lower false alarm rate at a given hit rate. We also present a novel post optimization procedure for a given boosted cascade improving on average the false alarm rate further by 12.5%.
Article
In this paper, we review the major approaches to multimodal human–computer interaction, giving an overview of the field from a computer vision perspective. In particular, we focus on body, gesture, gaze, and affective interaction (facial expression recognition and emotion in audio). We discuss user and task modeling, and multimodal fusion, highlighting challenges, open issues, and emerging applications for multimodal human–computer interaction (MMHCI) research.
Conference Paper
In this paper, a novel pruning algorithm is proposed for self-organizing the feed-forward neural network based on the sensitivity analysis, named novel pruning feed-forward neural network (NP-FNN). In this study, the number of hidden neurons is determined by the output's sensitivity to the hidden nodes. This technique determines the relevance of the hidden nodes by analyzing the Fourier decomposition of the variance. Then each hidden node can obtain a contribution ratio. The connected weights of the hidden nodes with small ratio will be set as zeros. Therefore, the computational cost of the training process will be reduced significantly. It is clearly shown that the novel pruning algorithm minimizes the complexity of the final feed-forward neural network. Finally, computer simulation results are carried out to demonstrate the effectiveness of the proposed algorithm.
Conference Paper
Most previous work focuses on how to learn discriminating appearance features over all the face without considering the fact that each facial expression is physically composed of some relative action units (AU). However, the definition of AU is an ambiguous semantic description in Facial Action Coding System (FACS), so it makes accurate AU detection very difficult. In this paper, we adopt a scheme of compromise to avoid AU detection, and try to interpret facial expression by learning some compositional appearance features around AU areas. We first divided face image into local patches according to the locations of AUs, and then we extract local appearance features from each patch. A minimum error based optimization strategy is adopted to build compositional features based on local appearance features, and this process embedded into Boosting learning structure. Experiments on the Cohn-Kanada database show that the proposed method has a promising performance and the built compositional features are basically consistent to FACS.
Conference Paper
http://vislab.ucr.edu/PUBLICATIONS/pubs/Journal%20and%20Conference%20Papers/after10-1-1997/Conference/2011/Facial%20expression%20recognition%20using11.pdf Existing facial expression recognition techniques analyze the spatial and temporal information for every single frame in a human emotion video. On the contrary, we create the Emotion Avatar Image (EAI) as a single good representation for each video or image sequence for emotion recognition. In this paper, we adopt the recently introduced SIFT flow algorithm to register every frame with respect to an Avatar reference face model. Then, an iterative algorithm is used not only to super- resolve the EAI representation for each video and the Avatar reference, but also to improve the recognition performance. Subsequently, we extract the features from EAIs using both Local Binary Pattern (LBP) and Local Phase Quantization (LPQ). Then the results from both texture descriptors are tested on the Facial Expression Recognition and Analysis Challenge (FERA2011) data, GEMEP-FERA dataset. To evaluate this simple yet powerful idea, we train our algorithm only using the given 155 videos of training data from GEMEP-FERA dataset. The result shows that our algorithm eliminates the person- specific information for emotion and performs well on unseen data. Keywords-face registration; level of Emotion Avatar Image; Person-independent emotion recognition; SIFT flow
Conference Paper
Automatic facial expression analysis is the most commonly studied aspect of behavior understanding and human-computer interface. The main difficulty with facial emotion recognition system is to implement general expression models. The same facial expression may vary differently across humans; this can be true even for the same person when the expression is displayed in different contexts. These factors present a significant challenge for the recognition task. The method we applied, which is reminiscent of the "baseline method", utilizes dynamic dense appearance descriptors and statistical machine learning techniques. Histograms of oriented gradients (HoG) are used to extract the appearance features by accumulating the gradient magnitudes for a set of orientations in 1-D histograms defined over a size-adaptive dense grid, and Support Vector Machines with Radial Basis Function kernels are the base learners of emotions. The overall classification performance of the emotion detection reached 70% which is better than the 56% accuracy achieved by the "baseline method" presented by the challenge organizers. I. INTRODUCTION
Conference Paper
This paper details the method and experiments conducted towards our submission to the FERA 2011 facial expression recognition benchmarking evaluations. The bench- marking evaluation task involves recognizing 5 emotion classes in videos. Our method for detecting facial expressions is a fusion of the decisions of two FER approaches based on two different feature representations, namely using motion information from facial regions and facial feature point displacement information. The main observation motivating the approach we took is that different feature representations are discriminative in detecting different facial expressions. Hence a fusion approach could complement each other to improve recognition performance. Experiments were conducted on the GEMEP-FERA data set provided by the organizers. absence of a neutral face for the clips. This makes it difficult to apply approaches for Facial Expression Recognition (FER) which are based on motion of facial feature points with neutral face as reference. Out of the different problems expected in spontaneous expression recognition, the proposed work deals with lip motions due to speech, head motion and absence of neutral face. The concept of Accumulated Motion Images (AMIs) is utilized to capture motion patterns in facial regions of in- terest. AMIs have been recently proposed for Human Action Recognition (5) and can be used for FER. We use AMIs as one of the feature representations along with geometrical feature representation and try to explore which representation is suitable for classification for a particular expression. We present an evaluation on the benchmark GEMEP-FERA dataset (6) comprising of video clips containing lip motions due to speech and head motion. In addition, most of the clips