DataPDF Available

Poster for "Generalizing to Unseen Head Poses in Facial Expression Recognition and Action Unit Intensity Estimation"

Authors:

Abstract

Facial expression analysis is challenged by the numerous degrees of freedom regarding head pose, identity, illumination, occlusions, and the expressions itself. It currently seems hardly possible to densely cover this enormous space with data for training a universal well-performing expression recognition system. In this paper we address the sub-challenge of generalizing to head poses that were not seen in the training data, aiming at getting along with sparse coverage of the pose subspace. For this purpose we (1) propose a novel face normalization method called FaNC that massively reduces pose-induced image variance; (2) we compare the impact of the proposed and other normalization methods on (a) action unit intensity estimation with the FERA 2017 challenge data (achieving new state of the art) and (b) facial expression recognition with the Multi-PIE dataset; and (3) we discuss the head pose distribution needed to train a pose-invariant CNN-based recognition system. The proposed FaNC method normalizes pose and facial proportions while retaining expression information and runs in less than 2 ms. When comparing results achieved by training a CNN on the output images of FaNC and other normalization methods, FaNC generalizes significantly better than others to unseen poses if they deviate more than 20° from the poses available during training.
Generalizing to Unseen Head Poses in Facial Expression
Recognion and Acon Unit Intensity Esmaon
Philipp Werner1, Frerk Saxen1, Ayoub Al-Hamadi1, Hui Yu2
1 Neuro-Information Technology, University of Magdeburg, Germany
2 Visual Computing, University of Portsmouth, UK
References Contact: Philipp.Werner@ovgu.de
[2] Amirian et al. Support vector regression of sparse dictionary-based features for view-independent action unit intensity estimation. In FG, 2017.
[3] Baltrusaitisetal.Cross-datasetlearningandperson-specicnormalisationforautomaticactionunitdetection.InFG,2015.(as in OpenFace)
[5] Batistaetal.Simultaneousactionunitsdetectionandintensityestimationonmultiposefacialimagesusingasingleconvolutionalneuralnetwork.InFG,2017.
[17] Hassneretal.EffectiveFaceFrontalizationinUnconstrainedImages.InCVPR,2015.
[39] Valstar et al. FERA 2017 - Addressing Head Pose in the Third Facial Expression Recognition and Analysis Challenge. In FG, 2017.
[43] Werner et al. Facial action unit intensity estimation and feature relevance visualization with random regression forests. In ACII, 2017.
[51] Zhouetal.Pose-independentfacialactionunitintensityregressionbasedonmulti-taskdeeptransferlearning.InFG,2017.(FERA 2017 challenge winner)
[55] Zophetal.LearningTransferableArchitecturesforScalableImageRecognition.InCVPR,2018.(NASNet CNN architecture)
This work was funded
by the German Federal
Ministry of Education
and Research (BMBF).
Data, source code,
and models available!
1. Background
• Limited amount of data is a major issue in
Facial Expression Recognition (FER)
• Low head pose (HP) variance in most da-
tasets(mostlyfrontal)→HPischallenge
• Recognition performance should be in-
variant to HP
2. Contributions
• Novel face normalization method “FaNC”
• Comparison of normalization methods’ impact on CNN-based facial AU intensity esti-
mation (FERA 2017) and FER (Multi-PIE)
• Analysis of generalization to unseen head poses - which poses are needed for training?
• Cross-database evaluation / new state of the art on FERA 2017 dataset
• Source code and data available: http://iikt.ovgu.de/FaNC.html
3. Face Normalization based on Learning Correspondences - FaNC (fancy!)
• Learntopredict correspondence point
coordinates and visibilities from facial
landmarks(trainingdata:SyLaFaN)
• Synthesize frontal view by warping,
mirroring, and blending
• Fast (2 ms on 2012 onboard hardware)
• Preserves expression, normalizes pose
and facial proportions
image with
landmarks
input
in-plane
normalization
source domain
prediction of correspondence
points and visibilities
source domain target domain
texture
warping
output
Fig. 1: Overview of FaNC method. Fig.2:SyntheticSyLaFaNdatabase.
4. Qualitative Comparison
• Problems: occlusions in high pitch angles
(nose and brow ridge), landmark errors
5. Action Unit Intensity Estimation
• FERA 2017 Challenge Dataset [39]
• TransferLearningwithNASNet[55]
• Regression of 7 AUs
Fig. 6: Results on unseen poses (solid) and seen poses (non-solid). Train-
ing was done with view 6 only (top) and with views 3, 6, and 9 (bottom).
6. Expression Recognition
• Multi-PIE Dataset
• TransferLearningwithNASNet[55]
• Classicationof6expressions
7. Conclusions
• Performancedecreasessignicantly
if tested poses deviate more than 20°
from training pose(s)
• Perform. decreases less with FaNC
• Simple registration methods (e.g.
SimStable)seemsufcientiftrain-
ing data cover pose space in steps of
about 40°
• But: FER with frontalization...
• benetsfromexistingfrontaldata
• requires less pose variation / frees
resources for varying other factors
• Room for improving frontalization
Fig.4:NormalizedandinputimagesfromLFWdatabase.
Table1:Landmark-basedfacenormalizationmethods.
Fig.5:NormalizedandinputimagesfromFERA2017data-
set. Bottom row are FaNC failure cases
Fig.3:NormalizedandinputimagesfromSyLaFaNdataset.
Fig. 7: AU intensity estimation results compared to state of the art. Mean
ICC over all views and AUs depending on number of views used for training
(outer plot) and number of model parameters (inner plot).
Fig. 8: Facial expression recognition results on unseen poses
(without shading). Training was done with view 0° only (top) and
with views −45°, 0°, and +45° (bottom). Mean accuracies across
all views in brackets. Trivial classier achieves 0.277.
P. Werner, F. Saxen, A. Al-Hamadi, H. Yu, "Generalizing to Unseen Head Poses in Facial Expression Recognition and Action Unit Intensity Estimation", in IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2019.
(C) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective
works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

File (1)

Content uploaded by Philipp Werner
Author content
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Automatic facial action unit intensity estimation can be useful for various applications in affective computing. In this paper, we apply random regression forests for this task and propose modifications that improve predictive performance compared to the original random forest. Further, we introduce a way to estimate and visualize the relevance of the features for an individual prediction and the forest in general. We conduct experiments on the FERA 2017 challenge dataset (which outperform the FERA baseline results), show the performance gain by the modifications, and illustrate feature relevance.
Conference Paper
Full-text available
This paper presents an unified convolutional neural network (CNN), named AUMPNet, to perform both Action Units (AUs) detection and intensity estimation on facial images with multiple poses. Although there are a variety of methods in the literature designed for facial expression analysis, only few of them can handle head pose variations. Therefore, it is essential to develop new models to work on non-frontal face images, for instance, those obtained from unconstrained environments. In order to cope with problems raised by pose variations, an unique CNN, based on region and multitask learning, is proposed for both AU detection and intensity estimation tasks. Also, the available head pose information was added to the multitask loss as a constraint to the network optimization, pushing the network towards learning better representations. As opposed to current approaches that require ad hoc models for every single AU in each task, the proposed network simultaneously learns AU occurrence and intensity levels for all AUs. The AUMPNet was evaluated on an extended version of the BP4D-Spontaneous database, which was synthesized into nine different head poses and made available to FG 2017 Facial Expression Recognition and Analysis Challenge (FERA 2017) participants. The achieved results surpass the FERA 2017 baseline, using the challenge metrics, for AU detection by 0.054 in F 1-score and 0.182 in ICC(3, 1) for intensity estimation.
Conference Paper
Full-text available
Automatic detection of Facial Action Units (AUs) is crucial for facial analysis systems. Due to the large individual differences, performance of AU classifiers depends largely on training data and the ability to estimate facial expressions of a neutral face. In this paper, we present a real-time Facial Action Unit intensity estimation and occurrence detection system based on appearance (Histograms of Oriented Gradients) and geometry features (shape parameters and landmark locations). Our experiments show the benefits of using additional labelled data from different datasets, which demonstrates the generalisability of our approach. This holds both when training for a specific dataset or when a generic model is needed. We also demonstrate the benefits of using a simple and efficient median based feature normalisation technique that accounts for person-specific neutral expressions. Finally, we show that our results outperform the FERA 2015 baselines in all three challenge tasks - AU occurrence detection, fully automatic AU intensity and pre-segmented AU intensity estimation.
Conference Paper
In this paper, a robust system for viewindependent action unit intensity estimation is presented. Based on the theory of sparse coding, region-specific dictionaries are trained to approximate the characteristic of the individual action units. The system incorporates landmark detection, face alignment and contrast normalization to handle a large variety of different scenes. Coupled with head pose estimation, an ensemble of large margin classifiers is used to detect the individual action units. The experimental validation shows that our system is robust against pose variations and able to outperform the challenge baseline by more than 35%.
Article
"Frontalization" is the process of synthesizing frontal facing views of faces appearing in single unconstrained photos. Recent reports have suggested that this process may substantially boost the performance of face recognition systems. This, by transforming the challenging problem of recognizing faces viewed from unconstrained viewpoints to the easier problem of recognizing faces in constrained, forward facing poses. Previous frontalization methods did this by attempting to approximate 3D facial shapes for each query image. We observe that 3D face shape estimation from unconstrained photos may be a harder problem than frontalization and can potentially introduce facial misalignments. Instead, we explore the simpler approach of using a single, unmodified, 3D surface as an approximation to the shape of all input faces. We show that this leads to a straightforward, efficient and easy to implement method for frontalization. More importantly, it produces aesthetic new frontal views and is surprisingly effective when used for face recognition and gender estimation.
NASNet CNN architecture) This work was funded by the German Federal Ministry of Education and Research
  • Zoph
Zoph et al. Learning Transferable Architectures for Scalable Image Recognition. In CVPR, 2018. (NASNet CNN architecture) This work was funded by the German Federal Ministry of Education and Research (BMBF).