Content uploaded by Philipp Werner
Author content
All content in this area was uploaded by Philipp Werner on May 20, 2019
Content may be subject to copyright.
Generalizing to Unseen Head Poses in Facial Expression
Recognion and Acon Unit Intensity Esmaon
Philipp Werner1, Frerk Saxen1, Ayoub Al-Hamadi1, Hui Yu2
1 Neuro-Information Technology, University of Magdeburg, Germany
2 Visual Computing, University of Portsmouth, UK
References Contact: Philipp.Werner@ovgu.de
[2] Amirian et al. Support vector regression of sparse dictionary-based features for view-independent action unit intensity estimation. In FG, 2017.
[3] Baltrusaitisetal.Cross-datasetlearningandperson-specicnormalisationforautomaticactionunitdetection.InFG,2015.(as in OpenFace)
[5] Batistaetal.Simultaneousactionunitsdetectionandintensityestimationonmultiposefacialimagesusingasingleconvolutionalneuralnetwork.InFG,2017.
[17] Hassneretal.EffectiveFaceFrontalizationinUnconstrainedImages.InCVPR,2015.
[39] Valstar et al. FERA 2017 - Addressing Head Pose in the Third Facial Expression Recognition and Analysis Challenge. In FG, 2017.
[43] Werner et al. Facial action unit intensity estimation and feature relevance visualization with random regression forests. In ACII, 2017.
[51] Zhouetal.Pose-independentfacialactionunitintensityregressionbasedonmulti-taskdeeptransferlearning.InFG,2017.(FERA 2017 challenge winner)
[55] Zophetal.LearningTransferableArchitecturesforScalableImageRecognition.InCVPR,2018.(NASNet CNN architecture)
This work was funded
by the German Federal
Ministry of Education
and Research (BMBF).
Data, source code,
and models available!
1. Background
• Limited amount of data is a major issue in
Facial Expression Recognition (FER)
• Low head pose (HP) variance in most da-
tasets(mostlyfrontal)→HPischallenge
• Recognition performance should be in-
variant to HP
2. Contributions
• Novel face normalization method “FaNC”
• Comparison of normalization methods’ impact on CNN-based facial AU intensity esti-
mation (FERA 2017) and FER (Multi-PIE)
• Analysis of generalization to unseen head poses - which poses are needed for training?
• Cross-database evaluation / new state of the art on FERA 2017 dataset
• Source code and data available: http://iikt.ovgu.de/FaNC.html
3. Face Normalization based on Learning Correspondences - FaNC (fancy!)
• Learntopredict correspondence point
coordinates and visibilities from facial
landmarks(trainingdata:SyLaFaN)
• Synthesize frontal view by warping,
mirroring, and blending
• Fast (2 ms on 2012 onboard hardware)
• Preserves expression, normalizes pose
and facial proportions
image with
landmarks
input
in-plane
normalization
source domain
prediction of correspondence
points and visibilities
source domain target domain
texture
warping
output
Fig. 1: Overview of FaNC method. Fig.2:SyntheticSyLaFaNdatabase.
4. Qualitative Comparison
• Problems: occlusions in high pitch angles
(nose and brow ridge), landmark errors
5. Action Unit Intensity Estimation
• FERA 2017 Challenge Dataset [39]
• TransferLearningwithNASNet[55]
• Regression of 7 AUs
Fig. 6: Results on unseen poses (solid) and seen poses (non-solid). Train-
ing was done with view 6 only (top) and with views 3, 6, and 9 (bottom).
6. Expression Recognition
• Multi-PIE Dataset
• TransferLearningwithNASNet[55]
• Classicationof6expressions
7. Conclusions
• Performancedecreasessignicantly
if tested poses deviate more than 20°
from training pose(s)
• Perform. decreases less with FaNC
• Simple registration methods (e.g.
SimStable)seemsufcientiftrain-
ing data cover pose space in steps of
about 40°
• But: FER with frontalization...
• benetsfromexistingfrontaldata
• requires less pose variation / frees
resources for varying other factors
• Room for improving frontalization
Fig.4:NormalizedandinputimagesfromLFWdatabase.
Table1:Landmark-basedfacenormalizationmethods.
Fig.5:NormalizedandinputimagesfromFERA2017data-
set. Bottom row are FaNC failure cases
Fig.3:NormalizedandinputimagesfromSyLaFaNdataset.
Fig. 7: AU intensity estimation results compared to state of the art. Mean
ICC over all views and AUs depending on number of views used for training
(outer plot) and number of model parameters (inner plot).
Fig. 8: Facial expression recognition results on unseen poses
(without shading). Training was done with view 0° only (top) and
with views −45°, 0°, and +45° (bottom). Mean accuracies across
all views in brackets. Trivial classier achieves 0.277.
P. Werner, F. Saxen, A. Al-Hamadi, H. Yu, "Generalizing to Unseen Head Poses in Facial Expression Recognition and Action Unit Intensity Estimation", in IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2019.
(C) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective
works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.