ArticlePublisher preview available

Compact Facial Landmark Layouts for Performance Capture

Wiley
Computer Graphics Forum
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

An abundance of older, as well as recent work exists at the intersection of computer vision and computer graphics on accurate estimation of dynamic facial landmarks with applications in facial animation, emotion recognition, and beyond. However, only a few publications exist that optimize the actual layout of facial landmarks to ensure an optimal trade‐off between compact layouts and detailed capturing. At the same time, we observe that applications like social games prefer simplicity and performance over detail to reduce the computational budget especially on mobile devices. Other common attributes of such applications are predefined low‐dimensional models to animate and a large, diverse user‐base. In contrast to existing methods that focus on creating person‐specific facial landmarks, we suggest to derive application‐specific facial landmarks. We formulate our optimization method on the widely adopted blendshape model. First, a score is defined suitable to compute a characteristic landmark for each blendshape. In a following step, we optimize a global function, which mimics merging of similar landmarks to one. The optimization is solved in less than a second using integer linear programming and guarantees a globally optimal solution to an NP‐hard problem. Our application‐specific approach is faster and fundamentally different to previous, actor‐specific methods. Resulting layouts are more similar to empirical layouts. Compared to empirical landmarks, our layouts require only a fraction of landmarks to achieve the same numerical error when reconstructing the animation from landmarks. The method is compared against previous work and tested on various blendshape models, representing a wide spectrum of applications.
This content is subject to copyright. Terms and conditions apply.
EUROGRAPHICS 2022 / R. Chaine and M. H. Kim
(Guest Editors)
Volume 41 (2022), Number 2
Compact Facial Landmark Layouts for Performance Capture
E. Zell1,2and R. McDonnell1
1Trinity College Dublin
2University of Bonn
Facial
Rig
Compact
Facial
Landmarks
Capturing
marker-based data annotation
M = 28 M = 32 M = 36 M = 24 M = 29 M = 33
Figure 1: Different to previous work, we suggest to derive facial landmarks from a low-dimensional facial rig by analyzing the degrees of
freedom. Our method (red) is purely based on the existing animation model and does not require large character databases or person-specific
4D sequences. Different compact layouts are computed by out method for two of Epic’s Metahuman character, with ε=0.3,0.5and 0.7for
the female and ε=0.5,0.7and 0.8for the male character.
Abstract
An abundance of older, as well as recent work exists at the intersection of computer vision and computer graphics on accurate
estimation of dynamic facial landmarks with applications in facial animation, emotion recognition, and beyond. However, only a
few publications exist that optimize the actual layout of facial landmarks to ensure an optimal trade-off between compact layouts
and detailed capturing. At the same time, we observe that applications like social games prefer simplicity and performance over
detail to reduce the computational budget especially on mobile devices. Other common attributes of such applications are pre-
defined low-dimensional models to animate and a large, diverse user-base. In contrast to existing methods that focus on creating
person-specific facial landmarks, we suggest to derive application-specific facial landmarks. We formulate our optimization
method on the widely adopted blendshape model. First, a score is defined suitable to compute a characteristic landmark for
each blendshape. In a following step, we optimize a global function, which mimics merging of similar landmarks to one. The
optimization is solved in less than a second using integer linear programming and guarantees a globally optimal solution to an
NP-hard problem. Our application-specific approach is faster and fundamentally different to previous, actor-specific methods.
Resulting layouts are more similar to empirical layouts. Compared to empirical landmarks, our layouts require only a fraction
of landmarks to achieve the same numerical error when reconstructing the animation from landmarks. The method is compared
against previous work and tested on various blendshape models, representing a wide spectrum of applications.
1. Introduction
Over the last two dacades, facial animation capturing evolved from
a research topic relevant only for high-end VFX application to a
widely accessible technology and is nowadays even integrated in
smartphones. Current applications span from highly-detailed cap-
tures of digital doubles to simple emoji animation, and from highly
actor-specific solutions to a nearly unlimited user base. Besides
capturing technology, best practices evolved for character creation
pipelines paving the way for parametric character configurators
like Epic’s MetaHuman, Daz3D Genesis or Polywink. The orig-
inally linear workflow, starting with motion capturing and move
afterwards to character creation and animation retargeting became
more and more non-linear due to convenient access and compelling
prices of pre-built characters. But if the character to animate exists
before the actual capturing, is it possible to limit the capturing data
and minimize data and processing time? We investigate the ques-
tion of how to distinguish between relevant and non-relevant infor-
© 2022 The Author(s)
Computer Graphics Forum © 2022 The Eurographics Association and John
Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.
DOI: 10.1111/cgf.14463
... MoCap systems have been used to capture detailed human motions, such as hands [20,21] or facial expressions [22,23], with applications in emotion recognition, grabbing, facial animation, and others. Zell and McDonnell's [23] proposed a novel algorithm for computing minimalistic facial landmark layouts specific to a blendshape model. ...
... MoCap systems have been used to capture detailed human motions, such as hands [20,21] or facial expressions [22,23], with applications in emotion recognition, grabbing, facial animation, and others. Zell and McDonnell's [23] proposed a novel algorithm for computing minimalistic facial landmark layouts specific to a blendshape model. Blendshape interpolation is the dominant approach to facial animation and provides the model's degrees of freedom. ...
Article
When considering sparse motion capture marker data, one typically struggles to balance its overfitting via a high dimensional blendshape system versus underfitting caused by smoothness constraints. With the current trend towards using more and more data, our aim is not to fit the motion capture markers with a parameterized (blendshape) model or to smoothly interpolate a surface through the marker positions, but rather to find an instance in the high resolution dataset that contains local geometry to fit each marker. Just as is true for typical machine learning applications, this approach benefits from a plethora of data, and thus we also consider augmenting the dataset via specially designed physical simulations that target the high resolution dataset such that the simulation output lies on the same so-called manifold as the data targeted.
Article
Complex deformable face-rigs have many independent parameters that control the shape of the object. A human face has upwards of 50 parameters (FACS Action Units), making conventional UI controls hard to find and operate. Animators address this problem by tediously hand-crafting in-situ layouts of UI controls that serve as visual deformation proxies, and facilitate rapid shape exploration. We propose the automatic creation of such in-situ UI control layouts. We distill the design choices made by animators into mathematical objectives that we optimize as the solution to an integer quadratic programming problem. Our evaluation is three-fold: we show the impact of our design principles on the resulting layouts; we show automated UI layouts for complex and diverse face rigs, comparable to animator handcrafted layouts; and we conduct a user study showing our UI layout to be an effective approach to face-rig manipulation, preferable to a baseline slider interface.
Article
Face2Face is an approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. To this end, we first address the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling. At run time, we track facial expressions of both source and target video using a dense photometric consistency measure. Reenactment is then achieved by fast and efficient deformation transfer between source and target. The mouth interior that best matches the re-targeted expression is retrieved from the target sequence and warped to produce an accurate fit. Finally, we convincingly re-render the synthesized target face on top of the corresponding video stream such that it seamlessly blends with the real-world illumination. We demonstrate our method in a live setup, where Youtube videos are reenacted in real time. This live setup has also been shown at SIGGRAPH Emerging Technologies 2016, by Thies et al. where it won the Best in Show Award.