Content uploaded by Ruizhe Wang
Author content
All content in this area was uploaded by Ruizhe Wang on May 06, 2016
Content may be subject to copyright.
Rapid Photorealistic Blendshapes from Commodity RGB-D Sensors
Dan Casas∗1, Oleg Alexander†1, Andrew W. Feng‡1, Graham Fyffe§1, Ryosuke Ichikari1, Paul Debevec¶1, Rhuizhe Wangk2,
Evan Suma∗∗1, and Ari Shapiro††1
1Institute for Creative Technologies, University of Southern California
2University of Southern California
Figure 1: We describe an end-to-end method for scanning and processing a set of facial scans from a commodity depth scanner.
Abstract
Creating and animating a realistic 3D human face has been an
important task in computer graphics. The capability of capturing
the 3D face of a human subject and reanimate it quickly will find
many applications in games, training simulations, and interactive
3D graphics. In this paper, we propose a system to capture photo-
realistic 3D faces and generate the blendshape models automati-
cally using only a single commodity RGB-D sensor. Our method
can rapidly generate a set of expressive facial poses from a single
Microsoft Kinect and requires no artistic expertise on the part of
the capture subject. The system takes only a matter of seconds to
capture and produce a 3D facial pose and only requires 4 minutes
of processing time to transform it into a blendshape model. Our
main contributions include an end-to-end pipeline for capturing and
generating face blendshape models automatically, and a registration
method that solves dense correspondences between two face scans
by utilizing facial landmark detection and optical flow. We demon-
strate the effectiveness of the proposed method by capturing 3D
facial models of different human subjects and puppeteering their
models in an animation system with real-time facial performance
retargeting.
CR Categories: I.3.7 [Computer Graphics]: Three-Dimensional
Graphics and Realism—Animation;
Keywords: animation, blendshapes, faces, 3D scanning, Kinect
∗casas@ict.usc.edu
†oalexander@ict.usc.edu
‡feng@ict.usc.edu
§fyffe@ict.usc.edu
¶debevec@ict.usc.edu
krhuizewa@usc.edu
∗∗suma@ict.usc.edu
††shapiro@ict.usc.edu
1 System Overview
The goal of our work is to build an end-to-end system that can
quickly capture a user’s face geometry using low-cost commodity
sensor and convert the raw scans into blendshape model automat-
ically, without the need for artist intervention. Since the raw face
scans have different positions and orientations, we run rigid align-
ment between expressions using iterative closest points (ICP) to ob-
tain a set of aligned scans. These scans are then unwrapped into a
2D representation of points cloud and texture UV map and stored
in EXR float images to be used for surface tracking. The surface
tracking then utilizes the 2D representation of face scans and find
correspondences from a source face pose to the target neutral face
pose. To guide the surface tracking, we first apply face feature de-
tection to find a set of facial landmark points on each scan. These
feature points are used to build a Delaunay triangulation on the UV
map as the intial constraints. This triangulation is used to pre-warp
the 2D map of each face scan to the target neutral face pose. Then
a dense image warping is done using optical flow to tranform the
source image to the target image. Once the dense correspondences
are established, the blendshape models can be produced by extract-
ing a consistent mesh from each face points cloud image using an
artist mesh.
Results generated by the proposed method can be used in many an-
imation and simulation environments that utilize blendshapes. Fig-
ure 1present results of facial animations created using the extracted
blendshapes. The character rig was in less than an hour using the
proposed pipeline, including capturing and processing.
The accompanying videos demonstrates the use of the data using an
animation system and through puppeteering using an online facial
retargeting system. We believe that the quality generated is suffi-
cient for many uses, such as on a 3D character in a video game, or
for live video conferencing.