Conference PaperPDF Available

Rapid photorealistic blendshapes from commodity RGB-D sensors

Authors:

Abstract and Figures

Creating and animating a realistic 3D human face has been an important task in computer graphics. The capability of capturing the 3D face of a human subject and reanimate it quickly will find many applications in games, training simulations, and interactive 3D graphics. In this paper, we propose a system to capture photorealistic 3D faces and generate the blendshape models automatically using only a single commodity RGB-D sensor. Our method can rapidly generate a set of expressive facial poses from a single Microsoft Kinect and requires no artistic expertise on the part of the capture subject. The system takes only a matter of seconds to capture and produce a 3D facial pose and only requires 4 minutes of processing time to transform it into a blendshape model. Our main contributions include an end-to-end pipeline for capturing and generating face blendshape models automatically, and a registration method that solves dense correspondences between two face scans by utilizing facial landmark detection and optical flow. We demonstrate the effectiveness of the proposed method by capturing 3D facial models of different human subjects and puppeteering their models in an animation system with real-time facial performance retargeting.
No caption available
… 
Content may be subject to copyright.
Rapid Photorealistic Blendshapes from Commodity RGB-D Sensors
Dan Casas1, Oleg Alexander1, Andrew W. Feng1, Graham Fyffe§1, Ryosuke Ichikari1, Paul Debevec1, Rhuizhe Wangk2,
Evan Suma∗∗1, and Ari Shapiro††1
1Institute for Creative Technologies, University of Southern California
2University of Southern California
Figure 1: We describe an end-to-end method for scanning and processing a set of facial scans from a commodity depth scanner.
Abstract
Creating and animating a realistic 3D human face has been an
important task in computer graphics. The capability of capturing
the 3D face of a human subject and reanimate it quickly will find
many applications in games, training simulations, and interactive
3D graphics. In this paper, we propose a system to capture photo-
realistic 3D faces and generate the blendshape models automati-
cally using only a single commodity RGB-D sensor. Our method
can rapidly generate a set of expressive facial poses from a single
Microsoft Kinect and requires no artistic expertise on the part of
the capture subject. The system takes only a matter of seconds to
capture and produce a 3D facial pose and only requires 4 minutes
of processing time to transform it into a blendshape model. Our
main contributions include an end-to-end pipeline for capturing and
generating face blendshape models automatically, and a registration
method that solves dense correspondences between two face scans
by utilizing facial landmark detection and optical flow. We demon-
strate the effectiveness of the proposed method by capturing 3D
facial models of different human subjects and puppeteering their
models in an animation system with real-time facial performance
retargeting.
CR Categories: I.3.7 [Computer Graphics]: Three-Dimensional
Graphics and Realism—Animation;
Keywords: animation, blendshapes, faces, 3D scanning, Kinect
casas@ict.usc.edu
oalexander@ict.usc.edu
feng@ict.usc.edu
§fyffe@ict.usc.edu
debevec@ict.usc.edu
krhuizewa@usc.edu
∗∗suma@ict.usc.edu
††shapiro@ict.usc.edu
1 System Overview
The goal of our work is to build an end-to-end system that can
quickly capture a user’s face geometry using low-cost commodity
sensor and convert the raw scans into blendshape model automat-
ically, without the need for artist intervention. Since the raw face
scans have different positions and orientations, we run rigid align-
ment between expressions using iterative closest points (ICP) to ob-
tain a set of aligned scans. These scans are then unwrapped into a
2D representation of points cloud and texture UV map and stored
in EXR float images to be used for surface tracking. The surface
tracking then utilizes the 2D representation of face scans and find
correspondences from a source face pose to the target neutral face
pose. To guide the surface tracking, we first apply face feature de-
tection to find a set of facial landmark points on each scan. These
feature points are used to build a Delaunay triangulation on the UV
map as the intial constraints. This triangulation is used to pre-warp
the 2D map of each face scan to the target neutral face pose. Then
a dense image warping is done using optical flow to tranform the
source image to the target image. Once the dense correspondences
are established, the blendshape models can be produced by extract-
ing a consistent mesh from each face points cloud image using an
artist mesh.
Results generated by the proposed method can be used in many an-
imation and simulation environments that utilize blendshapes. Fig-
ure 1present results of facial animations created using the extracted
blendshapes. The character rig was in less than an hour using the
proposed pipeline, including capturing and processing.
The accompanying videos demonstrates the use of the data using an
animation system and through puppeteering using an online facial
retargeting system. We believe that the quality generated is suffi-
cient for many uses, such as on a 3D character in a video game, or
for live video conferencing.
Article
We present a method for performing real-time facial animation of a 3D avatar from binocular video. Existing facial animation methods fail to automatically capture precise and subtle facial motions for driving a photo-realistic 3D avatar "in-the-wild" (i.e., variability in illumination, camera noise). The novelty of our approach lies in a light-weight process for specializing a personalized face model to new environments that enables extremely accurate real-time face tracking anywhere. Our method uses a pre-trained high-fidelity personalized model of the face that we complement with a novel illumination model to account for variations due to lighting and other factors often encountered in-the-wild (e.g., facial hair growth, makeup, skin blemishes). Our approach comprises two steps. First, we solve for our illumination model's parameters by applying analysis-by-synthesis on a short video recording. Using the pairs of model parameters (rigid, non-rigid) and the original images, we learn a regression for real-time inference from the image space to the 3D shape and texture of the avatar. Second, given a new video, we fine-tune the real-time regression model with a few-shot learning strategy to adapt the regression model to the new environment. We demonstrate our system's ability to precisely capture subtle facial motions in unconstrained scenarios, in comparison to competing methods, on a diverse collection of identities, expressions, and real-world environments.
Book
This book introduces state-of-the-art research on virtual reality, simulation and serious games for education and its chapters presented the best papers from the 4th Asia-Europe Symposium on Simulation and Serious Games (4th AESSSG) held in Turku, Finland, December 2018. The chapters of the book present a multi-facet view on different approaches to deal with challenges that surround the uptake of educational applications of virtual reality, simulations and serious games in school practices. The different approaches highlight challenges and potential solutions and provide future directions for virtual reality, simulation and serious games research, for the design of learning material and for implementation in classrooms. By doing so, the book is a useful resource for both students and scholars interested in research in this field, for designers of learning material, and for practitioners that want to embrace virtual reality, simulation and/or serious games in their education.
Chapter
We present an efficient face reconstruction and real-time facial expression for VR interaction driven by RGB-D videos. A RGB-D camera is first used to capture depth images, and a coarse face model is then rapidly reconstructed. The user’s personalized avatar is generated using pre-defined face model template and shape morphing techniques. We track the user’s head motion and face expression using a RGB camera. A set of facial features are located and labelled on the colour images. Corresponding facial features are automatically labelled on the reconstructed face model. The user’s virtual avatar is driven by the set of facial features using Laplacian deformation. We demonstrate that our algorithm is able to rapidly create a personalized face model using depth images and achieve realtime facial expression for VR interaction using live RGB videos. Our algorithm can be used in online learning environments that allow learners to interact with simulated and controlled virtual agents.
Chapter
Full-text available
Understanding unit operations is an essential part in chemical engineering course. An important example is the continuous distillation column, which operation is often seen as a black box, where the incoming feed will undergo separation process inside the column to produce desired products. Despite having learned the concepts on how they work, students may find it difficult to comprehend and visualize what is going on inside a distillation column and how to connect various theories involved in the design and calculations. By developing a virtual visualization tool, such as augmented reality (AR), students can better visualize the process, such as fluid flow profiles and different components that make up a distillation column. Although the idea of incorporating AR for higher education learning is not entirely new, this is the first initiative to implement virtual technology for chemical engineering curriculum in Singapore, which serves as a novel pedagogical approach to complement the conventional pen-and-paper teaching method. Besides enhancing the students’ learning experience, it is believed that the AR application would provide a way to improve the students’ motivation and interest to learn the subject as well as a complementary tool for laboratory demonstration, as it is practically safe and time-saving.
Chapter
The use of virtual learning technologies has become of great interest in twenty-first century science education research. Simulations and other virtual learning technologies have been shown to enhance conceptual understandings of abstract scientific concepts and foster motivation and interest in science learning. This book chapter reports on research of South African pre-service teachers’ use of simulations in physical sciences learning. In adopting an explanatory sequential mixed method approach, we investigated the experiences of fifty (n = 50). Pre-service physical science teachers in the use of PhET simulations explored instructional scaffolding within simulations that supported learner engagement. Further to this, we report on attitudinal changes of these pre-service teachers towards the subject, before and after virtual learning interventions. Findings from investigations suggest that learning by simulations in virtual spaces enhances attitudes towards physical sciences, with post-test attitude scores being significantly higher than pre-test attitude scores. From qualitative data sets, pre-service teachers asserted that their visualization of microscientific phenomena was enhanced. Evidence also suggested that self-directed learning was promoted by the use of simulations. From the research, themes such as convenience of use, interest in science learning, and enablement of guided inquiry emerged. However, pre-service teachers did express concern that simulations lacked authenticity, by failing to replicate hands-on laboratory experiences that enhance the development of manipulative science process skills associated with practical work. Drawing from this research and other studies in this domain, we reflect on the role of virtual learning technologies like simulations in the learning process. We also provide instructional recommendations for science educators and designers of virtual learning material.
ResearchGate has not been able to resolve any references for this publication.