Figure 1 - uploaded by Jay Busch
Content may be subject to copyright.
(Left) Three of eight high-res (0.1mm) light stage scans of the actor in static expressions. (Middle) Seven-camera HD performance recording. (Right) 180Hz video-driven blendshape model with screen-space subsurface scattering and advanced eye shading effects.
Source publication
Overview In 2008, the "Digital Emily" project [Alexander et al. 2009] showed how a set of high-resolution facial expressions scanned in a light stage could be rigged into a real-time photoreal digital character and driven with video-based facial animation techniques. However, Digital Emily was rendered offline, involved just the front of the face,...
Similar publications
With the introduction of concepts for virtual interaction and digital doubles, a rich scenario has been created for embodied avatars to strive. These avatars, more recently referred to as digital humans, have become a popular area of research, resulting in various techniques and methods that focus on improving the perception of their realism, fidel...
Citations
... Furthermore, to map information over the entire scalp, such as hair or skin variations, to track changes over time, we need to recover the alignment of the head for the entire pose range, including top and back views where the face is occluded and traditional facial landmark detectors cannot be applied. This has been achieved with high-end multi-camera setups [2,12,17] or 3D scanners [16,29]. However, reconstructing accurate head shape, requires setups that are not easily scalable to the clinical practice nor to the general population. ...
We address the problem of estimating the shape of a person's head, defined as the geometry of the complete head surface, from a video taken with a single moving camera, and determining the alignment of the fitted 3D head for all video frames, irrespective of the person's pose. 3D head reconstructions commonly tend to focus on perfecting the face reconstruction, leaving the scalp to a statistical approximation. Our goal is to reconstruct the head model of each person to enable future mixed reality applications. To do this, we recover a dense 3D reconstruction and camera information via structure-from-motion and multi-view stereo. These are then used in a new two-stage fitting process to recover the 3D head shape by iteratively fitting a 3D morphable model of the head with the dense reconstruction in canonical space and fitting it to each person's head, using both traditional facial landmarks and scalp features extracted from the head's segmentation mask. Our approach recovers consistent geometry for varying head shapes, from videos taken by different people, with different smartphones, and in a variety of environments from living rooms to outdoor spaces.
... Most light stages consist of room-scale, spherical arrays of brightly-flashing colored lights and cameras. They are used widely for movie special effects [5,2,1,21,15], volumetric media [25,6], presidential portraits [12], and to provide rich data for training computer vision relighting algorithms [18,24,19,9,13,20,10]. ...
Every time you sit in front of a TV or monitor, your face is actively illuminated by time-varying patterns of light. This paper proposes to use this time-varying illumination for synthetic relighting of your face with any new illumination condition. In doing so, we take inspiration from the light stage work of Debevec et al., who first demonstrated the ability to relight people captured in a controlled lighting environment. Whereas existing light stages require expensive, room-scale spherical capture gantries and exist in only a few labs in the world, we demonstrate how to acquire useful data from a normal TV or desktop monitor. Instead of subjecting the user to uncomfortable rapidly flashing light patterns, we operate on images of the user watching a YouTube video or other standard content. We train a deep network on images plus monitor patterns of a given user and learn to predict images of that user under any target illumination (monitor pattern). Experimental evaluation shows that our method produces realistic relighting results. Video results are available at http://grail.cs.washington.edu/projects/Light_Stage_on_Every_Desk/.
... To ensure efficient learning, we aligned the renderings by matching the inner corner of each subject's right eye to a fixed position. We use the techniques of Alexander et al. [2] and Chiang and Fyffe [16] to render photo-realistic portraits in real-time using OpenGL shaders, which include separable subsurface scattering in screen-space and photo-real eye rendering. ...
Near-range portrait photographs often contain perspective distortion artifacts that bias human perception and challenge both facial recognition and reconstruction techniques. We present the first deep learning based approach to remove such artifacts from unconstrained portraits. In contrast to the previous state-of-the-art approach, our method handles even portraits with extreme perspective distortion, as we avoid the inaccurate and error-prone step of first fitting a 3D face model. Instead, we predict a distortion correction flow map that encodes a per-pixel displacement that removes distortion artifacts when applied to the input image. Our method also automatically infers missing facial features, i.e. occluded ears caused by strong perspective distortion, with coherent details. We demonstrate that our approach significantly outperforms the previous state-of-the-art both qualitatively and quantitatively, particularly for portraits with extreme perspective distortion or facial expressions. We further show that our technique benefits a number of fundamental tasks, significantly improving the accuracy of both face recognition and 3D reconstruction and enables a novel camera calibration technique from a single portrait. Moreover, we also build the first perspective portrait database with a large diversity in identities, expression and poses, which will benefit the related research in this area.
... (no hair, or head) has progressed tremendously in the last decade. Beginning with high-detailed head geometry with a stereo capturing system [Alexander et al. 2013;Beeler et al. 2010;Debevec 2012], then RGB-D-based methods like dynamic fusion [Newcombe et al. 2015] and non-rigid reconstruction methods [Thies et al. 2015;Zollhöfer et al. 2014] allowed capture to be real-time and much easier with offthe-shelf devices. [Blanz and Vetter 1999] proposed a 3D morphable face model to represent any person's face shape using a linear combination of face bases, [Richardson et al. 2016[Richardson et al. , 2017Tran et al. 2017] proposed CNN-based systems and [Kemelmacher-Shlizerman and Basri 2011;Kemelmacher-Shlizerman and Seitz 2011;Suwajanakorn et al. 2014Suwajanakorn et al. , 2015 showed how to estimate highly detailed shapes from Internet photos and videos. ...
Imagine taking a selfie video with your mobile phone and getting as output a 3D model of your head (face and 3D hair strands) that can be later used in VR, AR, and any other domain. State of the art hair reconstruction methods allow either a single photo (thus compromising 3D quality) or multiple views, but they require manual user interaction (manual hair segmentation and capture of fixed camera views that span full 360°). In this paper, we describe a system that can completely automatically create a reconstruction from any video (even a selfie video), and we don't require specific views, since taking your -90°, 90°, and full back views is not feasible in a selfie capture.
In the core of our system, in addition to the automatization components, hair strands are estimated and deformed in 3D (rather than 2D as in state of the art) thus enabling superior results. We provide qualitative, quantitative, and Mechanical Turk human studies that support the proposed system, and show results on a diverse variety of videos (8 different celebrity videos, 9 selfie mobile videos, spanning age, gender, hair length, type, and styling).
... Calibrated head modeling has achieved amazing results over the last decade [4,5,6]. Calibrated methods require a person to participate in a capturing session to achieve good results. ...
3D face reconstruction from Internet photos has recently produced exciting results. A person's face, e.g., Tom Hanks, can be modeled and animated in 3D from a completely uncalibrated photo collection. Most methods, however, focus solely on face area and mask out the rest of the head. This paper proposes that head modeling from the Internet is a problem we can solve. We target reconstruction of the rough shape of the head. Our method is to gradually "grow" the head mesh starting from the frontal face and extending to the rest of views using photometric stereo constraints. We call our method boundary-value growing algorithm. Results on photos of celebrities downloaded from the Internet are presented.
... Face modeling (no hair, or head) has progressed tremendously in the last decade. Beginning with high-detailed head geometry with a stereo capturing system [Alexander et al. 2013;Beeler et al. 2010;Debevec 2012], then RGB-D-based methods like dynamic fusion [Newcombe et al. 2015] and non-rigid reconstruction methods [Thies et al. 2015;Zollhöfer et al. 2014] allowed capture to be real-time and much easier with off-the-shelf devices. [Blanz and Vetter 1999] proposed a 3D morphable face model to represent any person's face shape using a linear combination of face bases, [Richardson et al. 2016[Richardson et al. , 2017Tran et al. 2017] proposed CNN-based systems and [Kemelmacher-Shlizerman and Basri 2011;Kemelmacher-Shlizerman and Seitz 2011;Suwajanakorn et al. 2014Suwajanakorn et al. , 2015 showed how to estimate highly detailed shapes from Internet photos and videos. ...
Imagine taking a selfie video with your mobile phone and getting as output a 3D model of your head (face and 3D hair strands) that can be later used in VR, AR, and any other domain. State of the art hair reconstruction methods allow either a single photo (thus compromising 3D quality) or multiple views, but they require manual user interaction (manual hair segmentation and capture of fixed camera views that span full 360 degree). In this paper, we describe a system that can completely automatically create a reconstruction from any video (even a selfie video), and we don't require specific views, since taking your -90 degree, 90 degree, and full back views is not feasible in a selfie capture. In the core of our system, in addition to the automatization components, hair strands are estimated and deformed in 3D (rather than 2D as in state of the art) thus enabling superior results. We provide qualitative, quantitative, and Mechanical Turk human studies that support the proposed system, and show results on a diverse variety of videos (8 different celebrity videos, 9 selfie mobile videos, spanning age, gender, hair length, type, and styling).
... Acquiring high-detail 3D face meshes is challenging due to the highly non-rigid nature of human faces. Highdetail reconstruction methods currently require the subject to come to a lab equipped with a calibrated set of cameras and/or lights, e.g., multi-view stereo approaches [6,7,11], structured light [32], and light stages [1,2,16]. For many applications, however, we would like to enable scanning capabilities anywhere. ...
We present an algorithm that takes a single frame of a person's face from a depth camera, e.g., Kinect, and produces a high-resolution 3D mesh of the input face. We leverage a dataset of 3D face meshes of 1204 distinct individuals ranging from age 3 to 40, captured in a neutral expression. We divide the input depth frame into semantically significant regions (eyes, nose, mouth, cheeks) and search the database for the best matching shape per region. We further combine the input depth frame with the matched database shapes into a single mesh that results in a high-resolution shape of the input person. Our system is fully automatic and uses only depth data for matching, making it invariant to imaging conditions. We evaluate our results using ground truth shapes, as well as compare to state-of-the-art shape estimation methods. We demonstrate the robustness of our local matching approach with high-quality reconstruction of faces that fall outside of the dataset span, e.g., faces older than 40 years old, facial expressions, and different ethnicities.
... Calibrated head modeling has achieved amazing results over the last decade [4][5][6]. Calibrated methods require a person to participate in a capturing session to achieve good results. These typically take as input a video with relatively constant lighting, and large pose variation across the video. ...
3D face reconstruction from Internet photos has recently produced exciting results. A person’s face, e.g., Tom Hanks, can be modeled and animated in 3D from a completely uncalibrated photo collection. Most methods, however, focus solely on face area and mask out the rest of the head. This paper proposes that head modeling from the Internet is a problem we can solve. We target reconstruction of the rough shape of the head. Our method is to gradually “grow” the head mesh starting from the frontal face and extending to the rest of views using photometric stereo constraints. We call our method boundary-value growing algorithm. Results on photos of celebrities downloaded from the Internet are presented.
... In high-end production, special hardware setups (e.g., the Light Stage system) have been used to create photorealistic dynamic avatars with fine-scale skin details [Alexander et al. 2009;Alexander et al. 2013]. Jimenez et al. [2010] compute dynamic skin ap- pearances by blending hemoglobin distributions captured with dif- ferent expressions. ...
We present a novel image-based representation for dynamic 3D avatars, which allows effective handling of various hairstyles and headwear, and can generate expressive facial animations with fine-scale details in real-time. We develop algorithms for creating an image-based avatar from a set of sparsely captured images of a user, using an off-the-shelf web camera at home. An optimization method is proposed to construct a topologically consistent morphable model that approximates the dynamic hair geometry in the captured images. We also design a real-time algorithm for synthesizing novel views of an image-based avatar, so that the avatar follows the facial motions of an arbitrary actor. Compelling results from our pipeline are demonstrated on a variety of cases.
... Remarkable work is carried on realtime photoreal rendering in order to portray realistic reflections and wetness of the eye [Jimenez, 2012] (see Figure 2.1). [Parke, 1974], the Light Stage model 6, Digital Ira [Alexander et al., 2013] obtained with the Light Stage technology, realtime photorealistic eye rendering [Jimenez, 2012]. ...
The work presented in this thesis addresses the problem of generating audio-visual expressive performances for virtual actors. A virtual actor is represented by a 3D talking head and an audio-visual performance refers to facial expressions, head movements, gaze direction and the speech signal.While an important amount of work has been dedicated to emotions, we explore here expressive verbal behaviors that signal mental states, i.e "how speakers feel about what they say". We explore the characteristics of these so-called dramatic attitudes and the way they are encoded with speaker-specific prosodic signatures i.e. mental state-specific patterns of trajectories of audio-visual prosodic parameters.