Figure 14 - uploaded by Jay Busch
Content may be subject to copyright.
The shrink wrap term conforms the artist mesh to the static scan geometry, and also improves the transfer of expressive details to the dynamic performance. The registered artist mesh is shown for two static poses in (a) and (b), and a dynamic pose that borrows brow detail from (a) and mouth detail from (b) is shown in (c). Without the shrink wrap term, the registration to the static poses suffers (d, e) and the detail transfer to the dynamic performance is less sucessful (f). Fine-scale details are still transferred via displacement maps, but medium-scale expressive details are lost.
Source publication
We present a technique for creating realistic facial animation from a set of high-resolution static scans of an actor's face driven by passive video of the actor from one or more viewpoints. We capture high-resolution static geometry using multi-view stereo and gradient-based photometric stereo [Ghosh et al. 2011]. The scan set includes around 30 e...
Citations
... Manually and Video-driven Avatars: Although there has been an increasing trend in sophisticated face and motion tracking systems, the cost of production still remains high. In many production houses today, high fidelity speech animation is either created manually by an animator or using facial motion capture of human actors [4,6,14,40,42]. Both manually and performance-driven approaches require a skilled animator to precisely edit the resulting complex animation parameters. Such approaches are extremely time consuming and expensive, especially in applications such as computer generated movies, digital game production, and video conferencing, where talking faces for tens of hours of dialogue are required. ...
Audiovisual speech synthesis is the problem of synthesizing a talking face while maximizing the coherency of the acoustic and visual speech. In this paper, we propose and compare two audiovisual speech synthesis systems for 3D face models. The first system is the AVTacotron2, which is an end-to-end text-to-audiovisual speech synthesizer based on the Tacotron2 architecture. AVTacotron2 converts a sequence of phonemes representing the sentence to synthesize into a sequence of acoustic features and the corresponding controllers of a face model. The output acoustic features are used to condition a WaveRNN to reconstruct the speech waveform, and the output facial controllers are used to generate the corresponding video of the talking face. The second audiovisual speech synthesis system is modular, where acoustic speech is synthesized from text using the traditional Tacotron2. The reconstructed acoustic speech signal is then used to drive the facial controls of the face model using an independently trained audio-to-facial-animation neural network. We further condition both the end-to-end and modular approaches on emotion embeddings that encode the required prosody to generate emotional audiovisual speech. We analyze the performance of the two systems and compare them to the ground truth videos using subjective evaluation tests. The end-to-end and modular systems are able to synthesize close to human-like audiovisual speech with mean opinion scores (MOS) of 4.1 and 3.9, respectively, compared to a MOS of 4.1 for the ground truth generated from professionally recorded videos. While the end-to-end system gives a better overall quality, the modular approach is more flexible and the quality of acoustic speech and visual speech synthesis is almost independent of each other.
... Modeling methods are reviewed in Hartley and Zisserman (2004), Esteban et al. (2008) and Furukawa and Hernandez (2015). There is a highly developed literature for the very important case of faces, recently reviewed in Ghosh et al. (2011), Fyffe et al. (2013 and Fyffe et al. (2017). An alternative, which we believe has not been explored in the literature, would be to recover an illumination cone for the fragment from multiple images. ...
We present an object relighting system that allows an artist to select an object from an image and insert it into a target scene. Through simple interactions, the system can adjust illumination on the inserted object so that it appears naturally in the scene. To support image-based relighting, we build object model from the image, and propose a \emph{perceptually-inspired} approximate shading model for the relighting. It decomposes the shading field into (a) a rough shape term that can be reshaded, (b) a parametric shading detail that encodes missing features from the first term, and (c) a geometric detail term that captures fine-scale material properties. With this decomposition, the shading model combines 3D rendering and image-based composition and allows more flexible compositing than image-based methods. Quantitative evaluation and a set of user studies suggest our method is a promising alternative to existing methods of object insertion.
... Our solution to the stabilization problem requires a smooth set of rigid transforms that, when applied to a "4D tracked" mesh, will cause the motion of many vertices to have an approximate but clear mode. While our 4D tracking system is proprietary, the stabilization algorithm presented here does not rely on details of the tracking and could be used with published algorithms [17]- [19]; see [5] for one recent survey. ...
... Most of the recent work has been on data driven methods. Some state-of-the-art methods focus on obtaining realism based on multiview stereo [Wu et al. 2011;Ghosh et al. 2011;Beeler et al. 2011;Furukawa and Ponce 2010;Bickel et al. 2007]; this data can be used to drive blendshapes [Fyffe et al. 2013]. Some of the work is based on binocular [Valgaerts et al. 2012] and monocular [Garrido et al. 2013;Shi et al. 2014] videos. ...
We propose a system for real-time animation of eyes that can be interactively controlled in a WebGL enabled device using a small number of animation parameters, including gaze. These animation parameters can be obtained using traditional keyframed animation curves, measured from an actor's performance using off-the-shelf eye tracking methods, or estimated from the scene observed by the character, using behavioral models of human vision. We present a model of eye movement, that includes not only movement of the globes, but also of the eyelids and other soft tissues in the eye region. The model includes formation of expression wrinkles in soft tissues. To our knowledge this is the first system for real-time animation of soft tissue movement around the eyes based on gaze input.
In this paper, we investigate a new visual restoration task, termed as hallucinating visual instances in total absentia (HVITA). Unlike conventional image inpainting task that works on images with only part of a visual instance missing, HVITA concerns scenarios where an object is completely absent from the scene. This seemingly minor difference in fact makes the HVITA a much challenging task, as the restoration algorithm would have to not only infer the category of the object in total absentia, but also hallucinate an object of which the appearance is consistent with the background. Towards solving HVITA, we propose an end-to-end deep approach that explicitly looks into the global semantics within the image. Specifically, we transform the input image to a semantic graph, wherein each node corresponds to a detected object in the scene. We then adopt a Graph Convolutional Network on top of the scene graph to estimate the category of the missing object in the masked region, and finally introduce a Generative Adversarial Module to carry out the hallucination. Experiments on COCO, Visual Genome and NYU Depth v2 datasets demonstrate that the proposed approach yields truly encouraging and visually plausible results.
Humans are extremely sensitive to facial realism and spend a surprisingly amount of time focusing their attention on other people's faces. Thus, believable human character animation requires realistic facial performance. Various techniques have been developed to capture highly detailed actor performance or to help drive facial animation. However, the eye region remains a largely unexplored field and automatic animation of this region is still an open problem. We tackle two different aspects of automatically generating facial features, aiming to recreate the small intricacies of the eye region in real-time. First, we present a system for real-time animation of eyes that can be interactively controlled using a small number of animation parameters, including gaze. These parameters can be obtained using traditional animation curves, measured from an actors performance using off-the-shelf eye tracking methods, or estimated from the scene observed by the character using behavioral models of human vision. We present a model of eye movement, that includes not only movement of the globes, but also of the eyelids and other soft tissues in the eye region. To our knowledge this is the first system for real-time animation of soft tissue movement around the eyes based on gaze input. Second, we present a method for real-time generation of distance fields for any mesh in screen space. This method does not depend on object complexity or shape, being only constrained by the intended field resolution. We procedurally generate lacrimal lakes on a human character using the generated distance field as input. We present different sampling algorithms for surface exploration and distance estimation, and compare their performance. To our knowledge this is the first method for real-time or screen space generation of distance fields.
Facial modelling is a fundamental technique in a variety of applications in computer graphics, computer vision and pattern recognition areas. As 3D technologies evolved over the years, the quality of facial modelling greatly improved. To enhance the modelling quality and controllability of the model further, parametric methods, which represent or manipulate facial attributes (e.g. identity, expression, viseme) with a set of control parameters, have been proposed in recent years. The aim of this chapter is to give a comprehensive overview of current state-of-the-art parametric methods for realistic facial modelling and animation.