Article

Compact Facial Landmark Layouts for Performance Capture

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

An abundance of older, as well as recent work exists at the intersection of computer vision and computer graphics on accurate estimation of dynamic facial landmarks with applications in facial animation, emotion recognition, and beyond. However, only a few publications exist that optimize the actual layout of facial landmarks to ensure an optimal trade‐off between compact layouts and detailed capturing. At the same time, we observe that applications like social games prefer simplicity and performance over detail to reduce the computational budget especially on mobile devices. Other common attributes of such applications are predefined low‐dimensional models to animate and a large, diverse user‐base. In contrast to existing methods that focus on creating person‐specific facial landmarks, we suggest to derive application‐specific facial landmarks. We formulate our optimization method on the widely adopted blendshape model. First, a score is defined suitable to compute a characteristic landmark for each blendshape. In a following step, we optimize a global function, which mimics merging of similar landmarks to one. The optimization is solved in less than a second using integer linear programming and guarantees a globally optimal solution to an NP‐hard problem. Our application‐specific approach is faster and fundamentally different to previous, actor‐specific methods. Resulting layouts are more similar to empirical layouts. Compared to empirical landmarks, our layouts require only a fraction of landmarks to achieve the same numerical error when reconstructing the animation from landmarks. The method is compared against previous work and tested on various blendshape models, representing a wide spectrum of applications.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Recent advances in facial landmark detection achieve success by learning discriminative features from rich deformation of face shapes and poses. Besides the variance of faces themselves, the intrinsic variance of image styles, e.g., grayscale vs. color images, light vs. dark, intense vs. dull, and so on, has constantly been overlooked. This issue becomes inevitable as increasing web images are collected from various sources for training neural networks. In this work, we propose a style-aggregated approach to deal with the large intrinsic variance of image styles for facial landmark detection. Our method transforms original face images to style-aggregated images by a generative adversarial module. The proposed scheme uses the style-aggregated image to maintain face images that are more robust to environmental changes. Then the original face images accompanying with style-aggregated ones play a duet to train a landmark detector which is complementary to each other. In this way, for each face, our method takes two images as input, i.e., one in its original style and the other in the aggregated style. In experiments, we observe that the large variance of image styles would degenerate the performance of facial landmark detectors. Moreover, we show the robustness of our method to the large variance of image styles by comparing to a variant of our approach, in which the generative adversarial module is removed, and no style-aggregated images are used. Our approach is demonstrated to perform well when compared with state-of-the-art algorithms on benchmark datasets AFLW and 300-W. Code is publicly available on GitHub: https://github.com/D-X-Y/SAN.
Article
Full-text available
The field of 3D face modeling has a large gap between high-end and low-end methods. At the high end, the best facial animation is indistinguishable from real humans, but this comes at the cost of extensive manual labor. At the low end, face capture from consumer depth sensors relies on 3D face models that are not expressive enough to capture the variability in natural facial shape and expression. We seek a middle ground by learning a facial model from thousands of accurately aligned 3D scans. Our FLAME model (Faces Learned with an Articulated Model and Expressions) is designed to work with existing graphics software and be easy to fit to data. FLAME uses a linear shape space trained from 3800 scans of human heads. FLAME combines this linear shape space with an articulated jaw, neck, and eyeballs, pose-dependent corrective blendshapes, and additional global expression blendshapes. The pose and expression dependent articulations are learned from 4D face sequences in the D3DFACS dataset along with additional 4D sequences. We accurately register a template mesh to the scan sequences and make the D3DFACS registrations available for research purposes. In total the model is trained from over 33, 000 scans. FLAME is low-dimensional but more expressive than the FaceWarehouse model and the Basel Face Model. We compare FLAME to these models by fitting them to static 3D scans and 4D sequences using the same optimization method. FLAME is significantly more accurate and is available for research purposes (http://flame.is.tue.mpg.de).
Article
Full-text available
Spatially localized deformation components are very useful for shape analysis and synthesis in 3D geometry processing. Several methods have recently been developed, with an aim to extract intuitive and interpretable deformation components. However, these techniques suffer from fundamental limitations especially for meshes with noise or large-scale deformations, and may not always be able to identify important deformation components. In this paper we propose a novel mesh-based autoencoder architecture that is able to cope with meshes with irregular topology. We introduce sparse regularization in this framework, which along with convolutional operations, helps localize deformations. Our framework is capable of extracting localized deformation components from mesh data sets with large-scale deformations and is robust to noise. It also provides a nonlinear approach to reconstruction of meshes using the extracted basis, which is more effective than the current linear combination approach. Extensive experiments show that our method outperforms state-of-the-art methods in both qualitative and quantitative evaluations.
Conference Paper
Full-text available
We present an approach to simultaneously solve the two problems of face alignment and 3D face reconstruction from an input 2D face image of arbitrary poses and expressions. The proposed method iteratively and alternately applies two sets of cascaded regressors, one for updating 2D landmarks and the other for updating reconstructed pose-expression-normalized (PEN) 3D face shape. The 3D face shape and the landmarks are correlated via a 3D-to-2D mapping matrix. In each iteration , adjustment to the landmarks is firstly estimated via a landmark regressor, and this landmark adjustment is also used to estimate 3D face shape adjustment via a shape regressor. The 3D-to-2D mapping is then computed based on the adjusted 3D face shape and 2D landmarks, and it further refines the 2D landmarks. An effective algorithm is devised to learn these regressors based on a training dataset of pairing annotated 3D face shapes and 2D face images. Compared with existing methods, the proposed method can fully automatically generate PEN 3D face shapes in real time from a single 2D face image and locate both visible and invisible 2D landmarks. Extensive experiments show that the proposed method can achieve the state-of-the-art accuracy in both face alignment and 3D face reconstruction, and benefit face recognition owing to its reconstructed PEN 3D face shapes.
Conference Paper
Full-text available
We seek to determine an optimal set of markers for marker-based facial motion capture and animation control. The problem is addressed in two different ways: on the one hand, different sets of empirical markers classically used in computer animation are evaluated; on the other hand, a clustering method that automatically determines optimal marker sets is proposed and compared with the empirical marker sets. To evaluate the quality of a set of markers, we use a blendshape-based synthesis technique that learns the mapping between marker positions and blendshape weights, and we calculate the reconstruction error of various animated sequences created from the considered set of markers in comparison to ground truth data. Our results show that the clustering method outperforms the heuristic approach.
Article
Full-text available
The facial performance of an individual is inherently rich in subtle deformation and timing details. Although these subtleties make the performance realistic and compelling, they often elude both motion capture and hand animation. We present a technique for adding fine-scale details and expressiveness to low-resolution art-directed facial performances, such as those created manually using a rig, via marker-based capture, by fitting a morphable model to a video, or through Kinect reconstruction using recent faceshift technology. We employ a high-resolution facial performance capture system to acquire a representative performance of an individual in which he or she explores the full range of facial expressiveness. From the captured data, our system extracts an expressiveness model that encodes subtle spatial and temporal deformation details specific to that particular individual. Once this model has been built, these details can be transferred to low-resolution art-directed performances. We demonstrate results on various forms of input; after our enhancement, the resulting animations exhibit the same nuances and fine spatial details as the captured performance, with optional temporal enhancement to match the dynamics of the actor. Finally, we show that our technique outperforms the current state-of-the-art in example-based facial animation.
Article
Full-text available
The goal of a practical facial animation retargeting system is to reproduce the character of a source animation on a target face while providing room for additional creative control by the animator. This article presents a novel spacetime facial animation retargeting method for blendshape face models. Our approach starts from the basic principle that the source and target movements should be similar. By interpreting movement as the derivative of position with time, and adding suitable boundary conditions, we formulate the retargeting problem as a Poisson equation. Specified (e.g., neutral) expressions at the beginning and end of the animation as well as any user-specified constraints in the middle of the animation serve as boundary conditions. In addition, a model-specific prior is constructed to represent the plausible expression space of the target face during retargeting. A Bayesian formulation is then employed to produce target animation that is consistent with the source movements while satisfying the prior constraints. Since the preservation of temporal derivatives is the primary goal of the optimization, the retargeted motion preserves the rhythm and character of the source movement and is free of temporal jitter. More importantly, our approach provides spacetime editing for the popular blendshape representation of facial models, exhibiting smooth and controlled propagation of user edits across surrounding frames.
Conference Paper
Full-text available
Face alignment is a crucial step in face recognition tasks. Especially, using landmark localization for geometric face normalization has shown to be very effective, clearly improving the recognition results. However, no adequate databases exist that provide a sufficient number of annotated facial landmarks. The databases are either limited to frontal views, provide only a small number of annotated images or have been acquired under controlled conditions. Hence, we introduce a novel database overcoming these limitations: Annotated Facial Landmarks in the Wild (AFLW). AFLW provides a large-scale collection of images gathered from Flickr, exhibiting a large variety in face appearance (e.g., pose, expression, ethnicity, age, gender) as well as general imaging and environmental conditions. In total 25,993 faces in 21,997 real-world images are annotated with up to 21 landmarks per image. Due to the comprehensive set of annotations AFLW is well suited to train and test algorithms for multi-view face detection, facial landmark localization and face pose estimation. Further, we offer a rich set of tools that ease the integration of other face databases and associated annotations into our joint framework.
Conference Paper
Full-text available
We develop an automatic system to analyze subtle changes in upper face expressions based on both permanent facial features (brows, eyes, mouth) and transient facial features (deepening of facial furrows) in a nearly frontal image sequence. Our system recognizes fine-grained changes in facial expression based on Facial Action Coding System (FACS) action units (AUs). Multi-state facial component models are proposed for tracting and modeling different facial features, including eyes, brews, cheeks, and furrows. Then we convert the results of tracking to detailed parametric descriptions of the facial features. These feature parameters are fed to a neural network which recognizes 7 upper face action units. A recognition rate of 95% is obtained for the test data that include both single action units and AU combinations
Conference Paper
Full-text available
Most automatic expression analysis systems attempt to recognize a small set of prototypic expressions (e.g., happiness and anger). Such prototypic expressions, however, occur infrequently. Human emotions and intentions are communicated more often by changes in one or two discrete facial features. We develop an automatic system to analyze subtle changes in facial expressions based on both permanent (e.g., mouth, eye, and brow) and transient (e.g., furrows and wrinkles) facial features in a nearly frontal image sequence. Multi-state facial component models are proposed for tracking and modeling different facial features. Based on these multi-state models, and without artificial enhancement, we detect and track the facial features, including mouth, eyes, brow, cheeks, and their related wrinkles and facial furrows. Moreover we recover detailed parametric descriptions of the facial features. With these features as the inputs, 11 individual action units or action unit combinations are recognized by a neural network algorithm. A recognition rate of 96.7% is obtained. The recognition results indicate that our system can identify action units regardless of whether they occur singly or in combinations
Conference Paper
Full-text available
This paper presents the first dynamic 3D FACS data set for facial expression research, containing 10 subjects performing between 19 and 97 different AUs both individually and in combination. In total the corpus contains 519 AU sequences. The peak expression frame of each sequence has been manually FACS coded by certified FACS experts. This provides a ground truth for 3D FACS based AU recognition systems. In order to use this data, we describe the first framework for building dynamic 3D morphable models. This includes a novel Active Appearance Model (AAM) based 3D facial registration and mesh correspondence scheme. The approach overcomes limitations in existing methods that require facial markers or are prone to optical flow drift. We provide the first quantitative assessment of such 3D facial mesh registration techniques and show how our proposed method provides more reliable correspondence.
Article
Full-text available
We present a novel method for acquisition, modeling, compression, and synthesis of realistic facial deformations using polynomial displacement maps. Our method consists of an analysis phase where the relationship between motion capture markers and detailed facial geometry is inferred, and a synthesis phase where novel detailed animated facial geometry is driven solely by a sparse set of motion capture markers. For analysis, we record the actor wearing facial markers while performing a set of training expression clips. We capture real-time high-resolution facial deformations, including dynamic wrinkle and pore detail, using interleaved structured light D scanning and photometric stereo. Next, we compute displacements between a neutral mesh driven by the motion capture markers and the high-resolution captured expressions. These geometric displacements are stored in a polynomial displacement map which is parameterized according to the local deformations of the motion capture dots. For synthesis, we drive the polynomial displacement map with new motion capture data. This allows the recreation of large-scale muscle deformation, medium and fine wrinkles, and dynamic skin pore detail. Applications include the compression of existing performance data and the synthesis of new performances. Our technique is independent of the underlying geometry capture system and can be used to automatically generate high-frequency wrinkle and pore details on top of many existing facial animation systems.
Article
Complex deformable face-rigs have many independent parameters that control the shape of the object. A human face has upwards of 50 parameters (FACS Action Units), making conventional UI controls hard to find and operate. Animators address this problem by tediously hand-crafting in-situ layouts of UI controls that serve as visual deformation proxies, and facilitate rapid shape exploration. We propose the automatic creation of such in-situ UI control layouts. We distill the design choices made by animators into mathematical objectives that we optimize as the solution to an integer quadratic programming problem. Our evaluation is three-fold: we show the impact of our design principles on the resulting layouts; we show automated UI layouts for complex and diverse face rigs, comparable to animator handcrafted layouts; and we conduct a user study showing our UI layout to be an effective approach to face-rig manipulation, preferable to a baseline slider interface.
Article
The computer graphics and vision communities have dedicated long standing efforts in building computerized tools for reconstructing, tracking, and analyzing human faces based on visual input. Over the past years rapid progress has been made, which led to novel and powerful algorithms that obtain impressive results even in the very challenging case of reconstruction from a single RGB or RGB‐D camera. The range of applications is vast and steadily growing as these technologies are further improving in speed, accuracy, and ease of use. Motivated by this rapid progress, this state‐of‐the‐art report summarizes recent trends in monocular facial performance capture and discusses its applications, which range from performance‐based animation to real‐time facial reenactment. We focus our discussion on methods where the central task is to recover and track a three dimensional model of the human face using optimization‐based reconstruction algorithms. We provide an in‐depth overview of the underlying concepts of real‐world image formation, and we discuss common assumptions and simplifications that make these algorithms practical. In addition, we extensively cover the priors that are used to better constrain the under‐constrained monocular reconstruction problem, and discuss the optimization techniques that are employed to recover dense, photo‐geometric 3D face models from monocular 2D data. Finally, we discuss a variety of use cases for the reviewed algorithms in the context of motion capture, facial animation, as well as image and video editing.
Article
While facial capturing focuses on accurate reconstruction of an actor's per formance, facial animation retargeting has the goal to transfer the animation to another character, such that the semantic meaning of the animation remains. Because of the popularity of blendshape animation, this effectively means to compute suitable blendshape weights for the given target character. Current methods either require manually created examples of matching expressions of actor and target character, or are limited to characters with similar facial proportions (i.e., realistic models). In contrast, our approach can automatically retarget facial animations from a real actor to stylized characters. We formulate the problem of transferring the blendshapes of a facial rig to an actor as a special case of manifold alignment, by exploring the similarities of the motion spaces defined by the blendshapes and by an expressive training sequence of the actor. In addition, we incorporate a simple, yet elegant facial prior based on discrete differential properties to guarantee smooth mesh deformation. Our method requires only sparse correspondences between characters and is thus suitable for retargeting marker-less and marker-based motion capture as well as animation transfer between virtual characters.
Article
We present a method for automatically generating reduced marker layouts for marker-based optical motion capture of human hands. The employed motion reconstruction method is based on subspace-constrained inverse kinematics, which allows for the recovery of realistic hand movements even from sparse input data. We additionally present a user-specific hand model calibration procedure that fits an articulated hand model to point cloud data of the user's hand. Our marker layout optimization is sensitive to the kinematic structure and the subspace representations of hand articulations utilized in the reconstruction method, in order to generate sparse marker configurations that are optimal for solving the constrained inverse kinematics problem. We propose specific quality criteria for reduced marker sets that combine numerical stability with geometric feasibility of the resulting layout. These criteria are combined in an objective function that is minimized using a specialized surface-constrained particle swarm optimization scheme, which generates marker layouts bound to the surface of an animated hand model. Our method provides a principled way for determining reduced marker layouts based on subspace representations of hand articulations. We demonstrate the effectiveness of our motion reconstruction and model calibration methods in a thorough evaluation.
Conference Paper
Creating a high quality blendshape rig usually involves a large amount of effort from skilled artists. Although current 3D reconstruction technologies are able to capture accurate facial geometry of the actor, it is still very difficult to build a production-ready blendshape rig from unorganized scans. Removing rigid head motion and separating mixed expressions from the captures are two of the major challenges in this process. We present a technique that creates a facial blendshape rig based on performance capture and a generic face rig. The customized rig accurately captures actor-specific face details while producing a semantically meaningful FACS basis. The resulting rig faithfully serves both artist friendly keyframe animation and high quality facial motion retargeting in production.
Article
Computer Vision has recently witnessed great research advance towards automatic facial points detection. Numerous methodologies have been proposed during the last few years that achieve accurate and efficient performance. However, fair comparison between these methodologies is infeasible mainly due to two issues. (a) Most existing databases, captured under both constrained and unconstrained (in-the-wild) conditions have been annotated using different mark-ups and, in most cases, the accuracy of the annotations is low. (b) Most published works report experimental results using different training/testing sets, different error metrics and, of course, landmark points with semantically different locations. In this paper, we aim to overcome the aforementioned problems by (a) proposing a semi-automatic annotation technique that was employed to re-annotate most existing facial databases under a unified protocol, and (b) presenting the 300 Faces In-The-Wild Challenge (300-W), the first facial landmark localization challenge that was organized twice, in 2013 and 2015. To the best of our knowledge, this is the first effort towards a unified annotation scheme of massive databases and a fair experimental comparison of existing facial landmark localization systems. The images and annotations of the new testing database that was used in the 300-W challenge are available from http://ibug.doc.ic.ac.uk/resources/facial-point-annotations/
Conference Paper
We present a novel multi-scale representation and acquisition method for the animation of high-resolution facial geometry and wrinkles. We first acquire a static scan of the face including reflectance data at the highest possible quality. We then augment a traditional marker-based facial motion-capture system by two synchronized video cameras to track expression wrinkles. The resulting model consists of high-resolution geometry, motion-capture data, and expression wrinkles in 2D parametric form. This combination represents the facial shape and its salient features at multiple scales. During motion synthesis the motion-capture data deforms the high-resolution geometry using a linear shell-based mesh-deformation method. The wrinkle geometry is added to the facial base mesh using nonlinear energy optimization. We present the results of our approach for performance replay as well as for wrinkle editing.
Article
Deformation transfer applies the deformation exhibited by a source triangle mesh onto a different target triangle mesh, Our approach is general and does not require the source and target to share the same number of vertices or triangles, or to have identical connectivity. The user builds a correspondence map between the triangles of the source and those of the target by specifying a small set of vertex markers. Deformation transfer computes the set of transformations induced by the deformation of the source mesh, maps the transformations through the correspondence from the source to the target, and solves an optimization problem to consistently apply the transformations to the target shape. The resulting system of linear equations can be factored once, after which transferring a new deformation to the target mesh requires only a backsubstitution step. Global properties such as foot placement can be achieved by constraining vertex positions. We demonstrate our method by retargeting full body key poses, applying scanned facial deformations onto a digital character, and remapping rigid and non-rigid animation sequences from one mesh onto another.
Article
Facial scanning has become the industry-standard approach for creating digital doubles in movies and video games. This involves capturing an actor while they perform different expressions that span their range of facial motion. Unfortunately, the scans typically contain a superposition of the desired expression on top of un-wanted rigid head movement. In order to extract true expression deformations, it is essential to factor out the rigid head movement for each expression, a process referred to as rigid stabilization. In order to achieve production-quality in industry, face stabilization is usually performed through a tedious and error-prone manual process. In this paper we present the first automatic face stabilization method that achieves professional-quality results on large sets of facial expressions. Since human faces can undergo a wide range of deformation, there is not a single point on the skin surface that moves rigidly with the underlying skull. Consequently, computing the rigid transformation from direct observation, a common approach in previous methods, is error prone and leads to inaccurate results. Instead, we propose to indirectly stabilize the expressions by explicitly aligning them to an estimate of the underlying skull using anatomically-motivated constraints. We show that the proposed method not only outperforms existing techniques but is also on par with manual stabilization, yet requires less than a second of computation time.
Article
We present a fully automatic approach to real-time facial tracking and animation with a single video camera. Our approach does not need any calibration for each individual user. It learns a generic regressor from public image datasets, which can be applied to any user and arbitrary video cameras to infer accurate 2D facial landmarks as well as the 3D facial shape from 2D video frames. The inferred 2D landmarks are then used to adapt the camera matrix and the user identity to better match the facial expressions of the current user. The regression and adaptation are performed in an alternating manner. With more and more facial expressions observed in the video, the whole process converges quickly with accurate facial tracking and animation. In experiments, our approach demonstrates a level of robustness and accuracy on par with state-of-the-art techniques that require a time-consuming calibration step for each individual user, while running at 28 fps on average. We consider our approach to be an attractive solution for wide deployment in consumer-level applications.
Article
We present an end-to-end system that goes from video sequences to high resolution, editable, dynamically controllable face models. The capture system employs synchronized video cameras and structured light projectors to record videos of a moving face from multiple viewpoints. A novel spacetime stereo algorithm is introduced to compute depth maps accurately and overcome over-fitting deficiencies in prior work. A new template fitting and tracking procedure fills in missing data and yields point correspondence across the entire sequence without using markers. We demonstrate a data-driven, interactive method for inverse kinematics that draws on the large set of fitted templates and allows for posing new expressions by dragging surface points directly. Finally, we describe new tools that model the dynamics in the input sequence to enable new animations, created via key-framing or texture-synthesis techniques.
Article
We propose a method that extracts sparse and spatially localized deformation modes from an animated mesh sequence. To this end, we propose a new way to extend the theory of sparse matrix decompositions to 3D mesh sequence processing, and further contribute with an automatic way to ensure spatial locality of the decomposition in a new optimization framework. The extracted dimensions often have an intuitive and clear interpretable meaning. Our method optionally accepts user-constraints to guide the process of discovering the underlying latent deformation space. The capabilities of our efficient, versatile, and easy-to-implement method are extensively demonstrated on a variety of data sets and application contexts. We demonstrate its power for user friendly intuitive editing of captured mesh animations, such as faces, full body motion, cloth animations, and muscle deformations. We further show its benefit for statistical geometry processing and biomechanically meaningful animation editing. It is further shown qualitatively and quantitatively that our method outperforms other unsupervised decomposition methods and other animation parameterization approaches in the above use cases.
Conference Paper
This paper introduces a method for producing high quality hand motion using a small number of markers. The proposed "handover" animation technique constructs joint angle trajectories with the help of a reference database. Utilizing principle component analysis (PCA) applied to the database, the system automatically determines the sparse marker set to record. Further, to produce hand animation, PCA is used along with a locally weighted regression (LWR) model to reconstruct joint angles. The resulting animation is a full-resolution hand which reflects the original motion without the need for capturing a full marker set. Comparing the technique to other methods reveals improvement over the state of the art in terms of the marker set selection. In addition, the results highlight the ability to generalize the motion synthesized, both by extending the use of a single reference database to new motions, and from distinct reference datasets, over a variety of freehand motions.
We present a novel discriminative regression based approach for the Constrained Local Models (CLMs) framework, referred to as the Discriminative Response Map Fitting (DRMF) method, which shows impressive performance in the generic face fitting scenario. The motivation behind this approach is that, unlike the holistic texture based features used in the discriminative AAM approaches, the response map can be represented by a small set of parameters and these parameters can be very efficiently used for reconstructing unseen response maps. Furthermore, we show that by adopting very simple off-the-shelf regression techniques, it is possible to learn robust functions from response maps to the shape parameters updates. The experiments, conducted on Multi-PIE, XM2VTS and LFPW database, show that the proposed DRMF method outperforms state-of-the-art algorithms for the task of generic face fitting. Moreover, the DRMF method is computationally very efficient and is real-time capable. The current MATLAB implementation takes 1 second per image. To facilitate future comparisons, we release the MATLAB code and the pre-trained models for research purposes.
Article
A long-standing problem in marker-based facial motion capture is what are the optimal facial mocap marker layouts. Despite its wide range of potential applications, this problem has not yet been systematically explored to date. This paper describes an approach to compute optimized marker layouts for facial motion acquisition as optimization of characteristic control points from a set of high-resolution, ground-truth facial mesh sequences. Specifically, the thin-shell linear deformation model is imposed onto the example pose reconstruction process via optional hard constraints such as symmetry and multiresolution constraints. Through our experiments and comparisons, we validate the effectiveness, robustness, and accuracy of our approach. Besides guiding minimal yet effective placement of facial mocap markers, we also describe and demonstrate its two selected applications: marker-based facial mesh skinning and multiresolution facial performance capture.
Article
This paper investigates "Schelling points" on 3D meshes, feature points selected by people in a pure coordination game due to their salience. To collect data for this investigation, we designed an online experiment that asked people to select points on 3D surfaces that they expect will be selected by other people. We then analyzed properties of the selected points, finding that: 1) Schelling point sets are usually highly symmetric, and 2) local curvature properties (e.g., Gauss curvature) are most helpful for identifying obvious Schelling points (tips of protrusions), but 3) global properties (e.g., segment centeredness, proximity to a symmetry axis, etc.) are required to explain more subtle features. Based on these observations, we use regression analysis to combine multiple properties into an analytical model that predicts where Schelling points are likely to be on new meshes. We find that this model benefits from a variety of surface properties, particularly when training data comes from examples in the same object class.
Article
We present a novel multi-scale representation and acquisition method for the animation of high-resolution facial geometry and wrinkles. We first acquire a static scan of the face including reflectance data at the highest possible quality. We then augment a traditional marker-based facial motion-capture system by two synchronized video cameras to track expression wrinkles. The resulting model consists of high-resolution geometry, motion-capture data, and expression wrinkles in 2D parametric form. This combination represents the facial shape and its salient features at multiple scales. During motion synthesis the motion-capture data deforms the high-resolution geometry using a linear shell-based mesh-deformation method. The wrinkle geometry is added to the facial base mesh using nonlinear energy optimization. We present the results of our approach for performance replay as well as for wrinkle editing.
Article
Asymmetries of the smiling facial movement were more frequent in deliberate imitations than spontaneous emotional expressions. When asymmetries did occur they were usually stronger on the left side of the face if the smile was deliberate. Asymmetrical emotional expressions, however, were about equally divided between those stronger on the left side of the face and those stronger on the right. Similar findings were obtained for the actions involved in negative emotions, but a small data base made these results tentative.
Chapter
We demonstrate a novel method of interpreting images using an Active Appearance Model (AAM). An AAM contains a statistical model of the shape and grey-level appearance of the object of interest which can generalise to almost any valid example. During a training phase we learn the relationship between model parameter displacements and the residual errors induced between a training image and a synthesised model example. To match to an image we measure the current residuals and use the model to predict changes to the current parameters, leading to a better fit. A good overall match is obtained in a few iterations, even from poor starting estimates. We describe the technique in detail and give results of quantitative performance tests. We anticipate that the AAM algorithm will be an important method for locating deformable objects in many applications.
We present a novel approach to localizing parts in images of human faces. The approach combines the output of local detectors with a non-parametric set of global models for the part locations based on over one thousand hand-labeled exemplar images. By assuming that the global models generate the part locations as hidden variables, we derive a Bayesian objective function. This function is optimized using a consensus of models for these hidden variables. The resulting localizer handles a much wider range of expression, pose, lighting and occlusion than prior ones. We show excellent performance on a new dataset gathered from the internet and show that our localizer achieves state-of-theart performance on the less challenging BioID dataset.
Conference Paper
Facial motion retargeting approaches often transfer expressions by establishing correspondences between shared units of motion, such as action units, or spatial correspondences of landmarks between the source actor and target character faces. When the actor and character are structurally dissimilar, shared units of motion or spatial landmarks may not exist, and subtle styles of performance may differ. We present a method to deconstruct the content of an actor's facial expression into three parameter-parallel layers using a composition function, transfer the content to equivalent parameter-parallel layers for the character, and reconstruct the character's expression using the same composition function. Our algorithm uses the same parameter-parallel layered model of facial expression for both the actor and character, separating the content of facial expressions into emotion, speech, and eye-blink layers. Facial motion in each layer is embedded in simplicial bases, each of which encodes semantically significant configurations of the face. We show the transfer of facial motion capture and video-based tracking of the eyes and mouth of an actor to a number of faces with dissimilar facial structure and expressive disposition.
Article
Linear models, particularly those based on principal component analysis (PCA), have been used successfully on a broad range of human face-related applications. Although PCA models achieve high compression, they have not been widely used for animation in a production environment because their bases lack a semantic interpretation. Their parameters are not an intuitive set for animators to work with. In this paper we present a linear face modelling approach that generalises to unseen data better than the traditional holistic approach while also allowing click-and-drag interaction for animation. Our model is composed of a collection of PCA sub-models that are independently trained but share boundaries. Boundary consistency and user-given constraints are enforced in a soft least mean squares sense to give flexibility to the model while maintaining coherence. Our results show that the region-based model generalises better than its holistic counterpart when describing previously unseen motion capture data from multiple subjects. The decomposition of the face into several regions, which we determine automatically from training data, gives the user localised manipulation control. This feature allows to use the model for face posing and animation in an intuitive style.
Book
Physically Based Rendering, 2nd Edition describes both the mathematical theory behind a modern photorealistic rendering system as well as its practical implementation. A method - known as 'literate programming'- combines human-readable documentation and source code into a single reference that is specifically designed to aid comprehension. The result is a stunning achievement in graphics education. Through the ideas and software in this book, you will learn to design and employ a full-featured rendering system for creating stunning imagery.