Conference PaperPDF Available

Facial cartography: interactive high-resolution scan correspondence

Authors:
  • Rhythm & Hues Studios

Abstract and Figures

Blendshape interpolation is one of the most successful techniques for creating emotive digital characters for entertainment and simulation purposes. The quest for ever more realistic digital characters has pushed for higher resolution blendshapes. Many existing correspondence techniques, however, have difficulty establishing correspondences between very detailed blendshapes. To further aggravate the problem, in certain cases (such as pronounced wrinkles) exact 1:1 correspondences are unlikely to be found, and consequently no objectively "correct" solution exists.
Content may be subject to copyright.
Copyright © 2011 by the Association for Computing Machinery, Inc.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full
citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail permissions@acm.org.
SCA 2011, Vancouver, British Columbia, Canada, August 5 7, 2011.
© 2011 ACM 978-1-4503-0923-3/11/0008 $10.00
Eurographics/ ACM SIGGRAPH Symposium on Computer Animation (2011)
A. Bargteil and M. van de Panne (Editors)
Facial Cartography: Interactive Scan Correspondence
Cyrus A. Wilson
1
Oleg Alexander
1
Borom Tunwattanapong
1
Pieter Peers
1,2
Abhijeet Ghosh
1
Jay Busch
1
Arno Hartholt
1
Paul Debevec
1
1
USC Institute for Creative Technologies
2
The College of William & Mary
Abstract
We present a semi-automatic technique for computing surface correspondences between 3D facial scans in differ-
ent expressions, such that scan data can be mapped into a common domain for facial animation. The technique can
accurately correspond high-resolution scans of widely differing expressions – without requiring intermediate pose
sequences such that they can be used, together with reflectance maps, to create high-quality blendshape-based
facial animation. We optimize correspondences through a combination of Image, Shape, and Internal forces, as
well as Directable forces to allow a user to interactively guide and refine the solution. Key to our method is a novel
representation, called an Active Visage, that balances the advantages of both deformable templates and corre-
spondence computation in a 2D canonical domain. We show that our semi-automatic technique achieves more
robust results than automated correspondence alone, and is more precise than is practical with unaided manual
input.
1. Introduction
Just as portraiture is one of the most challenging but im-
portant aspects of painting, rendering human faces is one
of the most challenging but important aspects of computer
graphics. Progress in this area has accelerated greatly, with
recent results in research, movies, and video games building
the first bridges across the Uncanny Valley towards believ-
ably realistic digitally rendered faces. However, creating a
photorealistic digital actor remains complicated and time-
consuming [ARL
10], which prevents their widespread use.
Recent 3D scanning techniques provide useful data for
creating digital faces based on real people, able to be ren-
dered with any viewpoint and lighting. Since the appearance
and deformation of emotive faces is complex, many digital
characters are built from numerous scans of an actor making
different expressions called blendshapes, interpolating be-
tween them to create facial animation. Blending between ex-
pressions, however, requires knowing how the surface points
in each scan correspond to each other. Determining such cor-
respondences can be made easier by placing markers on the
actor’s face, but this is time consuming and mars the ap-
pearance of the actor; moreover, even hundreds of markers
yield little information about dynamic skin wrinkle behav-
ior. A dynamic facial scanning system can also make the
task easier, since the transition from one expression to an-
other is recorded as a sequence of scans with little motion
between them. However, dynamic scanning systems are typ-
ically very data intensive and provide far lower resolution
geometry and reflectance than systems which record static
facial expressions. As a result, building the highest-quality
digital characters requires determining accurate correspon-
dences between scans of significantly differing expressions
without the aid of facial markers. And typically, this is one
of the most difficult stages of the character creation process.
An important step in creating an animatable character is
to create a facial animation rig, often built on top of an ani-
mation mesh of moderate resolution augmented by high res-
olution detail textures (e.g., albedo, normals, etc.). Comput-
ing correspondences on the high resolution scan meshes, and
subsequently downsampling to the animation mesh produces
suboptimal results. To obtain the highest quality blend-
shapes, both the animation mesh and the detail textures need
to be optimally corresponded. Downsampling a high reso-
lution mesh does not guarantee the latter because important
visual details in the textures do not necessarily align with
vertices.
While exact 1 : 1 correspondences exist between the phys-
ical expressions, they might be difficult to uniquely identify
between facial scans (e.g., appearance and disappearance of
wrinkles). In such a case it is often difficult to find consis-
tent 1 : 1 correspondences. Automated methods often rely
on heuristics (e.g., smoothness) to handle such ambiguous
cases. A more robust solution is to add case-specific user-
defined constraints to these automatic methods. However,
such an approach often results in a trial-and-error procedure,
where the focus lies on avoiding undesired behavior of the
automatic algorithms by tweaking and adding non-intuitive
constraints. Instead, it would be better to have the user par-
ticipate in the correspondence estimation process in a con-
structive manner by directing the computations rather then
constraining and/or correcting them.
We propose a novel correspondence method, that we
205
Wilson et al. / Facial Cartography
call Facial Cartography, for computing correspondences be-
tween high resolution scans of an actor making different
expressions, to produce mappings between a user-specified
neutral mesh and all other expressions. A key difference
between the proposed method and prior work is that we
allow the user to participate and provide direction during
the correspondence computations. Additionally, we com-
pute correspondences on the final animation mesh and de-
tail textures. To ensure optimal detail texture alignment
and interactive user interaction, we leverage the GPU, and
adopt an analysis-by-synthesis approach. A key component
in our system is an Active Visage: a proxy for represent-
ing corresponded blendshapes that can be visualized from
one or more viewpoints to evaluate the difference (i.e., er-
ror) with ground-truth visualizations of the surface proper-
ties of the target expression. While conceptually similar to
a deformable template, a key difference is that a deformable
template deforms a 3D mesh, while an active visage is con-
strained to the manifold of the non-neutral expression geom-
etry, and hence only deforms the correspondences and not
the 3D shape.
We have developed a modular optimization framework in
which four different forces act on these active visages to
compute accurate correspondences. These four forces are:
1. Image Forces: favoring correspondences providing the
best alignment of fine-scale features in the detail maps;
2. Shape Forces: providing soft constraints on the 3D ver-
tex positions of the optimization estimate;
3. Internal Forces: avoiding implausible deformations by
promoting an as-rigid-as-possible deformation; and
4. Directable Forces that in conjunction with a GPU imple-
mentation of the other forces, enable user-participation in
the optimization process, and direct the optimization.
2. Related Work
Computing correspondences between two surfaces is a clas-
sical problem in computer graphics and computer vision, and
a large body of prior work investigates variants of this chal-
lenging problem. The ability to establish accurate surface
correspondences enables a variety of applications such as the
animation and morphing of shapes, shape recognition, and
compression. Describing all prior work in correspondence
computation is beyond the scope of this paper; we thus fo-
cus on the most relevant work to compute correspondences
between a scan of a neutral reference pose and scans of a
small set of extreme poses of a subject.
3D Methods. Allen et al. [ACP02] construct a kinematic
skeleton model for articulated body deformations from range
scan data and markers. Chang and Zwicker [CZ09] propose
a markerless registration method that employs a reduced de-
formable model, and they decouple deformation from sur-
face representation by formulating weight functions on a
regular grid. Both methods are geared towards articulated
objects, and are not well suited for modeling facial anima-
tions.
Brown and Rusinkiewicz [BR04, BR07] and Amberg et
al. [ARV07] develop a non-rigid variant of the well known
Iterative Closest Points (ICP) algorithm [BM92] to align
rigid objects where calibration errors introduce small non-
rigid deformations in the scans. A disadvantage of these ap-
proaches is that they can easily converge to a suboptimal
solution due to their greedy nature.
Li et al. [LSP08] form correspondences by registering an
embedded deformation graph [SP04] on the source surface
to match the target surface. Key to their method is an opti-
mization that robustly handles missing data, favors natural
deformation, and maximizes rigidity and consistency. How-
ever, this method is limited to the resolution of the defor-
mation graph, and cannot handle very small non-rigid de-
formations. [LAGP09] alleviates this for temporally dense
input sequences by computing a displacement projection to
the nearest surface point. However, this projection does not
rely on surface features, and thus can introduce small errors.
2D Methods. Another strategy for establishing corre-
spondences between two non-rigidly deformed shapes is
to embed them in a 2D canonical domain, reducing the
correspondence problem to a simpler 2D image match-
ing problem. Anguelov et al. [AKS
05] introduce corre-
lated correspondences to compute an embedding of each
shape that preserves geodesics and thus minimizes differ-
ences due to deformations. Anguelov et al. employ this al-
gorithm to create SCAPE [ASK
05], a data-driven model
that spans both deformation as well as shape of a hu-
man body. Wang et al. [WWJ
07] investigate several differ-
ent types of quasi-conformal mappings with regard to 3D
surface matching, and conclude that least squares confor-
mal mapping is the best choice for this purpose. Zeng et
al. [ZZW
08] observe that most conformal mapping meth-
ods are not suited to deal with captured data due to inconsis-
tent boundaries, complex topology, and distortions. To ad-
dress these issues, they compute correspondences from mul-
tiple boundary-constraint conformal mappings. Recently,
Lipman and Funkhouser [LF09] proposed Möbius voting,
an algorithm for finding point correspondences between two
near-isometric surfaces in polynomial-time. It selects the
Möbius transform that minimizes deformation error of the
two surfaces in a canonical domain. Besides the sensitivity
to scanning noise and inconsistent boundaries, these embed-
ding based strategies also assume that the deformed meshes
are isometric, which is not truly the case for most surface
deformations. Consequently, the resulting correspondences
may be affected. To handle non-isometric surfaces robustly,
Zeng et al. [ZWW
10] apply a higher-order graph match-
ing scheme on the combined embedding space determined
by the Möbius transform and the Gaussian map of the sur-
face. While mathematically elegant, the exact mapping to
these embedded spaces are non-intuitive, making it difficult
206
Wilson et al. / Facial Cartography
for users to manipulate and correct the computed correspon-
dences.
Litke et al. [LDRS05] propose a variational method that
matches surfaces in a 2D canonical domain. Similar to the
proposed technique, their matching energy takes into ac-
count various cues such as curvature and texture. Further-
more, it allows user-control in the form of feature lines (as
opposed to point-wise constraints). While their system can
produce impressive results, it is limited to surfaces that are
homogeneous to a disk. Eyes and mouth need to be manually
segmented out.
Blanz and Vetter [BV99] create a morphable face model
from a database of static laser scans of several individ-
uals in different expressions. To correspond the different
expressions, a combination of optical flow and smoothing
is employed, exploiting the native cylindrical output pa-
rameterization of the laser scanner. In subsequent work, a
similar approach was used to create a morphable mouth
model [BBVP03]. Huang et al. [HCTW11] employ a sim-
ilar method for aligning small scale details, while large scale
deformations are registered using a marker-based approach.
A general problem with optical flow based methods is that
they succeed in some cases but fail entirely on seemingly
similar cases. User input is often employed to constrain the
optical flow algorithm to produce a suitable solution. How-
ever, it is not always intuitive how a particular optical flow
optimization trajectory will respond to specific constraints,
leading to a trial-and-error procedure.
Analysis-by-synthesis. A key component in our system is
the GPU-accelerated real-time visualization and feedback
system. However, we are not the first to include graphics
hardware in a registration pipeline. Pighin et al. [PSS99] and
Pons et al. [PKF07] iteratively refine the correspondences by
evaluating the error on the deformed target surface and the
predicted deformed source surface. Blanz and Vetter [BV99]
also employ an analysis-by-synthesis loop to match a mor-
phable face model to one or more photographs of a subject.
However, none of these methods provide a way for the user
to correct erroneous correspondence estimates.
User-interaction. Finally, all prior methods have aimed at
making the registration process as automated as possible.
User interaction has only been employed to either initial-
ize the computations [DM96, ACP02, LDRS05, ZZW
08]
or to constrain the solution space to avoid local min-
ima [BBVP03, HCTW11] However, none of the previous
methods actually allows the user to direct the correspon-
dence computations as part of the main optimization loop.
3. Algorithm
Input. Our correspondence algorithm takes as input a set of
scanned meshes and high resolution detail maps of a subject,
each exhibiting a different expression. One of these expres-
sions, the most neutral one, is selected as a reference pose,
and an animation mesh is created for this expression, which
can serve as the basis for an animation rig. This animation
mesh can either be created by an artist based on the scanned
mesh or can be a direct copy of the acquired neutral mesh.
Additionally, the animation mesh is augmented by high res-
olution detail textures that can contain diffuse and specular
albedo information, surface normal information (to compen-
sate for the differences in fine geometric details between the
animation mesh and the scanned mesh), etc.. This animation
mesh is subsequently deformed to roughly match the target
mesh. This can either be done by an artist, or by using any
suitable automatic method discussed in Section 2. This de-
formation does not need to be exact; it is only used to boot-
strap the optimization.
Goal. The goal of Facial Cartography is to create dense
mappings between the user selected neutral mesh and all
other expressions. A distinctive feature of Facial Cartogra-
phy is that it was developed with the following two points
in mind. First, to provide an intuitive and effective user ex-
perience, Facial Cartography allows the user to interactively
participate during the computations of the correspondences.
This allows the user to direct the computations away from lo-
cal minima and even bias the computations to a subjectively
preferred solution. Second, the need to correspond scans of
different expressions is to aid in the creation of animation
rigs. Therefore, to obtain high quality animation rigs, com-
puted correspondences need to be optimal with respect to
the actual animation mesh, the detail textures and their cor-
responding UV mappings.
Optimization. We frame the computation of correspon-
dences between two surfaces as a energy-minimization op-
timization in which the user can participate via a live, in-
teractive simulation while the optimization is in progress.
To achieve this we consider the objective function as an en-
ergy potential function U, indicating how well the two sur-
faces are in correspondence, associated with a conservative
force f, where f = −∇U. This objective function can be min-
imized using a gradient-descent optimization, displaced by f
at every iteration. We identify the following four forces that
play a role in the correspondence optimization:
Image Forces: ensure proper registration of the features
in the high resolution detail maps. Image forces are com-
puted via an analysis-by-synthesis approach (Section 5).
Shape Forces: allow the user to provide soft constraints
on the 3D vertex positions of the animation mesh. (Sec-
tion 6).
Internal Forces: constrain the correspondence solution
such that undesirable or impossible deformations (e.g.,
collapsing of triangles) are avoided by enforcing an as-
rigid-as possible deformation (Section 7).
Directable Forces: allow the user to direct the optimiza-
tion (Section 8).
The key distinctive feature of Facial Cartography is not
the general optimization above, but the combination of the
207
Wilson et al. / Facial Cartography
forces and the domain on which we apply the optimization.
We define the optimization domain directly onto the target
manifold (i.e., the non-neutral expression mesh). To facili-
tate the correspondence computations over this domain, tak-
ing into account the animation mesh and detail textures, and
supporting the analysis-by-synthesis approach for the image
forces, we create a proxy called the Active Visage that repre-
sents one of the possible deformations of the neutral expres-
sion constrained to the target surface (Section 4).
At every iteration in the optimization, we remap (and re-
sample if necessary) the four forces to the vertices in the 2D
optimization domain that comprise the active visage. The fi-
nal resulting force acting on each vertex is then a weighted
sum of each of the remapped forces. The user can enable and
disable different forces during the optimization, and mod-
ulate their weights: typically we use an image weight be-
tween 4 and 8, shape weight = 1, internal weight = 1, and
directable weight = 1.
4. Active Visage
As noted in the previous section (Section 3), we define the
optimization domain directly onto the target manifold, and
optimize correspondences only, not shape. Intuitively, one
can visualize this as gliding and stretching a rubber sheet
over the target surface (i.e., maintaining contact with the tar-
get surface) until specific constraints are fulfilled (specified
by forces in our optimization framework).
While it makes sense to define the optimization domain
directly on the target manifold, it is not always the most
straightforward parameterization to compute the forces that
drive the optimization. To facilitate the force computations,
we create a proxy called the Active Visage, where every point
in the optimization domain has an associated 3D position
(constrained to the target manifold) and corresponding sur-
face properties borrowed from the neural expression. An ac-
tive visage represents one of the possible deformations of
the neutral expression onto the target surface. As such, an
active visage supports all operations that otherwise would
have been performed on the neutral and/or target expression
(e.g., visualization).
Implementation. Practically, we implement an active vis-
age as follows. An active visage is a mesh that has the
same connectivity as the animation mesh. Every vertex has
a texture coordinate, which stays constant during optimiza-
tion, that maps to the high resolution detail textures (ob-
tained from the scanned neutral expression). Furthermore,
every vertex has a correspondence estimate. This estimate
is defined as a texture coordinate in the target expression
mesh’s native texture parameterization or any other suitable
2D mapping (e.g., conformal mappings, or in case of 2.5D
scans, this parameterization can be in the captured image
space). The exact form of the 2D map is less important as
long as we can map between surface coordinates and tex-
ture coordinates. For every correspondence estimate, we also
store its 3D coordinate on the target manifold and update it
at every iteration of the optimization using a reverse lookup.
The active visage can be easily rendered, mapped with any
channel of information from the neutral expression and from
any viewpoint.
Discussion. Conceptually, the active visage representation
falls in between computing correspondences using a de-
formable template mesh and computing correspondences in
an intermediate 2D domain.
The key difference between a deformable template and
an active visage is that a deformable template deforms a
3D mesh, while an active visage is constrained to the mani-
fold of the non-neutral expression geometry, and hence only
deforms the correspondences. Solving the correspondence
problem by finding a suitable deformation of an input or
template mesh such that it matches a given target mesh sub-
ject to method-specific constraints (e.g., as-rigid-as-possible
deformation, map constraints, etc.) spends significant re-
sources in optimizing the shape (a 3D problem), and not the
correspondences (a 2D problem).
At a high level, computing correspondences using an ac-
tive visage is similar to computing correspondences in an
intermediate 2D domain (e.g., conformal mapping): in both
cases the optimization domain is 2D. However, a distinct dif-
ference is that an active visage represents a 2D manifold in
3D space defined by the final animation mesh. Often, the
final animation mesh is an artist-tuned mesh where edges
are aligned with specific features (e.g., ring-like structure
around eyes and mouth), hence alignment of these features
is important. Many intermediate 2D parameterizations are
defined by optimizing vertex positions in a 2D domain to
satisfy some preset constraint. Consequently, straight edges
in 3D on the target manifold do not necessarily correspond
to straight edges in the 2D parameterization, significantly
complicating the alignment of edge features especially when
the mesh resolution of the animation mesh and the scan dif-
fer. A similar problem occurs for the high resolution detail
maps. Such detail textures are commonly employed to com-
pensate for the lack of detail in the artist designed anima-
tion mesh. Since these detail maps greatly influence the ap-
pearance (e.g., normal maps) of the visualizations of the an-
imated mesh, optimally corresponding the features in these
maps is of utmost importance. Active visages provide a nat-
ural means of dealing with these issues. Although the opti-
mization domain is 2D, edge and texture projections do not
introduce discrepancies because the active visage is defined
on a 3D manifold that corresponds to target manifold. More-
over, the active visage, in conjunction with the analysis-by-
synthesis approach, allows Facial Cartography to take in ac-
count for the correspondence computations any “errors” in
shape due to the lower resolution of the animation meshes.
By employing an analysis-by-synthesis approach, we ensure
that the features in the detail textures are visually registered
consistently for the whole animation rig.
208
Wilson et al. / Facial Cartography
Image Forces Projections Image Force Projections
Shape Force
Internal Force
Directable Force
Figure 1: Forces acting on an active visage, shown in their
native formulations, and after remapping (“Projections”).
Arrows have been scaled up for visibility.
While a template or an intermediate 2D projection could
arguably be extended to correctly handle all these condi-
tions, we believe that the “active visage” makes this more
convenient.
5. Image Forces
Image forces provide a mechanism to incorporate non-
geometric cues, such as for example, texture information
which is often available at a much finer granularity than
the mesh resolution. Resampling this texture information on
the mesh can result in a loss of important high frequency
information. Instead, it is better to employ an analysis-by-
synthesis approach, where image forces are computed on 2D
visualizations of this high resolution data.
Input. The computation of an image force takes as input
two 2D visualizations: a visualization of the active visage
(i.e., the animation mesh deformed according to the current
correspondence estimate) and a visualization of the target
mesh. Both visualizations are from the same viewpoint, and
display the same set of surface properties.
Force Computation. The main goal of the image force is to
express how the active visage should be warped such that its
visualization better matches the corresponding visualization
of the target expression. There are many possible methods
for computing such image forces (e.g., optical flow). How-
ever, to maintain interactive rates suitable for user participa-
tion, we harness the computing power of the GPU, and opt
for a local cross-correlation window-matching approach.
A typical local cross-correlation window-matching ap-
proach works as follows. First, a number of feature loca-
tions are selected in the active visage visualization A. Then,
a mean-subtracted discrete cross-correlation is computed be-
tween a given window centered around each feature location
in A and in the target visualization B:
(A
s
? B
s
)(p, q)
η
τ=η
η
υ=η
[(A (s
u
+ τ, s
v
+ υ) hA
s
i)·
(B (s
u
+ τ + p, s
v
+ υ + q) hA
s
i)], (1)
where s = [s
u
, s
v
]
T
is the window center (i.e., feature point
location), hA
s
i is the mean value of A within the window, and
the window size is (2η + 1) × (2η + 1). We found that a η
of either 7 or 15 gave the best quality versus computational
cost ratio. If the cross-correlation A
s
? B
s
yields a significant
peak in the window, then we compute the centroid [p, q]
T
of the peak, which indicates the displacement of the feature
between the visualizations.
We can optimize this computation by observing that both
the texture coordinates and the content of the detail maps
of the active visage do not change during the optimization.
Hence, we can precompute the feature locations in texture
space, and at each iteration project the precomputed feature
locations into the visualization viewpoint, which can be done
very efficiently.
Surface Properties and Feature Selection. The exact
choice of surface properties should provide a maximum
number of distinctive cues, and thus good image forces, to
drive the correspondence optimization. In our implementa-
tion we employ a high-pass filtered version of the skin tex-
ture, and use Shi and Tomasi’s method to detect good fea-
tures [ST94]. This ensures that significant changes in skin-
texture are well aligned. These major changes in skin-texture
are well located in space (i.e., a high-frequency peak) and
correspond to the same surface location. However, other sur-
face properties, such as Gauss curvature, can also provide
additional cues in conjunction with the high-pass filtered
skin-texture.
Visualization Viewpoints. While the vertices of the anima-
tion mesh are constrained to the target mesh, both meshes
do not necessarily form the same manifold in 3D space,
due to the differences in resolution between the animation
mesh and the target mesh. In order to minimize visual arti-
facts because of differences in shape, we employ an analysis-
by-synthesis (i.e., inverse rendering) approach for comput-
ing the image forces. In particular, for good coverage of the
face, we compute image forces for five different viewpoints:
frontal, upper left, upper right, lower left, and lower right.
Aggregate Image Force. Finally, we need to resample,
remap and combine all computed image forces to the op-
timization domain (i.e., the target manifold):
Resample: the locations of the sparse features at which
the image forces are computed are most likely not the
same as the locations of the mesh vertices (projected into
209
Wilson et al. / Facial Cartography
the virtual viewpoint). We therefore first need to resample
the computed image forces to each vertex location. For
this we employ a Gaussian weighted radial basis function
based on the distance between the sparse feature location
(on the mesh) and the vertex location.
Remapping: Next we need to remap each of the com-
puted (resampled) forces from the visualization domain
to the optimization domain. Care has to be taken to cor-
rectly account for the change in differential measure by
multiplying the resampled image forces by the Jacobian
of this transform (i.e., from image pixels to a local tan-
gential field around the target vertex).
Combine: Finally, we add all image forces from the dif-
ferent virtual viewpoints into a single net image force.
Figure 1 (left) shows the visualizations of an active visage
and corresponding target expression for the five different
viewpoints, the image forces in the visualization domain and
the corresponding remappings onto the animation mesh in
the optimization domain.
6. Shape Forces
Shape forces allow the user to provide soft constraints on the
3D vertex positions of the active visage. This is an ancillary
soft constraint as opposed to the primary hard constraint that
the vertices of the active visage lie on the target manifold.
Input. The computation of a shape force takes as input a
deformed 3D mesh that has the same connectivity as the an-
imation mesh (and thus the active visage). While the 3D ver-
tices of the active visage are constrained to the target man-
ifold, we do not require the same from the input deformed
mesh, because it is often more convenient for the user to
specify constraints on the vertex positions in the form of a
deformed animation mesh or by deforming the 3D shape of
the (current) active visage.
Force Computation. Because we require the input de-
formed mesh to have the same connectivity, computation of
this force is trivial. The shape force on a vertex of the active
visage is proportional to the distance to the corresponding
vertex in the input deformed mesh. The resulting 3D force
is subsequently remapped onto the 2D optimization domain.
For this we need to multiply the resulting 3D shape force
vector with the Jacobian describing the difference in differ-
ential measure from 3D to the local tangent plane around the
vertex on which the shape force acts. Figure 1, 2
nd
row right,
shows the 3D shape force and its remapping.
Application. In our implementation, we allow the user to
exert shape forces in two instances:
1. To bootstrap the method, a deformed version of the an-
imation mesh, roughly matching the target shape, serves
as input to the shape force. Depending on the accuracy of
this initial guess, the user may opt to reduce the influence
of this initial guess by gradually lowering the weight of
the resulting shape force as the optimization progresses.
2. We furthermore allow the user to select an intermediate
solution at any iteration as a soft constraint. This is useful
when the user decides that the current solution provides
a good base configuration; it can also serve as a tool to
avoid getting stuck in a local optimum.
7. Internal Force
The above shape and image forces encourage similarity be-
tween corresponded points. However, by doing so, spatial or-
dering relationships are ignored. For example, the spatial re-
lationship between two surface points (e.g., point A is above
point B), is unlikely to reverse on a physical subject between
different expressions, but swapping their position might (de-
pending on the input data) numerically minimize the shape
and image force. To avoid such physically implausible de-
formations, we introduce an Internal Force that promotes an
as-rigid-as-possible deformation.
Input. The internal force only depends on the geometric
ordering of the vertices in the animation mesh corresponding
to the neutral expression (i.e., “resting state”), and on the
current ordering in the active visage.
Force Computation. We define a strain stress imposed by
the deformation of the active visage relative to the “resting
state”. We treat edges of the active visage as springs, with
equilibrium length defined as the length of the edges in their
“resting state”. Displacements of mesh vertices representing
non-rigid deformations will change the edge lengths, and re-
sult in a restoring force:
ρ
j
=
(
ε
j
kε
j
k
κ
ε
j
λ
j
if
ε
j
> λ
j
,
0 if
ε
j
λ
j
.
(2)
Here, λ
j
is the equilibrium length of the edges, and ε
j
is
the edge vector, and κ is a spring constant. The net resulting
3D internal force per vertex is then the sum of all restoring
forces acting on that vertex. This 3D internal force can be
easily and efficiently computed on the GPU. Finally, similar
to the shape force, this 3D force vector needs to be remapped
to the 2D optimization domain (Section 6).
Orthogonal Forces. In practice there are instances in which
the computed 3D internal force is nearly orthogonal to the
local tangent plane around the vertex on the target manifold
on which the internal force acts, resulting in a negligible in-
ternal force (when mapped to the optimization domain) de-
spite a potentially large deformation strain stress. We there-
fore also compute 2D internal forces directly on the 2D cor-
respondence estimate (i.e., a texture coordinate in the tar-
get expression mesh’s native texture parameterization (Sec-
tion 4-Implementation)). This 2D internal force is remapped
to the optimization domain and added to (remapped) 3D in-
ternal force. Figure 1, 3
rd
row right, shows the 3D internal
force and its 2D mapping.
210
Wilson et al. / Facial Cartography
Figure 2: Intermediate states of the Active Visage as the simulation progresses, shown at an interval of 70 iteration steps. Top:
3D visualization of the active visage; Middle: active visage and corresponding forces overlayed on frontal camera view of the
target subject; Bottom: band-pass filter texture overlayed with the aggregate forces (scaled up for visibility).
8. Directable Force
The final forces that drive the correspondences optimization
are the user directable forces. These forces, in conjunction
with the GPU implementation of the other forces, enable
the user to participate in the optimization process, and guide
the optimization. The motivation for bringing the user in the
loop is two-fold:
1. It allows the user to prevent the correspondence compu-
tation from getting caught in a local optimum. In such a
case, the user can pause the simulation (disabling other
forces) and interact with the active visage, updating the
optimization estimate through the directable force alone.
Once the estimate is free of the local optimum, the opti-
mization can be resumed to snap the solution into place
for optimal correspondence.
2. It furthermore, allows to user to steer the solution (as the
simulation is running) to a subjectively preferred solution:
the correspondence with the lowest error does not neces-
sarily correspond to artifact-free blendshapes.
While there are many possible ways for the user to spec-
ify directional forces, we have implemented two: dragging
of points on the manifold, and pinning (i.e., temporarily fix-
ing) points on the manifold. As was the case with any of
the prior forces, care has to be taken when remapping the
corresponding directional force to the optimization domain.
For example, if the user specifies directable forces in visu-
alizations of the active visage, then the same Jacobian as in
Section 5 needs to be applied. Figure 1, last row right, shows
an example of a directable force and its remapping to the tar-
get manifold. We refer the reader to the accompanying video
and supplemental material for a demonstration of the user-
interaction in Facial Cartography.
9. Results
We applied our technique to two facial expression datasets,
each containing approximately 30 high resolution scans of
a subject in various facial expressions. Each scan, acquired
using the technique of [MHP
07], includes a high-resolution
geometry (approximately 1.5 M polygons) and photometric
textures and normal maps (approximately 2K resolution),
of diffuse and specular components. Note that our corre-
spondence technique is equally suited to data obtained us-
ing other high-quality scanning methods such as [WMP
06,
BBB
10].
Each dataset includes a neutral pose, which an artist re-
meshed into a low-polygon (approximately 2000 polygons)
animation mesh along with a UV texture space layout for the
detail maps. We then used our technique on each expression
to find the correspondences to neutral which would provide
the best visual consistency for the desired animation mesh.
Figure 2 shows a sequence of intermediate results (visual-
ized using active visages) of the correspondence optimiza-
tion for a single expression at a selected number of itera-
tions.
211
Wilson et al. / Facial Cartography
captured
neutral mapsexpression maps
common space anim mesh full-res scan
Figure 3: Facial expression corresponded to a subject’s neu-
tral scan. Original captured maps (1
st
column) are mapped
to the artist-defined common space (2
nd
column) using the
computed correspondences. The animation mesh (3
rd
col-
umn), deformed according to the correspondences, is ren-
dered with neutral and expression maps, producing a con-
sistent result which is faithful to the rendering of the full-
res scan data (lower right). Diffuse color texture maps are
shown; renderings are performed using texture and photo-
metric normal maps.
The obtained correspondences allow us to deform the an-
imation mesh into each expression; remap the detail maps
from each expression into the artist-defined UV texture
space (Figure 3); and therefore blend high-resolution details
from different expressions as we deform the low-resolution
animation mesh (Figure 4). In facial animation applications,
the interpolation weights (for both vertex displacements and
detail maps) would come from rig controls and their associ-
ated weight maps, as in [ARL
10].
Comparison to other techniques. We compare our tech-
nique with two alternatives for computing high-quality facial
scan correspondences:
1. optical flow based alignment (based on [BBPW04]), and
2. manual (skilled) artistic alignment.
We compute correspondences on four different extreme fa-
cial expressions, each having one or more regions of pro-
nounced deformation, as well as a change in topology com-
pared to the neutral mesh (i.e., opening or closing of eyes
and mouth). To evaluate how well the three correspon-
dence methods line up fine-scale details, we remap the
bandpass-filtered detail texture into a common, artist-defined
UV texture space, and compute the difference between the
remapped and the neutral detail texture. A visual compari-
son of the three different methods, applied to three of the se-
lected extreme expressions, is shown in Figure 5. The aver-
age RMS errors, from the four selected extreme expressions,
are summarized in Table 1.
While the optical flow approach is automatic and offers
neutral extreme
Figure 4: Synthesis of intermediate expressions by combin-
ing scans which have been mapped into a common domain.
precise alignment (low average RMS error) it can fail on
particularly challenging cases such as those in Figure 5. We
used the optical flow algorithm of Brox et al. [BBPW04],
which we have found to be accurate and robust, albeit com-
putationally intensive. Optical flow systems can be finicky
in general, succeeding in some cases but failing entirely on
seemingly similar cases. Typical failure cases are expres-
sions in which substantial areas of one image (e.g., mouth
interior) are not present in the target image (e.g., mouth
closed).
The manual approach consistently achieves good corre-
212
Wilson et al. / Facial Cartography
Figure 5: Comparison. The diffuse texture of each expres-
sion is mapped to the common UV space using each tech-
nique (1
st
row). We assess fine-scale alignment by high-pass
filtering the textures (2
nd
row) and computing the difference
between expression and neutral textures after registration
(rows 3 5).
spondences even in challenging cases. However with a man-
ual approach it is impractical to precisely align fine-scale de-
tails (such as skin pores), as seen in Figure 5. Alignment of
such features is important if high-resolution detail maps are
to be blended seamlessly, without introducing ghosting arti-
facts (see video). The results shown for the manual approach
were obtained after approximately 30 minutes of work per
expression by a skilled artist. An interesting observation is
method average RMS error
optical flow 0.16
manual 0.21
Facial Cartography 0.16
Table 1: Average RMS error on corresponded expressions
computed using three different registration algorithms.
that the artist reported diminishing returns: additional small
improvements would require increased amounts of time, in-
dicating that there is a practical upper-limit to how well an
artist can correspond high resolution blendshapes.
Like the manual approach, the interactive Facial Cartogra-
phy approach achieves good correspondences even in chal-
lenging cases. Furthermore it greatly outperforms the man-
ual approach on precision, achieving significantly lower av-
erage RMS error, and doing so in a fraction of the time (ap-
proximately 8 minutes per expression).
Limitations. Compared to a fully automatic method, giving
the user more control, and thus responsibility, results in a
tradeoff. On one hand, it is now up to the user to decide
whether convergence is reached, as it is hard to formulate a
good metric given the element of unpredictability introduced
by bringing the user in the loop. On the other hand, this also
implies that the user can decide to continue working on the
result until a satisfactory result is reached. Furthermore, we
also rely on the user to help the algorithm to do the “right
thing” when the algorithm does not “know” what the right
thing is. However in a production context, having this ability
to correct and direct, is preferable to being fully automatic
but unable to change a “wrong” result.
Another limitation of Facial Cartography is that deep
wrinkles can occlude small areas of the face for some image
force viewpoints. Consequently, the image forces in these
hidden areas may not constrain the correspondence compu-
tations sufficiently to obtain preferable results. Using more
viewpoints or using an advanced physical model of skin
buckling as a prior could mitigate this.
Finally, on less-challenging correspondence cases, fully
automated methods are more likely to converge to a desir-
able result faster than Facial Cartography. The current opti-
mization scheme employed in Facial Cartography is not as
sophisticated as some of the schemes used in advanced auto-
matic correspondence methods. However, the goal of Facial
Cartography is not to obtain correspondences in the fewest
number of iterations, but to make the optimization accessi-
ble, intuitive, and directable.
10. Conclusion
In this work we introduced a novel technique, called Facial
Cartography, for determining correspondences between fa-
cial scans such that they can be mapped to a common do-
main for use in animation. Our approach allows the user to
213
Wilson et al. / Facial Cartography
participate, interactively, in the optimization as an integral
part of the computations. This provides practical benefits:
empowering the artist to guide the solution to a subjectively
preferred solution; assisted by computation which maintains
a consistent solution which retains detail from the original
measurements.
To make this interplay possible, we propose a novel rep-
resentation, called the Active Visage, which maintains the
advantages of deformable template meshes and computa-
tions in a 2D canonical domain, while avoiding their disad-
vantages. Furthermore, the components of our system such
as the analysis-by-synthesis component, together with the
tightly coupled interaction, may prove useful for a variety of
applications. Therefore in future work we plan to investigate
how our flexible framework can be extended to related prob-
lems in facial animation, such as using captured performance
data to generate control curves for a specific animation rig.
Acknowledgments. We thank N. Palmer-Kelly, M. Liewer, G.
Storm, B. Garcia, and T. Jones for assistance; and S. Mordijck,
K. Haase, G. Benn, J. Williams, M. Trimmer, K. LeMasters, B.
Swartout, R. Hill, and R. Hall for generous support. The work was
partly supported by the Office of Naval Research, NSF grant IIS-
1016703, the University of Southern California Office of the Provost
and the U.S. Army Research, Development, and Engineering Com-
mand (RDECOM). The content of the information does not neces-
sarily reflect the position or the policy of the US Government, and
no official endorsement should be inferred.
References
[ACP02] ALLEN B., CURLESS B., POPOVI
´
C Z.: Articulated
body deformation from range scan data. ACM Trans. Graph. 21,
3 (2002), 612–619. 2, 3
[AKS
05] ANGUELOV D., KOLLER D., SRINIVASAN P.,
THRUN S., PANG H.-C., DAVIS J.: The correlated correspon-
dence algorithm for unsupervised registration of nonrigid sur-
faces. In NIPS (2005). 2
[ARL
10] ALEXANDER O., ROGERS M., LAMBETH W., CHI-
ANG M., MA W.-C., WANG C., DEBEVEC P.: The digital emily
project: Achieving a photorealistic digital actor. IEEE Comp.
Graph. and App. 30 (July/Aug. 2010). 1, 8
[ARV07] AMBERG B., ROMDHANI S., VETTER T.: Optimal step
nonrigid icp algorithms for surface registration. IEEE CVPR 0
(2007), 1–8. 2
[ASK
05] ANGUELOV D., SRINIVASAN P., KOLLER D.,
THRUN S., RODGERS J., DAVIS J.: Scape: shape completion
and animation of people. ACM Trans. Graph. 24, 3 (2005), 408–
416. 2
[BBB
10] BEELER T., BICKEL B., BEARDSLEY P., SUMNER
B., GROSS M.: High-quality single-shot capture of facial geom-
etry. ACM Trans. Graph. 29, 4 (July 2010), 40:1–40:9. 7
[BBPW04] BROX T., BRUHN A., PAPENBERG N., WEICKERT
J.: High accuracy optical flow estimation based on a theory for
warping. In ECCV (2004), pp. 25–36. 8
[BBVP03] BLANZ, VOLKER, BASSO, CURZIO, VETTER,
THOMAS, POGGIO, TOMASO: Reanimating Faces in Images and
Video. Comp. Graph. Forum 22, 3 (2003), 641–650. 3
[BM92] BESL P. J., MCKAY N. D.: A method for registration of
3-d shapes. IEEE PAMI 14 (Feb. 1992), 239–256. 2
[BR04] BROWN B. J., RUSINKIEWICZ S.: Non-rigid range-scan
alignment using thin-plate splines. In 3DPVT (2004), pp. 759–
765. 2
[BR07] BROWN B., RUSINKIEWICZ S.: Global non-rigid align-
ment of 3-D scans. ACM Trans. Graphs. 26, 3 (Aug. 2007). 2
[BV99] BLANZ V., VETTER T.: A morphable model for the syn-
thesis of 3d faces. In Proc. SIGGRAPH (1999), pp. 187–194.
3
[CZ09] CHANG W., ZWICKER M.: Range scan registration using
reduced deformable models. Comp. Graph. Forum 28, 2 (2009),
447–456. 2
[DM96] DECARLO D., METAXAS D.: The integration of opti-
cal flow and deformable models with applications to human face
shape and motion estimation. In IEEE CVPR (1996), p. 231. 3
[HCTW11] HUANG H., CHAI J.-X., TONG X., WU H.-T.:
Leveraging motion capture and 3d scanning for high-fidelity per-
formance acquisition. ACM Trans. Graph. 30, 4 (Aug. 2011),
74:1–74:10. 3
[LAGP09] LI H., ADAMS B., GUIBAS L. J., PAULY M.: Robust
single-view geometry and motion reconstruction. ACM Trans.
Graph. 28, 5 (Dec. 2009). 2
[LDRS05] LITKE N., DROSKE M., RUMPF M., SCHDER P.:
An image processing approach to surface matching. In Proc. SGP
(2005). 3
[LF09] LIPMAN Y., FUNKHOUSER T.: Mobius voting for surface
correspondence. ACM Trans. Graph. 28, 3 (Aug. 2009). 2
[LSP08] LI H., SUMNER R. W., PAULY M.: Global corre-
spondence optimization for non-rigid registration of depth scans.
Proc. SGP 27, 5 (July 2008). 2
[MHP
07] MA W.-C., HAWKINS T., PEERS P., CHABERT C.-
F., WEISS M., DEBEVEC P.: Rapid acquisition of specular and
diffuse normal maps from polarized spherical gradient illumina-
tion. In Rendering Techniques (2007), pp. 183–194. 7
[PKF07] PONS J.-P., KERIVEN R., FAUGERAS O.: Multi-view
stereo reconstruction and scene flow estimation with a global
image-based matching score. IJCV 72 (April 2007), 179–193.
3
[PSS99] PIGHIN F., SZELISKI R., SALESIN D. H.: Resynthe-
sizing facial animation through 3d model-based tracking. In In
ICCV (1999), pp. 143–150. 3
[SP04] SUMNER R. W., POPOVI
´
C J.: Deformation transfer for
triangle meshes. ACM Trans. Graph. 23 (Aug. 2004), 399–405.
2
[ST94] SHI J., TOMASI C.: Good features to track. In IEEE
CVPR (1994), pp. 593 – 600. 5
[WMP
06] WEYRICH T., MATUSIK W., PFISTER H., BICKEL
B., DONNER C., TU C., MCANDLESS J., LEE J., NGAN A.,
JENSEN H. W., GROSS M.: Analysis of human faces using a
measurement-based skin reflectance model. ACM Trans. Graph.
25, 3 (2006), 1013–1024. 7
[WWJ
07] WANG S., WANG Y., JIN M., GU X. D., SAMARAS
D.: Conformal geometry and its applications on 3d shape match-
ing, recognition, and stitching. IEEE PAMI 29 (2007). 2
[ZWW
10] ZENG Y., WANG C., WANG Y., GU X., SAMARAS
D., PARAGIOS N.: Dense non-rigid surface registration using
high-order graph matching. In CVPR (2010), pp. 382–389. 2
[ZZW
08] ZENG W., ZENG Y., WANG Y., YIN X., GU X.,
SAMARAS D.: 3d non-rigid surface matching and registration
based on holomorphic differentials. In ECCV (2008), pp. 1–14.
2, 3
214
... Texture alignment to synthesize parametric textures for a 3D model has been recently tackled by using online optical flow computed from the virtual view-point [29]. In addition, a combination of image, shape and directable forces has been used to create scan correspondences in [30] . Similarly, multicamera setups also use texture alignment techniques to synthesize blended view-dependent appearances [31, 32]. ...
Conference Paper
Full-text available
Creating and animating realistic 3D human faces is an important element of virtual reality, video games, and other areas that involve interactive 3D graphics. In this paper, we propose a system to generate photorealistic 3D blendshape-based face models automatically using only a single consumer RGB-D sensor. The capture and processing requires no artistic expertise to operate, takes 15 seconds to capture and generate a single facial expression, and approximately 1 minute of processing time per expression to transform it into a blendshape model. Our main contributions include a complete end-to-end pipeline for capturing and generating photorealistic blendshape models automatically and a registration method that solves dense correspondences between two face scans by utilizing facial landmarks detection and optical flows. We demonstrate the effectiveness of the proposed method by capturing different human subjects with a variety of sensors and puppeteering their 3D faces with real-time facial performance retargeting. The rapid nature of our method allows for just-in-time construction of a digital face. To that end, we also integrated our pipeline with a virtual reality facial performance capture system that allows dynamic embodiment of the generated faces despite partial occlusion of the user's real face by the head-mounted display.
... Liu et al. [13] raised an optimization scheme that automatically discovered the non-linear relationship of blend shapes in facial animation. Wilson et al. [14] proposed to construct correspondences between detailed blend shapes to acquire more realistic digital animation. As the foundation role of blend shapes, the tedious work of discovering the proper blend shapes is time-consuming and even a portion of efforts is made on the compression of complex blend shape models [15]. ...
Article
Full-text available
This paper presents a hybrid method for synthesizing natural animation of facial expression with data from motion capture. The captured expression was transferred from the space of source performance to that of a 3D target face using an accurate mapping process in order to realize the reuse of motion data. The transferred animation was then applied to synthesize the expression of the target model through a framework of two-stage deformation. A local deformation technique preliminarily considered a set of neighbor feature points for every vertex and their impact on the vertex. Furthermore, the global deformation was exploited to ensure the smoothness of the whole facial mesh. The experimental results show our hybrid mesh deformation strategy was effective, which could animate different target face without complicated manual efforts required by most of facial animation approaches.
Article
Parametric 3D shape models are heavily utilized in computer graphics and vision applications to provide priors on the observed variability of an object's geometry (e.g., for faces). Original models were linear and operated on the entire shape at once. They were later enhanced to provide localized control on different shape parts separately. In deep shape models, nonlinearity was introduced via a sequence of fully‐connected layers and activation functions, and locality was introduced in recent models that use mesh convolution networks. As common limitations, these models often dictate, in one way or another, the allowed extent of spatial correlations and also require that a fixed mesh topology be specified ahead of time. To overcome these limitations, we present Shape Transformers, a new nonlinear parametric 3D shape model based on transformer architectures. A key benefit of this new model comes from using the transformer's self‐attention mechanism to automatically learn nonlinear spatial correlations for a class of 3D shapes. This is in contrast to global models that correlate everything and local models that dictate the correlation extent. Our transformer 3D shape autoencoder is a better alternative to mesh convolution models, which require specially‐crafted convolution, and down/up‐sampling operators that can be difficult to design. Our model is also topologically independent: it can be trained once and then evaluated on any mesh topology, unlike most previous methods. We demonstrate the application of our model to different datasets, including 3D faces, 3D hand shapes and full human bodies. Our experiments demonstrate the strong potential of our Shape Transformer model in several applications in computer graphics and vision.
Article
Full-text available
We introduce the SCAPE method (Shape Completion and Animation for PEople)---a data-driven method for building a human shape model that spans variation in both subject shape and pose. The method is based on a representation that incorporates both articulated and non-rigid deformations. We learn a pose deformation model that derives the non-rigid surface deformation as a function of the pose of the articulated skeleton. We also learn a separate model of variation based on body shape. Our two models can be combined to produce 3D surface models with realistic muscle deformation for different people in different poses, when neither appear in the training set. We show how the model can be used for shape completion --- generating a complete surface mesh given a limited set of markers specifying the target shape. We present applications of shape completion to partial view completion and motion capture animation. In particular, our method is capable of constructing a high-quality animated surface model of a moving person, with realistic muscle deformation, using just a single static scan and a marker motion capture sequence of the person.
Conference Paper
Full-text available
We present an unsupervised algorithm for registering 3D surface scans of an object undergoing signicant deformations. Our algorithm does not use markers, nor does it assume prior knowledge about object shape, the dynamics of its deformation, or scan alignment. The algorithm registers two meshes by optimizing a joint probabilistic model over all point-to- point correspondences between them. This model enforces preservation of local mesh geometry, as well as more global constraints that capture the preservation of geodesic distance between corresponding point pairs. The algorithm applies even when one of the meshes is an incomplete range scan; thus, it can be used to automatically ll in the remaining sur- faces for this partial scan, even if those surfaces were previously only seen in a different conguration. We evaluate the algorithm on several real-world datasets, where we demonstrate good results in the presence of signicant movement of articulated parts and non-rigid surface defor- mation. Finally, we show that the output of the algorithm can be used for compelling computer graphics tasks such as interpolation between two scans of a non-rigid object and automatic recovery of articulated object models.
Conference Paper
Full-text available
In this paper, we propose a high-order graph matching formulation to address non-rigid surface matching. The singleton terms capture the geometric and appearance similarities (e.g., curvature and texture) while the high-order terms model the intrinsic embedding energy. The novelty of this paper includes: 1. casting 3D surface registration into a graph matching problem that combines both geometric and appearance similarities and intrinsic embedding information, 2. the first implementation of high-order graph matching algorithm that solves a non-convex optimization problem, and 3. an efficient two-stage optimization approach to constrain the search space for dense surface registration. Our method is validated through a series of experiments demonstrating its accuracy and efficiency, notably in challenging cases of large and/or non-isometric deformations, or meshes that are partially occluded.
Conference Paper
Full-text available
We present a formal methodology for the integration of optical flow and deformable models. The optical flow constraint equation provides a non-holonomic constraint on the motion of the deformable model. In this augmented system, forces computed from edges and optical flow are used simultaneously. When this dynamic system is solved, a model-based least-squares solution for the optical flow is obtained and improved estimation results are achieved. The use of a 3-D model reduces or eliminates problems associated with optical flow computation. This approach instantiates a general methodology for treating visual cues as constraints on deformable models. We apply this framework to human face shape and motion estimation. Our 3-D deformable face model uses a small number of parameters to describe a rich variety of face shapes and facial expressions. We present experiments in extracting the shape and motion of a face from image sequences.
Conference Paper
Full-text available
3D surface matching is fundamental for shape registration, deformable 3D non-rigid tracking, recognition and classification. In this paper we describe a novel approach for generating an efficient and optimal combined matching from multiple boundary-constrained conformal parameterizations for multiply connected domains (i.e., genus zero open surface with multiple boundaries), which always come from imperfect 3D data acquisition (holes, partial occlusions, change of pose and non-rigid deformation between scans). This optimality criterion is also used to assess how consistent each boundary is, and thus decide to enforce or relax boundary constraints across the two surfaces to be matched. The linear boundary-constrained conformal parameterization is based on the holomorphic differential forms, which map a surface with n boundaries conformally to a planar rectangle with (n - 2) horizontal slits, other two boundaries as constraints. The mapping is a diffeomorphism and intrinsic to the geometry, handles an open surface with arbitrary number of boundaries, and can be implemented as a linear system. Experimental results are given for real facial surface matching, deformable cloth non-rigid tracking, which demonstrate the efficiency of our method, especially for 3D non-rigid surfaces with significantly inconsistent boundaries.
Article
Deformation transfer applies the deformation exhibited by a source triangle mesh onto a different target triangle mesh, Our approach is general and does not require the source and target to share the same number of vertices or triangles, or to have identical connectivity. The user builds a correspondence map between the triangles of the source and those of the target by specifying a small set of vertex markers. Deformation transfer computes the set of transformations induced by the deformation of the source mesh, maps the transformations through the correspondence from the source to the target, and solves an optimization problem to consistently apply the transformations to the target shape. The resulting system of linear equations can be factored once, after which transferring a new deformation to the target mesh requires only a backsubstitution step. Global properties such as foot placement can be achieved by constraining vertex positions. We demonstrate our method by retargeting full body key poses, applying scanned facial deformations onto a digital character, and remapping rigid and non-rigid animation sequences from one mesh onto another.
Article
We present an unsupervised method for registering range scans of deforming, articulated shapes. The key idea is to model the motion of the underlying object using a reduced deformable model. We use a linear skinning model for its simplicity and represent the weight functions on a regular grid localized to the surface geometry. This decouples the deformation model from the surface representation and allows us to deal with the severe occlusion and missing data that is inherent in range scan data. We formulate the registration problem using an objective function that enforces close alignment of the 3D data and includes an intuitive notion of joints. This leads to an optimization problem that we solve using an efficient EM-type algorithm. With our algorithm we obtain smooth deformations that accurately register pairs of range scans with significant motion and occlusion. The main advantages of our approach are that it does not require user specified markers, a template, nor manual segmentation of the surface geometry into rigid parts.
Conference Paper
Establishing a correspondence between two surfaces is a basic ingredient in many geometry processing applications. Existing approaches, which attempt to match two meshes directly in 3D, can be cumbersome to implement and it is often hard to produce accurate results in a reasonable amount of time. In this paper, we present a new variational method for matching surfaces that addresses these issues. Instead of matching two surfaces directly in 3D, we apply well-established matching methods from image processing in the parameter domains of the surfaces. A matching energy is introduced that can depend on curvature, feature demarcations or surface textures, and a regularization energy controls length and area changes in the induced non-rigid deformation between the two surfaces. The metric on both surfaces is properly incorporated into the formulation of the energy. This approach reduces all computations to the 2D setting while accounting for the original geometries. Consequently a fast multiresolution numerical algorithm for regular image grids can be used to solve the global optimization problem. The final algorithm is robust, generically much simpler than direct matching methods, and very fast for highly resolved triangle meshes. Categories and Subject Descriptors (according to ACM CCS): G.1.8 (Numerical Analysis): Elliptic equations; Finite element methods. I.3.5 (Computer Graphics): Curve, surface, solid and object representations; Geometric algorithms, languages, and systems; Physically based modeling.
Conference Paper
We study an energy functional for computing optical flow that combines three assumptions: a brightness constancy assumption, a gradient constancy assumption, and a discontinuity-preserving spatio-temporal smoothness constraint. In order to allow for large displacements, linearisations in the two data terms are strictly avoided. We present a consistent numerical scheme based on two nested fixed point iterations. By proving that this scheme implements a coarse-to-fine warping strategy, we give a theoretical foundation for warping which has been used on a mainly experimental basis so far. Our evaluation demonstrates that the novel method gives significantly smaller angular errors than previous techniques for optical flow estimation. We show that it is fairly insensitive to parameter variations, and we demonstrate its excellent robustness under noise.