Content uploaded by Nikolce Stefanoski

Author content

All content in this area was uploaded by Nikolce Stefanoski on Jun 15, 2015

Content may be subject to copyright.

Adaptation of a Generic Face Model to a 3D Scan

Axel Weissenfeld, Nikolˇ

ce Stefanoski, Shen Qiuqiong and Joern Ostermann

Institute of Information Technology

University of Hannover

Appelstr. 9A, 30167 Hannover, Germany

aweissen, stefanos, qshen, ostermann@tnt.uni-hannover.de

Abstract—In this paper we propose a fast and simple adap-

tation of a generic face model to a 3D scan. The adaptation is

divided into global and local adaptation. The global adaptation

is based on radial basis functions (RBFs), whereas the local

adaptation reﬁnes the mesh of a face model to achieve better

adaptation results. The 3D scan is several times low-pass ﬁltered

before the mesh of a generic face model is iteratively adapted

ﬁrst to the most and then to less ﬁltered scans. Finally the generic

model is adapted to the original scan. In this way the face model

is precisely and efﬁciently adapted to the scan.

I. INTRODUCTION

In computer vision 3D generic models are frequently used

for tackling many tasks, such as 3D motion estimation of a

human face, model-based coding or facial animation (FA). A

model-based algorithm for tracking a human face is described

in [1]. This algorithm is based on adapting a generic face

model to the human subject. Another application for adaptation

is facial animation. Research in FA started in the early 70’s

[2]. From that time on different animation techniques [3] [4]

were developed, which continuously improved the animation.

Finally, in MPEG-4 a Facial Animation speciﬁcation was stan-

dardized [5], that enables the use of a generic face model. In

order to synthesize personalized facial animation, the generic

face model has to be adapted to the geometric shape of an

individual human subject.

For the described tasks a precise adaptation is necessary

to achieve good tracking or animation results. Since faces

have a highly complex 3D shape, a precise adaptation is a

challenging task. Classically a generic face model is adapted

to an image in two steps: First facial feature points, such

as eye corners and nostrils are detected. Then the generic

model is adapted to these features. These approaches range

from single view [6] to multiple-views [3]. In single views

depth information is obviously not available, so that the

model cannot be precisely adapted. Both methods attempt

to automatically detect facial features. However, they lack in

robustness, so that the reconstructed 3D shape is often not

precise. Even manual assistance will not lead to precise 3D

shapes. Laser scanners offer the opportunity to automatically

capture precise 3D shapes of objects by using a 3D shape

acquisition system.

Acquired 3D-shapes are usually represented by triangle

meshes with a large number of vertices. Since these mesh

vertices are semantically unrelated, facial animation parame-

ters as deﬁned in MPEG-4 cannot be directly extracted from

the scan. Furthermore, 3D scans have a high resolution, so

Adaptation

3D scanGeneric Face Model

N manually selected

feature points

Global

Local

Adaptation

Fig. 1. Pipeline of adapting a generic face model to 3D scan.

that the computational effort is too large for real time appli-

cations. Consequently, generic face models are indispensable

for tracking and animations.

A precise adaptation of generic models to 3D scans is

possible. Earlier approaches [7]–[9] discussed the adaptation

of a generic face model to a 3D scan for facial animations.

However, these papers did not investigate the adaptation error,

whereas we are addressing the problem of adapting generic

face models as precise as possible. Our adaptation process is

divided into global and local adaptation. The global adaptation

roughly adapts the geometric shape of the generic face model

to the 3D scan via RBFs. This paper describes a novel

approach of locally adapting a generic face model to a 3D

scan by low-pass ﬁltering the original scan a few times,

before the face model is iteratively adapted. Low-pass ﬁlters

for 3D objects are simple to implement. They requiring low

computational effort and they are very effective for a precise

adaptation.

In the following some notations are introduced, which will

be used in the remainder of this paper. A mesh M= (V, E)

is considered as a tuple of sets of vertices Vand edges E.E

is the connectivity of mesh M.pi∈Vdenotes a vertex of M

and eij ∈Edenotes an edge connecting piand pj. The set of

neighbors N(i)of vertex piconsists of all vertices pjwhich

(b)(a)

Fig. 2. Mesh of generic face model before (a) and after uniform plain

subdivision (b).

have a common edge eij with pi.N(i)is also called the 1-

neighborhood of vertex pi. We will denote the face model with

Mm= (Vm, Em)and the 3D scan with Ms= (Vs, E s). The

cardinality of an arbitrary set Swill be denoted as |S|.

In the remainder of this paper, we give an overview of

our system (Section 2). Furthermore are discussed the global

(Section 3) and local adaptation (Section 4). The error measure

is explained in section 5. In section 6 some adaptation results

are presented.

II. SYSTEM OVERVIEW

The overall framework of our face model adaptation system

is described in Fig.1. The inputs are a 3D scan acquired

by a Cyberware scanner and a generic face model. As face

models we selected a simpliﬁed Candide-1 model for rigid 3D

motion estimation and a Candide-3 model, which is compliant

to MPEG-4 Face Animation, for animation. The Candide-

1 model consists of 76 vertices and 99 triangles, while the

Candide-3 model consists of 113 vertices and 168 triangles.

Note that our face model adaptation system is not limited to

the two models used.

First the generic face model Mmis globally adapted to the

3D scan Msby selecting N manually corresponding points in

both meshes resulting in mesh ¯

Mm. This adaptation achieves

a scaling and position of the face model’s facial features to

the 3D scan.

Secondly, the globally adapted face model ¯

Mmis further

improved locally by shifting vertices successively within a

small domain.

The number of vertices of the scan is much higher than

the number of vertices of the face model, i.e. the number of

degrees of freedom needed for adaptation is strongly limited.

This restricts the overall adaptation to coarse features of the

scan, like nose, mouth, eyes, eyebrows. In order to increase

the accuracy of the adaptation process to ﬁner details the user

has the opportunity to increase the resolution, i.e. to increase

the number of vertices to the generic face model. Therefore

we use uniform plain subdivision, which divides each mesh

triangle into four smaller triangles (Fig. 2).

III. GLOBAL ADAPTATION

The face model is globally adapted to the 3D scan by

selecting manually N facial feature points in the 3D scan;

e.g. for Candide-1 N=16. This is the only manual interference

required by the user. Then the Candide mask is adapted to

the 3D scan by interpolation, which is based on radial basis

functions [10]. RBFs are well known in the head animation

community for adapting generic head models [3] and even

animating facial parts [11]. A function describing the linear

transformation of f(pm

i) = pm

i,new, in which the N 3D feature

points of the Candide mask are mapped onto the 3D scan.

This transformation can be described as follows:

f(pm

i) =

N

X

j=1

bjΦpm

i−pm

j+

4

X

l=1

clgl(pm

i)(1)

from which we can determine the control points bjand cl.

For geometric applications, as described here, the ﬁrst term

Φis a radial basis function, which is shift and rotation

invariant. In literature a high number of RBFs are proposed

[12]. In our work we selected φ(r) = (1 −r)3

+(8 + 9r+r2)

with Φ(r) = φ(krk2)for adapting the Candide mask to a

3D scan. The second term represents a polynomial space with

degree 3 representing the x, y, and z coordinate of 3D space

coordinates.

Note that after the global adaptation only the N correspond-

ing points of the generic face model are located on the scan.

Since the positions of the other vertices are calculated as an

interpolation of the corresponding points (1), these vertices

may not be located on the 3D scan. Thus, a local adaptation

is necessary.

IV. LOCAL ADAPTATION

The process of local adaptation reﬁnes the globally adapted

model ¯

Mmto achieve a more precise adaptation of the face

model Mmto the 3D scan Ms. This local adaptation is

achieved by gradually adapting the face model ¯

Mmto a series

of meshes Ms

1, . . . , M s

l, . . . , M s

L=Msgenerated from the

3D scan. Here the mesh Ms

1captures the low ”frequency”

part of the scan, while meshes Ms

lwith increasing index l

possess more and more details leading to the scan Ms

L=Ms

itself, the most detailed mesh (Fig. 3). It is assumed that in

this series only vertex positions change, while the connectivity

Es

l, which is given by the connectivity of the scan Es, remains

the same for all meshes Ms

l. Local adaptation is performed

by iterative application of an operator T, i. e.

Mm

l:= T(Mm

l−1, M s

l)for l= 1, . . . , L, M m

0:= ¯

Mm.(2)

After a series of gradual face model adaptations the ﬁnal

adapted face model Mm

Lis received. In order to obtain an

overall accurate adaptation of the face model to the details

of the 3D scan gradual adaptations preformed by the operator

Tin (2) have to be accurate. In our framework we employ

the next neighbor operator Tnn.Tnn (Mm

l−1, M s

l)updates each

vertex pm

iof the face model Mm

l−1with the nearest vertex ps

j0

in the 3D scan Ms

l, i.e.

pm

i=ps

j0with ps

j0= argminps

j∈Vs

k

pm

i−ps

j

2.

The series of meshes (Ms

l)l=1...,L induces a hierarchy in

the details of the 3D scan. Such a hierarchy can be received

by low-pass ﬁltering. A hierarchy is required in order to

allow accurate gradual adaptations while repeatedly applying

operator T in (2); i.e. the face model is successively adapted

to the details of the scan from coarse to ﬁne.

low−pass filtering adaptation

Ms

1

Ms

l

Mm

0

Mm

L

Ms

L

Fig. 3. Local adaptation: First the scan is low-pass ﬁltered, before the generic

face model is iteratively adapted.

A. Low-pass ﬁltering

In order to obtain a hierarchy of meshes with increasing

details low-pass ﬁltering is applied. There are several ﬁltering

techniques for 2-manifold meshes known in literature [13]–

[15]. In our mask adaptation framework we employ a ﬁlter

based on a discretized version of the Laplace-Operator [13]

∆dwhich is deﬁned as

∆d(pi) = 1

|N(i)|X

pj∈N(i)

pj−pi.

A low-pass ﬁltered version Ml=Hλ(Ml+1)of mesh Ml+1

is obtained by updating all vertices ˜piof Mlaccording to

˜pi=pi+λ∆d(pi)with ˜pi∈Vl, pi∈Vl+1 (3)

for λ∈[0,2]. The parameter λcontrols the amount of

attenuation of the details of the scan, e.g. for λ= 1 H1

shifts all vertices piinto their local barycenter. For values

of λoutside of [0,2] Hλhas no low-pass ﬁltering properties

anymore. Thus by successive application of the low-pass ﬁlter

Hλto the 3D scan Mswe get a series of meshes Ms

L, . . . , M s

1

with attenuating high ”frequencies”:

Ms

l:= Hλ(Ms

l+1)for l= 1, . . . , L −1, Ms

L:= Ms.(4)

The number of vertices and the connectivity in all low-pass

ﬁltered scans remain the same as in the original. Vertices are

only shifted, as described by equation (3). Hence, the mean

Euclidean distance between the vertices of two consecutive

low-pass ﬁltered scans Ms

land Ms

l+1 can be easily calculated.

If this distance is smaller than a threshold dth, which is deﬁned

relatively to the length of the bounding box diagonal of the

scan, the low-pass ﬁltering is stopped and the number of low-

pass ﬁltered scans Lin (4) is determined.

Equations (3) and (4) are a generalization of the classical

diffusion equation ∂tρ=λ·∆ρ, which describes the ﬂow

of heat in Euclidean space [14]. In fact the mesh series

Ms

L, . . . , M s

1represents consecutive states of a discretized

diffusion process, where vertices are shifted resp. diffuse in

order to reduce local curvature. In this context the local

adaptation process in equation (2) can be interpreted as an

approximation to the reverse diffusion Ms

1, . . . , M s

Lwhich is

derived from the 3D scan.

V. ERROR MEASURE

For 1D and 2D signals a great number of error or distortion

measures are known. These measures range from simple

objective such as the mean square error to more elaborate

subjective measures regarding human perception [16]. Error

measurements for 3D data sets are much more complex,

since the comparison between two data sets with a different

number of vertices is not straight forward. We choose the

Hausdorff distance as an objective measure to determine the

distortion between two 3D data sets [17]. This distance is more

appropriate as an error measure then a simple vertex to vertex

metric.

The distance between a point pm

i∈Vmbelonging to the

face model and scan is deﬁned as:

d(pm

i, M s) = min

ps

i∈Vskpm

i−ps

ik2(5)

The root mean square Hausdorff distance between face

model and scan is deﬁned as:

dRMS(Mm, M s) = v

u

u

t

1

|Vm|X

pm

i∈Vm

d(pm

i, M s)2(6)

It is important to note, that dRMS is in general not sym-

metric. Hence, the symmetrical Hausdorff distance is deﬁned

as:

ds(Mm, M s) = max[dRMS(Mm, M s), dRMS(Ms, M m)]

(7)

The symmetrical distance ds(Mm, M s)describing the error

between two data sets is used for the evaluation of our adap-

tation algorithm. For that we are using the tool ’M.E.S.H.’,

which is publicly available and can be obtained on the Web

at http://mesh.epﬂ.ch.

VI. RESULTS

We have implemented the complete system described in this

paper and adapted both generic face models to different 3D

scans. For evaluation of the local adaptation the symmetrical

TABLE I

HAUSDORFF DISTANCE BETWEEN CANDIDE1AND 3D SCAN.

Face Model global λ=0 λ=0.25 λ=0.5 λ=0.75 λ=1

original 3.238 1.452 1.362 1.316 1.313 1.338

subdivision 1 3.238 0.837 0.682 0.667 0.655 0.671

subdivision 2 3.238 0.665 0.510 0.489 0.478 0.490

TABLE II

HAUSDORFF DISTANCE BETWEEN CANDIDE3AND 3D SCAN.

Face Model global λ=0 λ=0.25 λ=0.5 λ=0.75 λ=1

original 1.066 0.890 0.906 0.898 0.877 0.913

subdivision 1 1.066 0.539 0.504 0.495 0.485 0.493

subdivision 2 1.066 0.445 0.388 0.381 0.382 0.387

Hausdorff distance (7) between the original 3D scan Ms

Land

the global ¯

Mmas well as overall adapted face model Mm

L

is calculated. The resulting distances ds(Mm, M s)for the

different adaptations are presented for both Candide models

in Tab. I and II. Not only the original Candide model is

investigated, but also the model after applying one (subdivision

1) and two (subdivision 2) uniform plain subdivisions. In

the columns of both tables the following adaptations are

compared: global adaptation, local adaptation without ﬁltering

(λ= 0), and with ﬁltering using λ= 0.25,λ= 0.5,λ= 0.75

and λ= 1.

The Hausdorff distance is signiﬁcantly reduced by the

local adaptation with respect to the globally adapted model

¯

Mm. For this reason a precise adaptation requires a local

adaptation. Moreover, subdivision also signiﬁcantly improves

the adaptation results; e.g. in Tab. I the Hausdorff distance

is almost reduced by half between the original mesh and

once subdivided. That is because the original model’s number

of vertices needed for a more precise adaptation is too low.

Hence, a larger number of vertices describing the geometric

shape is signiﬁcantly improving the adaptation.

The best adaptation results are obtained with a low-pass

ﬁlter using approximately λ=0.75 for both generic models.

This ﬁlter seems to reduce local curvatures in the most

beneﬁcial way for adapting the generic models to a human

face. The Hausdorff distance between the adapted Candide-

1 model without (λ= 0) and with ﬁltering using λ=0.75

is unequal zero in areas with small details. There low-pass

ﬁltering improves the adaptation, because the generic model

is iteratively adapted ﬁrst to the most and then to less ﬁltered

scans.

We tested our adaptation algorithm for other scans and

obtained similar results, which lead to the same conclusions.

VII. CONCLUSIONS

For an adaptation system we discussed the global and local

adaptation of a generic face model to a 3D scan. The global

adaptation is based on RBFs, which scale and orient the face

model’s mesh to the 3D scan. For the local adaptation different

low-pass ﬁlters are investigated, which ﬁltered the original

scan before the meshes of the generic face model are iteratively

adapted to the scan. We explicitely addressed the problem

of adapting generic models as precisely as possible, which

former proposed algorithms did not. The low-pass ﬁlter with

λ=0.75 leads to the smallest Hausdorff distances. Moreover,

the number of vertices can be increased by uniform plain

subdivision. Face models with a higher number of vertices lead

to much better adaptation results. Hence, a precise adaptation

with the introduced algorithm is achievable. Furthermore, our

algorithm has the advantages that it is simple to implement

and requires only low-computational effort. The proposed

algorithm is not limited to adapt generic face models, but

arbitrary generic models can be adapted to 3D scans.

VIII. ACKNOWLEDGEMENTS

This paper is supported by EC within FP6 under Grant

511568 with the acronym 3DTV.

REFERENCES

[1] J. Ahlberg, “Extracting mpeg-4 faps from video,” in MPEG-4 Facial An-

imation: The Standard, Implementation and Applications, I. S. Pandzic

and R. Forchheimer, Eds. Chichester, England: Wiley, 2002, pp. 17–56.

[2] F. I. Parke, “Computer generated animation of faces,” Proc. ACM annual

conf., 1972.

[3] F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D. H. Salesin,

“Synthesizing realistic facial expressions from photographs,” Computer

Graphics, vol. 32, no. Annual Conference Series, pp. 75–84, 1998.

[4] G. A. Kalberer and L. Van Gool, “Face animation based on observed

3D speech dynamics,” in Proceedings of Computer Animation (CA2001),

November 2001, pp. 20–27.

[5] J. Ostermann, “Face animation in mpeg-4,” in MPEG-4 Facial Anima-

tion: The Standard, Implementation and Applications, I. S. Pandzic and

R. Forchheimer, Eds. Chichester, England: Wiley, 2002, pp. 17–56.

[6] M. Kampmann and R. Farhoud, “Precise face model adaptation for se-

mantic coding of videophone sequences,” in Picture Coding Symposium

(PCS 97) Berlin, Germany, 10-12 Sept. 1997, 1997.

[7] J. Ostermann, L. S. Chen, and T. S. Huang, “Animated talking head with

personalized 3d head model,” J. VLSI Signal Process. Syst., vol. 20, no.

1-2, pp. 97–105, 1998.

[8] Y. Zhang, T. Sim, and C. L. Tan, “Rapid modeling of 3d faces for

animation using an efﬁcient adaptation algorithm,” in GRAPHITE ’04:

Proceedings of the 2nd international conference on Computer graphics

and interactive techniques in Australasia and Southe East Asia. New

York, NY, USA: ACM Press, 2004, pp. 173–181.

[9] Q. Wang, H. Zhang, T. Riegel, E. Hundt, and G. Xu, “Creating

animatable mpeg4 face,” EUROIMAGE ICAV3D, 2001.

[10] R. Schaback, “Comparison of radial basis function interpolants,” in

Multivariate Approximation: From CAGD to Wavelets, K. Jetter and

F. I. Utreras, Eds. Singapore: World Scientiﬁc, 1993, pp. 293–305.

[11] N. Arad, N. Dyn, D. Reisfeld, and Y. Yeshurun, “Image warping

by radial basis functions: applications to facial expressions,” CVGIP:

Graphical Models and Image Processing, vol. 56, pp. 161–172, 1994.

[12] Z. Wu, “Multivariate Compactly Supported Positive Deﬁnite Radial

Functions,” Advances in Computational Mathematics, vol. 4, pp. 283–

292, 1995.

[13] G. Taubin, “A signal processing approach to fair surface design,” in

Proceedings of the Conference on Computer Graphics (SIGGRAPH-95),

R. Cook, Ed. New York: ACM Press, Aug. 6–11 1995, pp. 351–358.

[14] M. Desbrun, M. Meyer, P. Schr¨

oder, and A. H. Barr, “Implicit fairing

of irregular meshes using diffusion and curvature ﬂow,” Computer

Graphics, vol. 33, no. Annual Conference Series, pp. 317–324, 1999.

[15] H. Yagou, A. Balyaev, and D. Wei, “Mesh median ﬁlter for smoothing

3-D polygonal surfaces,” in Proceedings of the First International

Symposium on Cyber Worlds (CW’02), 2002, pp. 488–498.

[16] S. Winkler, “Visual ﬁdelity and perceived quality: Towards comprehen-

sive metrics.” in Proc. SPIE Human Vision and Electronic Imaging, vol.

4299, San Jose, California, 2001, pp. 114–125.

[17] N. Aspert, D. Santa-Cruz, and T. Ebrahimi, “Mesh: Measuring errors

between surfaces using the hausdorff distance,” in Proceedings of the

IEEE International Conference on Multimedia and Expo, vol. I, 2002,

pp. 705 – 708.