Content uploaded by Nong Sang
Author content
All content in this area was uploaded by Nong Sang on Oct 25, 2014
Content may be subject to copyright.
Automatic Face Replacement in Video Based on 2D Morphable Model
Feng Min1, Nong Sang2, Zhefu Wang1
1Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, China
2Institute of Pattern Recognition and Artificial Intelligence, Huazhong University of Science and
Technology, Wuhan, China
fmin.lhi@gmail.com,nsang@hust.edu.cn,wzf.wit@gmail.com
Abstract
This paper presents an automatic face replacement
approach in video based on 2D morphable model.
Our approach includes three main modules: face
alignment, face morph, and face fusion. Given a
source image and target video, the Active Shape
Models (ASM) is adopted to source image and target
frames for face alignment. Then the source face shape
is warped to match the target face shape by a 2D
morphable model. The color and lighting of source
face are adjusted to keep consistent with those of
target face, and seamlessly blended in the target face.
Our approach is fully automatic without user
interference, and generates natural and realistic
results.
1. Introduction
With the development in digital technology, users
can access a number of videos from the camera or the
Internet. Editing and rewriting digital videos becomes
an interesting and fashionable topic. The ability to
replace an individual’s face in a video with that of
another person has many applications in the
entertainment and special effect industries. For
example, users can become the lead actor in their
favorite movies by replacing the actor’s face with their
own face. Users can perform face replacement in
photographs with an image processing software.
Unfortunately, the manipulation process using the
software is often time-consuming and labor-intensive.
It is almost impossible to perform face replacement in
video using the software. We believe that an automatic
face replacement approach is an attractive solution.
There have been a few attempts to face replacement
in images or videos. For example, Blanz et al. [1]
present a morphable model for the synthesis of 3D
faces from one or more photographs. Later they apply
the model to face replacement [2] in which they
exchange the face region with source subject model
under the pose and lighting parameters of the target
subject. The major drawback of this approach is that it
requires manual initialization in order to obtain
accurate face alignment. Based on Blanz and Vetter’s
works [1], Cheng et al. [3] present a system that
replaces the target subject face in video with the
source subject face, under similar pose, expression,
and illumination. These methods are based on 3D
morphable model derived from a dataset of
prototypical 3D scans of faces. In [3], each 3D face
model is presented by 65536 vertices with textures.
Modeling such a complex model could be tedious and
time-consuming. Bitouk et al. [4] present a system for
automatic face replacement in images using image
based method. They select candidate face images from
a large face library that similar to the target face in
appearance and pose. However, it is hard for users to
find a candidate face to match the target face in
appearance and pose from their own images.
In this paper, we present an automatic face
replacement approach in videos based on 2D
morphable model. The face replacement procedure is
fully self-driven without user interference. Given a
source face image and target video, we implement face
alignment frame-by-frame between the morphable
model and the faces in both the source image and
target frames using Active Shape Models (ASM) [5].
Then we warp the source face shape to match the
target face shape by a 2D morphable model. Finally,
we adjust the lighting and color of the source face
image to keep consistent with those of the target image,
and seamlessly blend in the target face using Poisson
image editing [6].
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.551
2242
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.551
2254
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.551
2250
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.551
2250
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.551
2250
Figure 1.The overview of our approach
The rest of this paper is organized as follows. The
overview of our method is introduced in Section 2.
The fundamentals of standard ASM are described in
Section 3. The 2D morphable model is presented in
Section 4. The detailed procedure of face replacement
is illuminated in Section 5. Some results are shown in
Section 6. The paper is concluded in Section 7 with a
discussion of future work.
2. Approach overview
The overview of our approach is showed in Figure
1. The input is a target face video and a source face
image, and the output is a composite video in which
the target face is replaced with the source face. Our
approach includes three main modules: face alignment,
face morph, and face fusion. In face alignment, the
ASM algorithm is used to the target frame and source
image to detect the detailed facial features. According
to these facial features, face alignment is implemented.
In face morph, a 2D morphable model is fitted to the
target and source face. Adjusting the parameter of the
morphable model, the shape of source face is warped
to match that of target face. In face fusion, a relighting
and recolor algorithm is applied to the source face to
keep illumination and color consistency with the target
face. The source face is seamlessly blended in the
target face using Poisson image editing [6].
(a) target image (b) source image
Figure 2.The results of facial features localization by
ASM on the target and source image.
3. Standard ASM
ASM is one of the most popular methods for face
alignment. The advantage of ASM is that it learns the
prior knowledge and variation of objects based on
statistical models. The following is a brief description
of the standard ASM technique.
Given a set of annotated images, the nlandmark
points in each image can be represented as a vector
T
nn yxyxyxX ],,,,,,[ 111100
(1)
After aligning these vectors into a common
coordinate, Principal Component Analysis (PCA) is
applied to get a set of orthogonal basis
P
. Every
aligned shape can be approximately represented as
PbXX | (2)
Where
X
is the mean shape vector, b is the shape
parameters vector. Meanwhile, the standard ASM uses
a normalized derivative profile to build the local
texture models for each landmark point.
Based on the Point Distribution Model (PDM) [5]
and the local texture model, the search progress can be
operated. After initialization, each landmark point in
the model is optimized by selecting the point with a
minimum distance in the direction perpendicular to the
contour within a certain range. As the result of new
shape is possibly implausible, PDM is used to adjust
the shape parameters. Such procedure is repeated until
no considerable change is observed. Figure 2 shows
the results of facial features localization by ASM.
4. Morphable model
The morphable model is widely applied to face
analysis and synthesis [1]. The geometry of a face is
represented with a shape vector S, that contains the
X
,Y coordinates of its nvertices. The texture of a
face is represented with a texture vector
T
, that
Face
Fusion
Face
Mor
p
h
Face
Ali
g
nment
Target
V
i
deo
Source
Image
Facial Features Localization
Ali
g
nment
Mor
p
hable Model Fittin
g
Face War
p
Recolor and Reli
g
htin
g
Seamless blendin
g
Output
Video
22432255225122512251
contains the R,G,B color values of the n
corresponding vertices. A morphable face model is
then constructed using a set of example faces, each
represented by its shape vector i
S and texture vector
i
T. A new shape mod
S and new texture mod
T can be
expressed in barycentric coordinates as a linear
combination of the shapes and textures of the example
faces.
1,,
111
mod
1
mod ¦¦¦¦
m
i
i
m
i
ii
m
i
ii
m
i
ibaTbTSaS
(3)
The morphable model is defined as the set of faces
))(),(( modmod
oo
bTaS , parameterized by the coefficients
T
m
aaaa ),,,( 21
o and T
m
bbbb ),,,( 21
o. Arbitrary new
faces can be generated by varying the parameters
o
aand
o
bthat control shape and texture.
In this paper, we only require the shape of source
face to match that of target face. Given a source shape
),( yxSsource , the reconstructed shape is supposed to be
closest to the target shape ),(
arg yxS ett in terms of
Euclidean distance
2
,arg ||),(),(||
¦ yx ettsourceS yxSyxSE (4)
Minimizing the Equation (4), we obtain the morph
face of source image whose shape is very similar to
the shape of target face.
5. Face replacement
Figure 3. illuminates the procedure of face
replacement. After facial features localization, we
obtain the facial area and size of source and target face.
We resize the source face to match the size of target
face, and merge the adjusted source facial area into the
target facial area. The result is showed in Figure 3. (b).
Note that since the source face shape is different from
the target face shape, the result looks artificial. So we
warp the source face shape to match the target face
shape by a 2D morphable model. The result is showed
in Figure 3. (c). Note that since the illumination and
skin color are inconsistent, the result appears
perceptually incorrect. In order to improve the effect,
we adopt Poisson editing technique [6] to seamlessly
blend the morph face and target face to generate the
fusion result as showed in Figure 3. (d). Our editing
problem can be formulated as follows:
Let S denotes the fusion result image
I
domain, :
denotes the uniformly sampled point set from the
source image s
I,t
I denotes the target image. Our
object is to find interpolation of
I
over :\S to make
its gradient field close to s
I and satisfy the boundary
(a) target image (b) face alignment
(c) face morph (d) seamless blending
(e) boundary mask (f) final result
Figure 3.The procedure of face replacement. (a) is the
target face image, (b) is the result of face alignment, (c)
is the result of face morph, (d) is the improved
replacement result with recolor and relighting, (e) is a
mask image which blurs the face boundary, (f) is the
final replacement result which looks natural.
condition in
:
.
::
:
³³ ||)(||min \s
StIIKwithdxdyII (5)
where is the gradient operator, and
K
is the
Gaussian degradation operator. The first term requires
the gradient of the resultant image to be consistent
with the target image t
I, and the second term is the
added soft constraint as boundary condition. It forces
the resultant image to be equal to s
I when degraded.
After seamless blending by Poisson editing, the
lighting and color of the source face are consistent
22442256225222522252
with those of the target face. Note that since the
resultant face boundary is stiff, the replacement result
looks a little artificial. So we use a mask image as
showed in Figure 3. (e) to blur the resultant face
boundary. We obtain the final replacement result,
showed in Figure 3. (f), which looks highly realistic.
The blur algorithm can be formulated as follow:
255/)1( p
m
p
t
p
s
pIIII
DDD
(6)
where p
I
,p
t
I,p
s
I,p
m
I denote the color value of pixel p
on the result image
I
, the target image t
I, the source
image s
I, and the mask image m
I respectively.
6. Results
To show the effectiveness of our face replacement
approach, we test our method on the video clips
The test video LiuXiang is a video clip that we
download from Internet. We can observe that the
target face is blurry with low resolution in Figure 4.
The source face, showed in Figure 2. (b), is clear with
high resolution. So it is hard for the test data to replace
the blurry target face with the clear source face. Some
results generated by our approach are shown in Figure
4, and the results look natural. It takes about twenty
minutes on a Core-Duo 1.6GHZ 1GB PC to generate a
composite video clip which contains 100 frames.
7. Conclusions
In this paper, we present a novel approach that
replaces the target face in video with the source face.
Our approach is fully automatic without user
interference. Combined with face alignment, face
morph, and face fusion, we can replace the face in
videos under similar color and illumination. The final
results look realistic and natural.
Our approach works well in some cases. However,
the tolerance to pose and expression variance is
limited by the robustness of ASM. Besides, sharp
lighting and violent movement in videos may affect
the final result. In future work, we plan to enhance the
robustness of our approach to avoid the pose and
expression limitations. We will improve the accuracy
of ASM and accelerate the algorithm of Poisson
interpolation.
References
[1] V. Blanz, T. Vetter. A morphable model for the synthesis
of 3D face. In Proceedings of ACM SIGGRAPH, 187-
194, 1999
Figure 4.The results of face replacement in video. The
first column images are the target frames, and the
second column images are the face replacement results
generated by our approach.
[2] V. Blanz, K. Scherbaum, etc Exchanging faces in images.
Computer Graphics Forum, 23(3):669-676, 2004
[3] Y. T. Cheng, V. Tzeng, etc. 3D morph-able model based
face replacement in video. In Proceedings of ACM
SIGGRAPH, 2009
[4] D. Bitouk, N. Kumar, etc. Face swapping: automatically
replacing faces in photo-graphs. In Proceedings of ACM
SIGGRAPH, 27(3), 2008
[5] T. F. Cootes, C. J. Taylor, etc. Active shape models–their
training and applications. Computer Vision and Image
Understanding, 61(1): 38-59, 1995
[6] P. Perez, M. Gangnet, etc. Poisson Image Editing. In
Proceedings of ACM SIGGRAPH, 22(3): 313-318, 2003
22452257225322532253