Page 1

To appear in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Madison, Wisconsin, June 2003.

A Variational Framework for Image Segmentation

Combining Motion Estimation and Shape Regularization

Daniel Cremers

Department of Computer Science

University of California, Los Angeles – CA 90095

cremers@ucla.edu

Abstract

Based on a geometric interpretation of the optic flow con-

straint equation, we propose a conditional probability on

the spatio-temporal image gradient. We consistently derive

a variational approach for the segmentation of the image

domain into regions of homogeneous motion.

The proposed energy functional extends the Mumford-

Shah functional from gray value segmentation to motion

segmentation. It depends on the spatio-temporal image gra-

dient calculated from only two consecutive images of an im-

age sequence. Moreover, it depends on motion vectors for

a set of regions and a boundary separating these regions.

In contrast to most alternative approaches, the problems

of motion estimation and motion segmentation are jointly

solved by minimizing a single functional.

Numerical evaluation with both explicit and implicit

(level set based) representations of the boundary shows the

strengths and limitations of our approach.

1. Introduction and Related Work

The estimation of motion from image sequences has a long

tradition in computer vision. In recent years, many ap-

proaches have been proposed to segment the image plane

on the basis of this motion information. The fields of im-

age sequence analysis and video compression provide nu-

merous applications. In some approaches, motion discon-

tinuities are modeled implicitly [1, 10, 9, 15]. Other ap-

proaches treat the problems of motion estimation in disjoint

sets and optimization of the motion boundaries separately

[14, 2, 12, 13, 7].

In [5], we presented a variational approach to motion

segmentation with an explicit contour where both the mo-

tion estimation and the boundary optimization are derived

from minimizing a single energy functional. Yet, this ap-

proach had an important drawback: Satisfactory results

were only obtained upon applying two posterior normaliza-

tions to the terms driving the evolution of the motion bound-

ary.

In the present paper, we derive a novel variational formu-

lation for segmenting the image plane into regions of homo-

geneous motion. It is based on a simple probabilistic model

forthespatio-temporalimage gradient determined fromtwo

consecutive images of a sequence. We show that local min-

imization of an appropriate energy functional leads to an

eigenvalue problem for the motion parameters and to a gra-

dient descent evolution for the motion boundaries. In con-

trast to our previous approach, all normalizations comprised

in the evolution equation are derived in a consistent manner

by minimizing the proposed functional.

We present numerical results for two implementations of

the functional: one with an explicit spline based representa-

tion of the contour, and one with an implicit level set based

representation. In particular, these results cover the cases of

a moving object on a moving background and of multiple

moving regions.

2. From the Optic Flow Constraint ...

Let f : Ω × R+→ R+be a gray value image sequence.

We assume the intensity of a moving point to be constant

throughout time. This induces the optic flow constraint

equation:

d

dtf(x,t) =∂f

∂xu +∂f

∂yw +∂f

∂t= 0,

(1)

where (u, w)tis the local velocity vector. Geometrically,

thisconstraintstatesthatthespatio-temporalimagegradient

?∂f

must either vanish or be orthogonal to the homogeneous

velocity vector v = (u, w, 1)t:

∇3f =

∂x,∂f ∂y,∂f∂t

?t

,

(2)

∇3ftv = 0.

(3)

This constraint has been employed in many motion estima-

tion approaches. Commonly — for example in the seminal

work of Horn and Schunck [8] and many subsequent works

1

Page 2

— the square of this constraint is used as a fidelity term.

In contrast, we suggest to use the angle α between the two

vectors as a measure of the orthogonality.

To this end, let x be a point with spatio-temporal deriva-

tive ∇3f, and let R be a region of the image with homo-

geneous velocity v. Then we model the probability that the

point x is part of the region R by:

P(∇3f |v) ∝ e−cos2(α)= exp

This probability has the following favorable properties:

?

−(vt∇3f)2

|v|2|∇3f|2

?

. (4)

• It is maximal if the vectors v and ∇3f are orthogonal.

• It is minimal if the two vectors are parallel.

• It only depends on the angle between the spatio-

temporal image gradient and the homogeneous veloc-

ity vector, and not on the magnitude of these vectors.

3. ... to Motion Segmentation

Based on the probability measure in equation (4), we will

now define a variational approach for motion segmenta-

tion. We suggest to segment the image plane Ω into a set

of pairwise disjoint regions Riof homogeneous velocity vi

by minimizing the functional

?

Ri

simultaneously with respect to the motion vectors viof each

region Riand with respect to the motion boundary C sep-

arating these regions. The term L(C) denotes the length of

this boundary.

Inserting the model (4), we get (up to a constant):

?

Ri

This functional (6) has the simple form

?

where

Mi=

|∇3f|2

In numerical implementations, we replace the term

|∇3f|2in the denominator by |∇3f|2+?2with a small con-

stant ?. This guarantees that the matrix Miis always well

defined. Moreover, points with very small spatio-temporal

gradient (in the order of ?), i. e. weak motion information,

will contribute less to the segmentation process.

The functional (6) can be considered a generalization of

the Mumford–Shah functional [11, 16] from gray value seg-

mentation to motion segmentation.

E({vi},C) =

i

?

?

−logP(∇3f|vi)

?

dx+νL(C) (5)

E({vi},C) =

i

?

|vt

|vi|2|∇3f|2dx + νL(C).

i∇3f|2

(6)

E({vi},C) =

i

vt

iMivi

|vi|2

+ νL(C),

(7)

?

Ri

∇3f∇3ft

dx

(8)

4. Energy Minimization

Given two consecutive images f1and f2from an image se-

quence, we approximate the spatio-temporal gradient by:

?

From this, we determine the matrix

∇3f ≈

∇f1+f2

f2− f1

2

?

.

(9)

M =∇3f∇3ft

|∇3f|2

(10)

for each point in the image plane.

Given an initial contour C, we minimize the energy (6)

by alternating the two fractional steps of updating the mo-

tion parameters vifor the regions Ri, and of iterating a gra-

dient descent evolution for the boundary C separating these

regions. This will be detailed in the following.

4.1. An Eigenvalue Problem

For a fixed boundary C, minimization of the functional (6)

results in the eigenvalue problem

vi= argmin

v

vtMiv

vtv

.

(11)

The homogeneous velocity vector vi for each region Ri

is therefore given by the eigenvector corresponding to the

smallest eigenvalue of the 3×3-matrix Midefined in (8). It

is normalized, such that the third component is 1.

4.2. Motion Competition

For fixed velocity vectors vi, a gradient descent on the en-

ergy (6) for the boundary C results in the evolution equa-

tion:

dC= (ej− ek) · n − νdL(C)

where n denotes the normal vector on the boundary, the in-

dices ‘k’ and ‘j’ refer to the regions adjoining the contour,

and

ei=vt

dC

dt

= −dE

dC

,

(12)

iM vi

vt

ivi

(13)

is an energy density.

The two terms in the contour evolution (12) have the fol-

lowing intuitive interpretation:

• The first term is proportional to the difference of the

energy densities ei in the regions neighboring the

boundary. The neighboring regions compete for the

boundary in terms of their motion energy, thereby

maximizing the motion homogeneity. For this reason

we refer to this process as motion competition.

• The second term minimizes the length L of the sepa-

rating motion boundary.

2

Page 3

5. A Spline Based Implementation

In this section, we propose an implementation of the con-

tour evolution (12) with an explicit closed spline curve:

C : [0,1] × R+→ Ω, C(s,t) =

N

?

n=1

pn(t)Bn(s), (14)

with quadratic periodic B-spline basis functions Bn and

control points pn= (xn, yn)t.

One difficulty of explicit contour parameterizations is

the fact that control points may cluster in one point. This

causes the normal vector to become ill-defined and con-

sequently the evolution along the normal becomes insta-

ble. To prevent this behaviour, we use the length measure

1 ?

constraint enforces an equidistant spacing of control points

which strongly improves the numerical stability. The con-

tour evolution then reads:

L(C) =

0

?∂C

∂s

?2ds. As discussed in [6], minimizing this

∂C

∂t

= (ej− ek) · n − ν∂2C

∂s2.

(15)

Byinsertingthesplinedefinition(14), thisequationiseasily

converted to an evolution for the control points pn(t).

In practice, we iterate this gradient descent for the con-

trol points pn(t), and update in alternation the motion en-

ergy densities eiaccording to (13) and (11).

6. A Level Set Implementation

The explicit contour implementation presented in the previ-

ous section is quite fast, because the contour evolution only

involves the update of a small number of control point co-

ordinates. Yet, explicit contour representations have certain

disadvantages. Firstly, one needs to take care of a regrid-

ding of control points which are not intrinsic to the contour.

And secondly, the contour topology is fixed, such that no

contour splitting or merging is possible unless it is modeled

explicitly by some (inevitably) heuristic method.

We therefore propose an implicit level set based imple-

mentation of the functional (6). It is based on the analogous

gray value approach proposed in [3]. The idea of all level

set contour descriptions is to define the contour C as the

zero level set of a function φ : Ω → R:

C = {x ∈ Ω | φ(x) = 0}.

(16)

Using the Heaviside step function

H(φ) =

?

1

0

if φ ≥ 0

if φ < 0

,

(17)

we can embed the motion energy (6) by the two-phase func-

tional:

E (v1,v2,φ) =

?

Ω

|vt

|v1|2|∇3f|2H(φ)dx

?1−H(φ)?dx + ν

1∇3f|2

+

?

Ω

|vt

|v2|2|∇3f|2

2∇3f|2

?

Ω

??∇H(φ)??dx.

(18)

The contour evolution (12) then corresponds to the gradient

descent evolution on the embedding function φ:

?

with eias defined in (13). In numerical implementations,

we use for the delta function δ(φ) =

approximation of finite width τ, as suggested in [3]. De-

pending on the size of τ, this permits to detect interior con-

tours, as shown in the results of Figure 5.

Minimization with respect to the motion parameters v1

and v2will again lead to an eigenvalue problem of the form

(11) for the 3 × 3-matrices

?

Ω

∂φ

∂t= δ(φ)

ν div

?∇φ

|∇φ|

?

+ e2− e1

?

,

(19)

d

dφH(φ) a smoothed

M1=

∇3f∇3ft

|∇3f|2H(φ)dx, M2=

?

Ω

∇3f∇3ft

|∇3f|2

¯H(φ)dx,

where¯H = 1−H. As in the explicit scheme, we minimize

the functional (18) by alternating the update of the motion

vectors viwith the iteration of the contour evolution (19).

7. Numerical Results

7.1. Segmenting Multiple Motion

To evaluate the explicit scheme introduced in Section 5, we

used the Avengers sequence.1A moving car is captured by

a moving camera.

Figure 1 shows segmentation results obtained on frames

18 through 34. We fixed an initial contour (left), deter-

mined the spatio-temporal derivative for the given pair of

consecutive frames and iterated the minimization of energy

(6), alternating the motion estimation (11) and the contour

evolution (15). Despite the model hypothesis of constant

motion per region, the segmentation is fairly robust to non-

translatory motion. Once the car starts turning the segmen-

tation slowly degrades — see the last image in Figure 1.

Minimizing energy (6) simultaneously generates a seg-

mentation of the image plane and an estimate for the mo-

tion in the separate regions. The motion estimated for the

first two frames in the sequence is shown in Figure 3. Both

1We thank P. Bouthemy and his group for providing us with the image

data from the Avengers sequence.

3

Page 4

Figure 1: Segmentation of multiple motion for the frames 18–34 from the Avengers sequence: Contour evolution for the functional

(6) with an explicit contour initialized as shown in the top left image. The first two images show the evolution of the contour for the

first pair of frames, the following images show the segmentation obtained for consecutive frames. Both the car and the background

are moving. Despite the model hypothesis of constant motion per region, the segmentation is fairly robust to non-translatory motion

and only slowly degrades once the car starts moving perpendicular to the viewing plane (right).

Figure 2: Motion segmentation with an explicit contour for the frames 35–45 from the Avengers sequence. The contour is

initialized as shown on the left, then the minimization of (6) is iterated a fixed number of steps on each pair of consecutive frames

(the first two images showing frame 35). The comoving shadow is initially associated with the car, but attributed to the background

later on. Indeed, due to its semi-transparency, it is unclear whether the shadow is part of the car or not. There is no hypothesis of

motion continuity, therefore our approach can also be used for estimating temporally discontinuous motion and for tracking.

Figure 3: Motion estimate generated by minimizing energy

(6) for the first two frames from Figure 1. Both car and back-

ground move at different velocities – cf. Fig. 1, 2nd image.

the car and the background are moving, with velocities of

different direction and magnitude.

Figure 2 shows similar results for the frames 35–45 of

the Avengers sequence. The proposed method always uses

only two consecutive frames. Although using more than

two frames has been shown to stabilize the problem of mo-

tion estimation, we believe that using only two frames has

several advantages, in particular:

• No hypothesis is made on temporal continuity of the

motion. Therefore temporally discontinuous motion

can be estimated and segmented as well.

• Only two consecutive frames are used, motion estima-

tion reduces to a simple eigenvalue problem and the

contour evolution to an update of a few control points.

Therefore the proposed method is amenable to real-

time implementations for online tracking.

7.2. Segmenting Multiply Connected Regions

To evaluate the implicit scheme introduced in Section 6, we

used two consecutive frames from a sequence showing a

moving object which is not simply connected: A roll of

scotch tape is moving on a newspaper.

Figure 4 shows the initial contour and the contour evolu-

tion obtained by minimizing energy (18) which amounts to

alternating the gradient descent (19) and the motion param-

eter update (11). The images in the top row show one of the

two consecutive frames with the evolving contour and the

estimated motion superimposed.

The figures in the bottom row show the corresponding

evolution of the embedding surface φ, underlying the con-

tour evolution. It explains the change of contour topology

from the fourth to the fifth image.

Figure 5 shows the same segmentation process for a dif-

ferentinitialization. Theseimagesdemonstratethatthecon-

tour converges over fairly large distances. Moreover, our

numerical scheme is capable of detecting interior motion

boundaries.

4

Page 5

Figure 4: Level set motion segmentation for the energy (18). Top row: One of the two input images (showing a scotch tape

moving on a newspaper) with the evolving contour and the estimated motion superimposed. Note that the object of interest is

hardly distinguishable from the background on the basis of its intensity. Yet, the minimization of a the energy (18) generates both

a segmentation of the image plane and an estimate of the motion in each region. Bottom row: Corresponding evolution of the

embedding surface φ. The evolving surface induces a change of the contour topology from the fourth to the fifth image. Moreover,

the embedding surface is less negative in regions of weak gray value structure because these are less easily ascribed to one or the

other motion hypothesis.

Figure 5: Detecting interior motion boundaries. Top row: Evolving motion boundary and estimated motion superimposed on the

first of the two input images. Bottom row: Corresponding evolution of the embedding level set function φ. The transition form the

third to the fourth image illustrates the process of detecting interior motion boundaries.

7.3. Segmenting Several Moving Objects

The following example presents an application of the level

set framework (18) in a real-world traffic scene showing

several moving objects with a differently moving back-

ground. We used two consecutive images from a sequence

recorded by D. Koller and H. Nagel (KOGS/IAKS, Univer-

sity of Karlsruhe).2The sequence shows several cars mov-

ing in the same direction, filmed by a static camera. In or-

der to increase the complexity of the scene, we artificially

induced a background motion by shifting one of the two

frames, thereby simulating a moving camera.

The images in figure 6, top row, show the contour evolu-

tion with the corresponding motion estimates superimposed

on one of the two frames. The bottom row shows the evolu-

tion of the underlying level set function. Due to the level

set representation, the boundary can undergo topological

changes such as the split from the third to the fourth frame.

Therefore this framework permits to segment multiple mov-

ing objects against a differently moving background.

2http://i21www.ira.uka.de/image sequences/

8. Summary and Conclusions

We presented a probabilistic approach to the problem of

segmenting images on the basis of motion information.

Starting from a geometric interpretation of the well-known

optic flow constraint, we proposed a conditional probability

on the spatio-temporal image gradient at a point x, given the

velocity v. From this probability model we derived a novel

variational framework for segmenting the image plane into

regions of homogeneous motion.

We showed that minimizing the proposed energy leads

to an eigenvalue problem for the motion parameters and

to an evolution for the separating motion boundary. We

demonstrated the generality of our approach by detailing

two implementations of this functional — one with an ex-

plicit spline representation of the motion boundary, and one

with an implicit level set based representation.

Numerical results on real-world image sequences

demonstrate the capacity of our approach to segment mul-

tiply moving regions (moving cars captured by a mov-

ing camera), and to segment multiply connected moving

5