Page 1

Review of Statistical Approaches to Level Set Segmentation:

Integrating Color, Texture, Motion and Shape

Daniel Cremers and Mikael Rousson

Siemens Corporate Research

755 College Road East, Princeton, NJ 08540

{daniel.cremers,mikael.rousson}@siemens.com

Rachid Deriche

INRIA, Sophia-Antipolis, France

{rachid.deriche}@sophia.inria.fr

Abstract. Since their introduction as a means of front propagation and their first

application to edge-based segmentation in the early 90’s, level set methods have

become increasingly popular as a general framework for image segmentation. In this

paper, we present a survey of a specific class of region-based level set segmenta-

tion methods and clarify how they can all be derived from a common statistical

framework.

Region-based segmentation schemes aim at partitioning the image domain by

progressively fitting statistical models to the intensity, color, texture or motion in

each of a set of regions. In contrast to edge-based schemes such as the classical

Snakes, region-based methods tend to be less sensitive to noise. For typical images,

the respective cost functionals tend to have less local minima which makes them

particularly well-suited for local optimization methods such as the level set method.

We detail a general statistical formulation for level set segmentation. Subse-

quently, we clarify how the integration of various low level criteria leads to a set of

cost functionals and point out relations between the different segmentation schemes.

In experimental results, we demonstrate how the level set function is driven to

partition the image plane into domains of coherent color, texture, dynamic texture

or motion. Moreover, the Bayesian formulation allows to introduce prior shape

knowledge into the level set method. We briefly review a number of advances in

this domain.

Keywords: Image segmentation, level set methods, Bayesian inference, color, tex-

ture, motion

1. Introduction

The goal of image segmentation is to partition the image plane into

meaningful areas, where meaningful typically refers to a separation of

areas corresponding to different objects in the observed scene from the

area corresponding to the background.

A large variety of segmentation algorithms have been proposed over

the last few decades. While earlier approaches were often based on a

set of rather heuristic processing steps (cf. [69]), optimization methods

c ? 2005 Kluwer Academic Publishers. Printed in the Netherlands.

IJCV.tex; 19/09/2005; 19:10; p.1

Page 2

2

have become established as more principled and transparent methods:

Segmentations of a given image are obtained by minimizing appropri-

ate cost functionals. Among optimization methods, one can distinguish

between spatially discrete and spatially continuous representations.

In spatially discrete approaches, the pixels of the image are usu-

ally considered as the nodes of a graph, and the aim of segmentation

is to find cuts of this graph which have a minimal cost. Optimiza-

tion algorithms for these problems include greedy approaches such

as the Iterated Conditional Modes (ICM) [2] and continuation meth-

ods such as Simulated Annealing [35] or Graduated Non-convexity [5].

Specific classes of graph cut approaches gained in popularity with the

re-discovery of efficient global optimization methods, which are based

on concepts of dynamic programming [6], on spectral methods [82, 56]

or on semidefinite programming techniques [45].

In spatially continuous approaches, the segmentation of the image

plane Ω ⊂ I R2is considered as a problem of infinite-dimensional op-

timization. Using variational methods, one computes segmentations of

a given image I : Ω → I R by evolving contours in the direction of the

negative energy gradient using appropriate partial differential equations

(pdes). Such pde-based segmentation methods became popular with the

seminal paper on Snakes by Kass et al. [44]. In this paper, the contour

is implemented by an explicit (parametric) curve C : [0,1] → Ω which

is evolved by locally minimizing the functional

?

where Csand Cssdenote the first and second derivative with respect to

the curve parameter s. The first term in (1) is the external energy which

accounts for the image information, in the sense that the minimizing

contour will favor locations of large image gradient. The last two terms

– weighted by nonnegative parameters ν1and ν2– can be interpreted as

an internal energy of the contour, measuring the length of the contour

and its stiffness or rigidity.1

The Snakes approach had an enormous impact in the segmenta-

tion community (with over 3000 citations to date). Yet, it suffers from

several drawbacks:

E(C) = −|∇I(C)|2ds + ν1

?

|Cs|2ds + ν2

?

|Css|2ds,

(1)

1. The implementation of contour evolutions based on an explicit pa-

rameterization requires a delicate regriding (or reparameterization)

process to avoid self-intersection and overlap of control or marker

points.

1From a survey of a number of related publications and from our personal

experience, it appears that the rigidity term is not particularly important, such

that one commonly sets ν2 = 0.

IJCV.tex; 19/09/2005; 19:10; p.2

Page 3

3

2. The explicit representation by default does not allow the evolving

contour to undergo topological changes such that the segmentation

of several objects or multiply-connected objects is not straight-

forward.2

3. The segmentations obtained by a local optimization method are

bound to depend on the initialization. The Snake algorithm is

known to be quite sensitive to the initialization. For many realistic

images, the segmentation algorithm tends to get stuck in undesired

local minima – in particular in the presence of noise.

4. The Snakes approach lacks a meaningful probabilistic interpreta-

tion. Extensions to other segmentation criteria – such as color,

texture or motion – are not straight-forward.

In the present paper, we will review recent developments in the

segmentation community which aim at resolving the above problems.

We will review the level set method for front propagation as a means

to handle topological changes of evolving interfaces and to remove the

issues of contour parameterization and control point regriding. Among

the level set methods, we will focus on statistical region-based methods,

where the contour is not evolved by fitting to local edge information (as

in the Snakes) but rather by fitting statistical models to intensity, color,

texture or motion within each of the separated regions. The respective

cost functionals tend to have less local minima for most realistic images.

As a consequence, the segmentation schemes are far less sensitive to

noise and to varying initialization.

The outline of the paper is as follows: In Section 2, we will review

the general idea of level set based boundary propagation and its first

applications to image segmentation. In Section 3, we will then review

a probabilistic formulation of region-based segmentation. In particular,

we will make very explicit what are the assumptions underlying the

derivation of appropriate cost functionals. In the subsequent sections,

we then detail how to adapt the probabilistic level set framework to

different segmentation criteria: In Section 4, we present probabilistic

models which drive the segmentation process to group regions of ho-

mogeneous intensity, color or texture. In Section 5, we briefly present

extensions of this framework to Diffusion Tensor Images. In Section 6,

we discuss a further extension which allows to exploit spatio-temporal

dynamics to drive a segmentation process, given an entire sequence of

images. In particular, this approach allows to separate textures which

2It should be pointed out that based on various heuristics, one can successfully in-

corporate regriding mechanisms and topological changes into explicit representations

– cf. [62, 48, 28, 25].

IJCV.tex; 19/09/2005; 19:10; p.3

Page 4

4

have identical spatial characteristics but differ in their temporal dy-

namics. In Section 7 we detail how to integrate motion information

as a criterion for segmentation, leading to a partitioning of the image

plane into areas of piecewise parametric motion. Finally, in Section

8, we briefly discuss numerous efforts to introduce statistical shape

knowledge in level set based image segmentation in order to cope with

missing or misleading low-level information.

2. Level Set Methods for Image Segmentation

In the variational framework, a segmentation of the image plane Ω is

computed by locally minimizing an appropriate energy functional, such

as the functional (1). The key idea is to evolve the boundary C from

some initialization in direction of the negative energy gradient, which

is done by implementing the gradient descent equation:

∂C

∂t= −∂E(C)

∂C

= F · n,

(2)

modeling an evolution along the normal n with a speed function F.3

In general, one can distinguish between explicit (parametric) and

implicit representations of contours. In explicit representations – such

as splines or polygons – a contour is defined as a mapping from an

interval to the image domain: C : [0,1] → Ω. The propagation of an

explicit contour is typically implemented by a set of ordinary differ-

ential equations acting on the control or marker points. In order to

guarantee stability of the contour evolution (i.e. preserve well-defined

normal vectors), one needs to introduce certain regriding mechanisms to

avoid overlap of control points, for example by numerically resampling

the marker points every few iterations, by imposing in the variational

formulation a rubber-band like attraction between neighboring points

[25], or by introducing electrostatic repulsion [91]. Moreover, in order

to segment several objects or multiply connected objects, one needs

to introduce numerical tests to enable splitting and remerging of con-

tours during the evolution. Successful advances in this direction were

proposed among others by [50, 62, 48, 28].

In implicit contour representations, contours are represented as the

(zero) level line of some embedding function φ : Ω → I R:

C = {x ∈ Ω | φ(x) = 0}.

(3)

3Most meaningful contour evolutions do not contain any tangential component

as the latter does not affect the contour, but only the parameterization.

IJCV.tex; 19/09/2005; 19:10; p.4

Page 5

5

There are various methods to evolve implicitly represented contours.

The most popular among these is the level set method [29, 30, 65], in

which a contour is propagated by evolving a time-dependent embedding

function φ(x,t) according to an appropriate partial differential equa-

tion. In the following, we will briefly sketch two alternative methods

to derive a level set evolution implementing the minimization of the

energy E(C).

For a contour which evolves along the normal n with a speed F –

see equation (2) – one can derive a corresponding partial differential

equation for the embedding function φ in the following way. Since

φ(C(t),t) = 0 at all times, the total time derivative of φ at locations of

the contour must vanish:

d

dtφ?C(t),t?= ∇φ∂C

Inserting the definition of the normal n =

equation for φ:

∂t+∂φ

∂t= ∇φ F · n +∂φ

∂t= 0.

(4)

∇φ

|∇φ|, we get the evolution

∂φ

∂t= −|∇φ|F.

(5)

By derivation, this equation only specifies the evolution of φ (and the

values of the speed function F) at the location of the contour. For a

numerical implementation one needs to extend the right-hand side of

(5) to the image domain away from the contour.

Alternatively to the above derivation, one can obtain a level set equa-

tion from a variational formulation (cf. [98, 12]): Rather than deriving

an appropriate partial differential equation for φ which implements the

contour evolution equation (2), one can embed a variational principle

E(C) defined on the space of contours by a variational principle E(φ)

defined on the space of level set functions:

E(C)

−→

E(φ)

Subsequently, one can derive the Euler-Lagrange equation which min-

imizes E(φ):

∂φ

∂t= −∂E(φ)

In both cases, the embedding is not uniquely defined. Depending on the

chosen embedding, one can obtain slightly different evolution equations

for φ(x,t).

The first applications of this level set formalism for the purpose of

image segmentation were proposed in [10, 58, 57]. Indepdently, Caselles

et al. [11] and Kichenassamy et al. [46] proposed a level set formulation

∂φ

.

(6)

IJCV.tex; 19/09/2005; 19:10; p.5

Page 6

6

for the Snake energy (1) given by:

?

where the gradient |∇I| in functional (1) was replaced by a more general

edge function g(I). This approach is known as Geodesic Active Con-

tours, because the underlying energy can be interpreted as the length

of a contour in a Riemannian space with a metric induced by the image

intensity. See [11, 46] for details.

Local optimization methods such as the Snakes have been heavily

criticized because the computed segmentations depend on the initial-

ization and because algorithms are easily trapped in undesired local

minima for many realistic images. In particular in the presence of

noise, numerous local minima of the cost functional (1) are created by

local maxima of the image gradient. To overcome these local minima

and to drive the contour toward the boundaries of objects of interest,

researchers have introduced an additional balloon force [16] which leads

to either a shrinking or an expansion of contours. Unfortunately this

requires prior knowledge about whether the object of interest is inside

or outside the initial contour. Moreover, the final segmentation will be

biased toward smaller or larger segmentations.

In the following, we will review a probabilistic formulation of the

segmentation problem which leads to region-based functionals rather

than edge-based functionals such as the Snakes. Moreover, we will pro-

vide numerous experiments which demonstrate that such probabilistic

region-based segmentation schemes do not suffer from the above draw-

backs. While optimization is still done in a local manner, the respective

functionals tend to have few local minima and segmentation results

tend to be very robust to noise and varying initialization.

∂φ

∂t= |∇φ|div

g(I)∇φ

?∇φ|

?

= g(I)|∇φ|div

?∇φ

?∇φ|

?

+ ∇g(I)·∇φ, (7)

3. Statistical Formulation of Region-Based Segmentation

3.1. Image Segmentation as Bayesian Inference

Statistical approaches to image segmentation have a long tradition,

they can be traced back to models of magnetism in physics, such as

the Ising model [41], pioneering works in the field of image processing

include spatially discrete formulations such as those of Geman and

Geman [35] and Besag [2], and spatially continuous formulations such

as the ones of Mumford and Shah [63] and Zhu and Yuille [100].

The probabilistic formulation of the segmentation problem presented

in the following extends the statistical approaches pioneered in [49, 100,

IJCV.tex; 19/09/2005; 19:10; p.6

Page 7

7

66, 87]. In particular, this extension allows the probabilistic framework

to be applied to segmentation criteria such as texture and motion,

which will be detailed in subsequent sections. In [49], a segmentation

functional is obtained from a Minimum Description Length (MDL)

criterion. The link with the Mumford-Shah functional and the equiva-

lence to Bayesian maximum a posteriori (MAP) estimation is provided

in [100]. Following [66], an optimal partition P(Ω) of the image plane

Ω (i.e. a partition of the image plane into pairwise disjoint regions) can

be computed by maximizing the a posteriori probability p(P(Ω)|I) for

a given image I.4The Bayes rule permits to express this conditional

probability as

p(P(Ω)|I) ∝ p(I |P(Ω)) p(P(Ω)),

thereby separating image-based cues (first term) from geometric prop-

erties of the partition (second term). The Bayesian framework has

become increasingly popular to tackle many ill-posed problems in com-

puter vision. Firstly the conditional probability p(I |P(Ω)) of an obser-

vation given a model state is often easier to model than the posterior

distribution, it typically follows from a generative model of the image

formation process. Secondly, the term p(P(Ω)) in (8) allows to intro-

duce prior knowledge stating which interpretations of the data are a

priori more or less likely. Wherever available, such a priori knowledge

may help to cope with missing low-level information.

One can distinguish between generic priors and object specific priors.

Object specific priors can be computed from a set of sample segmen-

tations of an object of interest. In Section 8, we will briefly review a

number of recent advances regarding the incorporation of statistically

learnt priors into the level set framework.

In this section, we will focus on generic (often called geometric)

priors. The most commonly used regularization constraint is a prior

which favors a short length C of the partition boundary:

p(P(Ω)) ∝ e−ν |C|,

Higher-order constraints may be of interest for specific applications

such as the segmentation of thin elongated structures [71, 64].

To further specify the image term p(I |P(Ω)) in (8), we make the

following hypotheses. Following [66], we assume the image partition to

be composed of N regions without correlation between the labellings.

This gives the simplified expression:

(8)

ν > 0.

(9)

p(I |P(Ω)) = p(I |{Ω1,...,ΩN}) =

N

?

i=1

p(I |Ωi),

(10)

4In the following, I can refer to a single image or to an entire image sequence.

IJCV.tex; 19/09/2005; 19:10; p.7

Page 8

8

where p(I |Ωi) denotes the probability of observing an image I when Ωi

is a region of interest. Let us assume that regions of interest are char-

acterized by a given feature f(x) associated with each image location.

This feature may be a scalar quantity (such as the image intensity), a

vector quantity (such as color or the spatio-temporal image gradient),

or a tensor (such as a structure tensor or a diffusion tensor).

For the features presented in this paper, we make the assumption

that the values of f at different locations of the same region can be

modeled as independent and identically distributed realizations of the

same random process.5Let pibe the probability density function (pdf )

of this random process in Ωi. Expression (10) then reads

p(I |P(Ω)) =

N

?

i=1

?

x∈Ωi

?

pi(f(x))

?dx,

(11)

where the bin volume dx is introduced to guarantee the correct con-

tinuum limit. Approximation (11) is not valid in general since image

features (such as spatial gradients) are computed on a neighborhood

structure and may therefore exhibit local spatial correlations. More

importantly, one should expect to find spatial correlations of features

when modeling textured regions. However, one can capture certain spa-

tial correlations in the above model by computing appropriate features

such as the structure tensor.

Maximization of the a posteriori probability (8) is equivalent to

minimizing its negative logarithm. Integrating the regularity constraint

(9) and the region-based image term (11), we end up with following

energy:

E({Ω1,...,ΩN}) = −

?

i

?

Ωi

logpi(f(x))dx + ν |C|.

(12)

In the context of intensity segmentation (i.e. f = I), this energy is the

basis of several works [49, 100, 80, 66]. The region statistics are typically

computed interlaced with the estimation of the boundary C [100], yet

one can also compute appropriate intensity histograms beforehand [66].

In this paper, we will focus on the case that distributions and seg-

mentation are computed jointly. Distributions can be either modeled

as parametric or non-parametric ones. Upon insertion of parametric

representations for piwith parameters θi, the energy (12) takes on the

5In Section 7, we will consider a generalization in which the underlying random

processes are assumed to be space-varying. The distributions pi in (11) then contain

an explicit space dependency pi(f(x),x) which allows to model spatially varying

statistical distributions of features.

IJCV.tex; 19/09/2005; 19:10; p.8

Page 9

9

form

E({Ωi,θi}i=1..N) = −

?

i

?

Ωi

logp(f(x)|θi)dx + ν |C|.

(13)

For particular choices of parametric densities, the optimal parame-

ters can be expressed as functions of the corresponding domains and

only the regions remain as unknowns of the new energy

?

such that

−

In this case, the optimal model parametersˆθi typically depend on

the regions Ωi. As pointed out by several authors [85, 81, 1], this

region-dependence can be taken into account in the computation of

accurate shape gradients. Exact shape gradients can also be applied

with non-parametric density estimation techniques like the Parzen win-

dow method [47, 73, 40]. In [75], it is shown that no additional terms

arise in the shape gradient if the distributions pi are assumed to be

Gaussian. And in [39], the authors point out that the additional terms

are negligible in the case of Laplacian distributions.6We will therefore

neglect higher-order terms in the computation of shape gradients and

simply perform an alternating minimization of the energy (13) with

respect to region boundaries and region models.

ˆE({Ωi}i) ≡ min

{θi}E({Ωi,θi}i) = −

i

?

Ωi

logp(f(x)|ˆθi)dx + ν |C|, (14)

ˆθi= argmin

θ

?

Ωi

logp(f(x)|θ)dx

.

(15)

3.2. Two-Phase Level Set Formulation

Let us for the moment assume that the solution to (13) is in the class of

binary (two-phase) segmentations, i.e. a partitioning of the domain Ω

such that each pixel is ascribed to one of two possible phases. Extending

the approach of Chan and Vese [12], one can implement the functional

(13) by:

?

Ω

where Hφ denotes the heaviside step function defined as:

?1

E(φ,{θi})=−

Hφlogp(f|θ1)−(1−Hφ)logp(f|θ2)+ν |∇Hφ|dx, (16)

Hφ ≡ H(φ) =

if φ ≥ 0

else0

.

(17)

6For a recent study of various noise models on level set segmentation see [61].

IJCV.tex; 19/09/2005; 19:10; p.9

Page 10

10

The first two terms in (16) model the areas inside and outside the

contour while the last term represents the length of the separating

interface.

Minimization is done by alternating a gradient descent for the em-

bedding function φ (for fixed parameters θi):

?

|∇φ|

with an update of the parameters θiaccording to (15). In practice, the

delta function δ is implemented by a smooth approximation – cf. [12].

∂φ

∂t= δ(φ)

ν div

?∇φ

?

+ logp(f(x)|θ2)

p(f(x)|θ1)

?

.

(18)

3.3. Multiphase Level Set Formulation

Several authors have proposed level set formulations which can handle

a larger number of phases [98, 96, 66, 8]. These methods use a separate

level set function for each region. This clearly increases the computa-

tional complexity. Moreover, numerical implementations are somewhat

involved since the formation of overlap and vacuum regions needs to be

suppressed. By interpreting these overlap regions as separate regions,

Chan and Vese derived an elegant formulation which only requires

log2(n) level set functions to model n regions. Each of the n regions is

characterized by the various level set functions being either positive or

negative. See [93] for details.

3.4. Scalar, Vector and Tensor-valued images

3.4.1. Scalar images

Let us consider a scalar image made up of two regions, the intensities

of which are drawn from a Gaussian distribution:

p(I |µi,σ2

i) =

1

?

2πσ2

i

e

−(I−µi)2

2σ2

i

,i = {1,2}.

(19)

This distribution can be injected in the general bi-partitioning energy

(16). Given a partition of the image plane according to a level set

function φ, optimal estimates for the mean µiand the variance σican

be computed analytically:

and outside region. For fixed model parameters, the gradient descent

µ1=

1

a1

?H(φ)I(x)dxσ2

1=

1

a1

?H(φ)(I(x) − µ1)2dx,

µ2=

1

a2

?(1 − H(φ))I(x)dx,σ2

2=

1

a2

?(1 − H(φ))(I(x) − µ2)2dx.

where a1=?H(φ)dx and a2=?(1−H(φ))dx are the areas of the inside

IJCV.tex; 19/09/2005; 19:10; p.10

Page 11

11

equation for the level set function φ – see (18) – reads

?

|∇φ|

More details on this derivation when the parameters (µi,σi) are taken

as functions of Ωican be found in [76]. We end up with an algorithm

that alternates the estimation of the empirical intensity means and

variances inside each region and the level set evolution described by

equation (20). Regarding the complexity, each iteration of the level

set evolution is applied only inside a narrow band around the zero-

crossing because the Dirac function is equal to zero at other locations.

More interesting is that the statistical parameters can also be updated

with a similar complexity: new updates are functions of their previous

values and of the pixels where the sign of φ changes. Assuming the

evolving interface to visit each pixel only once, the total complexity is

thus linear in the size of the image.

∂φ

∂t= δ(φ)

ν div

?∇φ

?

+(I−µ2)2

2σ2

2

−(I−µ1)2

2σ2

1

+ logσ1

σ2

?

.

(20)

3.4.2. Vector-valued images

A direct extension to vector-valued images is to use multivariate Gaus-

sian densities as region models. Region pdf s are then parameterized

by a vector mean and a covariance matrix. Similarly to the scalar

case, the optimal statistical parameters are their empirical estimates

in the corresponding region. The 2-phase segmentation of an image I

of any dimension can thus be obtained through the following level set

evolution (cf. equation 18):

?

|∇φ|

with:

Like in the scalar case, the estimation of the statistical parameters can

be optimized to avoid a full computation over the whole image domain

at each iteration. Here, it becomes a bit more technical since cross-

components products appear in the covariance matrices but the final

complexity is identical to the one obtained in the scalar case [72].

∂φ

∂t= δ(φ)

ν div

?∇φ

?

+ logp(I(x)|µ2,Σ2

p(I(x)|µ1,Σ2

2)

1)

?

,

(21)

µi=

1

|Ωi|

1

|Ωi|

?

Ωi

?

Ωi

I(x)dx,

Σi=(I(x) − µi)(I(x) − µi)?dx

for

i = 1,2.

(22)

IJCV.tex; 19/09/2005; 19:10; p.11

Page 12

12

3.4.3. Tensor-valued images

In order to apply the above statistical level set framework to the seg-

mentation of tensor images, one needs to define appropriate distances

on the space of tensors. Several approaches have been proposed to define

distances from an information theoretic point of view by interpreting

the tensors as parameterizations of 0-mean multivariate normal laws.

The definition of a distance between tensors is then translated to one

of a dissimilarity measure between probability distributions.

The symmetric KL divergence

Wang and Vemuri [94] applied the symmetrized Kulback-Leibler (SKL)

divergence – also called J-divergence – to define the region term of the

front evolution. For multivariate 0-mean normal laws with covariance

matrices J1and J2, the SKL divergence is given by:

?

where n is the dimension of the tensors. This measure has the advantage

of being affine invariant and closed form expressions are available for

the mean tensors which is particularly interesting to estimate region

statistics. Region confidence were also incorporated in [77]. These works

present several promising segmentation experiments on 2D [94] and 3D

[77] real diffusion tensor images.

D(J1,J2)SKL=1

2

trace

?

J−1

1J2+ J−1

2J1

?

− 2n,

(23)

The Rao Distance

Another distance has been proposed in [52, 51] with the same idea

of considering tensors as covariance matrices of multivariate normal

distributions. Following [84], a Riemannian metric is introduced and

the geodesic distance between two members of this family is given by

?

DG(J1,J2) =

?

?

?1

2

2

?

i=1

log2(λi),

(24)

where λidenote the eigenvalues of the matrix J−1/2

metric was proposed in [68] from a different viewpoint. It verifies the

basic properties of a distance (positivity, symmetry and triangle in-

equality) and it is invariant to inversions: D(J1,J2) = D(J−1

The above metrics permit to define statistics on sets of SPD matrices

which can be used to define the region term of the segmentation. It was

also shown in [51] that the (asymmetric) Kullback Leibler divergence

(23) is a Taylor approximation of the geodesic distance (24).

1

J2J−1/2

1

. The same

1,J−1

2).

IJCV.tex; 19/09/2005; 19:10; p.12

Page 13

13

Figure 1. Curve evolution for the segmentation of a gray-level image using Gaussian

intensity distributions to approximate region information.

In the following sections, we will exploit the statistical level set

framework introduced above to construct segmentation schemes for

color, texture, dynamic texture and motion. To this end, we will con-

sider different choices regarding the features f – namely intensity val-

ues, color values, spatial structure tensors, spatio-temporal image gra-

dients, or features modeling the local spatio-temporal dynamics – and

respective sets of model parameters θi, modeling color or texture dis-

tributions or parametric motion in the separated regions. Moreover, we

will consider different choices for the distributions pi of these model

parameters.

4. Intensity, Color and Texture

In the previous section, we considered Gaussian approximations for

scalar and vector-values images. These models can be used to segment

gray, color and texture images [12, 75, 73]. In the following, these

models are applied to the segmentation of natural images. Curve evo-

lutions are presented to illustrate the gradient descent driving to the

segmentation.

4.1. Gray & color images

In Figure 1, we present the curve evolution obtained with the gray-

value level set scheme of Section 3.4.1. The curve is initialized with

a set of small circles and it successfully evolves toward the expected

segmentation. Other initializations may be considered but using tiny

circles provides a fast convergence speed and helps to detect small

parts and holes. Note that changes of topology during the evolution

are naturally handled by the implicit formulation.

We previously argued that the region-based formulation exhibits less

local minima than approaches which solely rely on gradient information

along the curve. To support this claim, we plotted in Figure 2 the

IJCV.tex; 19/09/2005; 19:10; p.13

Page 14

14

input image1D intensity profileenergy of cut

Figure 2. Comparison of edge- and region-based segmentation methods in 1D. For

a 1D intensity profile of the coin image (taken along the line indicated in white),

we computed the energy associated with a split of the interval at different locations.

While the region-based energy exhibits a broad basin of attraction around a single

minimum located at the boundary of the coin (thick black line), the energy of

the edge-based approach is characterized by numerous local minima (red line). A

gradient descent on the latter energy would not lead to the desired segmentation.

Figure 3. Binary segmentation of a color image using multivariate Gaussian distri-

butions as region descriptor (initialization and final segmentation) and multiphase

color segmentation obtained with the algorithm developed in [8].

empirical energy for the segmentation of a 1D slice of an image. In

contrast to the edge-based energy, the region-based energy (thick black

line) shows a single minimum corresponding to the boundary of the

coin.

The region-based approach can directly be extended to color images

by applying the vector-valued formulation of Section 3.4.2. The only

point to be careful about is the choice of color space for the multivariate

Gaussian model to make sense. The RGB space is definitely not the best

one since, as can be seen from the MacAdam ellipse, the perception

of color difference is nonlinear in this space. The CIE-lab space has

been designed to approximate this nonlinearity by trying to mimic the

logarithmic response of the eye. Figure 3 shows a two-phase and a

multiphase example of vector-valued segmentation obtained on natural

color images using this color space (the algorithm proposed in [8] was

used for the multiphase implementation).

IJCV.tex; 19/09/2005; 19:10; p.14

Page 15

15

Figure 4. Left: Zebra image and color representation of its structure tensor (the

components of the structure tensors are used as RGB components). Right: Intensity

and structure tensor of the Zebra image after coupled nonlinear diffusion.

4.2. Texture

In gray and color image segmentation, pixel values are assumed to be

spatially independent. This is not the case for textured images which

are characterized by local correlations of pixel values. In the following,

we will review a set of basic features which allow to capture these local

correlations. More sophisticated features are conceivable as well [53].

4.2.1. The nonlinear structure tensor as texture feature

While texture analysis can rely on texture samples to learn accurate

models [38, 26, 59, 83, 99], unsupervised image segmentation should

learn these parameters on-line. Since high-order texture models intro-

duce too many unknown parameter to be estimated in an unsupervised

approach, more compact features are usually favored. Big¨ un et al. in

[4] addressed this problem with the introduction of the structure tensor

(also called second order moment matrix) which yields three different

feature channels per scale. It has mainly been used to determine the

intrinsic dimensionality of images in [3, 34] by providing a continuous

measure to detect critical points like edges or corners. Yet, the structure

tensor does not only give a scalar value reflecting the probability of an

edge but it also includes the texture orientation. All these properties

make this matrix a good descriptor for textures. The structure tensor

[34, 4, 70, 55, 36] is given by the matrix of partial derivatives smoothed

by a Gaussian kernel Kσwith standard deviation σ:

Jσ= Kσ∗ (∇I∇I?) =

?

Kσ∗ I2

Kσ∗ Ix1Ix2

x1

Kσ∗ Ix1Ix2

Kσ∗ I2

x2

?

.

(25)

For color images, all channels can be taken into account by summing

the tensors of the individual channels [97].

Despite its good properties for texture discrimination, the structure

tensor is invariant to intensity changes. In order to segment images

with and without texture, a feature vector including the square root of

IJCV.tex; 19/09/2005; 19:10; p.15