Graph cut segmentation with a global constraint: Recovering region distribution via a bound of the Bhattacharyya measure.
-
Citations (0)
-
Cited In (0)
Page 1
Graph Cut Segmentation with a Global Constraint: Recovering Region
Distribution via a Bound of the Bhattacharyya Measure
Ismail Ben Ayed1,2, Hua-mei Chen2, Kumaradevan Punithakumar1, Ian Ross2, and Shuo Li1,2
1GE Healthcare, London, ON, Canada
2University of Western Ontario, London, ON, Canada
Abstract
This study investigates an efficient algorithm for image
segmentation with a global constraint based on the Bhat-
tacharyya measure. The problem consists of finding a re-
gion consistent with an image distribution learned a priori.
We derive an original upper bound of the Bhattacharyya
measure by introducing an auxiliary labeling. From this up-
per bound, we reformulate the problem as an optimization
of an auxiliary function by graph cuts. Then, we demon-
strate that the proposed procedure converges and give a
statistical interpretation of the upper bound. The algorithm
requires very few iterations to converge, and finds nearly
global optima. Quantitative evaluations and comparisons
withstate-of-the-artmethodsontheMicrosoftGrabCutseg-
mentation database demonstrated that the proposed algo-
rithm brings improvements in regard to segmentation accu-
racy, computational efficiency, and optimality. We further
demonstrate the flexibility of the algorithm in object track-
ing.
1. Introduction
This paper addresses image segmentation with a refer-
ence distribution. Based on the optimization of a global
measure of similarity between distributions, the problem
consists of finding a region in an image, so that the distribu-
tion of image data within the region most closely matches
a given model distribution. As several recent studies have
shown [1, 2, 3, 4, 5, 6, 7, 8], the use of global measures
outperforms standard techniques based on pixelwise infor-
mation in the contexts of image segmentation and object
tracking. Furthermore, segmentation withareferencedistri-
bution can yield robust region-based measures for image re-
trieval [1]. Possible measures include the Kullback–Leibler
divergence [6] and the Bhattacharyya measure [2, 3, 4, 7].
However, the latter has shown superior performances over
other criteria [3, 7]. Outstanding theoretical properties of
the Bhattacharyya measure were also studied in information
theory [14], and demonstrate its wide potential for applica-
tions.
Unfortunately, optimization of a global similarity mea-
sure, for instance the Bhattacharyya measure, with respect
to segmentation is NP-hard, and the problem has been com-
monly addressed with local, stepwise optimization proce-
dures. In this connection, segmentation with a reference
distribution has been generally stated as an active contour
optimization via partial differential equations [2, 3, 4, 5, 6].
A gradient flow equation of contour evolution is derived in
order to increase the similarity between the region within
the contour and a given model, thereby leading to a local
optimum at convergence. These methods lead to computa-
tionallyintensivealgorithms1, whichmaylimitsignificantly
their application. Along with an incremental contour evo-
lution, they require a large number of updates of computa-
tionally onerous integrals, namely, the distributions of the
regions defined by the curve at each iteration and the corre-
sponding measures (cf. table 3). Moreover, the robustness
of the ensuing algorithms inherently relies on a user ini-
tialization of the contour close to the target region and the
choice of an approximating numerical scheme of contour
evolution.
Combinatorial graph cut algorithms [10, 11, 12], which
view image segmentation as a label assignment following
the discrete optimization of a cost function, have recently
been of intense interest because they can guarantee global
optima and numerical robustness, in nearly real-time. Sev-
eral studies have shown that graph cut optimization can be
quite effective in various computer vision problems, for in-
stance, image segmentation [16, 17, 18], object tracking
[19], motion estimation [20], visual correspondence [21],
and restoration [11]. Most of existing graph cut segmen-
tation methods optimize a sum over all pixels of pixel or
pixel-neighborhooddependentdataandvariables. Variables
which are global over the segmentation regions have been
generally avoided because they cannot be written in a form
amenable to graph cut optimization. A notable exception
1Active contour segmentation with a reference distribution can be very
slow in practice; it may require up to several minutes on typical CPUs.
1
Page 2
is the Trust Region Graph Cut (TRGC) method of Rother
et al. [1], [9], devoted to co-segmentation of regions of
the same size in image pairs, where the problem of find-
ing an image region consistent with a reference histogram
arises. This work pioneered optimization of the L1norm
of the difference between histograms with graph cuts. In-
spired by trust region methods from continuous optimiza-
tion [13], the method computes a sequence of parametric
linear approximations of the energy and the corresponding
optimal segmentations, so that the energy does not increase.
Optimization of the energy is carried out over the approx-
imation parameter rather than the segmentation. The au-
thors used a submodular-supermodular procedure [15] for
initialization of the sequence, which requires that the en-
ergy is supermodular, and demonstrated that this is the case
for the L1norm. Moreover, they have shown that TRGC
can improve a wide spectrum of research: it outperformed
standard graph cut techniques based on pixelwise infor-
mation in the contexts of object tracking and image seg-
mentation, and yielded promising results in image retrieval.
Unfortunately, the unnormalized histogram depends on the
size of the learning region and, as such, is not a complete
representation of the class of target regions. For instance,
TRGCcannotbereadilyappliedtotrackinganobjectwhose
size varies over an image sequence. In applications where
the size of the target region is different from the size of
the learning region, TRGC requires additional optimiza-
tion/priors with respect to region size [1]. Furthermore, in
information theory, it transpires that the L1norm is not the
best measure of similarity between distributions [14].
Finally, it is worth mentioning the mean-shift tracking
algorithm of Comaniciu et al. [8], where the target object is
represented by an ellipsoidal shape. Given the position of
the target ellipse in a previous frame, this algorithm seeks
a new position in the neighborhood by optimizing a global
similarity measure. The latter is defined over the target po-
sition, and smoothed with a spatial kernel, so that gradient-
based optimization can be carried out in real-time. The cur-
rent study investigates a related, but different variant of the
problem. In our case, the problem consists of finding an
accurate segmentation of regions with arbitrary shape and
position (cf. the examples in Fig. 1). Apart from clear ap-
plication in object tracking, this problem is also very useful
in image segmentation, editing, and retrieval [1].
This study investigates an efficient algorithm for im-
age segmentation with a reference distribution based on the
Bhattacharyya measure. We derive an original upper bound
of the Bhattacharyya measure by introducing an auxiliary
labeling. Fromthisupperbound, wereformulateimageseg-
mentation with a reference distribution as an optimization
of an auxiliary function by graph cuts. Then, we formally
demonstratethattheproposedprocedureconvergesandgive
a statistical interpretation of the upper bound. The proposed
algorithm requires very few iterations to converge (less than
5 iterations), and finds nearly global optima.
tive evaluations and comparisons with TRGC and active
contour optimization on the Microsoft GrabCut segmenta-
tion database (50 images with ground truth segmentations)
demonstrated that the proposed algorithm brings improve-
mentsinregardtosegmentationaccuracy, computationalef-
ficiency, and optimality. We further demonstrate the flexi-
bility of the algorithm in object tracking.
Quantita-
2. Efficient segmentation via an upper bound
on the Bhattacharyya measure
Let Ip = I(p) : P ⊂ R2→ Z ⊂ Rn,n ∈ N, be an
image function from a positional array P to the space Z of
a photometric variable such as intensity or color. The pur-
pose of this study is to seek a region in P, so that the ker-
nel density estimate of image data within the region most
closely matches a learned model distribution M. We state
the problem as the minimization of a discrete cost func-
tion with respect to a binary variable (labeling), Lp =
L(p) : P → {0,1}, which defines a variable partition of
P: RL
gion (foreground), and RL
corresponding to its complement in P (background). The
optimal labeling is sought following the minimization of a
global cost function based on the Bhattacharyya measure.
Before introducing the cost function, we first consider the
following definitions for any binary labeling Lp= L(p) :
P → {0,1}:
•
tribution of image data within region RL
1}:
∀z ∈ Z
where A(R) denotes the number of pixels within region
R: A(R) =?
Kz(Ip) =
(2πσ2)
kernel.
•
the amount of overlap between two distributions f and g:
?
The algorithm consists of finding an optimal labeling
Loptthat minimizes the following cost function over all
possible binary labellings:
1= {p ∈ P/Lp= 1}, corresponding to the target re-
0= {p ∈ P/Lp= 0} = P \RL
1,
PLis the kernel density estimate (KDE) of the dis-
1= {p ∈ P/Lp=
PL(z) =
?
p∈RL
A(RL
1Kz(Ip)
1)
,
(1)
R1. Possible choices of Kzare the Dirac
function, which yields thehistogram, orthe Gaussian kernel
2exp−?z−Ip?2
1
n
2σ2
, with σ the width of the
B(f,g) is the Bhattacharyya coefficient2measuring
B(f,g) =
z∈Z
?
f(z)g(z)
(2)
Lopt= argmin
L:P→{0,1}F(L),
(3)
2Note that the values of B are always in [0,1], where 0 indicates that
there is no overlap, and 1 indicates a perfect match.
Page 3
with
F(L) =
B(L)
? ?? ?
Distribution matching
+
λS(L)
? ?? ?
Smoothness
,
(4)
B(L) = −B(PL,M) = −
?
z∈Z
?
PL(z)M(z),
(5)
and S(L) is a smoothness prior to minimize the length of
the partition boundary [22]:
?
with
?
N is some neighborhood system containing all unordered
pairs {p,q} of neighboring elements of P. λ is a positive
constant that balances the relative contribution of region
and boundary terms. Loptwill yield an optimal region,
RLopt
1
= {p ∈ P/Lopt
RLopt
1
is smooth and the kernel density estimate of image
data within RLopt
1
most closely matches M.
The global term B(L) in equation (5) is not directly
amenable to graph cut optimization because it does not
reference pixel or pixel-neighborhood penalties. It evalu-
ates a global similarity measure between distributions and,
therefore, its optimization is a challenging and NP-hard
problem. To optimize efficiently B, we first propose an
original upper bound of B by introducing an auxiliary
labeling.From this upper bound, we reformulate the
problem as an optimization of an auxiliary function by
graph cut iterations. Then, we formally demonstrate that
the proposed procedure converges and give a statistical
interpretation of the upper bound. To introduce our formu-
lation, let us first introduce the following proposition:
S(L) =
{p,q}∈N
sp,qδLp?=Lq,
(6)
δx?=y=
1 if x ?= y
0 if x = y,
and sp,q=
1
?p − q?
(7)
p
= 1}, so that the boundary of
Proposition 1: Given a fixed (auxiliary) labeling La, for
any labeling L verifying RL
region defined by L is within the foreground region defined
by La, and ∀α ∈ [0,1], we have the following upper bound
of B(L)
B(L) ≤ J(L,La,α) =
p∈RL
1⊂ RLa
1, i.e., the foreground
?
0
mp(0)+(1−α)
?
p∈RL
1
mp(1),
(8)
with mp(0) and mp(1) given for each p in P by
δLa
A(RLa
mp(1) =B(La)
A(RLa
mp(0) =
p?=0
1)
?
B(La) +
?
z∈Z
Kz(Ip)
?
M(z)
PLa(z)
?
1)
(9)
Proof of proposition 1: Because RL
plementary, we can rewrite RLa
1and RL
0are com-
1
as follows
RLa
1
= (RLa
1
∩ RL
1, we have RLa
1) ∪ (RLa
1
∩ RL
0)
(10)
For L verifying RL
Therefore, from equation (10), we can rewrite RL
lows
RL
1
Using this equation, we rewrite the kernel density estimate
in (1) as follows
?
1⊂ RLa
1
∩ RL
1= RL
1as fol-
1.
1= RLa
\ (RLa
1
∩ RL
0)
(11)
PL(z) =
p∈RLa
1
Kz(Ip) −?
p∈RLa
1
∩RL
∩ RL
0Kz(Ip)
0)
A(RLa
1) − A(RLa
∩ RL
1
(12)
Now because A(RLa
following inequality
1
0) is nonnegative, we have the
PL(z) ≥
?
p∈RLa
1
Kz(Ip) −?
?
p∈RLa
1)
0Kz(Ip)
A(RLa
1
∩RL
0Kz(Ip)
A(RLa
=PLa(z) −
p∈RLa
1
∩RL
1)
(13)
Using this lower bound in the Bhattacharyya measure, we
obtain the following upper bound of B(L) (z is omitted as
argument of the distributions to simplify the equations)
?
?
?
B(L) ≤ −
?
?
?
z∈Z
?
?
?
?
?
?
PLa −
?
p∈RLa
1
A(RLa
?
?
∩RL
0Kz(Ip)
1)
?
M
= −
z∈Z
PLaM
1 −
p∈RLa
PLaA(RLa
1
∩RL
0Kz(Ip)
1)
= −
z∈Z
PLaM
?
?
?1 −
p∈RLa
?
1
∩RL
0Kz(Ip)
Kz(Ip)
p∈RLa
1
(14)
Now notice the following inequality for any 0 ≤ x ≤ 1
√1 − x ≥ 1 − x.
Because RLa
1
∩ RL
?
(15)
0⊂ RLa
1, we have
0 ≤
p∈RLa
?
1
∩RL
0Kz(Ip)
Kz(Ip)
p∈RLa
1
≤ 1
(16)
Thus, applying inequality (15) gives
?
p∈RLa
?
?
?1 −
?
p∈RLa
?
1
∩RL
0Kz(Ip)
Kz(Ip)
1
≥ 1 −
?
p∈RLa
?
1
∩RL
0Kz(Ip)
Kz(Ip)
p∈RLa
1
(17)
Page 4
Finally, combining this inequality with (14) gives the fol-
lowing upper bound of B(L)
?
=B(La) +
z∈Z
?
B(L) ≤ −
z∈Z
?
PLaM
?
1 −
?
?
p∈RLa
?
p∈RLa
PLaA(RLa
?
1
∩RL
0Kz(Ip)
Kz(Ip)
p∈RLa
1
?
?
?
PLaM
1
∩RL
0Kz(Ip)
1)
?
PLa(z)
=B(La) +
p∈RL
0
δLa
A(RLa
p?=0
1)
z∈Z
Kz(Ip)
M(z)
(18)
Now notice the following inequality ∀α ∈ [0,1]
B(La) =B(La)A(RLa
?
1
1)
∩ RL
0)
A(RLa
δLa
A(RLa
+B(La)A(RL
A(RLa
1)
1)
≤
p∈RL
0
p?=0B(La)
1)
+ (1 − α)
?
p∈RL
1
B(La)
A(RLa
1)
(19)
Combining (18) and (19) proves proposition 1.
Definition1: A(L,ˆL)iscalledauxiliaryfunctionofcost
function F(L) if it satisfies the following conditions:
F(L) ≤ A(L,ˆL)
A(L,L) = F(L)
Auxiliary functions are commonly used in the Nonnegative
MatrixFactorization(NMF)literatureforoptimization[23].
Rather than optimizing the cost function, one can optimize
iteratively an auxiliary function of the cost function. At
each iteration t, this amounts at optimizing over the first
variable
L(t+1)= argmin
Thus, by definition of auxiliary function and minimum, we
obtain the following monotonically decreasing sequence of
the cost function
(20)
(21)
L
A(L,L(t))
(22)
F(L(t)) = A(L(t),L(t)) ≥ A(L(t+1),L(t)) ≥ F(L(t+1))
Proposition 2: For α = 0, the following function is an
auxiliary function of F(L)
A(L,La,α) = J(L,La,α) + S(L)
Proof of proposition 2: To prove proposition 2, it suf-
fices to verify conditions (20) and (21) for A and F. Condi-
tion (20) follows directly from proposition 1. For Condition
(21), it suffices to see that when La= L, δLa
(23)
(24)
p?=0= 0 ∀p ∈
RL
?
A(L,L,0) =J(L,L,0) + S(L) =
0, i.e., mp(0) = 0 ∀p ∈ RL
p∈RL
0. In this case, we have
0mp(0) = 0. Therefore, we have for α = 0
?
p∈RL
1
mp(1) + S(L)
=
?
p∈RL
=F(L)
1
B(L)
A(RL
1)+ S(L) = B(L) + S(L)
(25)
which verifies condition (21).
Proposition 2 instructs us to consider the following
procedure for minimizing functional F.
Minimization procedure:
begin
• Initialize the auxiliary labeling so that the initial
region corresponds to the hole image domain
La(p) = L(0)(p) = 1 ∀p ∈ P
• Initialize α: α = α0with 0 < α0< 1
repeat
1. Optimize the auxiliary function over L
L(t)= arg
L:RL
min
1⊂RLa
1
A(L,La,α)
2. Update Laby La= L(t)
3. Decrease α: α = αρwith ρ > 1
until Convergence ;
end
Convergence proof:
A(L,La,α) approaches an auxiliary function of cost
function F and, therefore, the above procedure leads to
a monotonically decreasing sequence of F. This comes
directly from (23). Since the cost function is lower bounded
(because the Bhattacharyya measure is upper bounded by
one), the algorithm converges.
When α approaches zero,
Optimization in step 1 with a graph cut: Now notice
that the auxiliary function A(L,La,α) in step 1 of the
optimization procedure is the sum of unary and pairwise
(submodular) penalties. In combinatorial optimization, a
global optimum of such sum can be computed efficiently
in low-order polynomial time with a graph cut by solving
an equivalent max-flow problem [10]. Furthermore, the
condition that the solution should verify RL
be imposed easily by adding a hard constraint [16]. We
used the well-known max-flow algorithm of Boykov and
1⊂ RLa
1
can
Page 5
Method
Error
BMGC
0.24%
TRGC [1]
2.33%
Table 1. Evaluation on the GrabCut database (50 images with
ground truth segmentations):average error for the proposed
method (BMGC) and the Target Region Graph Cut (TRGC)
method in [1]. BMGC yielded a significant improvement in seg-
mentation accuracy.
Kolmogorov [10] for the optimization in step 1.
we omit the details of the max-flow algorithm and hard
constraints. The details of these are well-known in the
literature, and can be found in [10], [16].
Here
Interpretation of the upper-bound optimization and
link to statistical hypothesis testing: For a clear interpre-
tation of the upper-bound optimization, let us assume that
KzistheDiracfunction. Inthiscase, foreachpixelpwithin
the current foreground region, i.e., p ∈ RLa
eration, this corresponds to all the pixels in the image), we
have
1
(at the first it-
Hp=mp(0) − (1 − α)mp(1) =
?
?
M(Ip)
PLa(Ip)+ αB(La)
A(RLa
1)
=
M(Ip)
PLa(Ip)− αB(PLa,M)
A(RLa
1)
(26)
When Hp< 0, the graph cut excludes pixel p from the cur-
rent foreground region so as to decrease the upper bound,
J(L,La,α). This has a clear meaning, and amounts to
a statistical hypothesis testing by an image likelihood ra-
tio test. It evaluates the hypotheses that the image at pixel
p is drawn from model M or from the image distribution
within the current foreground region. If the likelihood ratio
is lower than the following critical value
M(Ip)
PLa(Ip)<
?
αB(PLa,M)
A(RLa
1)
?2
,
(27)
Hp < 0, which results in excluding pixel p from the
foreground region. This makes sense because it results in
decreasing the image distribution within the current fore-
ground region at value Ip, which means a better match with
the model at that value.
3. Experiments
Quantitative evaluation and comparison with TRGC:
To verify the optimality and accuracy of the proposed
method, referred to as BMGC (Bhattacharyya Measure
Graph Cut), we carried out a quantitative evaluation on
the Microsoft GrabCut segmentation database [1] (50 im-
ages with ground truth segmentations into two regions: a
foreground and a background). Similar experiments on the
same data were presented in [1] to evaluate the Trust Region
Graph Cut (TRGC) method, which uses graph cut iterations
to optimize the L1difference between histograms. Given
the model distribution of the target region (foreground)
learned from the ground truth, each image in the database is
segmentedandtheaverageerror, i.e., percentageofmisclas-
sified pixels in comparison to the ground truth, was evalu-
ated to measure the accuracy of the proposed method.
Parameters for all the data (50 images): We fixed the
parameters as follows. λ = 10−5, α0= 0.85, and ρ = 1.1.
The total number of iterations is fixed equal to 4. We used
a trivial initial auxiliary labeling: L(0)(p) = 1 ∀p ∈ P,
i.e., the initial foreground region correspond to the image
domain. Therefore, the proposed method does not require
an initialization. The photometric variable is color specified
in RGB coordinates. A 3-dimensional histogram based on
192 × 192 × 192 bins was used as a density estimate.
Table 1 reports the results: BMGC yielded an average
error equal to 0.24%, whereas the authors in [1] report an
average error equal to 2.33% on the same data set. In com-
parison to TRGC, the proposed method leads to a signifi-
cant improvement in segmentation accuracy. Furthermore,
different from TRGC, the proposed method does not use a
submodular-supermodular procedure [15] for initialization.
Segmentation examples: A representative sample of
the tests we run on the GrabCut segmentation database is
depicted in Figure 1. The image, its segmentation with
BMGC, and the corresponding ground truth are shown, re-
spectively, in the first, second, and third columns. The er-
ror, obtained optimal Bhattacharyya measure corresponding
to the optimal labeling B(PLopt,M), initial Bhattacharyya
measure B(PL(0),M), run time, and image size are given
for each example. The proposed method yielded very ac-
curate segmentations, although in some examples a signifi-
cant overlap (similarity) exists between the foreground and
background distributions. Furthermore, it requires very few
iterations to converge (4 iterations). The example in Figure
2 illustrates the fast convergence of the proposed algorithm.
Optimality: To measure the optimality of the algo-
rithm, we evaluated over the GrabCut database the statis-
tics (mean and std) of the optimal Bhattacharyya measures
obtained with BMGC (refer to table 2 for details). We ob-
tained B(PLopt,M) = 0.9984 ± 0.0016 (B expressed as
mean±std). The obtained optimal energies are very close
to 1, which is the maximum possible value. This demon-
strates that BMGC yields nearly global optima, although it
requires only few iterations.
Comparisons with active contour methods: Using the
Grabcut database, we performed comparisons with gradient
flow active contour optimization, commonly used to tackle
global distribution measures [2, 3, 4, 5, 6, 7]. The results
and computation time/load, reported in table 3, demonstrate
Page 6
Obtained measure B(PLopt,M) (mean ± std)
0.9984 ± 0.0016
Table 2. Evaluation of the optimality of the algorithm on the GrabCut database (50 images): statistics (mean and std) of the optimal
Bhattacharyya measures obtained with BMGC (B(PLopt,M)) and the initial measure (B(PL(0),M)). The optimal measures are very
close to 1, which is the maximum possible value. This demonstrates that BMGC yields nearly global optima with few iterations.
Initial measure B(PL(0),M) (mean ± std)
0.4820 ± 0.1186
Maximum measure
1
MethodBMGC
4
3.51 secs
0.24%
0.9984 ± 0.0016
Active contour optimization [2, 3, 4]
1737
631 secs
2.49%
0.9791 ± 0.01
Number of KDEs per image (mean)
Run time per image (mean)
Error
Obtained Bhattacharyya measure (mean ± std)
Table 3. Comparisons with active contour optimization [2, 3, 4] on the GrabCut database: number of Kernel Density Estimations per
image (mean), run time per image (mean), error (percentage of misclassified pixels), and obtained Bhattacharyya measure (the higher the
measure, the more optimal the solution). BMGC leads to significant improvements in regard to segmentation accuracy, optimality, and
computational efficiency. It relaxes the need of a large number of updates of the kernel densities and the corresponding measures.
that BMGC leads to significant improvements in regard to
computational efficiency, segmentation accuracy, and opti-
mality. BMGC has an important computational advantage
over active contour methods. It leads to a significant de-
crease in computational load because it does not requires
a large number of updates of computationally onerous ker-
nel densities. With BMGC, the solution is reached after 4
Kernel Density Estimations (KDEs), whereas the average
number of KDEs for active contour optimization is 1737
(mean). BMGC took an average run time equal to 3.51 secs
to compute nearly global optima corresponding approxi-
mately to the maximum possible value of the Bhattacharyya
measure, whereas active contour optimization took 631 sec-
onds, converged to a less optimal Bhattacharyya measure,
and yielded a higher error (refer to table 3 for details).
Objecttrackingexamples: TheexampleinFigure3de-
picts the tracking of an object with an arbitrary shape in the
tennis table sequence. Given the model distribution learned
from a manual segmentation of the first frame, the target
object is recovered with BMGC in subsequent frames. In
frame 30, the target object undergoes significant variations
in shape/size in comparison to the first frame. The proposed
method handles implicitly these variations because no as-
sumptions were made as to the size, shape, or position of
the target region.
References
[1] C. Rother,
Blake,
togram Matching–Incorporating a Global Constraint
into MRFs, CVPR, 2006.
V. Kolmogorov,
Cosegmentation of Image Pairs by His-
T. Minka,and A.
[2] I. Ben Ayed, S. Li, and I. Ross, A Statistical Over-
lap Prior for Variational Image Segmentation, Int. J. of
Computer Vision, 85(1): 115-132, 2009.
[3] T. Zhang and D. Freedman, Improving performance
of distribution tracking through background mismatch,
IEEE Trans. on Pattern Anal. and Machine Intell.,
27(2):282-287, 2005.
[4] D. Freedman and T. Zhang, Active contours for track-
ing distributions, IEEE Trans. on Image Processing,
13(4):518-526, 2004.
[5] T. F. Chan, S. Esedoglu, and K. Ni, Histogram Based
Segmentation Using Wasserstein Distances, SSVM
2007, pp. 697-708.
[6] G. Aubert, M. Barlaud, O. Faugeras, and S. Jehan-
Besson, Image Segmentation Using Active Contours:
Calculus of Variations or Shape Gradients?, SIAM Ap-
plied Mathematics, 63(6): 2128-2154, 2003.
[7] O. V. Michailovich, Y. Rathi, and A. Tannenbaum, Im-
age Segmentation Using Active Contours Driven by the
Bhattacharyya Gradient Flow, IEEE Trans. on Image
Processing, 16(11):2787-2801, 2007.
[8] D. Comaniciu, V. Ramesh, and P. Meer, Kernel-based
object tracking, IEEE Trans. on Pattern Anal. and Ma-
chine Intell., 25(5):564-577, 2003.
[9] V.Kolmogorov, Y.Boykov, andC.Rother, Applications
of parametric maxflow in computer vision, ICCV 2007.
[10] Y. Boykov and V. Kolmogorov, An experimental com-
parison of min-cut/max-flow algorithms for energy
minimization in vision, IEEE Trans. on Pattern Anal.
and Machine Intell., 26(9):1124-1137, 2004.
[11] Y. Boykov, O. Veksler, and R. Zabih, Efficient
Approximate Energy Minimization via Graph Cuts,
IEEE Trans. on Pattern Anal. and Machine Intell.,
20(12):1222-1239, 2001.
Page 7
ImageSegmentation with BMGCGround truthMeasures
error = 0.04%
B(PLopt,M) = 0.9989
B(PL(0),M) = 0.3779
run time: 2.59 secs
size: 321 × 481
error = 0.13%
B(PLopt,M) = 0.9973
B(PL(0),M) = 0.4741
run time: 3.84 secs
size: 600 × 450
error = 0.05%
B(PLopt,M) = 0.9991
B(PL(0),M) = 0.4287
run time: 2.91 secs
size: 481 × 321
error = 0.19%
B(PLopt,M) = 0.9991
B(PL(0),M) = 0.5885
run time: 2.80 secs
size: 321 × 481
error = 0.16%
B(PLopt,M) = 0.9968
B(PL(0),M) = 0.5543
run time: 2.81 secs
size: 321 × 481
error = 0.09%
B(PLopt,M) = 0.9988
B(PL0,M) = 0.3704
run time: 3.72 secs
size: 600 × 450
Figure 1. A sample of the segmentations obtained with the proposed method. First column: the image and the segmentation boundary
obtained with BMGC, depicted with the red curve. Second column: foreground region obtained with BMGC. Third column: ground truth.
The models are learned from the ground truth as in [1]. For all the images, we used a trivial initial labeling: L(0)(p) = 1 ∀p ∈ P.
λ = 10−5, α0 = 0.85, and ρ = 1.1. The total number of iterations is fixed equal to 4. The photometric variable is color specified in RGB
coordinates. A 3-dimensional histogram based on 192 × 192 × 192 bins was used as a density estimate. BMGC yielded accurate results,
although in some examples a significant overlap (similarity) exists between the foreground and background distributions.
Page 8
Initialization
B(PL(0),M) = 0.5334
Figure 2. Illustration of the fast convergence of the algorithm: evolution of the segmentation and optimized Bhattacharyya measure within
the first three iterations. In the third iteration, the algorithm yielded a Bhattacharyya measure very close to 1, which is the maximum
possible value. α0 = 0.85 and ρ = 1.1.
Iteration 1Iteration 2Iteration 3
B(PL(1),M) = 0.85934
B(PL(2),M) = 0.9979
B(PL(3),M) = 0.99919
frame 1 (learning)frame 5frame 8 frame 11frame 13
frame 17 frame 20frame 24frame 27 frame 30
Figure 3. Tracking of an object with an arbitrary shape in the table tennis sequence. Given the model learned from a manual segmentation
of frame 1, the object is recovered with BMGC in subsequent frames (green curves). We used a trivial initial labeling for each test frame:
L(0)(p) = 1 ∀p ∈ P. The photometric variable is color specified in RGB coordinates. A 3-dimensional kernel density estimate of the
distribution was computed using 32 × 32 × 32 bins and a kernel width σ = 2/32. ρ = 3, α = 0.85, and λ = 2.5 × 10−5. Number
of iterations per frame: 4. Run time/frame=1.51 secs. Size: 240x352x30 frames. In frame 30, the target object undergoes significant
variations in shape/size in comparison to the first frame. The proposed method handles these variations because no assumptions were made
as to the size, shape, or position of the object.
[12] V. Kolmogorov and R. Zabih, What Energy Functions
canbeMinimizedviaGraphCuts?, IEEETrans.onPat-
tern Anal. and Machine Intell., 26(2):147-159, 2004.
[13] D. P. Bertsekas, Nonlinear Programming, Athena Sci-
entific, 2nd edition, 1999.
[14] F. Aherne, N. Thacker, and P. Rockett, The Bhat-
tacharyya metric as an absolute similarity measure for
frequency coded data, Kybernetika, 32(4):1-7, 1997.
[15] M. Narasimhan and J. Bilmes, A supermodular-
submodular procedure with applications to discrimina-
tive structure learning, In UAI, July 2005.
[16] Y. Boykov, and G. Funka-Lea, Graph Cuts and Effi-
cient N-D Image Segmentation, Int. J. of Computer Vi-
sion, 70(2):109-131, 2006.
[17] C. Rother, V. Kolmogorov, and A. Blake, Grabcut-
interactive foreground extraction using iterated graph
cuts, SIGGRAPH, ACM Trans. on Graphics, 2004.
[18] X. Liu, O. Veksler, and J. Samarabandu, Graph Cut
with Ordering Constraints on Labels and its Applica-
tions, CVPR, 2008.
[19] J. Malcolm, Y. Rathi, and A. Tannenbaum, Multi-
Object Tracking Through Clutter Using Graph Cuts,
ICCV, 2007.
[20] V. Lempitsky, S. Roth, and C. Rother, FusionFlow:
Discrete-Continuous Optimization for Optical Flow Es-
timation, CVPR, 2008.
[21] J. Kim, V. Kolmogorov, and R. Zabih, Visual Corre-
spondence Using Energy Minimization and Mutual In-
formation, ICCV, 2003.
[22] Y. Boykov and V. Kolmogorov, Computing geodesics
and minimal surfaces via graph cuts, ICCV, 2003.
[23] D.D.LeeandH.S.Seung, Algorithmsfornonnegative
matrix factorization, Advances in Neural Information
Processing Systems (NIPS) 2002, 13: 556-562.
View other sources
Hide other sources
-
Available from Shuo Li · 24 Dec 2012
-
Available from lhsc.on.ca