Conference PaperPDF Available

A Joint Illumination and Shape Model for Visual Tracking

Authors:

Abstract

Visual tracking involves generating an inference about the motion of an object from measured image locations in a video sequence. In this paper we present a unified framework that incorporates shape and illumination in the context of visual tracking. The contribution of the work is twofold. First, we introduce a a multiplicative, low dimensional model of illumination that is defined by a linear combination of a set of smoothly changing basis functions. Secondly, we show that a small number of centroids in this new space can be used to represent the illumination conditions existing in the scene. These centroids can be learned from ground truth and are shown to generalize well to other objects of the same class for the scene. Finally we show how this illumination model can be combined with shape in a probabilistic sampling framework. Results of the joint shape-illumination model are demonstrated in the context of vehicle and face tracking in challenging conditions.
A Joint Illumination and Shape Model for Visual Tracking
Amit Kale and Christopher Jaynes
Ctr. for Visualization and Virtual Environments and Department of Computer Science
University of Kentucky
{amit,jaynes}@cs.uky.edu
Abstract
Visual tracking involves generating an inference about
the motion of an ob ject from measured image locations in
a video sequence. In this paper we present a unified frame-
work that incorporates shape and illumination in the con-
text of visual tracking. The contribution of the work is
twofold. First, we introduce a a multiplicative, low di-
mensional model of illumination that is defined by a linear
combination of a set of smoothly changing basis functions.
Secondly, we show that a small number of centroids in this
new space can be used to represent the illumination condi-
tions existing in the scene. These centroids can be learned
from ground truth and are shown to generalize well to other
objects of the same class for the scene. Finally we show
how this illumination model can be combined with shape
in a probabilistic sampling framework. Results of the joint
shape-illumination model are demonstrated in the context
of vehicle and face tracking in challenging conditions.
1. Introduction
Visual tracking involves generating an inference about
the motion o f an object from measured image locations in
a video sequence. Unfortunately, this goal is confounded
by sources of image appearance change that are only partly
related to the position of the object in the scene. For exam-
ple, unknown deformations, changes in pose of the object,
or changes in illu mination can cause a template to ch ange
appearance over time and lead to tracking failure.
Shape change for rigid objects can be captured by a low-
dimensional shape space under a weak perspective assump-
tion. Thus tracking can b e considered as the statistical in-
ference of this low-dimensional shape vector. This inter-
pretation forms the basis of several tracking algorithms in-
cluding the well-known Condensation algorithm [7] and its
variants. A similarly concise model is required if we are to
This work was funded by NSF CAREER Award IIS-0092874
and by Department of Homeland Security
−4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(a) (b)
Figure 1. Tracking a car across drastic illumination change. (a)
A template constructed for the vehicle in sunlight will change ap-
pearance as it enters shadow and traditional shape-tracking fails.
(b) The histogram of the zeroth coefficient of our illumination
model. This work shows how the modes of these distributions are
sufficient to accurately track though both shape and illumination
change.
robustly estimate illumination changes in a statistical track-
ing framework while avoiding undue increase in the dimen-
sionality of the problem. This is the topic of this paper.
The study of appearance change as a function of illumi-
nation is a widely studied area in computer vision [2, 11, 1].
These methods focus on accurate models of appearance un-
der varying illumination and their utility for object recogni-
tion. However they typically require an explicit 3-D model
of the object which somewhat limits their application to
surveillance applications. A general yet low-dimensional
parameterization of illumination has thus far been elusive
in a tracking context.
In this work we focus on the problem of tracking objects
through simultaneous illumination and shape change. Ex-
amples include monitoring vehicles that move in and out
of shadow or tracking a face as it moves thorough differ-
ent lighting conditions in an indoor environment. The ap-
proach is intended for use in traditional video surveillance
and monito ring tasks where a large number of illumina tion
samples of each object to be tracked are unavailable [6]and
features th at are con sidered to be invariant to illumination
are known to be unreliable [2].
The contribution of the work is twofold. First, we intro-
duce a a multiplicative, low dimensional model of illumi-
nation that is computed as a linear combinatio n of a set of
Legendre functions. Such a multiplicative model can be in-
terpreted as an ap proximation of illumination image as dis-
cussed in Weiss [12]. Although the model is not intended
to be applied for recognition tasks under differing illumina-
tion, it is sufficient to capturing appearance variability for
improved tracking. The Legendre coefficients together with
the shape vectors define a joint shape-illumination space.
Our a pproach then is to estimate the vector in this jo int
space that best transforms the template to the current frame.
This is in contrast to app roaches that adapt the temp late over
time by modifying a continuously varying density [3, 13].
Direct adaption of the template requires careful selection of
adaption parameters to avoid problems of drift [10].
In alternative formulation of the problem Freedman and
Turek [4] introduce an illumination invariant approach to
computing optic-flow that can be used to localize object
templates. The method was shown to be qu ite robust at
tracking objects through shadows. However it is computa-
tionally expensive and it is unclear how known system dy-
namics can be integrated within the approach. We do not
seek illumination invariance but instead estimate the illumi-
nation changes using our model as part of the tracking pro-
cess. However, use of illumination invariant optic flow as a
low-level primitive could be used in combination with the
work here to inform the shape space sampling distributions
and is the subject of future work.
When using this joint shape illumination space for track-
ing, it is no longer obvious how this space should be sam-
pled. For example, Figure 1a shows a vehicle that moves
from bright sunlight to shadow. Because this transition
can occur instantaneously between frames, the smoothness
assumptions th at are used to derive the sampling distribu-
tion for shape cannot are often violated for the illumina-
tion component. Furthermore, the additional degrees-of-
freedom that it are required to model illu mination can lead
to decreased robustness at runtime o r require an inordinate
number of tracking samples in each frame. However, we
discover the surprising result that a small number of cen-
troids extracted from the underlying distributions of our il-
lumination coefficients are o ften adequate to represent the
influence of most of the illumination conditions existing in
the scene. Figure 1b shows a d istribution o f the zeroth or-
der coefficient in our model for the car moving through the
scene in Figure 1a. In Section 3.1 we discuss how important
modes of these distributions are extracted and used to track
through drastic illumination changes such as these.
2. A Multiplicative Model of Appearance
Change due to Illumination
The image template throughout the tracking sequence
can be expressed as:
U
t
(x, y)=L
t
(x, y)R(x, y) (1)
where L
t
(x, y) denotes the illumination image in frame t
and R(x, y) denotes a fixed reflectance image [12]. Thus
if the reflectance image of the object is known, tracking be-
comes the p roblem of estimating the illumin a tion image and
a shape-vector.
Of course, the reflectance image is typically unavailable
and the illumination image can only be computed modulo
the illumination contained in the image template shown in
Equation 2.
L
t
=
˜
L
t
L
0
R(x, y) (2)
where L
0
is the initial illumination image and
˜
L
t
is the un-
known illumina tion image for frame t.
Our proposed model of appearance change, then, is sim-
ply the product of the input image with a function f
t
(x, y)
that approximates L
t
and is defined over the image domain,
P ×Q. A naive way of compensating for appearance change
then is to allow each f(x, y),x =1, ··· ,P,y =1, ··· ,Q
to vary independently. However, it is known that for a con-
vex Lambertian object, the change in appearance of neigh-
boring pixels is not independent and the excessive addi-
tional degrees-of-freedom can make the tracking problem
intractable.
Instead we construct the illumination compensation im-
age f from a linear combination of a far lower dimensional
set of n basis functions. In order to be useful, the basis func-
tions must be both both orthogonal in the 2D image domain
and straightforward to compute. Furthermore they must be
capable of spanning most of the appearance changes in the
template due to illumination. For the work here we utilize
the Legendre polynomial basis although any other polyno-
mial basis will suffice. To give an idea about the type of
variation the basis supports, Figure 2 shows the Legendre
basis of order three.
Let p
n
(x) denote the nth Legendre basis function. Then,
for a given set of coefficients Λ=[λ
0
, ···
2n
]
T
,the
scaled intensity value at a pixel is computed as:
ˆ
U(x, y)=(
1
2n +1
(λ
0
+ λ
1
p
1
(x)+···+ λ
n
p
n
(x)+ (3)
λ
n+1
p
1
(y)+···+ λ
2n
p
n
(y)) + 1)U(x, y)
For purposes of notation, we will denote the effect of Λ on
the image as
ΛU U PΛ+U (4)
where
0
10
20
30
40
50
60
70
80
90
100
0
50
100
150
0
0.5
1
1.5
2
2.5
f(X,Y)
X
Y
0
10
20
30
40
50
60
70
80
90
100
0
50
100
150
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
X
Y
f(X,Y)
0
10
20
30
40
50
60
70
80
90
100
0
50
100
150
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
X
Y
f(X,Y)
0
10
20
30
40
50
60
70
80
90
100
0
50
100
150
0.9
0.95
1
1.05
1.1
1.15
X
Y
f(X,Y)
0
10
20
30
40
50
60
70
80
90
100
0
50
100
150
0.9
0.95
1
1.05
1.1
1.15
X
Y
f(X,Y)
0
10
20
30
40
50
60
70
80
90
100
0
50
100
150
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
X
Y
f(X,Y)
0
10
20
30
40
50
60
70
80
90
100
0
50
100
150
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
X
Y
f(X,Y)
Figure 2. First seven Legendre basis functions used to track illumination change in an image template.
P =
1
2n+1
p
0
···
1
2n+1
p
n
(y
1
)
.
.
.
.
.
.
.
.
.
1
2n+1
p
0
···
1
2n+1
p
n
(y
PQ
)
. (5)
We dene as an operator that scales the rows of P with
the corresponding element of U written as a vector. Given
an input template T andanimageU, the Legendre coeffi-
cients that minimize the error between .ΛU and T can be
computed by solving the least squares problem,
U P Λ T U. (6)
Each of the basis functions is scaled by a particular
choice of Λ
i
and the n linearly combined using Eq uation 4
to derive a illumination image.
Figure 3 demonstrates how how this low-dimensional
set of Legendre polynomials can accommodate illumination
change. Figure 3a is an input template and Figure 3bisthe
same image template relit from a different direction. Us-
ing an a least squares fit for Λ, a new image that is more
similar in appearance to the target image is generated (see
Figure 3c).
3. A Joint-space of Illumination and Shape for
Tracking
For the sake of generality, we assume an N
S
-
dimensional shape space and a N
λ
-dimensional illumina-
tion space that results in a joint space A
L
= L(W,T,).
that maps a joint shape and appearance vector X
A
R
N
S
+N
λ
:
X
A
=
X
Λ
(7)
to a deformed and relit template, U R
N
T
:
U =[Λ] [I (WX + T )] . (8)
(a) (b) (c)
Figure 3. An example of illumination compensation using a low-
dimensional multiplicative model. (a) Input template. (b) Input
image under new illumination. (c) Synthesized image that is the
product of illumination basis functions with the input.For this ex-
ample a third order Le gendre polynomial was used and the Legen-
dre coefficients were computed using (6).
W denotes a N
T
×N
S
shape matrix. The constant offset
T denotes the template against which shape variations are
measured. No such offset is required for the illumination
component. I(·) sim ply refers to the image intensities mea-
sured on the shape grid implied by the shape component of
X
A
.
The proposed joint shape-illumination space can be sam-
pled sequentially to track objects through a range of shape
and illumination changes. This is best acco mplished in a
robust way using a particle filter framework. Particle filters
(PF) are very widely studied in computer vision and differ-
ent variants of its implementation exist [13, 9]. Two im-
portant components of a PF include a state evolution model
p(X
A
t
|X
A
t1
) and an observation model p(Y
t
|X
A
t
).ThePF
tracker approximates the posterior density, p(X
A
t
|Y
1:t
) with
a set of weighted particles {(X
A
)
j
t
,w
j
t
} with
M
j=1
w
j
t
=
1. The likelihood p(Y
t
|X
A
t
) of a particular hypothesis and
in the case of the joint shape-appearance model is computed
using the transformed image and the template. A likelihood
measure on the joint shape-illumination hypothesis X
A
i
is
computed as the sum of absolute difference (SAD) between
U and T .
The other component of PF tracking is the specification
of p(X
A
t
|X
A
t1
). Typically a Gauss-Markov model is as-
sumed, whereby X
A
t+1
∼N(X
A
t
,V). In the absence of
any knowledge about the expected range of motion and il-
lumination change, a brute force approach is required and
the variance on the normal distribution of each component
in X
A
is set to a high value. This necessitates an u nreason-
able increase in the number of particles in order to maintain
reliable tracking and such an approach is now m ore likely
to suffer from local minima. With the addition a l dimen-
sions that the new model implies, the problem can be even
more formidable than traditio nal shape tracking where re-
cent work has studied how more informed sampling distri-
butions for shape tracking can be derived [8]. In the follow-
ing sectio n we outline how meaningful sampling densities
for illumination can be learned from a few examples and
show th at these densities can in fact are degenerate. As a
result, the new model can be represented by several cen-
troids in the Legendre basis.
3.1. Learning Sampling Distributions for Illumina-
tion and Shape
We assume that we have a static camera acquiring im-
ages of a scene and that the illumination conditions, al-
though variable within the scene, do not change signifi-
cantly over time. Ground truth video sequences consisting
of a starting template T and its location and shape in sub-
sequent frames, {U
1
, ··· ,U
N
} are used to compute shape-
vectors {X
1
, ··· ,X
N
} corresponding to this motion. Fur-
thermore, a set of Legendre coefficents {Λ
1
, ··· , Λ
N
} that
best map {U
1
, ··· ,U
N
} to T are computed via standard
least squares fitting (6).
The shape sampling distribution h(X) must model the
incremental motion between frames. For smooth motions,
shape distributions can be computed from shape difference
vectors { X
2
X
1
, ··· ,X
N
X
N1
}. Standard kernel-
density methods can then be used for estimating a sam-
pling distribution from using these differences. Alterna-
tively a uniform density U(a, b) corresponding to the max-
imal ranges of state components can be used as a simple
approximation of h(X).
In the case of our new illumination model, sampling dis-
tributions in the Legendre space must be estimated. I t is
natural to consider whether a differential model similar to
the one used for X is suitable in this regard. Figure 6
illustrate the problem with such an approach for the illu-
mination space. Although components of shape space are
more or less smoothly monotonic (Figure 4c), this is not the
case for the illumination coefficents. For example, the first
coefficents, λ
0
, changes dramatically as the subject moves
through differing illumination. The result is a trajectory that
cannot be modeled by considering discrete differences (Fig-
ure 4d).
(a) (b)
0 10 20 30 40 50 60 70
−100
−50
0
50
100
150
200
250
300
Frames
X(5) = t
x
0 10 20 30 40 50 60 70
−3
−2.5
−2
−1.5
−1
−0.5
0
Frames
lambda
0
(c) (d)
Figure 4. Difficulty of using a differential model for building a
sampling distribution for illumination (a) and (b) show images of
a person walking in a hallway towards the camera (c) shows the
y-translation component of X and (d) shows the λ
0
coefficient of
Λ as a function of time. As can be seen even for smooth motions,
the illumination component displays discontinuities.
(a) (b)
(c) (d)
Figure 5. Our approach is motivated by the fact that certain domi-
nant illumination conditions can be quantized into a few centroids
in the illumination space. For example, in this scene some of the
salient illumination conditions are: (a) subject is diffusely lit from
above (b) subject passes thorough shadow, (c) subject strongly lit
from the side, and (d) subject in darker region of room near cam-
era.
One approach to this problem is to identify subregions of
monotonicity and then build a mixture of distributions us-
ing discrete differences that are particular to each. However
one direct consequence of using these distributions is that
the number of particles needed to span the corresponding
regions in illumination space will be extremely large adding
an additional computational burden on top o f the traditional
shape space sampling. Clearly a more efficient way of sam-
pling the illumination space must be found if the resulting
algorithm is to be useful.
0 10 20 30 40 50 60 70
5
10
15
20
25
30
35
40
45
50
55
Frames
Error
With No Illumination Compensation
With Exact Least Squares Fit
Using K−means
Using Random Compensation
Figure 6. A plot of the SAD error as a function of time. The red
dashed line represents the situation with no illumination compen-
sation. The blue dash-dotted line represents t he compensation with
the least squares fit for Λ The green solid line represents compen-
sation with the vector quantized values of the l east square fits. The
black crossed line represents compensation with a random Λ.
Although the underlying distribution of Λ is of course
continuous, we can discard much of this information in fa-
vor of tracking robustness by seeking the most important
illumination modes that are present in the distribution. This
step is motivated by the observation that a scene is typi-
cally composed of a discrete set of illumination conditions.
For example, the underlying illumination distribution for
the scene shown in Figure 4 arises from certain salient il-
lumination con ditions in the scene as shown in Fig ure 5.
In order to achieve an efficient sampling of the il-
lumination space we perform a kmeans clustering
{Λ
1
, ··· , Λ
N
} and use the k centroids c
1
, ··· ,c
k
as a rep-
resentation of the illumination space.
To demonstrate that clustering in this way does not de-
grade our ability to track, we studied many face track ing ex-
amples under different illumination conditions. The results
support the claim that only a few modes are needed instead
of the entire distribution. For example, Figure 6 show the
SAD score achieved for a typical face tracking process us-
0 10 20 30 40 50 60
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
Frames
Value of Lambda component
0 10 20 30 40 50 60
−4
−3
−2
−1
0
1
2
Frames
Value of Lambda component
0 10 20 30 40 50 60
−2
−1.5
−1
−0.5
0
0.5
1
1.5
Frames
Value of Lambda component
0 10 20 30 40 50 60
−1
0
1
2
3
4
Frames
Value of Lambda component
Figure 7. The range of Λ as a function of time shown for two
individuals for a 2nd order Legendre polynomial t. The similarity
in the range of values taken by different components of Λ can be
seen from the plot. As a consequence the centroids describing the
illumination conditions in a scene for different people are close.
ing several different approaches. Both random selection of
Legendre coefficents and no compensation lead to high er-
ror. More importantly, the plot shows nearly no difference
between exact least-squares fits of a second order Legen-
dre that utilizes only six centroids discovered v ia kmeans
clustering process.
This result is typical of most situations and a rate-
distortion study found that k =6is adequate to represent
the variability of Λ for our indoor surveillance senario. Fig-
ure 7 shows the result of least-squares fits to the first four
Legendre coefficents for two different subjects. Note that
the rang e of variability is nearly the same for both subjects
justifying our use of the same centroids to represent several
subjects from the same scene.
These results allows us to coarsely sample the illumi-
nation space with minimal impact on the tracking results
while retaining the ab ility to generalize to previously un-
seen objects within the same class. This requires only minor
modification to the standard particle filter that incorporates
the k illumination clusters. Specifically fo r every particle j,
drawn from h(X),wesamplei from {1, ··· ,k} with prob-
ability
1
k
and compute
U =[Λ
ci
]
I
WX
j
t
+ T

(9)
before measuring the SAD distance. The new algorithm,
then, combines traditional shape tracking with our multi-
plicative model of illumination compensation. Table 1 sum-
marizes the joint shap e-illumination tracking algorithm.
Given: an estimate of shape sampling d istributions,
h(X) and k cluster centers, c
1
, ..., c
k
in the
illumination basis.
1. Initialize sample set X =
X
j
0
, 1/M
2. For t =1, ··· ,T
3. For j =1, ··· ,M
4. Generate X
j
t
from X
j
t1
using h(X)
5. Compute transformed image regions in
accordance with shape vectors X
j
t
6. Pick an i from {1, ··· ,k} with probability
1
k
7. Compute U using (9)
8. Compute likelihood p(Y
t
|X
j
t
) by
measuring the SAD distance between
U and T
9. End
10. Importance resample {X
j
t
}
based on {p(Y
t
|X
j
t
)} to get {X
j
t
}
11. End
Table 1. The Particle Filter using the new shape-illumination
space.
4. Experimental Results
We now d emonstrate the utility of the joint shape-
illumination model in two different scenarios. The results
discussed here are indicative of results the system achieved
for many such sequences. For example, in the car sequence
twenty cars were successfully tracked over a period of two
hours
1
In each case we follow the procedure described in
Section 3.1 to establish sampling distributions in the joint-
space over some set of training samples. Tracking was then
performed using 200 particles on new objects using the al-
gorithm in Table 1.
The car dataset was generated from a camera observ-
ing a road from above as cars approach an intersection and
move in and out of shadow. Two sequences were used to
acquire the sampling distributions. Training involved mark-
ing locations of the moving car in successive frames. Using
these locations the corresponding shape-vector was com-
puted. We used a 3-D shape space that spans scaling and
translations in X and Y. Using the maximal values of the
shape difference vectors, a uniform distribution over the
corresponding range was computed for each shape compo-
nent. Using the least squares method (see Section 6)we
fit different orders of Legendre polynomials and computed
the resulting SAD error. We found that a first order Leg-
endre polynomial was adequate to capture the illumination
change in this case where the object is more or less planar.
The kmeans clustering process yields two centers {c
1
,c
2
}
1
The cars were arbitrarily picked in the sequence and initial locations
of the cars were hand extracted and passed on to the tracker
that were then u sed to r epresent the discretized illuminatio n
space.
Figure 8 shows tracking results for a car using the joint
shape-illumination tracker. The white square corresponds
to the MAP estimate for that frame. T he new tracking al-
gorithm is compared to a tradition a l particle filter that does
not encompass illumination change (Figur e 8 bottom row).
The same shape sampling d istributions were used by both
algorithms.
The particle filter tracks the template well as long as
the illumination conditions that existed when the template
was captured remain unchanged. However, at the shadow
boundary the traditional tracker fails. On the other hand,
the new illu mination model captures this appearance change
and the joint shape-illumination likelihoods remain high for
the correct estimate via the additional degree-of-freedom af-
forded b y the illumination model.
A second dataset contained several different subjects
moving through different illuminations in an indoor envi-
ronment. The illumination conditions in this case were sig-
nificantly more complex than the vehicle tracking dataset.
Sunlight through window and different light sources (i.e.
fluorescent overhead lamps and incandescent desk lights)
persist throughout the space making the dataset very chal-
lenging. In fact, to test the algorithm a strong diffuser
lamp was placed in a room to generate strong side light-
ing (see Figure 9). Ground truth was again g enerated from
two different sequences. A second order Legendre poly-
nomial was chosen for the illumination component. Using
rate-distortion studies as discussed in Section 3.1, we found
that around six clusters were required to capture the vari-
ability in the scene. Here we discuss tracking results when
six clusters were used. Using more centroids does not lead
to degradation of the results, however it requires an addi-
tional number of particles.
Figure 9 shows two different subjects moving through
various illumination conditions as they approach a surveil-
lance camera. These sequences are typical for this setup and
only three frames are shown in the interest of space.
Figure 10 shows the initial template for each subject and
the illumination image generated by the illumination cen-
troid associated with the MAP estimate. This illumination
image was multiplied to the grid indicated by the shape vec-
tor in the frames shown in Figure 9. As can be seen these
illumination images are able to compensate for the illumi-
nation changes in the sequence.
5. Conclusions and Future Work
In this paper we presented an approach to track across
shape and illumination change. We introduced a low-
dimensional multiplicative model of illumination change
that is expressed as a linear combination of a Legendre ba-
sis. We demonstrated how this n ew model is capable of
Figure 8. Example of tracking a car through drastic illumination changes. The bottom row shows the result using a conventional particle
filter while the top row shows the result using our algorithm.
(a) (b) (c)
(d) (e) (f)
Figure 9. Example of tracking faces in an indoor setting. The illumination conditions existing in this scenario are significantly more
complex than those in the vehicle tracking situaion.
capturing appearance change in the tracked template. We
showed how the Legendre coefficients can be combined
with the shape vector to define a new shape-illumination
space. We discovered that in this new illumination space,
a small number of centroids suffice to capture illum ination
changes in particular scenario. We showed how to estimate
these centroids and incorporate them in the particle filter-
ing framework at run time without adding excessive com-
putational burden. We demonstrated the utility of our ap-
proach for both vehicle and face tracking scenario. One of
the assumptions in our work is that the initial tem plate in
the training and testing sequences are acquired under sim-
10 20 30 40 50 60
10
20
30
40
50
60
70
80
90
5 10 15 20 25 30 35
5
10
15
20
25
30
35
40
45
10 20 30 40 50 60
10
20
30
40
50
60
70
80
90
5 10 15 20 25 30 35
5
10
15
20
25
30
35
40
45
5 10 15 20 25 30 35
5
10
15
20
25
30
35
40
45
5 10 15 20 25 30 35
5
10
15
20
25
30
35
40
45
Figure 10. Initial template and illumination images constructed from the Legendre basis that were used to model appearance change in the
sequence shown in Figure 9.
ilar illumination conditions. We expect to incorporate the
bilinear style-content factorization of Freeman and Tenen-
baum [5] to overcome this drawback. Finally, more sophis-
ticated studies involving the stability of the learned distri-
butions over time and slow illumination changes are un-
derway. Initial results indicate that the distributions can be
quite stable but may need to be re-learned over some period
over time. For example, distributions learned at dawn no
longer apply at dusk.
References
[1] R. Basri and D. Jacobs. Lambertian reflectance and linear
subspaces. IEEE Trans. PAMI, 25(2):218–233, 2003. 1
[2] P. Belhumeur and D.J.Kriegman. What is the set of images
of an object under all possible illumination conditions. IJCV,
28(3):1–16, 1998. 1
[3] B.Han and L. Davis. On-line density-based appearance mod-
eling for object tracking. Proceedings of ICCV, 2005. 2
[4] D. Freedman and M. Turek. Illumination-invariant track-
ing via graph cuts. IEEE Computer Society Conference on
Computer Vi sion and Pattern Recognition (CVPR), 2:10–17,
2005. 2
[5] W. Freeman and J. Tenenbaum. Learning bilinear models for
two factor problems in vision. Proceedings of IEEE CVPR,
1997. 7
[6] G. Hager and P. Belhumeur . Efficient region tracking with
parametric models of geometry and illumination. IEEE
Trans. PAMI, 20(10):1025–1039, 1998. 1
[7] M. Isard and A. Blake. Condensation - conditional density
propogation for visual tracking. IJCV , 21(1):695–709, 1998.
1
[8] A. Kale and C. Jaynes. Shape space sampling distributions
and their impact on visual tracking. IEEE International Con-
ference on Image Processing, 2005. 4
[9] L. Lu, X.Dai, and G. Hager . A particle filter without dynam-
ics for robust 3d face tracking. Proc. of FPIV, 2004. 3
[10] I. Matthews, T. Ishikawa, and S. Baker. The template up-
date problem. In Proceedings of the B ritish Machine Vision
Conference, September 2003. 2
[11] R. Ramamoorthi. Analytic pca construction for theoretical
analysis of lighting variability in images of lambertian ob-
ject. IEEE Trans. PAMI, 24(10):1–12, 2002. 1
[12] Y. Weiss. Deriving intrinsic images from image sequences.
Proc of ICCV, 2001. 2
[13] S. Zhou, R. Chellappa, and B. Moghaddam. Visual tracking
and recognition using appearance-adaptive models in par-
ticle filters. IEEE Trans. on Image Processing, November
2004. 2, 3
... Template adaptation approaches [14], [15], [16] suffer from problems of drift [17], e.g. if you adapt when the tracker has latched onto clutter, it will lead to tracking failure. In [18], it was assumed that a small number of centroids in the illumination space can be used to represent the illumination conditions existing in the scene. The six centroids method of [18] does not suffice given complex illumination patterns often encountered in reality. ...
... In [18], it was assumed that a small number of centroids in the illumination space can be used to represent the illumination conditions existing in the scene. The six centroids method of [18] does not suffice given complex illumination patterns often encountered in reality. Also, it is unclear how standard trackers like mean-shift tracker [19] can be adapted for illumination invariance. ...
... feature based approaches. Also, the template matching framework enables illumination to be parameterized using a Legendre basis, as suggested in past works [18], [26]. We use a very simple motion and illumination change model to demonstrate how to design PF-MT for our problem. ...
Conference Paper
Full-text available
In recent work, the authors introduced a multiplicative, low dimensional model of illumination that is computed as a linear combination of a set of simple-to-compute Legendre basis functions. The basis coefficients describing illumination change, are can be combined with the "shape" vector to define a joint "shape"-illumination space for tracking. The increased dimensionality of the state vector necessitates an increase in the number of particles required to maintain tracking accuracy. In this paper, we utilize the recently proposed PF-MT algorithm to estimate the illumination vector. This is motivated by the fact that, except in case of occlusions, multimodality of the state posterior is usually due to multimodality in the "shape" vector (e.g. there may be multiple objects in the scene that roughly match the template). In other words, given the "shape" vector at time t, the posterior of the illumination (probability distribution of illumination conditioned on the "shape" and illumination at previous time) is unimodal. In addition, it is also true that this posterior is usually quite narrow since illumination changes over time are slow. The choice of the illumination model permits the illumination coefficients to be solved in closed form as a solution of a regularized least squares problem. We demonstrate the use of our method for the problem of face tracking under variable lighting conditions existing in the scene
... In this work, we show that the above is also true for the illumination image sequence. 2) In many recent works [39], [40], [20], the illumination image is often represented using the first few lowest order Legendre polynomials. However, our experiments with the dictionary of Legendre polynomials (henceforth referred to as the Legendre dictionary) are the first to demonstrate that for many video sequences with significant illumination variations, the illumination image is approximately sparse in this dictionary and, in fact, its sparsity pattern includes many of the higher order Legendre polynomials, and may not include all the lower order ones. ...
... For tracking moving objects across spatially varying illumination changes, we use a template-based model taken from [40], [20] with a three-dimensional motion model, that only models x-y translation and scale. We use this because it is simple to use and to explain our key ideas. ...
Article
Full-text available
We study the problem of tracking (causally estimating) a time sequence of sparse spatial signals with changing sparsity patterns, as well as other unknown states, from a sequence of nonlinear observations corrupted by (possibly) non-Gaussian noise. In many applications, particularly those in visual tracking, the unknown state can be split into a small dimensional part, e.g. global motion, and a spatial signal, e.g. illumination or shape deformation. The spatial signal is often well modeled as being sparse in some domain. For a long sequence, its sparsity pattern can change over time, although the changes are usually slow. To address the above problem, we propose a novel solution approach called Particle Filtered Modified-CS (PaFiMoCS). The key idea of PaFiMoCS is to importance sample for the small dimensional state vector, while replacing importance sampling by slow sparsity pattern change constrained posterior mode tracking for recovering the sparse spatial signal. We show that the problem of tracking moving objects across spatially varying illumination change is an example of the above problem and explain how to design PaFiMoCS for it. Experiments on both simulated data as well as on real videos with significant illumination changes demonstrate the superiority of the proposed algorithm as compared with existing particle filter based tracking algorithms.
... The set of Legendre polynomials provide a good platform for such representation, since the basis functions are smoothly changing themselves. Legendre polynomials have got a great application in certain aspects of image processing in designing illumination models for object tracking [17] and generating shape signatures for supervised segmentation [18]. Using Legendre polynomials model intensity based appearance allows the estimating functions to vary spatially, but at the same time variation is constrained by the inherent smoothness of the polynomials. ...
... The set of Legendre polynomials provide an efficacious platform for such representation, since the basis functions are smoothly changing themselves. Legendre polynomials have found application in certain aspects of image processing such as designing illumination models for object tracking [17] and generating shape signatures for supervised segmentation [18]. Using Legendre polynomials to model intensity based appearance allows the estimating functions to vary spatially, yet the variation is constrained by the inherent smoothness of the polynomials. ...
... Rapid illumination changes pose another challenge for robust visual tracking. In general, for appearance-based approaches , when faced with extensive change, illumination modelling has to be added, such as in [10].Figure 8 shows the advantage of using our motion-based occupancy map method instead of an appearance-based technique on a sequence from a published domain 1 . The challenges come from the overall poor lighting condition that creates low contrast between the tracked person and the surrounding and from the abrupt lighting changes as the tracked person walks under different light sources placed along the corridor . ...
Article
We present a tracking algorithm based on motion analy-sis of regional affine invariant image features. The tracked object is represented with a probabilistic occupancy map. Using this map as support, regional features are detected and probabilistically matched across frames. The motion of pixels is then established based on the feature motion. The object occupancy map is in turn updated according to the pixel motion consistency. We describe experiments to measure the sensitivities of our approach to inaccuracy in initialization, and compare it with other approaches.
Article
The common image related artifacts during image acquisition are noise caused due to external interference and imbalance in illumination. Uneven illumination correction incorporates a penalty term that performs intensity distribution, and transfer between pre-defined uniformly illuminated and non-uniformly illuminated sub-regions of the input scale image. Many methods exist in the literature to address illumination correction. In this study, an overview of illumination models, estimation, unevenly illuminated image processing, non-uniform illumination correction, background correction techniques, and shadow correction are delivered. Various real-life applications, in the field of remote sensing imaging, automatic medical diagnosis of different diseases, underground imaging, and document imaging depend on image quality and illumination conditions. Some of the related work done by different researchers in solving illumination correction is discussed in a sequel. Assessment of correction quality is difficult because of non-availability of unilluminated images. Some of the objective assessment techniques are also discussed in this survey. This study aims at putting forward an all-inclusive discussion on the application of non-uniform image processing by means of various existing correction models in a wide application domain, and their frequently encountered challenges.
Conference Paper
Tracking object under illumination conditions is an important task in computer vision. A large number of methods for tracking object are described in the literature. Unfortunately, there is not enough robust methods that work for all applications. We have therefore proposed a tracker for the changing lights conditions with a model of the combination of sparse representation and intensity feature of the video sequence. In addition, the model is an object instanced model and depends on the illumination of the surroundings, and thus is effective in tracking object in illumination conditions. Experimental results show that the proposed tracker works well under significant illumination changes and outperforms many state-of-the-art tracking algorithms.
Conference Paper
In this work, we develop algorithms for tracking time sequences of sparse spatial signals with slowly changing sparsity patterns, and other unknown states, from a sequence of nonlinear observations corrupted by (possibly) non-Gaussian noise. A key example of the above problem occurs in tracking moving objects across spatially varying illumination changes, where motion is the small dimensional state while the illumination image is the sparse spatial signal satisfying the slow-sparsity-pattern-change property.
Article
Full-text available
Ternary blending of amorphous and semi-crystalline anthracene-containing poly(p-phenylene-ethynylene)-alt-poly(p-phenylene-vinylene) (PPE-PPV) copolymers (AnE-PVs) with PCBM was investigated in bulk heterojunction solar cells. In general, a strong impact on all photovoltaic parameters was observed by increasing the amount of amorphous AnE-PVba-derivative in relation to its semi-crystalline counterpart AnE-PVab. Interestingly, small additions of the amorphous copolymer were beneficial for overall solar cell performance. The observed performance increase of the ternary blend could be related to an improved open-circuit voltage, despite the fact that the binary blend of the amorphous copolymer and [6,6]-phenyl-C61-butyric acid methyl ester (PCBM) did not exhibit a larger photovoltage than the binary blend based on the semi-crystalline copolymer. These results indicate that a certain amorphous fraction of the donor polymer may be required for obtaining optimal bulk heterojunction morphologies, yielding maximum photovoltaic performance.
Conference Paper
Full-text available
Object tracking is a challenging problem in real-time computer vision due to variations of lighting condition, pose, scale, and view-point over time. However, it is exceptionally difficult to model appearance with respect to all of those variations in advance; instead, on-line update algorithms are employed to adapt to these changes. We present a new on-line appearance modeling technique which is based on sequential density approximation. This technique provides accurate and compact representations using Gaussian mixtures, in which the number of Gaussians is automatically determined. This procedure is performed in linear time at each time step, which we prove by amortized analysis. Features for each pixel and rectangular region are modeled together by the proposed sequential density approximation algorithm, and the target model is updated in scale robustly. We show the performance of our method by simulations and tracking in natural videos.
Article
Full-text available
The appearance of an object depends on both the viewpoint from which it is observed and the light sources by which it is illuminated. If the appearance of two objects is never identical for any pose or lighting conditions, then - in theory - the objects can always be distinguished or recognized. The question arises: What is the set of images of an object under all lighting conditions and pose? In this paper, we consider only the set of images of an object under variable illumination, including multiple, extended light sources, shadows, and color. We prove that the set of -pixel monochrome images of a convex object with a Lambertian reflectance function, illuminated by an arbitrary number of point light sources at infinity, forms a convex polyhedral cone in and that the dimension of this illumination cone equals the number of distinct surface normals. Furthermore, the illumination cone can be constructed from as few as three images. In addition, the set of -pixel images of an object of any shape and with a more general reflectance function, seen under all possible illumination conditions, still forms a convex cone in . These results immediately suggest certain approaches to object recognition. Throughout, we present results demonstrating the illumination cone representation.
Article
Full-text available
We present an approach that incorporates appearance-adaptive models in a particle filter to realize robust visual tracking and recognition algorithms. Tracking needs modeling interframe motion and appearance changes, whereas recognition needs modeling appearance changes between frames and gallery images. In conventional tracking algorithms, the appearance model is either fixed or rapidly changing, and the motion model is simply a random walk with fixed noise variance. Also, the number of particles is typically fixed. All these factors make the visual tracker unstable. To stabilize the tracker, we propose the following modifications: an observation model arising from an adaptive appearance model, an adaptive velocity motion model with adaptive noise variance, and an adaptive number of particles. The adaptive-velocity model is derived using a first-order linear predictor based on the appearance difference between the incoming observation and the previous particle configuration. Occlusion analysis is implemented using robust statistics. Experimental results on tracking visual objects in long outdoor and indoor video sequences demonstrate the effectiveness and robustness of our tracking algorithm. We then perform simultaneous tracking and recognition by embedding them in a particle filter. For recognition purposes, we model the appearance changes between frames and gallery images by constructing the intra- and extrapersonal spaces. Accurate recognition is achieved when confronted by pose and view variations.
Conference Paper
Full-text available
Object motions can be represented as a sequence of shape deformations and translations which can be interpretated as a sequence of points in N-dimensional shape space. These spaces range from simple 2D translations to more inclusive spaces such as the affine. In this case, tracking is the problem of inferring the most likely point in the space for the next frame given a current set of hypotheses. A robust method for achieving this is the particle filter. In this case, likely points within shape space are selected in a two step process. First, image measurements assign likelihoods to proposed points. Likely points are then propagated forward using a dynamical model to derive a set of new points that are perturbed according to some sampling distribution. These distributions play an important role in tracking performance because dynamical models are seldom known and a Gauss Markov model is often assumed for the model dynamics. This paper address the problems inherent in utilizing uninformed sampling distributions for visual tracking. We introduce a principled adaptive sampling approach that takes into account constraints on each component of the shape vector. Further a more appropriate sampling distribution that takes place in a linear subspace representing the predominant motion in the shape space. Results demonstrate improved tracking performance in challenging conditions where targets exhibit changing motion models.
Conference Paper
Full-text available
Illumination changes are a ubiquitous problem in computer vision. They present a challenge in many applications, including tracking: for example, an object may move in and out of a shadow. We present a new tracking algorithm which is insensitive to illumination changes, while at the same time using all of the available photometric information. The algorithm is based on computing an illumination-invariant optical flow field; the computation is made robust by using a graph cuts formulation. Experimentally, the new technique is shown to quite reliable in both synthetic and real sequences, dealing with a variety of illumination changes that cause problems for density based trackers.
Conference Paper
Full-text available
Particle filtering is a very popular technique for sequential state estimation problem. However its convergence greatly depends on the balance between the number of particles/hypotheses and the fitness of the dynamic model. In particular, in cases where the dynamics are complex or poorly modeled, thousands of particles are usually required for real applications. This paper presents a hybrid sampling solution that combines the sampling in the image feature space and in the state space via RANSAC and particle filtering, respectively. We show that the number of particles can be reduced to dozens for a full 3D tracking problem which contains considerable noise of different types. For unexpected motions, a specific set of dynamics may not exist, but it is avoided in our algorithm. The theoretical convergence proof [1, 3] for particle filtering when integrating RANSAC is difficult, but we address this problem by analyzing the likelihood distribution of particles from a real tracking example. The sampling efficiency (on the more likely areas) is much higher by the use of RANSAC. We also discuss the tracking quality measurement in the sense of entropy or statistical testing. The algorithm has been applied to the problem of 3D face pose tracking with changing moderate or intense expressions. We demonstrate the validity of our approach with several video sequences acquired in an unstructured environment.
Article
The problem of tracking curves in dense visual clutter is challenging. Kalman filtering is inadequate because it is based on Gaussian densities which, being unimo dal, cannot represent simultaneous alternative hypotheses. The Condensation algorithm uses factored sampling, previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set. Condensation uses learned dynamical models, together with visual observations, to propagate the random set over time. The result is highly robust tracking of agile motion. Notwithstanding the use of stochastic methods, the algorithm runs in near real-time.
Conference Paper
We prove that the set of all reflectance functions (the mapping from surface normals to intensities) produced by Lambertian objects under distant, isotropic lighting lies close to a 9D linear subspace. This implies that the images of a convex Lambertian object obtained under a wide variety of lighting conditions can be approximated accurately with a low-dimensional linear subspace, explaining prior empirical results. We also provide a simple analytic characterization of this linear space. We obtain these results by representing lighting using spherical harmonics and describing the effects of Lambertian materials as the analog of a convolution. These results allow us to construct algorithms for object recognition based on linear methods as well as algorithms that use convex optimization to enforce non-negative lighting functions
Conference Paper
Intrinsic images are a useful midlevel description of scenes proposed by H.G. Barrow and J.M. Tenenbaum (1978). An image is de-composed into two images: a reflectance image and an illumination image. Finding such a decomposition remains a difficult problem in computer vision. We focus on a slightly, easier problem: given a sequence of T images where the reflectance is constant and the illumination changes, can we recover T illumination images and a single reflectance image? We show that this problem is still imposed and suggest approaching it as a maximum-likelihood estimation problem. Following recent work on the statistics of natural images, we use a prior that assumes that illumination images will give rise to sparse filter outputs. We show that this leads to a simple, novel algorithm for recovering reflectance images. We illustrate the algorithm's performance on real and synthetic image sequences