Access to this full-text is provided by Wiley.
Content available from Advances in Multimedia
This content is subject to copyright. Terms and conditions apply.
Research Article
Object Tracking with Adaptive Multicue Incremental
Visual Tracker
Jiang-tao Wang,1De-bao Chen,1Jing-ai Zhang,1Su-wen Li,1and Xing-jun Wang2
1School of Physical and Electronic Information, Huaibei Normal University, Huaibei 235000, China
2Shandong Huisheng Group, Weifang 261201, China
Correspondence should be addressed to Jing-ai Zhang; ellazhangja@.com
Received February ; Revised August ; Accepted August ; Published September
Academic Editor: Constantine Kotropoulos
Copyright © Jiang-tao Wang et al. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Generally, subspace learning based methods such as the Incremental Visual Tracker (IVT) have been shown to be quite eective
for visual tracking problem. However, it may fail to follow the target when it undergoes drastic pose or illumination changes. In
this work, we present a novel tracker to enhance the IVT algorithm by employing a multicue based adaptive appearance model.
First, we carry out the integration of cues both in feature space and in geometric space. Second, the integration directly depends
on the dynamically-changing reliabilities of visual cues. ese two aspects of our method allow the tracker to easily adapt itself
to the changes in the context and accordingly improve the tracking accuracy by resolving the ambiguities. Experimental results
demonstrate that subspace-based tracking is strongly improved by exploiting the multiple cues through the proposed algorithm.
1. Introduction
Due to the wide applications in video surveillance, intelligent
user interface, human motion understanding, content-based
video retrieval, and object-based video compression [–],
visualtrackinghasbecomeoneoftheessentialandfunda-
mental tasks in computer vision. During the past decades,
numerous and various approaches have been endeavored to
improve its performance, and there is a fruitful literature
in tracking algorithms development that reports promising
results under various scenarios. However, visual tracking still
remains a challenging problem in tracking the nonstationary
appearance of objects undergoing signicant pose, illumina-
tion variations, and occlusions as well as shape deformation
for nonrigid objects.
When we design an object tracking system, usually two
essential issues should be considered: which searching algo-
rithm should be applied to locate the target and what type
of cue should be used to represent the object. For the rst
issue, there are two well-known searching algorithms which
had been widely studied in the last decade. ese are oen
referred to as particle ltering and mean shi. e particle
lter performs a random search guided by a stochastic motion
modeltoobtainanestimateoftheposteriordistribution
describing the object’s conguration []. On the other hand,
mean shi, a typical and popular variational algorithm, is a
robust nonparametric method for climbing density gradients
to nd the peak of probability distributions [,]. e
searching paradigms dier in those two methods as one is
stochastic and model-driven while the other is deterministic
and data-driven.
Modeling target appearance in videos is a problem of fea-
ture extracting and is known to be a more critical factor than
the search strategy. Developing a robust appearance system
which can model the target appearance changes adaptively
has been the matter of primary interest in the recent visual
tracking research. e Incremental Visual Tracker (IVT)
[] has been proved to be a successful tracking method by
incorporating an adaptive appearance model. In particular,
the IVT modeled the target appearance as a low-dimensional
subspace based on the probabilistic principal component
analysis (PPCA), where the subspace is updated adaptively
basedontheimagepatchestrackedinthepreviousframes.In
this model, the intensity dierences between target reference
and candidates are computed to measure the observation
weight. e IVT alleviates the burden of constructing a target
Hindawi Publishing Corporation
Advances in Multimedia
Volume 2014, Article ID 343860, 11 pages
http://dx.doi.org/10.1155/2014/343860
Advances in Multimedia
F : Two cases for the IVT tracker failure.
model prior to tracking with a large number of expensive
oine data and tends to yield higher tracking accuracies.
However, since only the intensity feature is employed to select
the optimal candidate as the target, it may fall into trouble
when the target is moving into shadow or undergoing large
pose changes (as shown in Figure ).
In this work, a multicue based incremental visual tracker
(MIVT) is proposed to confront the aforementioned di-
culties. In a sense, our work can be seen as an extension
of []. Compared to the classical IVT method, the main
contributions of our algorithm are as follows. First, with
color (or gray) and edge properties, our representation
model describes the target with more information. Second,
an adaptive multicue integration framework is designed
considering both the target and the background changes.
When one cue becomes not discriminative enough due to
target or background changes, the other will compensate.
ird, the proposed multicue framework can be eectively
incorporated in the particle lter tracking system, so as to
make the tracking process more robustly.
e rest of the paper is organized as follows. Section
reviews the related multicue fusion works. Section gives
an overview of the IVT tracking algorithm. In Section ,we
rst propose our multicue appearance modeling scheme, and
then we implement the presented MIVT tracking framework.
In Section , a number of comparative experiments are
performed. Section concludes the whole paper.
2. Related Work
ere is a rich literature in visual tracking and a thorough
discussion on this topic is beyond the scope of this paper. In
this section, we review only the most relevant visual tracking
works, focusing on algorithms that operate on multiple cues.
Up to now, a number of literatures have been published about
thefusionofmultiplecues.Ingeneral,therearetwokeyissues
that should be solved in multicue based tracking algorithm:
(1)what cues are used to represent the target’s feature, (2)
how the cues are integrated. Here, we focus on the second
key problem.
e simplest case is that dierent cues are assumed to be
independent, so as to use all cues in parallel and treat them as
equivalent channels; this approach has been reported in [,].
Based on this idea, in [], two features, intensity gradients,
and the color histogram were fused directly with xed equal
weights. A limitation of this method lies in that it does not
take account of the single cue’s discriminative ability.
To avoid the limitation of above methods, in [], Du
and Piater proposed Hidden Markov Models (HMM) based
multicue fusing approach. In this approach the target was
trackedineachcuebyaparticlelter,andtheparticlelters
in dierent cues interacted via a message passing scheme
based on the HMM, four visual cues including color, edges,
motion, and contours were selectively integrated in this work.
Jia et al. [] presented a dynamic multicue tracking scheme
Advances in Multimedia
by integrating color and local features. In this work, the
weights were supervised and updated based on a Histograms
of Oriented Gradients (HOG) detection response. Yang et al.
[] introduced a new adaptive way to integrate multicue
in tracking multiple human driven by human detections,
these dened dissimilarity function for each cue according
to its discriminative power and applied regression process to
adapt the integration of multiple cues. In [], a consistent
histogram-based framework was developed for the analysis
of color, edge, and texture features.
In [], Erdem et al. carried out the integration of the
multiple cues in both the prediction step and the measure-
ment step, and they dened the overall likelihood function
so that the measurements from each cue contributed the
overall likelihood according to its reliability. Yin et al. []
designed an algorithm that combined CamShi with particle
lterusingmultiplecuesandanadaptiveintegrationmethod
was adopted to combine color information with motion
information. Democratic integration was an architecture that
allows the tracking of objects through the fusion of multiple
adaptive cues in a self-organized fashion. is method was
given by Triesch and von der Malsburg in []. It was explored
more deeply in [,]. In this framework, each cue created
a -dimensional cue report, or saliency map; the cues fusion
was carried out by resulting fused saliency map which was
computed as a weighted sum of all the cue reports. P´
erez
et al. [] utilized a particle lter based visual tracker that
fused three cues: color, motion, and sound. In their work,
color cues were served as the main visual cue, and according
to the scenario under consideration, color cues were fused
with either sound localization cues or motion activity cues.
A partitioned sampling technique was applied to combine
dierent cues; the particle resampling was not implemented
on the whole feature space but in each single feature space
separately. is technique increased the eciency of the
pa rticle lter. However, in their c ase, onl y two c ues could
be used simultaneously, this restricted the exible selection
of cues and the extension of the method. Wu and Huang
[] formulated the problem of integrating multiple cues for
robust tracking as the probabilistic inference problem of a
factorized graphical model. To analyze this complex graphical
model, a variational method was taken to approximate
the Bayesian inference. Interestingly, the analysis revealed
a coinference phenomenon of multiple modalities, which
illustrated the interactions among dierent cues; that is, one
cue could be inferred iteratively by the other cues. An ecient
sequential Monte Carlo tracking algorithm was employed to
integrate multiple visual cues, in which the coinference of
dierent modalities was approximated.
Despite that subspace representation models have been
successfully applied in handling the small appearance vari-
ations and illumination changes, they still usually fail in
handling rapid appearance, shape, and scale changes. To
overcome this problem for the classical IVT tracker, in this
paper, we design a novel multicue based dynamic appearance
model for the IVT tracking system, and this model can
adapt to both the target and the background changes. We
implement this model by fusing multiple cues in an adaptive
observation model. In each frame, the tracking reliability is
utilized to measure the weight of each cue, and an observation
model is constructed with the subspace models and their
corresponding weights. e appearance changes of the target
are taken into account when we update the appearance
models with tracking results. erefore, online appearance
modeling and weight update of each cue can adapt our track-
ing approach to both the target and background changes,
thereby generating good performances.
3. Review of the IVT
e IVT models the target appearance as a low-dimensional
subspace based on the probabilistic principal component
analysis (PPCA) and uses particle-lter dynamics to track the
target.
Letthestateofthetargetobjectattimeis represented as
𝑡=𝑡,𝑡,𝑡,𝑡,𝑡,𝑡, ()
where 𝑡,𝑡,𝑡,𝑡,𝑡,and𝑡denote ,translation, rotation
angle, scale, aspect ratio, and skew direction at time .And
the state dynamic model between time and time −1can
be treated as a Gaussian distribution around the state at −1;
then, we have
𝑡|𝑡−1=𝑡;𝑡−1,Ψ, ()
where Ψis a diagonal covariance matrix whose elements are
the corresponding variances of state parameters.
Based on ()and(), the particle lter can be carried out
to locate the target. In this stage, rst the particles are drawn
from the particle lter, according to the dynamical model.
en, for each particle, extract the corresponding window
from the current frame and calculate its reconstruction error
on the selected eigenbasis and weight through ()and(),
which is its likelihood under the observation model:
𝑒=𝑡−−𝑇𝑡−, ()
𝑡
𝑖∝exp −
𝑡
𝑒
2
22
𝑤, =1,...,, ()
where 𝑡is an image patch predicated by 𝑡,anditwas
generated from a subspace spanned by andcenteredat.
In (), 2
𝑤is the variance of the reconstruction error and ⋅
denotes the L2-norm.
Finally, the image window corresponding to the most
likely particle is stored as the real target window. When
the desired number of new images has been accumulated,
perform an incremental update (with a forgetting factor) of
the eigenbasis, mean, and eective number of observations.
e key contribution of the IVT lies in that an ecient
incremental method was proposed to learn the eigenbases
online as new observations arrive. is method extends the
Sequential Karhunen-Loeve (SKL) algorithm to present a
new incremental PCA algorithm that correctly updates the
eigenbasis as well as the mean, given one or more additional
training data. A detailed description of this method can be
found in [].
Advances in Multimedia
(a) (b)
0
50
010 20 30 40 50
1.9
1.8
1.7
1.6
1.5
1.4
1.3
1.2
×105
(c)
11
10
9
8
7
6
60
40
20
0
50 40 3020 10 0
×104
(d)
F : Dierent features show dierent discriminative abilities. (a) e target window within a red bounding box in reference image. (b)
Current target window aer object moving. (c) Image error between reference image and candidate image around current target window
with edge cue. (d) Image error between reference image and candidate image around current target window with gray.
4. Multicue Fusion
For a robust tracking system with multicue, dierent cue’s
signicanceshouldbeconsistentwithitstrackingreliability.
And this signicance also should be self-adapting under
dynamic environment when the object undergoing signi-
cant pose, illumination variations, and occlusions as well as
shape deformation, so as to ensure that the most reliable cue
in current time is always adapted to track the target. In this
work, we aim to develop a multicue integrating framework
which is exible enough to exploit any valid image feature,
such as gray, texture, edge direction, and motion information;
and this framework does not restrict the target’s feature type.
However, it is impossible to apply all the types of features
simultaneously, for simple, only two types of features, gray
andedge,areusedintherestofthepaper.
Figure shows that dierence cues may have dierent
discriminative abilities. In Figure (a),theimagewithinred
rectangle is set as the target reference. Aer some time,
the target moves to the current position of Figure (b).
To evaluate the discriminative ability of various features,
we generate target candidate uniformly around the current
target position with same scale as the reference image. en,
the sum pixel error between the candidate and reference
is calculated under two feature spaces: edge and gray. As
showninFigures(c) and (d), it indicates that dierent
cues may yield dierent discriminative abilities. From the
gure we also can nd that here two distances exist: (1)the
Euclidean distance between the position of candidate and the
position of the real target in the image plane, (2)the distance
betweenthereferencemodelandcandidatemodelinfeature
space (reconstruction error). When single cue is used, small
feature distance means that the candidate approximates the
real object more closely. However, this does not work when
multiple visual cues are adopted, since dierent cues may
have dierent sensitive characters to the change of object
appearance and environment.
In this section, we introduce a method to evaluate the
reliabilities of cues based on the above analysis, and this
methodcanbeeectivelyembeddedintheparticlelter
Advances in Multimedia
Initialization
Locate the target manually in the rst frame, and use a single particle to indicate this
location. Set the initial relative sharpness factors as 0=1/for cues. Initialize the
eigenbasis to be empty, and the mean to be the appearance of the target in the rst frame.
for t=1toT
() Spread the target states at time −1to time using the state dynamic model.
() For each new state 𝑖
𝑡corresponding to particle at time , nd its corresponding
weight (𝑖,𝑓)
𝑡in feature space based on its likelihood under the observation models.
() Based on each cue’s relative sharpness factor 𝑓
𝑡−1,=1,...,. Combine multiple cues
by calculating the new weight for each particle as
𝑖
𝑡=∑𝑁
𝑓=1 (𝑖,𝑓)
𝑡𝑓
𝑡−1.
() Store the image window corresponding to the most likely particle. When the desired
number of new images have been accumulated, perform an incremental update (with a
forgetting factor) of the eigenbasis, mean, and eective number of observations.
() Update the relative sharpness factor for each cue at time as 𝑡based on the
estimated target state and the particle distribution.
end for
A : Multicue based IVT algorithm.
tracking framework. In our approach, we treat each particle
as a target candidate, and the image reconstruction error of
this candidate serves as particle’s weight. ereby, the particle
distribution and its weights can be referred as a D map
(Figures (c) and (d)). For a point (,,) in this map,
the and coordinates give the point projecting position
on the image plane, and coordinate describes the weight
for the particle on this point. Particles with same position
on the image plane may have dierent weighs under various
feature spaces, for that dierent cues may lead to dierent
reconstruction errors. In other words, particles with same
distribution may show dierence maps under dierence cues.
For cues with enough discriminability between the target and
background, obvious height dierences exist among point on
target position and point on other positions in the D map.
ese D maps are then analyzed to obtain the sharpness
factor of the terrain. e sharpness factor is used to evaluate
the signicance of the cue.
We denote the distance created by reconstruction error as
𝑛
𝑚,=1,...,and =1,...,.Here,is the number
of particle; is the cue index which the distance belongs to.
is distance can be gotten by ()and(), where 𝑛
𝑚∝𝑛
𝑚.
en we calculate the Euclidean distance 𝑚,=1,...,
for every particle as:
𝑚=𝑚−02+𝑚−02,()
where (𝑚,𝑚)and (0,0)are the coordinates of particle and
target in the image plane. e sharpness factor for particle
under feature space canbedenedas:
𝑛
𝑚=𝑛
𝑚
𝑛
𝑚.()
So the mean sharpness factor for the entire particle under
feature space is
𝑛=1
𝑀
𝑚=1 𝑛
𝑚.()
Here,
𝑛gives the tracking ability for the th feature space,
because more large value of
𝑛indicates that the current
reconstruction error map is more steep, therefore the target
can be distinguished from other candidates more clearly.
Otherwise, the current reconstruction error map is more
at, thus the target and other candidates may be confused.
To compare the discriminative ability among various feature
spaces, the relative sharpness factor among dierent features
is dened as:
𝑛=
𝑛
∑𝑁
𝑛=1
𝑛.()
is RSF (relative sharpness factor) gives the signicance for
the th cue.
e general algorithm is given as in Algorithm .
5. Experimental Results and Analysis
We implemented the proposed approach in MATLAB
based on the code of classical IVT from http://www.cs
.toronto.edu/∼dross/ivt/.eproposedmethodistestedon
several video sequences, which include dicult tracking
conditions such as complex backgrounds, occlusions, and
non-rigid object’s appearance changes. In order to test the
eectivenessoftheproposedadaptiveappearancemodel,
we compare the tracking results of our presented method
with other approaches. For multicue method, the multicue
appearance models with intensity and edge cues are used,
as for the single cue tracker, the single feature model with
intensity cue is applied. e number of particles used for
our method is same to the other trackers, particles are
adopted for all experiments except for the two long sequences
where it is . In all cases, the initial position of the target is
selected manually.
e rst test sequence is an infrared (IR) image sequence;
it shows a tank moving on the ground from le to right.
Some samples of the tracking results are shown in Figure .
Advances in Multimedia
(a)
(b)
F : Tracking results for seq.. e rst row: results for IVT. e second row: results for our method.
020 40 60 80 100 120 140
8
7
6
5
4
3
2
1
0
Frame number
Position error (pixel)
Our method
IVT
F : Position error for seq..
Here, the rst row gives the results of classical IVT and the
second row shows the results of our proposed method. e
frame indices are , , , and from le to right. e
target-to-background contrast is very low and the noise level
is high for these IR frames. In Figure ,thetrackingerrors
for both the two methods are given, we can see that our
tracker is capable of tracking the object all the time with
smallerror.eRSFforthetwocuesaredemonstratedin
Figure , it shows that the edge weight is higher than the
intensity weight in general, and this also gives low target-to-
background contrast.
e second test sequence shows a moving person and it
presents challenging appearance changes caused by shadows
of the trees. Figure shows the tracking results using both
methods, where the rst row gives the results of classical
IVTandthesecondrowshowstheresultsofourproposed
method. e person is small in the image and undergoes
sharp appearance changes when he walks into the shadow.
From Figure ,weseethatlargepositionerrorarousedfor
020 40 60 80 100 120
1.2
1
0.8
0.6
0.4
0.2
0
−0.2
Frame number
RSF of individual cue
Intensity cue
Edge cue
F : RSF for each cue throughout seq..
the classical IVT, on comparisons, our method keeps low
error still the person walk out the shadow. In Figure ,the
RSF of edge and intensity cues are given.
e third test sequences is from http://www.cs.toronto
.edu/∼dross/ivt/, it shows a moving animal toy undergoing
drastic view changes, for that the toy frequently changes its
view as well as its scale. e tracking results are illustrated in
Figure , where rows and correspond to IVT and rows
and correspond to our tracker. In which eight representative
frames(,,,,,,,and)areshown.We
can see that our proposed tracker performs well throughout
the sequence. In contrast to our method, IVT fails when the
target changes its pose drastically.
e fourth test sequence is an infrared (IR) image
sequence from the VIVID benchmark dataset []. In this
sequence, cars run through large shadows caused by the trees
ontheroadside,andthetarget-to-backgroundcontrastislow,
but the noise level is high. Some samples of the nal tracking
results are demonstrated in Figure .Fourrepresentative
Advances in Multimedia
(a)
(b)
F : Tracking results for seq.. e rst row: results for IVT. e second row: results for our method.
020 40 60 80 100 120 180160140
9
8
7
6
5
4
3
2
1
0
Frame number
Position error (pixel)
Our method
IVT
F : Position error for seq..
frames of the video sequence are shown, with indices , , ,
and,itcorrespondstotheFigure,,,andin
the dataset. From Figure ,weseethatourtrackeriscapable
of tracking the object all the time even the car runs out the
shadows. In comparison, IVT tracker fails when the car runs
out the shadows and is unable to recover it.
e h test sequence is obtained from PETS
benchmark dataset http://www.cvg.reading.ac.uk/datasets/
index.html. It shows a walking passenger in subway station
who undergoes large appearance changes. Some samples of
the tracking results are shown in Figure . e frame indices
are , , , and from le to right; the indices of them in
thedatasetare,,,andrespectively.Fromthe
results, we can see that the tracking process of IVT cannot
distinguish the actual person of interest from the background
for the large appearance changes. On the other hand, our
framework provide good tracking results, it can overcome
the eect of appearance changes and tracking the target
successfully.
020 40 60 80 120100 160 180140
1.2
1
0.8
0.6
0.4
0.2
0
−0.2
Frame number
RSF of individual cue
Intensity cue
Edge cue
F : RSF for each cue throughout seq..
Figure gives some representative tracking results for
three sequences which have been tested in []. e rst
row shows results for sequence “trellis.” e indices of
them in the dataset are , , , and from le to
right. e second row provides representative results for the
sequence “car,” and the frame indices are , , , and .
Tracking results for the sequence “davidin” are depicted
in the last row, frame , , , and of the dataset are
given.Ascanbeseeninthegure,ourmethodperforms
well under challenging conditions such as variations of
views, scale, and illumination changes. To straightforwardly
make comparisons among other tracker, we quantitatively
evaluate our tracking algorithm on the sequence “duedk” and
sequence “car” which can be found in []. In Table ,center
location RMS errors of three tracker: the proposed method,
IVT tracker, and a multicue tracker which described in []
(here, we call it CSR) are provided.
Finally, we investigate the runtime of the IVT algorithm
andtheproposedmethod.AscanbeseenfromTable ,the
IVT tracker can track target with perfect real-time processing
Advances in Multimedia
(a)
(b)
F : Tracking results for seq.. e rst two rows: results for IVT. e last two rows: results for our method.
(a)
(b)
F : Tracking results for seq.. e rst row: results for IVT. e second row: results for our method.
Advances in Multimedia
(a)
(b)
F : Tracking results for seq.. e rst row: results for IVT. e second row: results for our method.
(a)
(b)
(c)
F : Tracking results for some sequences which had been tested in []. e rst row: results for the sequence “trellis.” e second
row: results for the sequence “car.” e last row: results for the sequence “davidin.”
T : Center location RMS errors (in pixel) and running speed (in frame per second) for three trackers.
Video sequence IVT CSR Our method
RMS error Running Speed RMS error Running Speed RMS error Running Speed
dudek . 21.32 . . 15.32 .
Car . 28.04 . . 2.51 .
Advances in Multimedia
speed. In contrast, our method and CSR are slower than the
IVT. With the same number of particles, the IVT algorithm
using single intensity cue run at an average speed of . fps,
in comparison, our method using both intensity and edge
cues runs at an average speed of . fps. is means a loss in
the runtime performance as increasing the number of cues.
As illustrated in above experiment results, we can see that
the presented approach outperforms the IVT and the CSR
algorithm in terms of tracking accuracy. is mainly stems
from our adaptive cue integration scheme, for that, at each
frame, the target is determined by using particles under all
the cues, but additionally considering their discriminative
reliabilities, rather than by just using particles under single
cue which itself may provide poor or inaccurate measure-
ments. e advantage of our formulation is its adaptive
nature which lets us easily combine dierent target views, but
generally with a loss of computational eciency. It would be
interesting to focus on developing more ecient solutions to
this problem in future work.
6. Conclusion
In this work, we presented a novel tracker to enhance the
IVT algorithm by employing a multicue based adaptive
appearance model. First, we carry out the integration of cues
bothinfeaturespaceandimagegeometricspace.Second,
considering both the target and background changes, the
integration of cues directly depend on the dynamically-
changing reliabilities of visual cues. ese two aspects of
our method allow the tracker to easily adapt itself to the
changes in the context and accordingly improve the tracking
accuracy by resolving the ambiguities. In this way, our adap-
tive appearance model can ensure when one cue becomes
not discriminative enough due to target or background
changes, the other will compensate. In the last, the proposed
multicue framework eective utilizes the merits of particle
lter, so as to make robust tracking on less computing cost.
Experimental results demonstrate that subspace tracking is
strongly improved by exploiting the multiple cues through
the proposed algorithm.
Conflict of Interests
e authors declare that there is no conict of interests
regarding the publication of this paper.
Acknowledgments
is work is jointly supported by the National Natural
Science Foundation of China (nos. , , and
) and the Natural Science Foundation of Anhui
Province (MF).
References
[] R. Wang and J. Popovic, “Real-time hand-tracking with a color
glove,” ACM Transactions on Graphics,vol.,no.,article,
.
[] V. omas and A. K. Ray, “Fuzzy particle lter for video surveil-
lance,” IEEE Transactions on Fuzzy Systems,vol.,no.,pp.
–, .
[] Z. Li, S. Qin, and L. Itti, “Visual attention guided bit allocation
in video compression,” Image and Vision Computing,vol.,no.
, pp. –, .
[] M.IsardandA.Blake,“CONDENSATION:conditionaldensity
propagation for visual tracking ,” International Journal of Com-
puter Vision,vol.,no.,pp.–,.
[] D. Comaniciu and P. Meer, “Mean shi: a robust approach
toward feature space analysis,” IEEE Transactions on Pattern
Analysis and Machine Intelligence,vol.,no.,pp.–,
.
[] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object
tracking,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. , no. , pp. –, .
[] D. A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang, “Incremental
learning for robust visual tracking,” International Journal of
Computer Vision,vol.,no.–,pp.–,.
[] H. Wang and D. Suter, “Ecient visual tracking by probabilistic
fusion of multiple cues,” in Proceedings of the 18th International
Conference on Pattern Recognition, pp. –, August .
[]P.LiandC.Francois,“Imagecuesfusionforobjecttracking
based on particle lter,” in Articulated Motion and Deformable
Objects,vol.ofLecture Notes in Computer Science,pp.–
, .
[] S. Bircheld, “Elliptical head tracking using intensity gradients
and color histograms,” in Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recognition,
pp. –, June .
[] W. Du and J. Piater, “A probabilistic approach to integrating
multiple cues in visual tracking,” in Proceedings of the 10th
European Conference on Computer Vision,vol.,pp.–,
.
[] G. Jia, Y. Tian, Y. Wang, T. Huang, and M. Wang, “Dynamic
multi-cue tracking with detection responses association,” in
Proceedingsofthe18thACMInternationalConferenceonMul-
timedia (MM ’10), pp. –, October .
[] M.Yang,F.Lv,W.Xu,andY.Gong,“Detectiondrivenadaptive
multi-cue integration for multiple human tracking,” in Proceed-
ings of the International Conference Computer Vision,pp.–
, .
[] P.Brasnett,L.Mihaylova,D.Bull,andN.Canagarajah,“Sequen-
tial Monte Carlo tracking by fusing multiple cues in video
sequences,” Image and Vision Computing,vol.,no.,pp.–
, .
[] E. Erdem, S. Dubuisson, and I. Bloch, “Visual tracking by fus-
ing multiple cues with context-sensitive reliabilities,” Pattern
Recognition,vol.,no.,pp.–,.
[] M. Yin, J. Zhang, H. Sun, and W. Gu, “Multi-cue-based
CamShi guided particle lter tracking,” Expert Systems with
Applications, vol. , no. , pp. –, .
[] J. Triesch and C. von der Malsburg, “Democratic integration:
self-organized integration of adaptive cues,” Neural Computa-
tion,vol.,no.,pp.–,.
[] M. Spengler and B. Schiele, “Towards robust multi-cue integra-
tion for visual tracking,” Machine Vision and Applications,vol.
,no.,pp.–,.
[] C.Shen,A.vandenHengel,andA.Dick,“Probabilisticmultiple
cueintegrationforparticlelterbasedtracking,”inProceedings
of the 7th Digital Image Computing: Techniques and Applications,
pp.–,.
Advances in Multimedia
[] P. P´
erez, J. Vermaak, and A. Blake, “Data fusion for visual track-
ing with particles,” Proceedings of the IEEE,vol.,no.,pp.
–, .
[] Y. Wu and T. S. Huang, “Robust visual tracking by integrating
multiple cues based on co-inference learning,” International
Journal of Computer Vision,vol.,no.,pp.–,.
[] VIVID database, http://vision.cse.psu.edu/data/vividEval/data-
sets/datasets.html.
Available via license: CC BY
Content may be subject to copyright.