ArticlePDF Available

Patch-based visual tracking with online representative sample selection

Authors:

Abstract and Figures

Occlusion is one of the most challenging problems in visual object tracking. Recently, a lot of discriminative methods have been proposed to deal with this problem. For the discriminative methods, it is difficult to select the representative samples for the target template updating. In general, the holistic bounding boxes that contain tracked results are selected as the positive samples. However, when the objects are occluded, this simple strategy easily introduces the noises into the training data set and the target template and then leads the tracker to drift away from the target seriously. To address this problem, we propose a robust patch-based visual tracker with online representative sample selection. Different from previous works, we divide the object and the candidates into several patches uniformly and propose a score function to calculate the score of each patch independently. Then, the average score is adopted to determine the optimal candidate. Finally, we utilize the non-negative least square method to find the representative samples, which are used to update the target template. The experimental results on the object tracking benchmark 2013 and on the 13 challenging sequences show that the proposed method is robust to the occlusion and achieves promising results.
Content may be subject to copyright.
Patch-based visual tracking with
online representative sample selection
Weihua Ou
Di Yuan
Donghao Li
Bin Liu
Daoxun Xia
Wu Zeng
Weihua Ou, Di Yuan, Donghao Li, Bin Liu, Daoxun Xia, Wu Zeng, Patch-based visual
tracking with online representative sample selection,J. Electron. Imaging 26(3),
033006 (2017), doi: 10.1117/1.JEI.26.3.033006.
Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 05/13/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
Patch-based visual tracking with online representative
sample selection
Weihua Ou,a,*,Di Yuan,b,Donghao Li,bBin Liu,aDaoxun Xia,aand Wu Zengc
aGuizhou Normal University, School of Big Data and Computer Science, Guiyang, China
bHarbin Institute of Technology Shenzhen Graduate School, School of Computer Science, Shenzhen, China
cWuhan Polytechnic University, School of Electric and Electronic Engineering, Wuhan, China
Abstract. Occlusion is one of the most challenging problems in visual object tracking. Recently, a lot of dis-
criminative methods have been proposed to deal with this problem. For the discriminative methods, it is difficult
to select the representative samples for the target template updating. In general, the holistic bounding boxes
that contain tracked results are selected as the positive samples. However, when the objects are occluded,
this simple strategy easily introduces the noises into the training data set and the target template and then
leads the tracker to drift away from the target seriously. To address this problem, we propose a robust
patch-based visual tracker with online representative sample selection. Different from previous works, we divide
the object and the candidates into several patches uniformly and propose a score function to calculate the score
of each patch independently. Then, the average score is adopted to determine the optimal candidate. Finally,
we utilize the non-negative least square method to find the representative samples, which are used to update
the target template. The experimental results on the object tracking benchmark 2013 and on the 13 challenging
sequences show that the proposed method is robust to the occlusion and achieves promising results. © 2017 SPIE
and IS&T [DOI: 10.1117/1.JEI.26.3.033006]
Keywords: discriminative method; patch-based visual tracking; representative sample selection; occlusion; robust tracking.
Paper 161023 received Dec. 22, 2016; accepted for publication Apr. 25, 2017; published online May 13, 2017.
1 Introduction
Visual object tracking is one of the most important topics in
computer vision and has wide applications in video surveil-
lance, vehicle navigation, and humancomputer interaction.
Despite much progress obtained over the past years, visual
object tracking still suffers from many challenging problems,
such as occlusion, illumination, pose variations, and back-
ground clutters. Among them, the occlusion is the important
one and often occurs in real applications.
Many methods have been proposed to handle the occlu-
sion in visual tracking. Mainly, these methods can be
classified into discriminative methods18and generative
methods.917 The generative methods use a lot of low-dimen-
sional subspaces to describe the appearance of the object.
The discriminative methods formulate the object tracking
as a classification problem and distinguish the target object
from the background. Although the discriminative method
is generally used in image segmentation,1821 image denois-
ing,22,23 face recognition,2429 action recognition,30,31 and
handwriting identification,3234 it is increasingly popular in
visual tracking recently. For example, Kalal et al.3proposed
the classical trackinglearningdetection (TLD) framework
that explicitly decomposes the long-term tracking task
into tracking, learning, and detection. For the full occlusions,
the TLD can track the target by detection when the target
appears again. Using the circulant matrix and Fourier trans-
form, Henriques et al.1presented a high-speed correlation
filter tracker (KCF), which is robust to the partial occlusion.
Combining the deep feature with the traditional support
vector machine, Hong et al.2presented the outstanding
tracker (CNN-SVM). The results show that the deep feature
is more robust than the hand-crafted feature to occlusion.
These trackers are the representative methods for dealing
with the occlusion problem in object tracking.
Patch-based tracking strategy is another effective way to
handle occlusion because the objects are usually occluded
partially and can be tracked by the other visible parts.
Based on this strategy, several methods have been proposed.
For instance, Adam et al.35 presented a fragment-based
tracking method in which each patch votes on the possible
position and scale of the object using the integral histogram.
By exploiting partial information and spatial information of
the target, an adaptive structural local sparse appearance
model is proposed in Ref. 36. To handle the part appearance
variations, in Ref. 37, the authors used a partial matching
method in multiple frames by simultaneously exploiting
the low-rank and sparse structure. With the adaptive corre-
lation filters, Liu et al.38 proposed a real-time part-based vis-
ual tracking method, which is robust to various appearance
variations. More references about the patch-based tracking
strategy can be found in Refs. 3945. Motivated by these
patch-based methods, our approach also adopts a patch-
based strategy to handle occlusion. However, different from
existing methods, our approach tracks each local patch inde-
pendently. The final tracked result is determined by the
scores of all the patches.
The challenging problem in the discriminative method
is the selection of the positive and negative samples for
the classifier training and the target template update. In
the most existing methods, the holistic bounding boxes
that contain the tracked results are selected as the positive
*Address all correspondence to: Weihua Ou, E-mail: ouweihuahust@gmail
.com
Weihua Ou and Di Yuan these authors are contributed equally to this work. 1017-9909/2017/$25.00 © 2017 SPIE and IS&T
Journal of Electronic Imaging 033006-1 MayJun 2017 Vol. 26(3)
Journal of Electronic Imaging 26(3), 033006 (MayJun 2017)
Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 05/13/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
samples, and the neighborhood samples that are around
the tracked results are selected as the negative samples,
regardless of the occlusion. As shown in Fig. 1, choosing
the holistic bounding boxes as positive samples would
introduce noises into the target template set when the target
is occluded. Obviously, the target templates updated by
those positive samples cannot effectively represent the
object.
To address this problem, in this paper, we propose a
robust patch-based visual tracker with online representative
sample selection. Specifically, we divide the bounding box
of the candidates and the target into several local patches
uniformly and extract the associated histogram of oriented
gradient (HOG) features to represent them independently,
as shown in Fig. 2. Then, we calculate the score of each
local patch by the proposed score function and obtain the
average score of these patches to determine which candidate
is the best one, as shown in Step 2 of Fig. 2. Finally, we
propose a non-negative least square method to choose
some representative samples to update the target template,
as shown in steps 3 and 4 of Fig. 2.
The contributions of this paper are summarized as follows:
A score function, which is effective for measuring the
similarity between the candidate and the target, is
proposed.
An online representative sample selection method is
presented based on the proposed score function.
Fig. 1 The selection of the positive and negative samples for most existing methods. A bounding box
with solid green line represents a positive sample, and the bounding boxes with dotted red lines represent
the negative samples. When the target is partly occluded, as shown in (b), the 37-th frame (#0037),
choosing the bounding box with the solid green line as a positive sample will introduce the noises
into the training sample obviously.
Fig. 2 The flowchart of the proposed method (PB-RSS). Step 1 presents that some candidates are gen-
erated by the particle filter in the tth frame. In step 2, these candidates are divided into four local patches,
and the score of each local patch is calculated by the proposed score function. Then, the average score is
adopted to determine the optimal candidate. Step 3 represents that some training samples are extracted
using the polar grid sampling around the tracking result. Step 4 represents that some representative
samples are selected from the training sample set by solving a non-negative least square problem.
Finally, these representative samples are used to update the target template and score function.
Journal of Electronic Imaging 033006-2 MayJun 2017 Vol. 26(3)
Ou et al.: Patch-based visual tracking with online representative sample selection
Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 05/13/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
The experimental results show that the proposed
method is robust to partial occlusions and obtained
promising performance.
The rest of this paper is organized as follows: we briefly
review the related works in Sec. 2and describe the proposed
method in Sec. 3. Then, we show the experimental details
and results in Sec. 4. Finally, the conclusions are made in
Sec. 5.
2 Related Works
In this section, we first present the basic tracking model and
then review some related works. Before the review, we begin
with some notations. We denote by a lowercase letter x,
a bold lowercase letter x, and a bold uppercase letter X,
a real number, a vector, and a matrix, respectively.
2.1 Sequential Inference Model
The object tracking problem can be cast as a sequential infer-
ence task.46 Given sobserved patches It¼fI1
t;I2
t;···;Is
tg,
the object tracking can be formulated by estimating btby
maximizing the posterior state distribution pðbtjItÞ, where
bt¼½x; y; w; hR4are the state variables in the tth
frame. The optimal state can be estimated by the maximum
a posterior formulation
EQ-TARGET;temp:intralink-;e001;63;473bt¼arg max pðbi
tjItÞ:(1)
According to the Bayesian theorem, we have following
formulation:
EQ-TARGET;temp:intralink-;e002;63;420pðbi
tjItÞpðIi
tjbi
tÞZpðbi
tjbi
t1Þpðbi
t1jIt1Þdbi
t1;(2)
where pðbi
tjbi
t1Þis the motion model and pðIi
tjbi
tÞis the
observation model. As in traditional methods, we use the par-
ticle filter4749 to model the object motion in this paper. The
particle filter is a classical Bayesian sequential importance
sampling technique, which is adopted in many tracking
methods. The observation model pðIi
tjbi
tÞgives the confi-
dence of a given candidate being the target.
Most existing discriminative methods train a classifier to
distinguish the target from the background. However, the
performance of the classifiers is heavily dependent on the
selection of the training samples. In this paper, we directly
construct a score function instead of a classifier to predict the
optimal candidate. This idea can be described as follows:
EQ-TARGET;temp:intralink-;e003;63;234pðIi
tjbi
tÞfðxi
tÞ;(3)
where xi
tRdrepresents the features extracted from the
patch Ii
t. The construction of the score function will be intro-
duced in Sec. 3.1.
2.2 Sample Selection
Choosing the holistic bounding boxes as the positive training
samples would introduce the noises into the training samples
because the target object might be occluded partially. To deal
with this problem, many methods have been proposed; they
can be categorized as follows: (1) robust loss function,50,51
(2) semisupervised learning,52,53 and (3) multiple instance
learning (MIL).54,55 The main idea of the first class method
is to design robust loss function to measure the similarity
between the target template and the candidate samples. For
example, Leistner et al.50 proposed an online GradientBoost
with a robust loss function, which is less sensitive to noise.
Masnadi-Shirazi et al.51 used a robust tangent loss function
that encourages the margin between the negative samples
and the positive samples. The second class method used
the supervised information to constrain the objective func-
tion. For example, in Ref. 52, the author proposed an online
semisupervised boosting method that uses the previously
tracked results as labeled samples during the tracking.
Saffari et al.53 proposed a multiclass semisupervised boost-
ing algorithm that uses a regularization constraint over the
unlabeled data. Different from the above methods, the
main idea of multiple instances learning is to utilize more
samples to identify the object target. For example, Babenko
et al.54 used the MIL instead of the conventional supervised
learning, which leads to the more robust tracker with fewer
parameters. In Ref. 55, the author proposed an online semi-
supervised learning algorithm that combines semisupervised
learning and multiple instances learning into a coherent
framework. Then, they introduced a loss function that simul-
taneously uses labeled and unlabeled samples. The results
show that the tracker was more adaptive compared with
the other online semisupervised methods.
Recently, the most related work is the struck,56 which
presents a framework for adaptive visual object tracking
based on the structured output prediction. Rather than learn-
ing a classifier, a prediction function was proposed to
directly estimate the object transformation between the
frames. The prediction function is determined by a set of sup-
port vectors, which are selected from the training samples
and the corresponding coefficients. The struck method
avoids an artificial binarization step within a coherent
framework by directly linking the learning to tracking.
The experimental results show its effectiveness in dealing
with occlusion.
However, the samples selection algorithm57 for the struck
is very complicated. Different from the struck,56 in this paper,
we formulate the selection of the representative samples as
a non-negative least square problem, which can be easily
solved via standard non-negative least square.
3 Proposed Approach
In this section, we give the proposed method that includes the
score function, the representative sample selection, and the
template updating strategy. We present the score function in
Sec. 3.1 and describe the online representative sample selec-
tion in Sec. 3.2. Then, we give the template updating strategy
in Sec. 3.3 and summarize the whole algorithm in Sec. 3.4.
3.1 Score Function
The score function measures the similarity between the can-
didate and the templates. Given a candidate sample Vin the
tth frame, we divide it into mlocal patches uniformly,
that is, V¼fV1;V2;:::;Vmg. For each local patch Vi,
i¼1;2;:::;m, we extract the corresponding HOG features
xiRd,i¼1;2;:::;m. Concatenating all those local
feature vectors, we obtain a long feature vector x¼
½x1;x2;:::;xmfor the candidate sample V. For the associ-
ated ntarget template T¼fT1;T2;:::;Tng, we also divide
them into mpatches according to the same procedure above.
Journal of Electronic Imaging 033006-3 MayJun 2017 Vol. 26(3)
Ou et al.: Patch-based visual tracking with online representative sample selection
Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 05/13/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
Therefore, we get the feature set of the template, H¼
fH1;H2;:::;Hmg, where Hi¼½h1
i;h2
i;:::;hn
iRd×n
denotes the target template set of the ith patch, hj
iRdðj¼
1;2;:::;nÞdenotes the jth template of the ith patch.
Instead of learning a classifier, we utilize a score function
to evaluate the similarity between the candidate and the tem-
plate set. For the given candidate x, the score function is
defined by the inner product between the candidate and
the template below
EQ-TARGET;temp:intralink-;e004;63;650fðxÞ¼ 1
mX
m
i¼1
hxi;hii;(4)
where xiand hiare normalized, i.e., kxik2¼1,khik2¼1.
Obviously, the larger the value of the score function fðxÞis,
the more similar the candidate and the target template are.
Let Ai¼½x1
i;x2
i;:::;xk
iRd×kbe the corresponding
support sample set of the ith local patch, where xt
i,
t¼1;2;:::;k indicates the tth support sample of the ith
local patch. Based on the subspace assumption, each tem-
plate can be linearly represented by the support set, i.e.,
EQ-TARGET;temp:intralink-;e005;63;514hi¼Ai~
ωi. (5)
For the Ai, we denote the positive sample set as Aþ
iand
the negative sample set as A
i. Accordingly, the associated
coefficient vector ~
ωiis also separately denoted as positive
coefficients ~
ωþ
iand the negative coefficients ~
ω
i.
In a real application, the features dimension of xis usually
very high; the simple inner product is not suitable for meas-
uring the similarity between the target template and the
candidate sample.
Motivated by the kernel representer theorem,58 we map
the features into kernel space and then measure the similarity
in the kernel space. Substituting Eq. (5) into Eq. (4), we
obtain the following score function in the kernel space:
EQ-TARGET;temp:intralink-;e006;63;351fðxÞ¼ 1
mX
m
i¼1X
k
t¼1
ωt
ihxt
i;xii;(6)
where ~
ωi¼½ω1
i;ω2
i;:::;ωk
iTRk,ωt
i,t¼1;2;:::;k
denotes the coefficient of the tth kernel function corre-
sponding to the ith local patch.
As we know, the value of the score function will be larger
when the candidate is similar to the positive support sample.
Based on this fact, we can constrain the coefficient vector ~
ωi
as follows:
~
ωþ
iare all positive real numbers when the correspond-
ing sample is positive;
~
ω
iare all negative real numbers when the correspond-
ing sample is negative;
Thus, the final formulation of the score function can be
rewritten as follows:
EQ-TARGET;temp:intralink-;e007;326;622fðxÞ¼ 1
mX
m
i¼1X
k
t¼1
δðxt
iÞjωt
ijhxt
i;xii;(7)
where all the support samples are normalized, i.e.,
kxt
ik2¼1, and
EQ-TARGET;temp:intralink-;e008;326;550δðxt
iÞ¼1xt
iis positive support sample
1xt
iis negative support sample :(8)
3.2 Patch-Based Online Representative Sample
Selection
Given the target in a certain frame, the score of each local
patch can be obtained using the proposed score function.
If the score of the local patch is greater than the predefined
threshold α, then the local patch is regarded as a positive
sample and some patches around it will be extracted as neg-
ative samples by the polar grid, as shown in Fig. 3. If the
score of the local patch is smaller than the predefined thresh-
old α, then the local patch will be discarded.
After that, we can obtain the training sample set for
the tth frame Ti¼½x1
i;x2
i;:::;xr
iRd×r. For each local
patch xi, we calculate its ground truth by the following
formulation:
EQ-TARGET;temp:intralink-;e009;326;336gðxiÞ¼overlap½b; boxðxiÞ;(9)
where brepresents the area of the bounding box for the
tracked target and boxðxiÞindicates the area of the bounding
box of sample xi. The function overlap ð·;·Þcomputes the
intersection area of the two bounding boxes, as shown in
Fig. 3 The illustration of the patch-based sample selection. The candidate is divided into four local
patches uniformly. For a candidate, we can get the score of each local patch, which is classified as
positive sample or negative sample according to its score value and labeled by +or ,respectively.
When the local patch is labeled +,we will sample around it. When the local patch is labeled ,
the candidate will be discarded (such as the 37th frame).
Journal of Electronic Imaging 033006-4 MayJun 2017 Vol. 26(3)
Ou et al.: Patch-based visual tracking with online representative sample selection
Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 05/13/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
Fig. 4. For the training sample set, we can compute
its ground truth ~
yi, where ~
yi¼½y1
i;y
2
i;:::;y
r
iT¼
½gðx1
iÞ;gðx2
iÞ;:::;gðxr
iÞTRr. For the support sample set
Ai, the corresponding ground truth ~
si¼½s1
i;s
2
i;···;s
k
iT
Rkcan also be obtained by utilizing Eq. (9). Combining
the support sample set Aiand training sample set Ti,we
get the candidate sample set X¼½Ai;TiRd×ðkþrÞand
the associated ground truth ~
qi¼½
~
si;~
yiRðkþrÞ.
Suppose the candidate samples in Xare normalized, the
coefficients ~
ωiin the tth frame can be obtained by minimiz-
ing the following objective function:
EQ-TARGET;temp:intralink-;e010;63;401min
~ωiX
kþr
j¼1~
qj
iX
kþr
t¼1
δðxt
iÞjωt
ijhxt
i;xii2
;(10)
where ~
qj
iRdenotes the jth ground truth of ~
qi. Utilizing
matrix notation, Eq. (10) can be reformulated as follows:
EQ-TARGET;temp:intralink-;e011;63;328min
~
zi0
k~
qiðXTXEiÞ~
zik2
2;(11)
where Eiis the label matrix with its definition in Eq. (8),
denotes the elementwise product, and ~
zi¼½jω1
ij;jω2
ij;:::;
jωkþr
ijT. Let W¼XTXEi, Eq. (11) can be reformulated
as follows:
EQ-TARGET;temp:intralink-;e012;63;244min
~
zi0
k~
qiW~
zik2
2:(12)
This optimization can be solved by the standard non-neg-
ative least square method.59
3.3 Patch-Based Template Updating
The template update is very important in the tracking proc-
ess. If the template set Hiis fixed, the tracker will fail
because the appearance of target changes dynamically.
However, if the template set Hiis updated too frequently,
the errors will accumulate and the tracker will drift away
from the target.
In our method, we adopt a patch-based method to update
the template set. Given a local patch, if the score is greater
than the predefined threshold, we obtain the vector of coef-
ficient ~
ωiby solving Eq. (12). Then, the template hiis
updated by Eq. (5). If the number of templates in Hiis
below the given threshold η, we append the new template hi
into the template set Hi. Otherwise, we append the new tem-
plate into Hiand discard the oldest one.
3.4 Algorithm Implementation
The whole algorithm is summarized in Algorithm 1,which
includes a searching stage and an updating stage. In the
searching stage, the optimal candidate is obtained using
the score function. In the updating stage, some representa-
tive samples are selected by non-negative least squares,
and the target template set is updated using the selected
samples.
Fig. 4 The calculation of ground truths. The green bounding box
denotes the tracked result, and the red bounding box represents
the training sample. The blue circle is the sampling region of the
training samples.
Algorithm 1 Patch-based visual tracking with online representative
sample selection.
1. Inputs: Testing sequence ψ¼fI0;I1;:::;IFgand initial state b0.
Fis the number of the frame.
2. Outputs: The predicted optimal states fb1;b
2;:::;b
Fg.
3. Predefined template set Hand threshold α.
4. for t¼1to Fdo
5. Searching stage:
6. Generate Mcandidate samples by exploiting particle filter.
7. Divide it into mpatches uniformly for each candidate.
8. for i¼1to Mdo
9. Calculate the score of each local patch by Eq. (7).
10. Get Max½fðxÞ as the optimal candidate.
11. end for
12. Updating stage:
13. for i¼1to mdo
14. if fðxiÞ>αthen
15. Regard this local patch as a positive sample and extract
training samples around it as the negative samples by the
polar grid.
16. Get corresponding coefficient vector ~ωby solving Eq. (12).
17. Construct target template hiby Eq. (5) and append hi
into Hi.
18. end if
19. end for
20. end for
Journal of Electronic Imaging 033006-5 MayJun 2017 Vol. 26(3)
Ou et al.: Patch-based visual tracking with online representative sample selection
Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 05/13/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
(a) (b)
Fig. 5 Comparison of eight trackers on OTB 2013 in the precision and success plots with one pass
evaluation (OPE). Each line with different style and marker represents a tracker.
(a) (b) (c)
(d)
(g) (h)
(e) (f)
Fig. 6 Comparison of different trackers for the success plots on different attributes of OTB 2013. Each
line with different style and marker represents a tracker.
Journal of Electronic Imaging 033006-6 MayJun 2017 Vol. 26(3)
Ou et al.: Patch-based visual tracking with online representative sample selection
Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 05/13/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
4 Experiments
In this section, we introduce the experimental details in
Sec. 4.1 and present the experimental results in Sec. 4.2.
4.1 Implementation Details
4.1.1 Parameters setting
All the methods are implemented in MATLAB 2014 on a PC
with an Intel 3.7 GHz Dual Core CPU and 8 GB RAM. The
image patches are resized to 32 ×32 pixels, and the target is
divided into four local patches uniformly. For the HOG features,
the cell size and the number of orientation bins are set to 4 and
9, respectively. In the updating stage, the radius of the polar grid
is set to 5 and the number of the angular division is set to 16. In
the searching stage, the candidates number from the particle
filter is set to 750, and the size of the template set is set to 120.
Threshold αis set to 0.28, and ηis set to 200 empirically.
4.1.2 Datasets
The experiments are conducted on the object tracking bench-
mark (OTB) 201360 that contains 50 image sequences. These
sequences have 11 attributes (illumination variation, scale
variation, occlusion, deformation, motion blur, fast motion, in-
plane rotation, out-of-plane rotation, out-of-view, background
clutters, and low resolution). To test the robustness of the
proposed method for the occlusion, we select 13 sequences
(CarScale, David3, Dudek, FaceOcc1, FaceOcc2, Freeman4,
Tiger1, Tiger2, Jogging-1, Jogging-2, SUV, Walking2, and
Women) from the OTB 2013, which include different kinds
of occlusions. For example, FaceOcc1 and FaceOcc2 contain
partial occlusions, while Tiger1 and Tiger2 include severe
occlusions. Furthermore, Jogging-1 and Jogging-2 contain
nearly full occlusions.
4.1.3 Compared trackers
The following representative methods are selected to compare
with the proposed method: KCF,1DSST,61 CCT,5TGPR,62
0 50 100 150 200 250 300
0
50
100
150
200
250
300
350
400
Frame numbers
Center location error
David3
TGPR
Struck
KCF
DSST
CCT
TLD
RR
SVM
Ours
(a)
0 50 100 150 200 250 300
0
50
100
150
200
250
300
Frame numbers
Center location error
Freeman4
TGPR
Struck
KCF
DSST
CCT
TLD
RR
SVM
Ours
(b)
0 100 200 300 400 500 600
0
50
100
150
200
250
300
Frame numbers
Center location error
Woman
TGPR
Struck
KCF
DSST
CCT
TLD
RR
SVM
Ours
(c)
050 100 150 200 250 300 350 400
0
50
100
150
Frame numbers
Center location error
Tiger2
TGPR
Struck
KCF
DSST
CCT
TLD
RR
SVM
Ours
(d)
Fig. 7 Comparison of different trackers with CLE on four challenging sequences. Each marker with
different style represents a tracker.
Table 1 Comparison of the average frame rate of our tracker with
other eight trackers on OTB 2013.
Struck KCF DSST TGPR CCT TLD RR SVM Ours
FPS 20 172 24 3 51 28 4.5 7.8 1.5
Journal of Electronic Imaging 033006-7 MayJun 2017 Vol. 26(3)
Ou et al.: Patch-based visual tracking with online representative sample selection
Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 05/13/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
Table 2 The percentage of successful frames whose CLE is below the threshold 20 pixels. The best result is in italic, the second result is in bold,
and the average values are shown at the end of the table.
Struck KCF DSST TGPR CCT TLD RR SVM Ours
CarScale 0.69 0.81 0.76 0.72 0.79 0.64 0.70 0.63 0.84
David3 0.57 1.00 0.60 1.00 1.00 0.35 0.97 1.00 1.00
Dudek 0.86 0.88 0.82 0.87 0.90 0.64 0.85 0.89 0.92
FaceOcc1 0.73 0.73 0.91 0.79 0.76 0.17 0.30 0.24 0.84
FaceOcc2 1.00 0.97 1.00 0.99 0.94 0.82 0.39 0.96 0.89
Freeman4 0.41 0.53 0.96 0.90 1.00 0.37 0.45 0.16 0.97
Jogging-1 0.97 0.23 0.23 0.22 0.98 0.97 0.98 0.24 0.98
Jogging-2 0.18 0.16 0.19 0.99 0.19 0.95 0.97 0.16 0.99
SUV 0.18 0.98 0.98 0.53 0.98 0.94 0.96 0.52 0.97
Tiger1 0.61 0.85 0.57 0.25 0.95 0.24 0.60 0.33 0.90
Tiger2 0.43 0.36 0.30 0.86 0.86 0.35 0.47 0.42 0.88
Walking2 0.71 0.43 1.00 1.00 1.00 0.56 0.97 0.91 0.99
Woman 1.00 0.94 0.94 0.94 0.20 0.40 0.34 0.87 0.94
Average 0.64 0.68 0.71 0.77 0.81 0.57 0.69 0.56 0.93
Table 3 The percentage of successful frames whose overlap ratio is beyond the threshold 0.5. The best result is in italic, the second result is in
bold, and the average values are shown at the end of the table.
Struck KCF DSST TGPR CCT TLD RR SVM Ours
CarScale 0.45 0.44 0.85 0.48 1.00 0.69 0.46 0.63 0.74
David3 0.57 0.99 0.53 1.00 1.00 0.32 0.92 0.81 1.00
Dudek 0.97 0.98 0.99 0.88 1.00 0.67 0.87 0.96 0.99
FaceOcc1 1.00 1.00 1.00 0.98 1.00 0.56 0.78 0.95 1.00
FaceOcc2 1.00 1.00 1.00 1.00 0.99 0.79 0.36 0.57 0.95
Freeman4 0.24 0.18 0.44 0.74 0.63 0.22 0.10 0.12 0.85
Jogging-1 0.90 0.22 0.22 0.22 0.97 0.96 0.96 0.23 0.97
Jogging-2 0.16 0.16 0.18 0.99 0.19 0.95 0.96 0.14 0.97
SUV 0.16 0.99 0.98 0.54 0.98 0.94 0.92 0.53 0.98
Tiger1 0.62 0.86 0.59 0.26 0.96 0.23 0.69 0.23 0.93
Tiger2 0.43 0.36 0.30 0.89 0.87 0.20 0.23 0.28 0.89
Walking2 0.41 0.38 1.00 0.74 1.00 0.21 0.97 0.41 0.98
Woman 0.94 0.94 0.93 0.94 0.20 0.33 0.29 0.19 0.86
Average 0.60 0.65 0.69 0.74 0.83 0.54 0.65 0.47 0.93
Journal of Electronic Imaging 033006-8 MayJun 2017 Vol. 26(3)
Ou et al.: Patch-based visual tracking with online representative sample selection
Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 05/13/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
TLD,3RR,63 SVM,63 and struck.56 Among them, KCF,1
DSST,61 and CCT5are filtering-based trackers, which can han-
dle partial occlusions efficiently. TGPR,62 TLD,3RR,63 and
SVM63 are the discriminative method, which learns a classifier
to distinguish the target from the background. Struck56 learns
a prediction function instead of a classifier.
4.2 Experimental Results
4.2.1 Quantitative evaluation
Two measurements are adopted to evaluate the performance
for the trackers. One is the center location error (CLE), which
is defined as the average Euclidean distance between the
center location of the tracked target and the ground truth.
Fig. 8 The comparison between the proposed method and the four different trackers (RR, TLD, DSST,
and TGPR) in five frames of two sequences (FaceOcc1 and Tiger2). The red bounding box denotes the
tracked result.
Journal of Electronic Imaging 033006-9 MayJun 2017 Vol. 26(3)
Ou et al.: Patch-based visual tracking with online representative sample selection
Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 05/13/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
The other criterion is the pascal VOC overlap ratio, which is
defined as S¼jrtrajjrtraj, where rtand rarepresent
the bounding box of the tracked target and the ground truth,
respectively, and represent the intersection and union of
two regions, respectively, and j·jdenotes the number of the
pixels in the region. Obviously, the higher the percentage is,
the better performance the tracker obtains.
For the overall performance, we use one pass evaluation
for precision and success rates as an evaluation criterion, as
shown in Fig. 5. It is evident that our tracker achieves prom-
ising performance.
Figure 6presents the performance of all the trackers
with different attributes on the dataset of OTB 2013.
From Fig. 6(a), we can see that our tracker is robust to
Fig. 9 The comparison between the proposed method and the four different trackers (struck, SVM, KCF,
and CCT) in five frames of two sequences (SUV and Jogging2). The red bounding box denotes the
tracked result.
Journal of Electronic Imaging 033006-10 MayJun 2017 Vol. 26(3)
Ou et al.: Patch-based visual tracking with online representative sample selection
Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 05/13/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
occlusion compared with the other trackers. Meanwhile, our
tracker has obtained promising performance on the other var-
iations, such as deformation, motion blur, and scale variation.
Figure 7shows the CLE of different trackers on four chal-
lenging sequences. From this, we can see that our tracker has
a lower CLE in the all sequences. In Freaman4, our method
achieves a remarkable performance.
Table 1shows the average frame per second with several
other trackers on the same environment. Our method can
deal with 1.5 frame in each second and is slower than the
other methods. The main reason is that four local patches
are needed to process independently for our method.
For all 13 different challenging sequences, Table 2shows
the percentage of successful frames whose CLE is below the
threshold 20 pixels, and Table 3shows the percentage of suc-
cessful frames whose overlap ratio is beyond the threshold
0.5. From that, we can see that our method can accurately
track the targets and outperforms the other compared trackers
in most sequences. For the sequences that contain partial
occlusions, our tracker outperforms most of the compared
trackers. For the sequences with severe occlusions, our
tracker also outperforms all compared trackers.
4.2.2 Qualitative evaluation
To qualitatively evaluate the performance, we conduct two
experiments. First, we conduct an experiment on the sequen-
ces of FaceOcc1 and Tiger2, which include partial occlu-
sions. Four different trackers (RR, TLD, TGPR, and
DSST) are selected to compare. The results on five different
frames are shown in Fig. 8. For the 466th and 740th frames
of FaceOcc1, it can be seen that RR and TLD drift away from
the target seriously, whereas our method tracks accurately.
This is because our method can exploit the other visible
local patches to find the optimal candidate when the face
is partially occluded by the book. Meanwhile, our method
also outperforms TGPR and DSST on the sequence Tiger2.
For the second experiment, we compare our tracker with
four different trackers (struck, SVM, KCF, and CCT) on two
sequences with severe occlusions and even full occlusions
(SUV and Jogging-2). The results on five frames are shown
in Fig. 9. For the 524th and 561th frames of SUV, it is clear
that struck and SVM fail to track the target when the car is
severely occluded by the tree, whereas our method tracked
the object correctly. For the 56th and 63rd frames in Jogging-
2, our method can track the target when the lady is fully
occluded by the telegraph pole for a short time while KCF
and CCT cannot. This is because our method has a robust
updating strategy for the target template set. The patches
will be discarded when they are occluded.
5 Conclusion
In this paper, we propose a patch-based visual tracker with
online representative sample selection to deal with the occlu-
sion problem. As a kind of discriminative method, we pro-
pose a score function to predict the optimal candidate and
utilize non-negative least square to select representative
samples. Then, the templates are updated using the selected
samples efficiently. Experimental results and comparisons
with other trackers on the benchmarks show that our method
achieved promising results and can effectively deal with
partial occlusions or severe occlusions.
Acknowledgments
This work was supported by the National Natural Science
Foundation of China (Nos. 61402122, 61672183, and
61272252), Science and Technology Planning Project of
Guanddong Province (Grant No. 2016B090918047), Natural
Science Foundation of Guangdong Province (Grant
No. 2015A030313544), Shenzhen Research Council (Grant
Nos. JCYJ20160406161948211, JCYJ20160226201453085,
and JSGG20150331152017052), and the 2014 PhD Recruit-
ment Program of Guizhou Normal University, the Outstanding
Innovation Talents of Science and Technology Award Scheme
of Education Department in Guizhou Province (Qian jiao
KY word[2015]487), Natural Science Foundation of
Guizhou (LH[2015]7784), the China Scholarship Council
(No. 201508525007), and Fund of Guizhou Educational
Department (KY[2016]027). No conflicts.
References
1. J. F. Henriques et al., High-speed tracking with kernelized correlation
filters,IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583596 (2015).
2. S. Hong et al., Online tracking by learning discriminative saliency map
with convolutional neural network,in Int. Conf. on Machine Learning
(ICML), pp. 597606 (2015).
3. Z. Kalal, K. Mikolajczyk, and J. Matas, Tracking learning detection,
IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 14091422 (2012).
4. X. Li et al., A multi-view model for visual tracking via correlation
filters,Knowl.-Based Syst. 113,8899 (2016).
5. G. Zhu et al., Collaborative correlation tracking,in British Machine
Vision Conf. (BMVC), pp. 112 (2015).
6. X. Li, X. You, and C. L. P. Chen, A novel joint tracker based on occlu-
sion detection,Knowl.-Based Syst. 71, 409418 (2014).
7. Z. He et al., Connected component model for multi-object tracking,
IEEE Trans. Image Process. 25(8), 36983711 (2016).
8. X. Sun et al., Non-rigid object contour tracking via a novel supervised
level set model,IEEE Trans. Image Process. 24(11), 33863399
(2015).
9. Z. Chen et al., Dynamically modulated mask sparse tracking,IEEE
Trans. Cybern. PP, 113 (2016).
10. J. Santner et al., PROST: parallel robust online simple tracking,in
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
pp. 723730 (2010).
11. M. Zhai, M. J. Roshtkhari, and G. Mori, Deep learning of appearance
models for online object tracking,arXiv preprint arXiv:1607.02568
(2016).
12. X. Lan, A. J. Ma, and P. C. Yuen, Multi-cue visual tracking using
robust feature-level fusion based on joint sparse representation,in
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
pp. 11941201 (2014).
13. X. Lan et al., Joint sparse representation and robust feature-level fusion
for multi-cue visual tracking,IEEE Trans. Image Process. 24(12),
58265841 (2015).
14. X. Lan, S. Zhang, and P. C. Yuen, Robust joint discriminative feature
learning for visual tracking,in Proc. Int. Joint Conf. on Artificial
Intelligence, pp. 34033410 (2016).
15. R. Liu et al., Robust visual tracking using dynamic feature weighting
based on multiple dictionary learning,in IEEE Conf. on Signal
Processing (EUSIPCO), pp. 21662170 (2016).
16. S. Zhang et al., Robust visual tracking using structurally random pro-
jection and weighted least squares,IEEE Trans. Circuits Syst. Video
Technol. 25(11), 17491760 (2015).
17. X. Lan, P. C. Yuen, and R. Chellappa, Robust mil-based feature tem-
plate learning for object tracking,in Thirty-First Conf. on Atificial
Intelligence (AAAI), pp. 41184125 (2017).
18. H. Lu et al., Wound intensity correction and segmentation with con-
volutional neural networks,Concurrency Comput. Pract. Exper. 29
(2016).
19. L. Chen, C. L. P. Chen, and M. Lu, A multiple-kernel fuzzy c-means
algorithm for image segmentation,IEEE Trans. Syst. Man Cybern.
41(5), 12631274 (2011).
20. J. Qian et al., Accurate tilt sensing with linear model,IEEE Sens. J.
11(10), 23012309 (2011).
21. R. Liu, Y. Tang, and B. Fang, Topological coding and its application in
the refinement of SIFT,IEEE Trans. Cybern. 44(11), 21552166
(2014).
22. L. Liu et al., Weighted joint sparse representation for removing mixed
noise in image,IEEE Trans. Cybern. 47, 600611 (2016).
Journal of Electronic Imaging 033006-11 MayJun 2017 Vol. 26(3)
Ou et al.: Patch-based visual tracking with online representative sample selection
Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 05/13/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
23. Q. Ge et al., Structure-based low-rank model with graph nuclear norm
regularization for noise removal,IEEE Trans. Image Process. PP, 11
(2016).
24. W. Ou et al., Robust face recognition via occlusion dictionary learn-
ing,Pattern Recognit. 47(4), 15591572 (2014).
25. W. Ou et al., Multi-view non-negative matrix factorization by patch
alignment framework with view consistency,Neurocomputing 204,
116124 (2016).
26. X.-Y. Jing et al., Multi-spectral low-rank structured dictionary learning
for face recognition,Pattern Recognit. 59,1425 (2016).
27. W.-S. Chen et al., Supervised kernel nonnegative matrix factorization
for face recognition,Neurocomputing 205, 165181 (2016).
28. W.-S. Chen et al., Semi-supervised discriminant analysis method for
face recognition,Int. J. Wavelets Multiresolution Inf. Process. 13,
1550049 (2015).
29. B. Chen et al., Color image analysis by quaternion-type moments,
J. Math. Imaging Vision 51(1), 124144 (2015).
30. W. K. Wong et al., Joint tensor feature analysis for visual object rec-
ognition,IEEE Trans. Cybern. 45(11), 24252436 (2015).
31. Z. Lai et al., Human gait recognition via sparse discriminant projection
learning,IEEE Trans. Circuits Syst. Video Technol. 24(10), 1651
1662 (2014).
32. Z. He, X. You, and Y. Y. Tang, Writer identification of Chinese hand-
writing documents using hidden Markov tree model,Pattern Recognit.
41(4), 12951307 (2008).
33. Z. He et al., Writer identification using fractal dimension of wavelet
subbands in Gabor domain,Integr. Comput. Aided Eng. 17, 157165
(2010).
34. Z. He and A. C. Chung, 3-D B-spline wavelet-based local standard
deviation (BWLSD): its application to edge detection and vascular seg-
mentation in magnetic resonance angiography,Int. J. Comput. Vision
87(3), 235265 (2010).
35. A. Adam, E. Rivlin, and I. Shimshoni, Robust fragments-based
tracking using the integral histogram,in IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), pp. 798805 (2006).
36. X. Jia, H. Lu, and M.-H. Yang, Visual tracking via adaptive structural
local sparse appearance model,in IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR), pp. 18221829 (2012).
37. T. Zhang et al., Partial occlusion handling for visual tracking via robust
part matching,in IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), pp. 12581265 (2014).
38. T. Liu, G. Wang, and Q. Yang, Real-time part-based visual tracking via
adaptive correlation filters,in IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR), pp. 49024912 (2015).
39. L. Cehovin, M. Kristan, and A. Leonardis, Robust visual tracking
using an adaptive coupled-layer visual model,IEEE Trans. Pattern
Anal. Mach. Intell. 35(4), 941953 (2013).
40. Z. He et al., Robust object tracking via key patch sparse representa-
tion,IEEE Trans. Cybern. 47, 354364 (2016).
41. J. Kwon and K. M. Lee, Highly nonrigid object tracking via patch-
based dynamic appearance modeling,IEEE Trans. Pattern Anal.
Mach. Intell. 35(10), 24272441 (2013).
42. Y. Li, J. Zhu, and S. C. Hoi, Reliable patch trackers: robust visual
tracking by exploiting reliable patches,in IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), pp. 353361 (2015).
43. M. Sun and S. Savarese, Articulated part-based model for joint object
detection and pose estimation,in IEEE Int. Conf. on Computer Vision
(ICCV), pp. 723730 (2011).
44. W. Xiang and Y. Zhou, Part-based tracking with appearance learning
and structural constrains,in Int. Conf. on Neural Information
Processing, pp. 594601 (2014).
45. S. Battiato et al., An integrated system for vehicle tracking and clas-
sification,Expert Syst. Appl. 42(21), 72637275 (2015).
46. D. A. Ross et al., Incremental learning for robust visual tracking,
Int. J. Comput. Vision 77(1), 125141 (2008).
47. B. Ristic, S. Arulampalam, and N. J. Gordon, Beyond the Kalman
Filter: Particle Filters for Tracking Applications, Artech House,
Boston (2004).
48. X. Ma et al., Visual tracking via exemplar regression model,Knowl.-
Based Syst. 106,2637 (2016).
49. Q. Liu et al., Visual object tracking with online sample selection via
lasso regularization,in Signal, Image and Video Processing, Springer,
London (2017).
50. C. Leistner et al., On robustness of on-line boostinga competitive
study,in IEEE Int. Conf. on Computer Vision Workshops (ICCV
Workshops), pp. 13621369 (2009).
51. H. Masnadi-Shirazi, V. Mahadevan, and N. Vasconcelos, On the design
of robust classifiers for computer vision,in IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), pp. 779786 (2010).
52. H. Grabner, C. Leistner, and H. Bischof, Semi-supervised on-line
boosting for robust tracking,in European Conf. on Computer Vision
(ECCV), pp. 234247 (2008).
53. A. Saffari et al., Robust multi-view boosting with priors,in European
Conf. on Computer Vision (ECCV), pp. 776789 (2010).
54. B. Babenko, M.-H. Yang, and S. Belongie, Visual tracking with online
multiple instance learning,in IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR), pp. 983990 (2009).
55. B. Zeisl et al., On-line semi-supervised multiple-instance boosting,in
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
pp. 18791879 (2010).
56. S. Hare, A. Saffari, and P. H. Torr, Struck: structured output tracking
with kernels,in IEEE Int. Conf. on Computer Vision (ICCV), pp. 263
270 (2011).
57. A. Bordes et al., Solving multiclass support vector machines with
LaRank,in Int. Conf. on Machine Learning (ICML), pp. 8996 (2007).
58. B. Schölkopf, R. Herbrich, and A. J. Smola, A generalized representer
theorem,in Int. Conf. on Computational Learning Theory, pp. 416
426 (2001).
59. C. L. Lawson and R. J. Hanson, Solving Least Squares Problems,
Vol. 161, SIAM, Philadelphia (1974).
60. Y. Wu, J. Lim, and M.-H. Yang, Online object tracking: a benchmark,
in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
pp. 24112418 (2013).
61. M. Danelljan et al., Accurate scale estimation for robust visual
tracking,in British Machine Vision Conf. (BMVC) (2014).
62. J. Gao et al., Transfer learning based visual tracking with Gaussian
processes regression,in European Conf. on Computer Vision (ECCV),
pp. 188203 (2014).
63. N. Wang et al., Understanding and diagnosing visual tracking sys-
tems,in IEEE Int. Conf. on Computer Vision (ICCV), pp. 3101
3109 (2015).
Weihua Ou received his MS degree in mathematics from Southeast
University, Nanjing, China, in 2006 and his PhD in information and
communication engineering from Huazhong University of Science
and Technology, China, in 2014. Currently, he is an associate profes-
sor at the School of Big Data and Computer Science at Guizhou
Normal University, Guiyang, China. His current research interests
include sparse representation, multiview learning, and image
processing and computer vision.
Di Yuan graduated from Harbin University of Commerce, Harbin,
China, in 2015. He is pursuing his masters degree in science with
the research institute of Biocomputing, School of Science, Harbin
Institute of Technology Shenzhen Graduate School, China. His cur-
rent research interests include object tracking and kernel methods.
Donghao Li is pursuing his masters degree in science with the
research institute of Biocomputing, School of Science, Harbin
Institute of Technology Shenzhen Graduate School, China. His cur-
rent research interests include machine learning and computer vision.
Biographies of the other authors are not available.
Journal of Electronic Imaging 033006-12 MayJun 2017 Vol. 26(3)
Ou et al.: Patch-based visual tracking with online representative sample selection
Downloaded From: http://electronicimaging.spiedigitallibrary.org/ on 05/13/2017 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
... Danelljan et al. [5] exploit the color attributes of the target object and learn an adaptive correlation filter. The literature [21] proposes a patch-based visual tracker that divides the object and the candidate area into several small blocks evenly and uses the average score of the overall small blocks to determine the optimal candidate, which greatly improves under the occlusion circumstances. The literature [22] proposes an online representative sample selection method to construct an effective observation module that can handle occasional large appearance changes or severe occlusion. ...
Article
Full-text available
Long-term visual tracking undergoes more challenges and is closer to realistic applications than short-term tracking. However, the performances of most existing methods have been limited in the long-term tracking tasks. In this work, we present a reliable yet simple long-term tracking method, which extends the state-of-the-art learning adaptive discriminative correlation filters (LADCF) tracking algorithm with a re-detection component based on the support vector machine (SVM) model. The LADCF tracking algorithm localizes the target in each frame, and the re-detector is able to efficiently re-detect the target in the whole image when the tracking fails. We further introduce a robust confidence degree evaluation criterion that combines the maximum response criterion and the average peak-to-correlation energy (APCE) to judge the confidence level of the predicted target. When the confidence degree is generally high, the SVM is updated accordingly. If the confidence drops sharply, the SVM re-detects the target. We perform extensive experiments on the OTB-2015 and UAV123 datasets. The experimental results demonstrate the effectiveness of our algorithm in long-term tracking.
... The literature [21] proposes a patch-based visual tracker that divides the object and the candidate area into several small blocks evenly, and uses the average score of overall small blocks to determine the optimal candidate, which greatly improves under the occlusion circumstances. The literature [22] proposes an online representative sample selection method to construct an effective observation module that can handle occasional large appearance changes or severe occlusion. ...
Preprint
Full-text available
The long-term visual tracking undergoes more challenges and is closer to realistic applications than short-term tracking. However, the performances of most existing methods have been limited in the long-term tracking tasks. In this work, we present a reliable yet simple long-term tracking method, which extends the state-of-the-art Learning Adaptive Discriminative Correlation Filters (LADCF) tracking algorithm with a re-detection component based on the SVM model. The LADCF tracking algorithm localizes the target in each frame and the re-detector is able to efficiently re-detect the target in the whole image when the tracking fails. We further introduce a robust confidence degree evaluation criterion that combines the maximum response criterion and the average peak-to correlation energy (APCE) to judge the confidence level of the predicted target. When the confidence degree is generally high, the SVM is updated accordingly. If the confidence drops sharply, the SVM re-detects the target. We perform extensive experiments on the OTB-2015 and UAV123 datasets. The experimental results demonstrate the effectiveness of our algorithm in long-term tracking.
... PF algorithms have been studied in visual object tracking for many years and their variations are still widely used nowadays as it is neither limited to linear systems nor requires the noise to be Gaussian [10][11][12]. The traditional PF algorithm implements a recursive Bayesian framework by using the nonparametric Monte Carlo sampling method, which can effectively track target objects in most scenes. ...
Article
Full-text available
A robust tracking method is proposed for complex visual sequences. Different from time-consuming offline training in current deep tracking, we design a simple two-layer online learning network which fuses local convolution features and global handcrafted features together to give the robust representation for visual tracking. The target state estimation is modeled by an adaptive Gaussian mixture. The motion information is used to direct the distribution of the candidate samples effectively. And meanwhile, an adaptive scale selection is addressed to avoid bringing extra background information. A corresponding object template model updating procedure is developed to account for possible occlusion and minor change. Our tracking method has a light structure and performs favorably against several state-of-the-art methods in tracking challenging scenarios on the recent tracking benchmark data set.
... As for matching/registration methods, there are also many existing researches. In particular, Di Yuan et al. [10] proposed a patch-based visual method for with object selection/segmentation. P Bourgeat et al. [19], [20] proposed a segmentation algorithm which is suitable for semiconductor wafer images generated by optical inspection tools. ...
Article
Full-text available
In integrated circuit manufacturing industry, in order to meet the high demand of electronic products, wafers are designed to be smaller and smaller, which makes automatic wafer defect detection a great challenge. The existing wafer defect detection methods are mainly based on the precise segmentation of one single wafer, which relies on high-cost and complicated hardware instruments. The segmentation performance obtained is unstable because there are too many limitations brought by hardware implementations such as the camera location, the light source location, and the product location. To address this problem, in this paper, we propose a method for wafer defect detection. This novel method includes two phases, namely wafer segmentation and defect detection. In wafer segmentation phase, the target wafer image is segmented based on the affine iterative closest algorithm with spatial feature points guided (AICP-FP). In wafer defect detection phase, with the inherent characteristics of wafers, a simple and effective algorithm based on machine vision is proposed. The simulations demonstrate that, with these two phases, the higher accuracy and higher speed of wafer defect detection can be achieved at the same time. For real industrial system, this novel method can satisfy the real-time detection requirements of automatic production line.
... Some visual object tracking methods applied representational based methods with pre-computed fixed appearance models [5]; however, the visual appearance of the tracked target object may change along the time and for this reason they may interrupt tracking the target object after a period of time when the tracking conditions change (e.g., the scene illumination changes, occlusions). Some authors proposed to use the data generated during the tracking process to accommodate possible target appearance changes, such as in online learning [6], incremental learning for visual tracking (ivt) [7], patch based approach with online representation of samples [8], and in online feature learning techniques based on dictionaries [1]. Often, online visual tracking methods tend to miss the target object in complex scenarios, such as when the head pose changes while tracking faces, or in cluttered backgrounds and/or in object occlusions [9]. ...
Article
Full-text available
In this work, we propose an adaptive face tracking scheme that compensates for possible face tracking errors during its operation. The proposed scheme is equipped with a tracking divergence estimate, which allows to detect early and minimize the face tracking errors, so the tracked face is not missed indefinitely. When the estimated face tracking error increases, a resyncing mechanism based on Constrained Local Models (CLM) is activated to reduce the tracking errors by re-estimating the tracked facial features' locations (e.g., facial landmarks). To improve the Constrained Local Model (CLM) feature search mechanism, a Weighted-CLM (W-CLM) is proposed and used in resyncing. The performance of the proposed face tracking method is evaluated in the challenging context of driver monitoring using yawning detection and talking video datasets. Furthermore, an improvement in a yawning detection scheme is proposed. Experiments suggest that our proposed face tracking scheme can obtain a better performance than comparable state-of-the-art face tracking methods and can be successfully applied in yawning detection.
... Hence, it is in extraordinary need of a programmed indicator to relieve the genuine negative effects brought about by the fake news [6]. There are many methodology such as correlation filter based tracking algorithms [7], non-negative least square algorithm [8], Online Representative Sample Selection method [9], regularization framework [10], multiple feature fused model [11] have been introduced. ...
Article
Full-text available
These days online networking is generally utilized as the wellspring of data as a result of its ease, simple to get to nature. In any case, expending news from online life is a twofold edged sword as a result of the widespread of fake news, i.e., news with purposefully false data. Fake news is a major issue since it affects people just as society substantial. In the internet based life, the data is spread quick and subsequently discovery component ought to almost certainly foresee news quick enough to stop the dispersal of fake news. Consequently, identifying fake news via web-based networking media is a critical and furthermore an in fact testing issue. In this paper, Ensemble Voting Classifier based, an intelligent detection system is proposed to deal with news classification both real and fake tasks. Here, eleven mostly well-known machine-learning algorithms like Naïve Bayes, K-NN, SVM, Random Forest, Artificial Neural Network, Logistic Regression, Gradient Boosting, Ada Boosting, etc. are used for detection. After cross-validation, we used the best three machine-learning algorithms in Ensemble Voting Classifier. The experimental outcomes affirm that the proposed framework can accomplish to about 94.5% outcomes as far as accuracy. The other parameters like ROC score, precision, recall and F1 are also outstanding. The proposed recognition framework can effectively find the most important highlights of the news. These can also be implemented in other classification techniques to detect fake profiles, fake message, etc.
... They did not use any classi¯er for classifying the results. In the recent years, a patch-based visual tracking with online representative sample selection approach 16 has been proposed. To tackle the problem of intensity inhomogeneity in medical images, a model to segment and correct bias¯eld moderately and simultaneously for MR images 17 has been proposed. ...
Article
The chromosomes are the carriers of the geometric information, any alteration in the structure or number of these chromosomes is termed as genetic defect. These alterations cause malfunctioning in the proteins and are cause of the various underlying medical conditions that are hard to cure or detect by normal clinical procedures. In order to detect the underlying causes of these defects, the cells of the humans need to be imaged during the mitosis phase of cell division. During this phase, the chromosomes are the longest and can be easily studied and the alterations in the structure and count of the chromosomes can be analyzed easily. The chromosomes are non-rigid objects, due to which they appear in varied orientations, which makes them hard to be analyzed for the detection of structural defects. In order to detect the genetic abnormalities due to structural defects, the chromosomes need to be in straight orientation. Therefore, in this work, we propose to classify the segmented chromosomes from the metaspread images into straight, bent, touching overlapping or noise, so that the bent, touching, overlapping chromosomes can be preprocessed and straightened and the noisy objects be discarded. The classification has been done using a set of 17 different geometric features. We have proposed a Multilayer Perceptron-based classification approach to classify the chromosomes extracted from metaspread images into five distinct categories considering their orientation. The results of the classification have been analyzed using the segmented objects of the Advance Digital Imaging Research (ADIR) dataset. The proposed technique is capable of classifying the segmented chromosomes with 94.28% accuracy. The performance of the proposed technique has been compared with seven other state-of-the-art classifiers and superior results have been achieved by the proposed method.
... Therefore, to study the effective numerical solution of this kind of integral equation has become a research direction that mathematicians, natural science workers, and engineering technicians strive to open up. In recent years, the numerical solution of Fredholm integral equation has been greatly developed [10][11][12][13][14]. ...
Article
Full-text available
In the field of engineering technology, many problems can be transformed into the first kind Fredholm integral equation, which has a prominent feature called “ill-posedness”. This property makes it difficult to find the analytical solution of first kind Fredholm integral equation. Therefore, how to find the numerical solution of first kind Fredholm integral equation has been a common concern of domestic and overseas scholars in recent years. In this article, various numerical solution methods of first kind Fredholm integral equation are introduced in detail. First, the existence and convergence of the solution of the integral equation are given. Second, the current mainstream numerical methods, such as regularization method, wavelet analysis method and multilevel iteration method are introduced in detail. Finally, we presented a concise overview of the numerical method of first kind Fredholm integral equation.
Article
Full-text available
Considering the problems of motion blur, partial occlusion and fast motion in target tracking, a target tracking method based on adaptive structured sparse representation with attention is proposed. Under the framework of particle filtering, the performance of high-quality templates is enhanced through an attention mechanism. Structure sparseness is used to build candidate target sets and sparse models between candidate samples and local patches of target templates. Combined with the sparse residual method, reconstruction error is reduced. After optimally solving the model, the particle with the highest similarity is selected as the prediction target. The most appropriate scale is selected according to the multiscale factor method. Experiments show that the proposed algorithm has a strong performance when dealing with motion blur, fast motion, partial occlusion.
Article
Full-text available
In the past years, discriminative methods are popular in visual tracking. The main idea of the discriminative method is to learn a classifier to distinguish the target from the background. The key step is the update of the classifier. Usually, the tracked results are chosen as the positive samples to update the classifier, which results in the failure of the updating of the classifier when the tracked results are not accurate. After that the tracker will drift away from the target. Additionally, a large number of training samples would hinder the online updating of the classifier without an appropriate sample selection strategy. To address the drift problem, we propose a score function to predict the optimal candidate directly instead of learning a classifier. Furthermore, to solve the problem of a large number of training samples, we design a sparsity-constrained sample selection strategy to choose some representative support samples from the large number of training samples on the updating stage. To evaluate the effectiveness and robustness of the proposed method, we implement experiments on the object tracking benchmark and 12 challenging sequences. The experiment results demonstrate that our approach achieves promising performance.
Article
Full-text available
Robustness and efficiency are the two main goals of existing trackers. Most robust trackers are implemented with combined features or models accompanied with a high computational cost. To achieve a robust and efficient tracking performance, we propose a multi-view correlation tracker to do tracking. On one hand, the robustness of the tracker is enhanced by the multi-view model, which fuses several features and selects the more discriminative features to do tracking. On the other hand, the correlation filter framework provides a fast training and efficient target locating. The multiple features are well fused on the model level of correlation filer, which are effective and efficient. In addition, we raise a simple but effective scale-variation detection mechanism, which strengthens the stability of scale variation tracking. We evaluate our tracker on online tracking benchmark (OTB) and two visual object tracking benchmarks (VOT2014, VOT2015). These three datasets contains more than 100 video sequences in total. On all the three datasets, the proposed approach achieves promising performance.
Article
Full-text available
This paper introduces a novel deep learning based approach for vision based single target tracking. We address this problem by proposing a network architecture which takes the input video frames and directly computes the tracking score for any candidate target location by estimating the probability distributions of the positive and negative examples. This is achieved by combining a deep convolutional neural network with a Bayesian loss layer in a unified framework. In order to deal with the limited number of positive training examples, the network is pre-trained offline for a generic image feature representation and then is fine-tuned in multiple steps. An online fine-tuning step is carried out at every frame to learn the appearance of the target. We adopt a two-stage iterative algorithm to adaptively update the network parameters and maintain a probability density for target/non-target regions. The tracker has been tested on the standard tracking benchmark and the results indicate that the proposed solution achieves state-of-the-art tracking results.
Article
Because of appearance variations, training samples of the tracked targets collected by the online tracker are required for updating the tracking model. However, this often leads to tracking drift problem because of potentially corrupted samples: 1) contaminated/outlier samples resulting from large variations (e.g. occlusion, illumination), and 2) misaligned samples caused by tracking inaccuracy. Therefore, in order to reduce the tracking drift while maintaining the adaptability of a visual tracker, how to alleviate these two issues via an effective model learning (updating) strategy is a key problem to be solved. To address these issues, this paper proposes a novel and optimal model learning (updating) scheme which aims to simultaneously eliminate the negative effects from these two issues mentioned above in a unified robust feature template learning framework. Particularly, the proposed feature template learning framework is capable of: 1) adaptively learning uncontaminated feature templates by separating out contaminated samples, and 2) resolving label ambiguities caused by misaligned samples via a probabilistic multiple instance learning (MIL) model. Experiments on challenging video sequences show that the proposed tracker performs favourably against several state-of-the-art trackers.
Conference Paper
Using multiple features in appearance modeling has shown to be effective for visual tracking. In this paper, we dynamically measured the importance of different features and proposed a robust tracker with the weighted features. By doing this, the dictionaries are improved in both reconstructive and discriminative way. We extracted multiple features of the target, and obtained multiple sparse representations, which plays an essential role in the classification issue. After learning independent dictionaries for each feature, we then implement weights to each feature dynamically, with which we select the best candidate by a weighted joint decision measure. Experiments have shown that our method outperforms several recently proposed trackers.
Article
Nonlocal image representation methods, including group-based sparse coding and BM3D, have shown their great performance in application to low-level tasks. The nonlocal prior is extracted from each group consisting of patches with similar intensities. Grouping patches based on intensity similarity, however, gives rise to disturbance and inaccuracy in estimation of the true images. To address this problem, we propose a structure-based low-rank model with graph nuclear norm regularization. We exploit the local manifold structure inside a patch and group the patches by the distance metric of manifold structure. With the manifold structure information, a graph nuclear norm regularization is established and incorporated into a low-rank approximation model. We then prove that the graph-based regularization is equivalent to a weighted nuclear norm and the proposed model can be solved by a weighted singular-value thresholding algorithm. Extensive experiments on additive white Gaussian noise removal and mixed noise removal demonstrate that the proposed method achieves better performance than several state-of-the-art algorithms.