ArticlePDF Available

Visual object tracking with online sample selection via lasso regularization

Authors:

Abstract and Figures

In the past years, discriminative methods are popular in visual tracking. The main idea of the discriminative method is to learn a classifier to distinguish the target from the background. The key step is the update of the classifier. Usually, the tracked results are chosen as the positive samples to update the classifier, which results in the failure of the updating of the classifier when the tracked results are not accurate. After that the tracker will drift away from the target. Additionally, a large number of training samples would hinder the online updating of the classifier without an appropriate sample selection strategy. To address the drift problem, we propose a score function to predict the optimal candidate directly instead of learning a classifier. Furthermore, to solve the problem of a large number of training samples, we design a sparsity-constrained sample selection strategy to choose some representative support samples from the large number of training samples on the updating stage. To evaluate the effectiveness and robustness of the proposed method, we implement experiments on the object tracking benchmark and 12 challenging sequences. The experiment results demonstrate that our approach achieves promising performance.
Content may be subject to copyright.
Signal, Image and Video Processing manuscript No.
(will be inserted by the editor)
Visual Object Tracking with Online Sample Selection via Lasso
Regularization
Qiao Liu ·Xiao Ma ·Weihua Ou B
·Quan Zhou
Received: date / Accepted: date
Abstract In the past years, discriminative methods are pop-
ular in visual tracking. The main idea of the discriminative
method is to learn a classifier to distinguish the target from
the background. The key step is the update of the classifier.
Usually, the tracked results are chosen as the positive sam-
ples to update the classifier, which results in the failure of the
updating of the classifier when the tracked results are not ac-
curate. After that, the tracker will drift away from the target.
Additionally, a large number of training samples would hin-
der the online updating of the classifier without an appropri-
ate sample selection strategy. To address the drift problem,
we propose a score function to predict the optimal candi-
date directly instead of learning a classifier. What’s more,
to solve the problem of a large number of training samples,
we design a sparsity constrained sample selection strategy to
choose some representative support samples from the large
number of training samples on the updating stage. To evalu-
ate the effectiveness and robustness of the proposed method,
we implement experiments on the OTB (Object Tracking
Benchmark) and 12 challenging sequences. The experimen-
t results demonstrate that our approach achieved promising
performance.
Weihua Ou is the corresponding author
Qiao Liu ·Weihua Ou(B)
School of Big Data and Computer Science, Guizhou Normal Universi-
ty, Guiyang, China
E-mail: liuqiao.hit@gmail.com, ouweihuahust@gmail.com
Xiao Ma
School of Computer Science, Harbin Institute of Technology Shenzhen
Graduate School, Shenzhen, China
E-mail: turingki@yeah.net
Quan Zhou
Key Lab of Ministry of Education for Broad Band Communication
and Sensor Network Technology, Nanjing University of Posts and T-
elecommunications, Nanjing, China
E-mail: quan.zhou@njupt.edu.cn
Keywords Discriminative method ·Object drift ·Score
function ·Samples selection ·Sparse constraint
1 Introduction
Visual object tracking is an important computer vision prob-
lem in real applications, such as surveillance, human com-
puter interaction, vehicle navigation, and so on. Several ap-
proaches have been proposed in the past years, which can
be classified into generative methods [14,17,19,20] and dis-
criminative methods [1,5,7,12,10,27,3,24]. Generative meth-
ods focus on modeling the appearance of the object which
might be varied in a different frame. Discriminative meth-
ods cast object tracking as a classification problem that dis-
tinguishes the tracked target from the background.
Discriminative methods become more popular mainly
because they do not need to construct a complex appearance
model. Some representative discriminative methods have re-
ceived much attention in recent years. For instances, Kalal
et al. [12] proposed a novel tracking framework (TLD) that
decomposes the long-term tracking task into tracking, learn-
ing and detection. Zhu and Wang et al. [27] presented a
collaborative correlation tracker (CCT) to deal with the s-
cale variation and the drift problem. Gao and Ling et al. [4]
proposed a new transfer learning based visual tracker to al-
leviate drift using gaussian processes regression (TGPR).
Danelljan et al. [3] proposed a novel approach (DSST) by
learning discriminative correlation filters based on a scale
pyramid representation in the tracking-by-detection frame-
work. Babenko and Yang et al. [1] proposed the multiple in-
stance learning (MIL) instead of traditional supervised learn-
ing to handle the drift problem. Henriques et al. [10] present-
ed a high-speed kernelized correlation filter (KCF) by using
a circulant matrix. All of these methods achieved satisfied
2 Qiao Liu et al.
……… ...
h1h2h3hn
Target template set
Sampling samples
Old support
samples
training sample set
New support samples
hi
Update
template
l1regularized
least squares
Step 1: Searching stage
t frame
Candidates from Particle filter
Inner product
0.96 0.13
0.14
0.21
t frame
Candidate’s score
t frame
Polar grid Sampling
Step 2: Updating stage
Fig. 1 The illustration of the proposed method. Searching stage: using the score function to calculate every candidate’s score and choosing the
maximum one as the optimal candidate. Updating stage: extracting a set of training samples using polar grid sampling around the currently
tracked target; appending these samples into the old representative sample set to form a support sample set; exploiting `1-regularized least squares
to obtain some representative samples and updating the target template. The green and red bounding box denote the positive sample and negative
sample, respectively.
performance on the OTB [26] and received much attention
from the researchers in recent years.
Generally, discriminative method trains a classifier to i-
dentify the object, which heavily depends on the selection
of the positive and negative training samples. Most existing
discriminative methods regard the currently tracked target
as the positive sample, and select samples from the neigh-
borhood around the currently tracked target as the negative
samples to update the classifier. The classifier will be up-
dated with a sub-optimal positive sample when the currently
tracked target is not accurate. After that, the tracker would
drift in a long time. Additionally, a large number of training
samples will hinder the classifier to be updated online in real
time. Therefore, it is necessary to design a sample selection
strategy for the classifier updating.
Different from the most existing discriminative methods,
in this paper, we propose a score function instead of learn-
ing a classifier to predict the optimal candidate directly. As
shown in step 1 of Figure 1, we use the similarity of be-
tween the candidate which is generated from particle filter
and the target template set as our score function and exploit
the inner product to measure the similarity. Because our ap-
proach does not to update the classifier with a sub-optimal
positive sample, thus, it can avoid the drift problem. To ad-
dress the problem of a large number of training samples,
we propose an online sample selection strategy based on `1-
regularized least squares, as shown in step 2 of Figure 1. We
construct a training sample set and calculate the ground truth
of it. Then, we minimize the errors of the score function and
ground truth to choose some representative support samples
for the update of the target template set.
The main contributions of this paper are summarized as
follows:
A simple score function is proposed to predict the op-
timal candidate directly instead of learning a classifier,
which can address the drift problem.
Asparsity-constrained sample selection method is pro-
posed, through which the representative support samples
are chosen to construct the templates.
The rest of this paper is organized as follows. We briefly
review the related works in Section 2and describe the pro-
posed approach in Section 3. Then, we show the experimen-
tal details and results in Section 4and conclude this work in
Section 5.
2 Related Works
In this section, we review the particle filter framework for
tracking firstly, because our approach is based on this frame-
work. Then, we briefly introduce sparse representation mod-
el for tracking, because the sparse constraint is applied on
our approach to solve the tracking problem. Before the re-
view, we begin with some notations. We denote by a low-
ercase letter x, a bold lowercase letter x, a bold uppercase
letter Xa real number, a vector and a matrix, respectively.
Visual Object Tracking with Online Sample Selection via Lasso Regularization 3
2.1 Particle Filter Framework
Particle filter [18] is a Bayesian sequential importance sam-
pling technique. It provides a general framework for estimat-
ing and propagating the posterior probability density func-
tion of state variables. During the last years, a large num-
ber of popular trackers [1,17,2,11, 6, 8] based on this frame-
work are proposed. Our approach also uses the particle filter
as motion model (see the candidates from particle filter of
Figure 1).
Given t-1observed patches I1:t1={I1,I2,· · ·,It1}
from the first frame to the t-1frame. Let bt= [x, y, w, h]
R4be the state variables in t-th frame, where (x, y)are the
coordinates of the center point of bounding box and w, h
are the width and height of the bounding box, respectively.
The state variables btcan be formulated by the following
predicting distribution:
p(bt|I1:t1) = Zp(bt|bt1)p(bt1|I1:t1)dbt1.(1)
Given the observed patch Itin frame t, the state variables bt
can be updated by the following formulation:
p(bt|I1:t) = p(It|bt)p(bt|I1:t1)
p(It|I1:t1),(2)
where p(It|bt)denotes the observation model.
The observation model p(It|bt)represents the similar-
ity between a target candidate and the target template. For
an observed patch It, we use xtto represent the features
extracted from It. We introduce a score function of xtto
approximate p(It|bt):
p(It|bt)F(xt).(3)
This function is defined as a simple inner product between
the candidate and the target template (see Section 3.1). The
optimal candidate state is the one with the biggest score val-
ue.
2.2 Sparse Representation based Tracking
Sparse representation has been applied to visual tracking
[17,11,2,16,25,21,15,23,8] to find the target with the mini-
mum reconstruction error from the target template subspace.
These methods can be classified as two categories: holistic
sparse representation [17,2,16,25,21] and local sparse rep-
resentation [11,15,23,8]. In the first class, Mei et al. [17]
cast the tracking problem as finding a sparse approximation
in a template subspace. They adopt the holistic representa-
tion of the object as the appearance model and then track the
object by solving the `1minimization problem (`1tracker).
To address the bottleneck of the computational cost of the
`1tracker, Bao and Wu et al. [2] proposed a new `1norm
related minimization model based on the accelerated proxi-
mal gradient approach (`1-APG) which can run in real time.
Liu and Yang et al. [16] also proposed a two-stage sparse
optimization algorithm to handle the higher computational
expense. This category of methods can handle the partial
occlusion and slight deformation effectively.
In contrast to the holistic sparse representation, the local
sparse representation encodes the each local patch of a target
sparsely with an over-complete dictionary, and then aggre-
gate the corresponding sparse codes. For instances, Jia and
Lu et al. [11] proposed a structural local sparse appearance
model which exploits both partial information and spatial in-
formation of the target based on a novel alignment-pooling
method. Liu and Huang et al. [15] also presented a robust
tracking algorithm using a local sparse appearance model,
which used a static sparse dictionary and a dynamically on-
line updated basis distribution to model the target appear-
ance. Wang and Chen et al. [23] proposed an online algorith-
m based on local sparse representation, where the local im-
age patches of a target are represented by their sparse codes
with an over-complete dictionary, and a classifier is learned
to discriminate the target from the background. Because the
local sparse representation can exploit the structural infor-
mation of the object, it can better deal with the occlusion
and deformation. However, it is more complicated and has
the higher computational expense. In this paper, we impose
a sparse constraint on the score function and we apply holis-
tic sparse representation to solve the tracking problem.
3 Proposed Approach
In this section, we give the details of the proposed approach
which includes four parts primarily. Specifically, we present
the score function in Section 3.1 and propose the online sam-
ple selection in Section 3.2. Then, we give the template up-
dating strategy in Section 3.3. Finally, the whole algorithm
is summarized in Section 3.4.
3.1 Score Function
The aim of score function is to predict which candidate is
the optimal one. Given a target template set H={h1,h2,··
·,hn} ∈ Rd×nand a target candidate xin the t-th frame,
where xRdis the HOG feature vector extracting from the
target candidate . We use a simple inner product to measure
the similarity between the candidate and the target template
as a part of the score function:
fi(x) = hx,hii,(4)
where xand hiare normalized, i.e., kxk2= 1,khik2= 1.
For a candidate x,the larger the value of score function f(x)
is, the higher the similarity between the candidate and the
4 Qiao Liu et al.
target template has. However, we explore the target template
set Hwhich is made up several templates, rather than a sin-
gle target template. Therefore, the average score of the sim-
ilarity between the candidate and each target template in H
is adopted as the final score function F(x).
F(x) = 1
n
n
X
i=1
fi(x),(5)
For the given target template hiin the t-th frame, let
Ai= [x1
i,x2
i,· · ·,xm
i]Rd×mbe the corresponding train-
ing sample set which consists of the support sample set At1
and updating sample set Xt, where xj
i, j = 1,2,· · ·, m in-
dicates the j-th training sample of the Ai,At1is the old
support sample set in the (t1)-th frame and Xtis updat-
ing sample set which is sampled from the currently tracked
target in the t-th frame. As we know, each template can be
linearly represented by the training sample set, i,e.,
hi=Aiωi,(6)
where ωiis the coefficient vector of the Ai.
In a real application, the appearance of the object target
is very similar to the certain adjacent frames. Therefore, the
target template of these object targets can be sparsely rep-
resented by a few positive and negative support samples in
these adjacent frames, as shown in Figure 2. Based on this
fact, for a target template, just a few representative support
samples are needed to represent it. Therefore, we impose a
sparsity constraint on the coefficient vector ωiand reformu-
late equation (6) below:
hiAiωis.t. kωik0α, (7)
where αis a threshold value.
=
h
Training sample set
Coefficient
Fig. 2 The illustration of the observation. The bounding box with
green line denotes the positive sample in the training sample set and
the other are negatives.
Then, substituting equation (7) into equation (4), we ob-
tain the following score function:
fi(x) = hx,Aiωii
=
m
X
j=1
ωj
ihx,xj
iis.t. kωik0α, (8)
where ωi= [ω1
i, ω2
i,···, ωm
i]TRm(ωj
i, j = 1,2,···, m)
denotes the coefficient of j-th inner product.
3.2 Online Sample Selection via `1-Regularized Least
Squares
Given a target in the t-th frame, we exploit the polar grid
sampling to obtain an updating sample set Xt= [x1,x2,· ·
·,xk]Rd×k, where xrRd, r = 1,2,···, k denotes the
r-th training sample of Xt. For each training sample xr, we
define a function to calculate its ground truth,
g(xr) = overlap(b, box(xr))
b,(9)
where g(xr)is normalized 0to 1,brepresents the bound-
ing box area of the currently tracked target, box(xr)indi-
cates the bounding box area of the training sample xr, and
overlap(,)calculates the overlap area of the two bound-
ing boxes. Therefore, for the training sample set Xt, we
can obtain its ground-truth yusing equation (9), where y=
[y1, y2,···, yk]T= [g(x1), g(x2),···, g(xk)]TRk.
For the old support sample set At1= [x1,x2,···,xu]
Rd×uin the (t1)-th frame, we also can get its corre-
sponding ground-truth s= [s1, s2,· · ·, su]TRmusing
equation (9). Combining the old support sample set At1
and updating sample set Xt, we get the training sample set
Ai= [At1,Xt]Rd×(k+u)and the associated ground
truth q= [y,s]R(k+u), we call the training sample set
AiRd×mis the candidate support sample set , where
m=k+u.
Suppose that the candidate support sample in Aiis nor-
malized. we get coefficient ωiin the t-th frame by minimiz-
ing following objective function:
min
ωi
m
X
`=1
q`
m
X
j=1
ωj
ihx,xj
ii
2
+λkωik1,(10)
where q`Rdenotes the `-th ground truth of the qand λis
regularization parameter. Utilizing matrix notation, equation
(10) can be reformulated as following:
min
ωi
q(AT
iAiωi)
2
2+λkωik1,(11)
Let D=AT
iAi, then equation(11) can be simplified as be-
low:
min
ωi
kqDωik2
2+λkωik1.(12)
Equation (12) can be solved by `1-regularized least squares
[13]. Then, we choose the corresponding samples which the
coefficient are greater than the predefined threshold σas new
support samples. The new support sample set Atin the t-th
frame is constituted by all these support samples.
Visual Object Tracking with Online Sample Selection via Lasso Regularization 5
3.3 Template Updating
Template updating is a very important step in updating stage
of visual tracking. If the template set His fixed, the tracker
will be failed because the target’s appearance changes dy-
namically. However, if the template set His updated too
frequently, the errors would be accumulated and the tracker
would drift away from the target.
In our method, we adopt a discriminative strategy to up-
date the template set. For a given target, if its score is greater
than the predefined threshold θ, we can obtain the vector of
coefficient ωiand new support sample set Atby solving the
problem (10). Based on the assumption the target template
can be represented by the linear combination of some repre-
sentative support samples, the template hican be updated by
equation (6). If the template number in the template set H
is below the given threshold η, we put this new template hi
into the template set H. Otherwise, this new template will be
appended into Hand the oldest template will be discarded.
3.4 Algorithm
The proposed method is described in Algorithm 1and the
details of the algorithm implementing will be given in Sec-
tion 4.1. The overall algorithm includes searching stage and
updating stage. In searching stage, the optimal candidate is
obtained using the score function. In updating stage, some
representative samples are chosen by `1-regularized least
squares and the target template set is updated using the s-
elected samples.
Algorithm 1 Visual Tracking with Online Sample Selection
via `1regularization
1: Inputs: Testing sequence ψ={I0, I1, ..., IF}and initial state
b0.
2: Outputs: The predicted optimal states {b1, b2, ..., bF}.
3: Predefine template set Hand threshold θ.
4: for t= 1 to Fdo
5: Searching stage:
6: Generate Mcandidate samples by exploiting particle filter.
7: for i= 1 to Mdo
8: Calculate the score of every candidate by score function
F(x).
9: Get Max(F(x)) as an optimal candidate.
10: end for
11: Updating stage:
12: if Max(F(x)) > θ then
13: Save the currently tracked target as the positive sample and
draw the negative samples by polar grid sampling.
14: Get the corresponding coefficient vector ωiby solving e-
quation (10).
15: Construct the new target template hiby equation (6) and
append hito H.
16: end if
17: end for
4 Experiments
In this section, we first introduce the experimental imple-
mentation details in section 4.1. it includes the parameter
setting, datasets, comparison tracker and evaluation. Then,
we give the experiment results and analyze in section 4.2.
4.1 Experiment Details
Parameter setting: All the methods are carried out in MAT-
LAB 2014a on a PC with an Intel 3.7 GHz Dual Core CPU
and 8 GB RAM. The image patches are resized to 32 ×32
pixels during the tracking process. For the HOG feature, the
cell-size and the number of orientation bins are set to 4 and
9, respectively. In the updating stage, the radius of the polar
grid and the number of angular division are set to 5 and 16,
respectively. The other mentioned parameters of the paper
are listed as the follows.
parameter name M k λ σ θ η
parameter value 750 81 0.3 0.01 0.32 160
Mdenotes the number of the candidates generated from the
particle filter, which is set to 750.kis the number of the
training samples that sampling by the polar grid and is set
to 81.λis a regularization parameter which represents the
sparsity degree of the coefficient vector ωiand is set to 0.3.
σdenotes the threshold of the coefficient vector and is set
to 0.01. We choose the corresponding samples as represen-
tative samples of which the coefficient is greater than the
threshold σ.θis a threshold of the score for the update of
support sample set and target template, which is set to 0.32.
ηis the maximum number of the template in target template
set and is set to 160.
Datasets: Our experiments are carried out on the OT-
B [26] that contains 50 image sequences. These image se-
quences have 11 attributes (illumination variation, scale vari-
ation, occlusion, deformation, motion blur, fast motion, in-
plane rotation, out-of-plane rotation, out-of-view, background
clutters and low resolution), which represents the challeng-
ing aspects of visual tracking. We also choose 12 challeng-
ing sequences in these 50 image sequences to qualitative e-
valuate our approach. They are Dudek, jogging-1, jogging-2,
Suv, FleetFace, Freeman3, Freeman4, Lemining, Sylvester,
Tiger2, woman and Walking2.
Comparison Tracker: In order to examine the perfor-
mance of the proposed approach, 8-state-of-the-art tracker-
s which have a superior performance on the OTB are cho-
sen to compare with ours. They are CCT [27], DSST [3],
TGPR [4], KCF [10], Struck [7], SVM [22], RR [22] and
TLD [12].
Evaluation criterion: Two criteria are used to evaluate
the performance of our approach. One of the widely used
6 Qiao Liu et al.
0 10 20 30 40 50
0
0.2
0.4
0.6
0.8
1
Location error threshold
Precision
Precision plots of OPE
Ours [0.794]
CCT [0.770]
KCF [0.738]
DSST [0.734]
TGPR [0.732]
RR [0.672]
Struck [0.665]
SVM [0.646]
TLD [0.548]
(a)
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Overlap threshold
Success rate
Success plots of OPE
Ours [0.605]
CCT [0.579]
DSST [0.557]
TGPR [0.522]
KCF [0.513]
Struck [0.476]
RR [0.455]
SVM [0.452]
TLD [0.405]
(b)
Fig. 3 Comparison with eight state-of-the-art trackers on 50 image sequences of the precision and success plots using One Pass Evaluation (OPE).
criteria is center location error (CLE), which is the aver-
age Euclidean distance between the center locations of the
tracked targets and the manually labeled ground truths. We
use the precision [9] to measure the overall tracking perfor-
mance, which is defined as the percentage of frames whose
estimated location is within the given threshold distance of
the ground truth. Usually, this threshold distance is set to 20
pixels.
Another evaluation criterion is the Pascal VOC overlap
ratio (VOR) [26], which is defined as S=|rtra|/|rtra|,
where rtand rarepresent the bounding box of the tracked
target and the ground truth respectively, and represent
the intersection and union of two regions respectively, |·| de-
notes the number of the pixel in the region. In order to mea-
sure the overall performance on a given image sequence, we
count the number of successful frames, whose VOR is larg-
er than the given threshold 0.5. The success plot shows the
ratios of successful frames at the thresholds varied from 0 to
1. We use the area under the curve (AUC) of each success
plot to rank the comparison trackers.
4.2 Experiment Results and Analyses
Two groups of experiments are carried out to quantitatively
and qualitatively evaluate the proposed approach. The first
group is performed on OTB [26] which contains 50 image
sequences. We use this group experiments to quantitatively
evaluate the overall performance of our tracker and to com-
pare with the other 8 state-of-the-art trackers. Another group
experiments are carried out on 12 challenging sequences to
qualitatively evaluate our tracker mainly.
Quantitative Evaluation: The overall performance of
our tracker and other 8 compared trackers are shown as in
Figure 3. We use one pass evaluation (OPE) for the overall
performance, and precision and success rate as an evaluation
criterion, as shown in Figure 3(a) and Figure 3(b),respec-
tively. Obviously, the overall performance of our tracker out-
performs the other 8 state-of-the-art trackers. What’s more,
we divide 50 image sequences into different groups accord-
ing to the different attributes of the image sequences (see
Datasets of Section 4.1). Then, we also use precision and
success rate to evaluate the performance of the tracker on d-
ifferent attributes. Due to space limitations, ten groups preci-
sion and success plots on 10 different attributes are provided
in the supplemental material. The results also demonstrate
that our tracker is clearly more accurate and robust.
For better illustrate the proposed method is effective-
ly. We give the precision and success rate on another 12
challenging sequences more detailedly, as shown in Table
1. From Table 1we can see clearly that our approach has a
better performance on most challenging sequences. For in-
stances, our tracker achieved the precision score with 0.99
on jogging-2 which has fully occlusion challenge with a
short time , while the CCT [27], DSST [3], KCF [10] just
obtained 0.19,0.19,0.16, respectively. Lemming is a chal-
lenging sequence which has occlusion and deformation et al.
challenges, and our tracker also achieved the highest score
0.93 while the Struck [7], KCF [10] and DSST [3] obtained
0.50,0.49,0.43, respectively. The average precision score of
our tracker has improved 30% than that of the second best
tracker CCT [27]. It is also obviously that our tracker has
achieved the best success rate, and it average success rate of
the proposed tracker also has improved 30% than the second
best tracker CCT [27].
Qualitative Evaluation: The second group experiments
are carried out on 12 challenging sequences to evaluate the
proposed approach more intuitive. Due to space constraints,
we just give the center location error (CLE) of the frame by
frame on 6 challenging sequences, as shown in Figure 4(a) -
Visual Object Tracking with Online Sample Selection via Lasso Regularization 7
Table 1 The percentage of successful frames whose center location error (CLE) within the threshold 20 pixels and the percentage of successful
frames whose overlap ratio (VOR) passes the threshold 0.5. The best result is highlighted in red and the second best result is highlighted in blue
and the average value follows in the end.
TGPR Struck KCF DSST TLD RR CCT SVM Ours
Dudek 0.87 / 0.88 0.86 / 0.97 0.88 / 0.98 0.82 / 0.99 0.64 / 0.67 0.85 / 0.87 0.90 /1.00 0.89 / 0.96 0.91 / 0.97
jogging-1 0.22 / 0.22 0.97 / 0.90 0.23 / 0.22 0.23 / 0.22 0.97 / 0.96 0.98 / 0.96 0.98 /0.97 0.24 / 0.23 0.98 /0.97
jogging-2 0.99 /0.99 0.18 / 0.16 0.16 / 0.16 0.19 / 0.18 0.95 / 0.95 0.97 / 0.96 0.19 / 0.19 0.16 / 0.14 0.99 /0.98
Suv 0.53 / 0.54 0.18 / 0.16 0.98 / 0.98 0.98 / 0.98 0.94 / 0.94 0.96 / 0.92 0.98 / 0.98 0.52 / 0.53 0.98 /0.98
Fleetface 0.50 / 0.59 0.57 / 0.83 0.46 / 0.67 0.62 / 0.70 0.48 / 0.44 0.57 / 0.69 0.61 / 0.67 0.62 / 0.66 0.64 /0.77
Freeman3 0.18 / 0.08 0.67 / 0.33 0.91 / 0.27 0.91 / 0.33 0.83 / 0.65 0.39 / 0.15 0.91 / 0.32 0.92 / 0.43 0.97 /0.95
Freeman4 0.90 / 0.74 0.41 / 0.24 0.53 / 0.18 0.96 / 0.44 0.37 / 0.22 0.45 / 0.10 1.00 / 0.63 0.16 / 0.12 0.94 /0.88
Lemming 0.45 / 0.37 0.50 / 0.48 0.49 / 0.43 0.43 / 0.27 0.80 / 0.63 0.60 / 0.57 0.70 / 0.70 0.82 /0.77 0.93 /0.93
Sylvester 0.96 / 0.95 0.99 /0.93 0.84 / 0.82 0.84 / 0.74 0.91 /0.85 0.88 / 0.32 0.85 / 0.80 0.93 / 0.66 0.97 / 0.92
Tiger2 0.86 / 0.89 0.43 / 0.43 0.36 / 0.36 0.30 / 0.30 0.35 / 0.20 0.47 / 0.23 0.86 / 0.87 0.42 / 0.28 0.93 /0.93
woman 0.94 / 0.94 1.00 / 0.94 0.94 / 0.94 0.94 / 0.93 0.40 / 0.33 0.34 / 0.29 0.20 / 0.20 0.97 / 0.19 0.97 /0.94
Walking2 1.00 / 0.74 0.71 / 0.41 0.43 / 0.38 1.00 /1.00 0.56 / 0.21 0.97 / 0.97 1.00 /1.00 0.91 / 0.41 0.97 / 0.97
Average 0.70 / 0.66 0.62 / 0.57 0.60 / 0.53 0.68 / 0.59 0.69 / 0.59 0.70 / 0.59 0.76 /0.69 0.62 / 0.45 0.93 /0.93
0 200 400 600 800 1000 1200
0
20
40
60
80
100
120
140
160
180
Frame Numbers
Center Location Error
Dudek
TGPR
Struck
KCF
DSST
TLD
RR
CCT
SVM
Ours
(a) Dudek
0 50 100 150 200 250 300 350
0
20
40
60
80
100
120
140
160
180
Frame Numbers
Center Location Error
jogging−1
TGPR
Struck
KCF
DSST
TLD
RR
CCT
SVM
Ours
(b) jogging-1
0 50 100 150 200 250 300 350
0
50
100
150
200
250
Frame Numbers
Center Location Error
jogging−2
TGPR
Struck
KCF
DSST
TLD
RR
CCT
SVM
Ours
(c) jogging-2
0 100 200 300 400 500 600 700 800
0
50
100
150
200
250
300
350
400
450
Frame Numbers
Center Location Error
FleetFace
TGPR
Struck
KCF
DSST
TLD
RR
CCT
SVM
Ours
(d) Fleetface
0 50 100 150 200 250 300 350 400 450 500
0
50
100
150
200
250
300
Frame Numbers
Center Location Error
Freeman3
TGPR
Struck
KCF
DSST
TLD
RR
CCT
SVM
Ours
(e) Freeman3
0 50 100 150 200 250 300
0
50
100
150
200
250
300
Frame Numbers
Center Location Error
Freeman4
TGPR
Struck
KCF
DSST
TLD
RR
CCT
SVM
Ours
(f) Freeman4
Fig. 4 Comparison with eight different trackers in center location error (CLE) of frame by frame on 6 challenging sequences.
Figure 4(f). More results are provided in the supplemental
material.From Figure 4, we can see clearly that our tracker
has the lowest center location error on the most frames of the
most challenging sequences. Specifically, just like jogging-
1(4(b)) and jogging-2(4(c)), the CLE of our tracker is low-
er than CCT [27], DSST [3] and TGPR [4] when the full
occlusion happened. When the appearance changed slight
quickly, the CLE of our tracker is also lower than most oth-
er trackers, as shown in Fleetface(4(d)), Freeman3(4(e)) and
Freeman4(4(f)). The other CLE results can also demonstrate
that our tracker outperforms the other eight state-of-the-art
trackers. For better understanding the proposed approach
achieved promising performance, the tracked results of the
trackers in some representative frames are listed in the sup-
plemental material. These tracked results also indicated
that our tracker is effective and robust.
8 Qiao Liu et al.
5 Conclusion
In this paper, we proposed a score function to predict the
optimal candidate directly instead of learning a classifier.
Exploiting the score function can avoid the drift problem.
Moreover, to solve the problem of a large number of training
samples, we impose a sparse constraint on the score function
and use `1-regularized least squares to choose some repre-
sentative support samples. Then, to evaluate the effective-
ness and robustness of the proposed approach, we carry out
two groups experiments on the OTB [26] and another 12
challenging sequences. Both quantitative and qualitative e-
valuations are performed to validate our approach and ex-
perimental results demonstrate that the proposed approach
achieved promising performance.
Acknowledgement
This work is supported by the National Nature Science Foun-
dation of China (No. 61402122), the 2014 ph.D. Recruit-
ment Program of Guizhou Normal University, the Outstand-
ing Innovation Talents of Science and Technology Award
Scheme of Education Department in Guizhou Province (Qian-
jiao KY word [2015]487).
References
1. Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. Visu-
al tracking with online multiple instance learning. In Computer
Vision and Pattern Recognition. IEEE Conference on, pages 983–
990, 2009. 1,3
2. Chenglong Bao, Yi Wu, Haibin Ling, and Hui Ji. Real time robust
l1 tracker using accelerated proximal gradient approach. In Com-
puter Vision and Pattern Recognition (CVPR), 2012 IEEE Confer-
ence on, pages 1830–1837. IEEE, 2012. 3
3. Martin Danelljan, Gustav H¨
ager, Fahad Khan, and Michael Fels-
berg. Accurate scale estimation for robust visual tracking. In
British Machine Vision Conference, Nottingham, September 1-5,
pages 1–11, 2014. 1,5,6,7
4. Jin Gao, Haibin Ling, Weiming Hu, and Junliang Xing. Transfer
learning based visual tracking with gaussian processes regression.
In Computer Vision–ECCV, pages 188–203. 2014. 1,5,7
5. Helmut Grabner and Horst Bischof. On-line boosting and vision.
In IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR’06), volume 1, pages 260–267, 2006.
1
6. Zhenjun Han, Jianbin Jiao, Baochang Zhang, Qixiang Ye, and
Jianzhuang Liu. Visual object tracking via sample-based adaptive
sparse representation (adasr). Pattern Recognition, 44(9):2170–
2183, 2011. 3
7. Sam Hare, Amir Saffari, and Philip HS Torr. Struck: Structured
output tracking with kernels. In Computer Vision (ICCV) 2011
IEEE International Conference on, pages 263–270, 2011. 1,5,6
8. Zhenyu He, Shuangyan Yi, Yiu-Ming Cheung, Xinge You, and
Yuan Yan Tang. Robust object tracking via key patch sparse rep-
resentation. IEEE transactions on cybernetics, PP:1–11, 2016. 3
9. Jo˜
ao F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista.
Exploiting the circulant structure of tracking-by-detection with k-
ernels. In European conference on computer vision, pages 702–
715, 2012. 6
10. Jo˜
ao F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista.
High-speed tracking with kernelized correlation filters. Pat-
tern Analysis and Machine Intelligence, IEEE Transactions on,
37(3):583–596, 2015. 1,5,6
11. Xu Jia, Huchuan Lu, and Ming-Hsuan Yang. Visual tracking via
adaptive structural local sparse appearance model. In Computer
vision and pattern recognition (CVPR), 2012 IEEE Conference
on, pages 1822–1829. IEEE, 2012. 3
12. Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. Tracking-
learning-detection. Pattern Analysis and Machine Intelligence,
IEEE Transactions on, 34(7):1409–1422, 2012. 1,5
13. Kwangmoo Koh, Seungjean Kim, and Stephen Boyd. l1 ls: A
matlab solver for large-scale l1-regularized least squares problem-
s. Stanford University, pages 1–6, 2007. 4
14. Junseok Kwon and Kyoung Mu Lee. Visual tracking decomposi-
tion. In Computer Vision and Pattern Recognition (CVPR), IEEE
Conference on, pages 1269–1276, 2010. 1
15. Baiyang Liu, Junzhou Huang, Lin Yang, and Casimir Kulikowsk.
Robust tracking using local sparse appearance model and k-
selection. In Computer Vision and Pattern Recognition (CVPR),
2011 IEEE Conference on, pages 1313–1320. IEEE, 2011. 3
16. Baiyang Liu, Lin Yang, Junzhou Huang, Peter Meer, Leiguang
Gong, and Casimir Kulikowski. Robust and fast collaborative
tracking with two stage sparse optimization. In European con-
ference on computer vision, pages 624–637. Springer, 2010. 3
17. Xue Mei and Haibin Ling. Robust visual tracking using l1 min-
imization. In Computer Vision, IEEE 12th International Confer-
ence on, pages 1436–1443, 2009. 1,3
18. Branko Ristic, Sanjeev Arulampalam, and Neil Gordon. Beyond
the Kalman filter: Particle filters for tracking applications, volume
685. Artech house Boston, 2004. 3
19. David A Ross, Jongwoo Lim, Ruei-Sung Lin, and Ming-Hsuan
Yang. Incremental learning for robust visual tracking. Interna-
tional Journal of Computer Vision, 77(1-3):125–141, 2008. 1
20. Jakob Santner, Christian Leistner, Amir Saffari, Thomas Pock, and
Horst Bischof. Prost: Parallel robust online simple tracking. In
Computer Vision and Pattern Recognition (CVPR), IEEE Confer-
ence on, pages 723–730, 2010. 1
21. Dongjing Shan and Chao Zhang. Visual tracking using ipca
and sparse representation. Signal, Image and Video Processing,
9(4):913–921, 2015. 3
22. Naiyan Wang, Jianping Shi, Dit-Yan Yeung, and Jiaya Jia. Under-
standing and diagnosing visual tracking systems. In Proceedings
of the IEEE International Conference on Computer Vision, pages
3101–3109, 2015. 5
23. Qing Wang, Feng Chen, Wenli Xu, and Ming-Hsuan Yang. Online
discriminative object tracking with local sparse representation. In
Applications of Computer Vision (WACV), 2012 IEEE Workshop
on, pages 425–432. IEEE, 2012. 3
24. Shu Wang, Huchuan Lu, Fan Yang, and Ming-Hsuan Yang. Super-
pixel tracking. In International Conference on Computer Vision,
pages 1323–1330. IEEE, 2011. 1
25. Xiangyang Wang, Ying Wang, Wanggen Wan, and Jenq-Neng H-
wang. Object tracking with sparse representation and annealed
particle filter. Signal, Image and Video Processing, 8(6):1059–
1068, 2014. 3
26. Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. Online object track-
ing: A benchmark. In Proceedings of the IEEE conference on com-
puter vision and pattern recognition, pages 2411–2418, 2013. 2,
5,6,8
27. Guibo Zhu, Jinqiao Wang, Yi Wu, and Hanqing Lu. Collaborative
correlation tracking. In Proc. British Machine Vision Conference,
pages 184.1–184.12, 2015. 1,5,6,7
... However, the process of online updating the appearance model with potentially inaccurate training examples often brings the model drift problem. Various strategies have been introduced to alleviate drift problem [17]- [22], [29]- [33]. A novel online semi-supervised boosting tracking method is proposed [17] and the update process of this method depends on combined decision of a given prior and an on-line classifier. ...
... Recently, Zhang et al. [22] propose a multi-expert tracking framework which maintains a collection of historical snapshots and the minimum entropy criterion is applied to expert selection for tracking. Ou et al. [32], [33] proposed a simple score function to predict the optimal candidate directly instead of learning a classifier. The coefficient constrained model [32] and sparsity-constrained model [33] are proposed respectively to select representative samples. ...
... Ou et al. [32], [33] proposed a simple score function to predict the optimal candidate directly instead of learning a classifier. The coefficient constrained model [32] and sparsity-constrained model [33] are proposed respectively to select representative samples. ...
Article
Full-text available
Visual object tracking in unconstrained environments is a challenging task in computer vision. How to design an efficient discriminative feature representation is one challenging issue. To improve the adaptability of the tracker to large object appearance changes, the observation model needs to be updated online. However, a bad model update using inaccurate training samples can lead to model drift problem. Therefore, how to design an efficient online observation model and a model update strategy are two other challenging issues. This paper proposes the concatenation of histogram of oriented gradients variant (HOGv) and color histogram as the feature representation to balance discriminative power and efficiency. The single-hidden-layer feedforward neural network (SFNN) is used as an observation model, and the recursive orthogonal least squares (ROLS) algorithm is used to update the model online. A bidirectional tracking scheme is designed to alleviate the model drift problem during online tracking. The proposed bidirectional tracking scheme consists of three modules: the forward tracking module, the backward tracking module and the integration module. The forward tracking module first finds all the candidate regions, and then, the backward tracking module calculates the respective confidence of each candidate region according to historical information. Finally, the integration module integrates both of the first two modules’ results to determine the final tracked object and the model update strategy for the current frame. Extensive evaluations of the existing tracking benchmarks have shown that the proposed tracking framework results in significant performance improvements compared with the base tracker, and it outperforms most of the state-of-the-art trackers.
... where > 0 is regularization parameter; or the Lasso problem [13,14]: ...
... by equations (14) and (15) then equation (13) hold. ...
Article
Full-text available
We consider a control proximal gradient algorithm (CPGA) for solving the minimization of a non-smooth convex function. In particular, the convex function is an ℓ 1 regularized least squares function derived by the discretized ℓ 1 norm models arising in image processing. This proximal gradient algorithm with the control step size is attractive due to its simplicity. However, this method is also known to converge quite slowly. In this paper, we present a fast control proximal gradient algorithm by adding Nesterov step. It preserves the computational simplicity of proximal gradient methods with a convergence rate 1/í µí±˜ 2. It is proven to be better than the existing methods both theoretically and practically. The initial numerical results for performing the image deblurring demonstrate the efficiency of CPGA when it is compared to the existing fast iterative shrinkage-thresholding algorithms (FISTA).
... For a single object tracking, an arbitrary interested object can be selected as the target, which is initialized with a bounding box in the initial frame, and the tracker is going to locate the target in the following each frame [35]. Visual object tracking has been studied extensively during the past decades [18,40,41] and has been made great progress in recent years [12,19,20]. Despite several advanced approaches, visual object tracking is still regarded as a challenging task due to various reasons: occlusion (heavy occlusion, short-time complete occlusion), deformation (object posture, shape, scale, and appearance change), abrupt motion, illumination change, motion blur caused by camera moving, fast motion, in-plane rotation, out-of-plane rotation, out-of-view, background clutters, low resolution, and other disturbing factors etc [1,7]. ...
Article
Full-text available
The performance of the tracking task directly depends on target object appearance features. Therefore, a robust method for constructing appearance features is crucial for avoiding tracking failure. The tracking methods based on Convolution Neural Network (CNN) have exhibited excellent performance in the past years. However, the features from each original convolutional layer can usually represent spatial information, but not temporal information. They only use additionally the temporal information at the testing stage. To solve the lacks of prediction in the pretrained networks, we train both the spatial features and the temporal information for training at the pretraining stage. Firstly, the spatial features are trained by a domain-wise learning with the augmented data to prepare the training data to learn the temporal information. Secondly, the posterior probability maps are calculated by the particle filter and the above pretrained model. The posterior probability maps are used as the prior and the posterior respectively corresponding to the input and the output of the final network at the next stage. Thirdly, the temporal information is trained by using the augmented image sequences and the probability maps. The experimental results demonstrate that the proposed tracking method outperforms the state-of-the-art tracking methods.
... Li et al. proposed a target aware deep tracking framework integrated with the Siamese CNN and target aware features [22]. Liu proposed a tracker that performed a prediction using a given threshold by providing a template update method as a score function between a candidate group and a template [23,24]. This tracker had an effective observation module, which could deal with occasional large appearance variation or severe occlusions. ...
Article
Full-text available
Although recently developed trackers have shown excellent performance even when tracking fast moving and shape changing objects with variable scale and orientation, the trackers for the electro-optical targeting systems (EOTS) still suffer from abrupt scene changes due to frequent and fast camera motions by pan-tilt motor control or dynamic distortions in field environments. Conventional context aware (CA) and deep learning based trackers have been studied to tackle these problems, but they have the drawbacks of not fully overcoming the problems and dealing with their computational burden. In this paper, a global motion aware method is proposed to address the fast camera motion issue. The proposed method consists of two modules: (i) a motion detection module, which is based on the change in image entropy value, and (ii) a background tracking module, used to track a set of features in consecutive images to find correspondences between them and estimate global camera movement. A series of experiments is conducted on thermal infrared images, and the results show that the proposed method can significantly improve the robustness of all trackers with a minimal computational overhead. We show that the proposed method can be easily integrated into any visual tracking framework and can be applied to improve the performance of EOTS applications.
... For a single object tracking, the target can be an arbitrary interested object, initialized with a bounding box in the initial frame, and the tracker aims at locating the target in the following each frame [7]. Although visual object tracking has been studied extensively during the past decades [8][9][10][11] and has been made great progress in recent years [12][13][14][15][16], it is still a challenging task to achieve a stable and robust tracking method. In order to infer accurate location of the target object, a tracker needs to take into account changes in appearance properties, such as significant appearance changes https://doi.org/10.1016/j.knosys.2018. ...
... Thermal infrared (TIR) object tracking is often used as a subroutine that plays an important role in these vision tasks. It has several superiorities over visual object tracking [3,4,5,6,7,8,9,10,11,12,13,14]. For example, TIR tracking is not sensitive to variation of the illumination, whereas visual tracking usually fails in poor visibility. ...
... For a single object tracking, the target can be an arbitrary interested object, initialized with a bounding box in the initial frame, and the tracker aims at locating the target in the following each frame [7]. Although visual object tracking has been studied extensively during the past decades [8][9][10][11] and has been made great progress in recent years [12][13][14][15][16], it is still a challenging task to achieve a stable and robust tracking method. In order to infer accurate location of the target object, a tracker needs to take into account changes in appearance properties, such as significant appearance changes https://doi.org/10.1016/j.knosys.2018. ...
Article
The performance of the tracking task directly depends on target object appearance features. Therefore, a robust method for constructing appearance features is crucial for avoiding tracking failure. The tracking methods based on Convolution Neural Network (CNN) have exhibited excellent performance in the past years. However, the features from each original convolutional layer are not robust to the size change of target object. Once the size of the target object has significant changes, the tracker drifts away from the target object. In this paper, we present a novel tracker based on multi-scale feature, spatiotemporal features and deep residual network to accurately estimate the size of the target object. Our tracker can successfully locate the target object in the consecutive video frames. To solve the multi-scale change issue in visual object tracking, we sample each input image with 67 different size templates and resize the samples to a fixed size. And then these samples are used to offline train deep residual network model with multi-scale feature that we have built up. After that spatial feature and temporal feature are fused into the deep residual network model with multi-scale feature, so that we can get deep multi-scale spatiotemporal features model, which is named MSST-ResNet feature model. Finally, MSST-ResNet feature model is transferred into the tracking tasks and combined with three different Kernelized Correlation Filters (KCFs) to accurately locate target object in the consecutive video frames. Unlike the previous trackers, we directly learn various change of the target appearance by building up a MSST-ResNet feature model. The experimental results demonstrate that the proposed tracking method outperforms the state-of-the-art tracking methods.
Article
Full-text available
Considering the problems of motion blur, partial occlusion and fast motion in target tracking, a target tracking method based on adaptive structured sparse representation with attention is proposed. Under the framework of particle filtering, the performance of high-quality templates is enhanced through an attention mechanism. Structure sparseness is used to build candidate target sets and sparse models between candidate samples and local patches of target templates. Combined with the sparse residual method, reconstruction error is reduced. After optimally solving the model, the particle with the highest similarity is selected as the prediction target. The most appropriate scale is selected according to the multiscale factor method. Experiments show that the proposed algorithm has a strong performance when dealing with motion blur, fast motion, partial occlusion.
Article
Finding representative examples is important for pattern discovery and data analytics. In this letter, we propose a novel formulation for representative selection via center reconstruction on a hypersphere, which makes the selection not affect the center information of given data, thus the overall data distribution can also be easily maintained by those selected representatives. We adopt the proximal gradient strategy and the Fast Iterative Shrinkage Thresholding Algorithm (FISTA) to solve the problem. Compared with most existing methods with cubic time complexity in the number of samples, our method is considerably more efficient, with time complexity reduced to being quadratic. Our formulation has only one parameter. We analyze the behavior of this parameter and analyze its bound theoretically. Experiments on synthesis and real-world datasets validate the effectiveness and efficiency of our method and demonstrate its robustness to noise compared with the state-of-the-art methods.
Article
Full-text available
Robustness and efficiency are the two main goals of existing trackers. Most robust trackers are implemented with combined features or models accompanied with a high computational cost. To achieve a robust and efficient tracking performance, we propose a multi-view correlation tracker to do tracking. On one hand, the robustness of the tracker is enhanced by the multi-view model, which fuses several features and selects the more discriminative features to do tracking. On the other hand, the correlation filter framework provides a fast training and efficient target locating. The multiple features are well fused on the model level of correlation filer, which are effective and efficient. In addition, we raise a simple but effective scale-variation detection mechanism, which strengthens the stability of scale variation tracking. We evaluate our tracker on online tracking benchmark (OTB) and two visual object tracking benchmarks (VOT2014, VOT2015). These three datasets contains more than 100 video sequences in total. On all the three datasets, the proposed approach achieves promising performance.
Article
Full-text available
Visual tracking remains a challenging problem in computer vision due to the intricate variation of target appearances. Some progress made in recent years has revealed that correlation filters, which formulate the tracking process by creating a regressor in the frequency domain, have achieved remarkable experimental results on a large amount of video tracking sequences. On the contrary, building the regressor in the spatial domain directly has been considered as a limited approach since the number of training samples is restricted. And without sufficient training samples, the regressor will have less discriminability. In this paper, we demonstrate that, by giving a very simple positive-negative prior knowledge for the training samples, the performance of the ridge regression model can be improved by a large margin, even better than its frequency domain competitors-the correlation filters, on most challenging sequences. In particular, we build a regressor (or a score function) by learning a linear combination of some selected training samples. The selected samples consist of a large number of negative samples, but a few positive ones. We constrain the combination such that only the coefficients of positive samples are positive, while all coefficients of negative samples are negative. The coefficients are learnt under such a regression setting that makes the outputs fit overlap ratios of the bounding box, where the overlap ratios are measured by calculating the overlaps between the inputs and the estimated position in the last frame. We call this regression exemplar regression because of the novel positive-negative arrangement of the linear combination. In addition, we adopt a non-negative least square approach to solve this regression model. We evaluate our approach on both the standard CVPR2013 benchmark and the 50 selected challenging sequences, which include dozens of state-of-the-art trackers and more than 70 datasets in total. In both of the two experiments, our algorithm achieves a promising performance, which outperforms the state-of-the-art approaches.
Article
Principal component analysis (PCA) is widely used in dimensionality reduction. A lot of variants of PCA have been proposed to improve the robustness of the algorithm. However, the existing methods either cannot select the useful features consistently or is still sensitive to outliers, which will depress their performance of classification accuracy. In this paper, a novel approach called joint sparse principal component analysis (JSPCA) is proposed to jointly select useful features and enhance robustness to outliers. In detail, JSPCA relaxes the orthogonal constraint of transformation matrix to make it have more freedom to jointly select useful features for low-dimensional representation. JSPCA imposes joint sparse constraints on its objective function, i.e., ℓ2,1-norm is imposed on both the loss term and the regularization term, to improve the algorithmic robustness. A simple yet effective optimization solution is presented and the theoretical analyses of JSPCA are provided. The experimental results on eight data sets demonstrate that the proposed approach is feasible and effective.
Article
In multi-object tracking, it is critical to explore the data associations by exploiting the temporal information from a sequence of frames rather than the information from the adjacent two frames. Since straightforwardly obtaining data associations from multi-frames is an NP-hard Multi-Dimensional Assignment (MDA) problem, most existing methods solve this MDA problem by either developing complicated approximate algorithms, or simplifying MDA as a two-dimensional assignment problem based upon the information extracted only from adjacent frames. In this paper, we show that the relation between associations of two observations is the equivalence relation in the data association problem, based on the spatial-temporal constraint that trajectories of different objects must be disjoint. Therefore, the MDA problem can be equivalently divided into independent subproblems by equivalence partitioning. In contrast to existing works for solving the MDA problem, we develop a Connected Component Model (CCM) by exploiting the constraints of the data association and the equivalence relation on the constraints. Based upon CCM, we can efficiently obtain the global solution of the MDA problem for multi-object tracking by optimizing a sequence of independent data association subproblems. Experiments on challenging public datasets demonstrate that our algorithm outperforms the state-of-the-art approaches.
Article
Multi-view non-negative matrix factorization (NMF) has been developed to learn the latent representation from multi-view non-negative data in recent years. To make the representation more meaningful, previous works mainly exploit either the consensus information or the complementary information from different views. However, the latent local geometric structure of each view is always ignored. In this paper, we develop a novel multi-view NMF by patch alignment framework with view consistency. Different from previous works, we take the local geometric structure of each view into consideration, and penalize the disagreement of different views at the same time. More specifically, given a data in each view, we construct a local patch utilizing locally linear embedding to preserve its local geometrical structure, and obtain the global representation under the whole alignment strategy. Meanwhile, for different views, we make the representations of views to approximate the latent representation shared by different views via considering the view consistency. We adopt the correntropy-induced metric to measure the reconstruction error and employ the half-quadratic technique to solve the optimization problem. The experimental results demonstrate the proposed method can achieve satisfactory performance compared with single-view methods and other existing multi-view NMF methods.
Conference Paper
Modeling the target appearance is critical in many modern visual tracking algorithms. Many tracking-by-detection algorithms formulate the probability of target appearance as exponentially related to the confidence of a classifier output. By contrast, in this paper we directly analyze this probability using Gaussian Processes Regression (GPR), and introduce a latent variable to assist the tracking decision. Our observation model for regression is learnt in a semi-supervised fashion by using both labeled samples from previous frames and the unlabeled samples that are tracking candidates extracted from the current frame. We further divide the labeled samples into two categories: auxiliary samples collected from the very early frames and target samples from most recent frames. The auxiliary samples are dynamically re-weighted by the regression, and the final tracking result is determined by fusing decisions from two individual trackers, one derived from the auxiliary samples and the other from the target samples. All these ingredients together enable our tracker, denoted as TGPR, to alleviate the drifting issue from various aspects. The effectiveness of TGPR is clearly demonstrated by its excellent performances on three recently proposed public benchmarks, involving 161 sequences in total, in comparison with state-of-the-arts.