ArticlePDF Available

Long-term target tracking combined with re-detection

Authors:

Abstract

Long-term visual tracking undergoes more challenges and is closer to realistic applications than short-term tracking. However, the performances of most existing methods have been limited in the long-term tracking tasks. In this work, we present a reliable yet simple long-term tracking method, which extends the state-of-the-art learning adaptive discriminative correlation filters (LADCF) tracking algorithm with a re-detection component based on the support vector machine (SVM) model. The LADCF tracking algorithm localizes the target in each frame, and the re-detector is able to efficiently re-detect the target in the whole image when the tracking fails. We further introduce a robust confidence degree evaluation criterion that combines the maximum response criterion and the average peak-to-correlation energy (APCE) to judge the confidence level of the predicted target. When the confidence degree is generally high, the SVM is updated accordingly. If the confidence drops sharply, the SVM re-detects the target. We perform extensive experiments on the OTB-2015 and UAV123 datasets. The experimental results demonstrate the effectiveness of our algorithm in long-term tracking.
R E S E A R C H Open Access
Long-term target tracking combined with
re-detection
Juanjuan Wang
1
, Haoran Yang
1
, Ning Xu
1
, Chengqin Wu
1
, Zengshun Zhao
1,2,3*
, Jixiang Zhang
1*
and
Dapeng Oliver Wu
3
* Correspondence: zhaozs@sdust.
edu.cn;zjxhii@163.com
1
College of Electronic and
Information Engineering, Shandong
University of Science and
Technology, Qingdao 266590, P.R.
China
Full list of author information is
available at the end of the article
Abstract
Long-term visual tracking undergoes more challenges and is closer to realistic applications
than short-term tracking. However, the performances of most existing methods have been
limited in the long-term tracking tasks. In this work, we present a reliable yet simple long-
term tracking method, which extends the state-of-the-art learning adaptive discriminative
correlation filters (LADCF) tracking algorithm with a re-detection component based on the
support vector machine (SVM) model. The LADCF tracking algorithm localizes the target in
each frame, and the re-detector is able to efficiently re-detect the target in the whole
image when the tracking fails. We further introduce a robust confidence degree evaluation
criterion that combines the maximum response criterion and the average peak-to-
correlation energy (APCE) to judge the confidence level of the predicted target. When the
confidence degree is generally high, the SVM is updated accordingly. If the confidence
drops sharply, the SVM re-detects the target. We perform extensive experiments on the
OTB-2015 and UAV123 datasets. The experimental results demonstrate the effectiveness of
our algorithm in long-term tracking.
Keywords: Learning adaptive discriminative correlation filters, Long-term tracking, Re-
detection
1 Introduction
While visual object tracking as a hot research topic in computer vision has been widely
applied in various fields, many challenges are still not resolved especially in target dis-
appearance, partial occlusion, and background clutter, and studying a general and
powerful tracking algorithm is a tough task.
A typical scenario of visual tracking is to track an unknown object in subsequent
image frames by giving the initial state of a target in the first frame of the video. In the
past few decades, visual object tracking technology has made significant progress [1
10]. These methods are very effective for short-term tracking tasks, which the tracked
object is almost always in the field of view. However, in realistic applications, the re-
quirement for tracking is not only to track correctly, but also to track for a longer
period of time [11]. During the period of time, the tracking output is wrong in the ab-
sence of the target objects. And the training samples will be incorrectly annotated,
© The Author(s). 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit
line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a
copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
EURASIP Journal on Advances
in Signal Processing
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2
https://doi.org/10.1186/s13634-020-00713-3
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
which leads to a risk of model drifts. Therefore, it is important to long-term trackers to
determine whether the target is absent and have the capability of re-detection.
Long-term tracking task also requires the tracker as well as short-term tracking
to maintain high accuracy in the challenges of disappearance and occlusion, espe-
cially to stably capture the target object in a long-term video [12]. Therefore, the
long-term tracking presents more challenges from two aspects. The first issue is
how to determine the confidence degree of the tracking results. In [13], the max-
imum response value of the target is used to determine the confidence of the
tracking result. When the maximum peak value of the response map is lower than
the threshold value, the result is determined to be unreliable. However, the re-
sponse map may fluctuate drastically when the object in occlusion or disappear
condition, only using the maximum response value to judge confidence is incredi-
bility. The average peak-to-correlation energy (APCE) criterion in [14] indicates the
degree of fluctuation of the response map. If the target is undergoing fast motion,
the value of APCE will be low even if the tracking is correct. However, the APCE
criterion is commonly used to update trackers in [14]. Secondly, how to relocate
the out-of-view targets remains unresolved. The tracking-learning-detection (TLD)
[15] algorithm exploits an ensemble of weak classifiers for global re-detection of
the out of view. The method fails to classify the target object due to the huge
number of scanning windows. The long-term correlation tracking (LCT) [13]algo-
rithm proposes a random fern re-detection model to detect the out-of-view target.
In [16], it learns a spatial-temporal filter in a lower-dimensional discriminative
manifold to alleviate the influences of boundary effects. But the method still cannot
solve the target disappearance problem.
This paper proposes a tracking algorithm combining the learning adaptive discrim-
inative correlation filter tracker and re-detector. The proposed method aims to perform
robust re-detection and relocate the target when target tracking fails. Our main contri-
butions can be summarized as follows:
i) We propose a stable long-term tracking strategy to track the targets that may dis-
appear or deform heavily in long-term tracking. With the confidence strategy
adopted, the learning adaptive discriminative correlation filters (LADCF) tracks the
accurate target online. And the support vector machine (SVM) is updated when
the confidence degree is generally high. In contrast, if the response maps fluctuate
heavily, the SVM is used as a re-detector to relocate the target.
ii) We not only utilize the maximum response but also adopt the APCE criterion to
the re-detection component. The fusion of the two criteria can accurately deter-
mine the state of the tracker and improve the accuracy of the tracking system.
iii) We evaluate the proposed tracking algorithm on the OTB-2015 [17] and UAV123
[18] datasets; the experimental results show that the proposed algorithm performs
more stable and accurate tracking performance in the case of occlusion, back-
ground clutter, etc. during the long-term tracking.
The structure of the rest of the paper is as follows: Section 2overviews the related
work. Section 3presents the proposed method. Section 4reports the experimental re-
sults and experimental analysis. Section 5concludes the paper.
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 2 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2 Related works
2.1 Correlation filter
Correlation filters have shown outstanding results for target tracking [17,19]. These
methods exploit the circular correlation of the filter in the frequency domain to locate
the target object. Bolme et al. [4] propose the pioneering MOSSE tracker, using only
gray image features to train the filter. The circulant structure of tracking-by-detection
with kernels (CSK) tracker [20] employs the illumination intensity features and applies
DCFs in a kernel space. The kernelized correlation filters (KCF) [6] further improves
CSK by the use of the multi-channel histogram of oriented gradient (HOG) features.
Danelljan et al. [5] exploit the color attributes of the target object and learn an adaptive
correlation filter. The literature [21] proposes a patch-based visual tracker that divides
the object and the candidate area into several small blocks evenly and uses the average
score of the overall small blocks to determine the optimal candidate, which greatly im-
proves under the occlusion circumstances. The literature [22] proposes an online repre-
sentative sample selection method to construct an effective observation module that
can handle occasional large appearance changes or severe occlusion.
The estimation of the target scale is another important aspect for testing an outstand-
ing tracker. It not only improves better performance, but also provides computational
efficiency. The discriminative scale space tracking (DSST) tracker [23] performs trans-
lation estimation and scale estimation separately, using a scale pyramid to respond to
the scale change. Li and Zhu [24] present an effective scale adaptive scheme, which de-
fines a scale pool to turn the samples of each scale into the same size as the initial sam-
ple by the bilinear interpolation method.
The formulation of DCFs exploits the circular correlation which implements learning
efficiently by applying fast Fourier transform (FFT). However, it induces the circular
boundary effects, which has a drastic negative impact on tracking performance. Danell-
jan et al. [25] suggest reducing these boundary effects by introducing a spatial
regularization component. Nevertheless, regularization will make the cost of the model
optimization higher. Galoogahi et al. [26] propose an idea to the pre-multiply a fixed
masking matrix containing the target regions to address such deficiency of DCFs. Then,
they apply the alternating direction method of multipliers (ADMM) [27] algorithm to
solve the constrained optimization problem in real time. The context-aware correlation
filter tracking (CACF) [28] algorithm selects the background reference around the tar-
get by considering the global information and adds the background penalty to the
closed solution of the filter. The discriminative correlation filter with channel and
spatial reliability (CSRDCF) [29] method distinguishes the foreground and background
by segmenting the colors in the search area. The learning adaptive discriminative cor-
relation filters (LADCF) [16] approach adds adaptive spatial feature selection and tem-
poral consistency constraints to alleviate the spatial boundary effects and temporal
filter degradation problems that exist in the DCF method.
2.2 Long-term tracking
Kalal et al. [15] propose a tracking-learning-detection (TLD) algorithm, which decom-
poses the tracking task into tracking, learning, and detection. Among them, tracking
and detection facilitate each other, the short-term tracker provides training examples
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 3 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
for the detector, while the detectors are implemented as a cascade to reduce com-
putational complexity. Enlightened by the TLD framework, Ma et al. [13]propose
a long-term correlation filter tracker using a KCF as a baseline algorithm and a
random fern classifier as a detector. The FCLT-A fully correlational long-term
tracker (FCLT) [30] trains several correlation filters on different time scales as a
detector and exploits the correlation response to link the short-term tracker and
long-term detector.
3 Methods
In this section, we describe our tracker. In Section 3.1, we introduced the main
tracking framework of our algorithm, which is shown in Fig. 1. In Section 3.2,we
introduce the tracker based on LADCF correlation filtering. In Section 3.3,we
introduce the composite evaluation criteria of the confidence degree and the SVM
based re-detector.
Fig. 1 The framework of the algorithm in this paper
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 4 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
3.1 The main framework of the algorithm
The proposed algorithm aims to combine both the DCF tracker and the re-detector for
long-term tracking. First, the baseline correlation filter tracker is adopted to estimate
the translation in the tracking stage. Second, the maximum response value and the
APCE criterion are utilized to judge the confidence level of the target. Finally, when the
value of confidence is higher than the threshold, the baseline tracker achieves the track-
ing target alone. When the confidence level drops sharply, it indicates tracking failure.
We do not update the model and exploit the SVM model to re-detect the target object
in the current frame. The structure of the algorithm in this paper is shown in Fig. 1.
The tracking framework is summarized as follows:
(1) Position and scale detection: We utilize DSST to achieve the target position and
scale prediction. The tth frame target is I
t
, and the filter model is θ
model
. When a
new frame I
t
appears, we extract multiple scale search windows ½Ipatch
tfsg from it,
s=1,2,,S, with Sdenoting the number of scales. For each scale s, the search
window patch is centered around the target center position p
t1
with a size of
a
N
n×a
N
npixels, where ais the scale factor and N¼b
2sS1
2c. The size of the
basic search window size is n×n, which is determined by the target size ω×hand
padding parameter as n¼ð1þϱÞffiffiffiffiffiffiffiffiffiffiffi
ωh
p. So, the bilinear interpolation is
applied to resize each patch into n×n. Then, we extract multi-channel features for
each scale search window as χðsÞϵRD2L. Given the filter template, the response
score can efficiently be calculated in the frequency domain as [16]:
^
fsðÞ¼^
xsðÞ
^
θ
model ð1Þ
After the implementation of the IDFT on each scale, the maximum value of fD2S
is the relative position and scale.
(2) Updating: We adopt the same updating strategy as the traditional DCF method:
θmodel ¼1αðÞθmodel þαθð2Þ
where αis the updating rate. More specifically, since θ
model
is not available in the learn-
ing stage for the first frame, we use a pre-defined mask that only the target region is ac-
tivated to optimize θas in BACF. And then, we initialize θ
model
=θafter the learning
stage of the first frame.
3.2 Correlation filter tracker
In this paper, we set LADCF [16] as the baseline algorithm of our tracking approach.
The LADCF algorithm proposes a new DCF-based tracking method, which utilizes
the adaptive spatial feature selection and temporal consistent constraints to reduce the
impact of spatial boundary effect and temporal filter degradation. The feature selection
process is to select several specific elements in the filter to retain distinguishable and
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 5 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
descriptive information, forming a low-dimensional and compact feature representa-
tion. Considering an n×nimage patch xn2as a base sample for the DCF design, the
circulant matrix for this sample is generated by collecting its full cyclic shifts, XΤ
¼½x1;x2;;xn2Τn2n2with the corresponding Gaussian-shaped regression labels y
¼½y1;y2;;yn2. The spatial feature selection embedded in the learning stage can be
expressed as:
argmin
θ;ϕ
θxykk
2
2þλ1ϕkk
0
s:t:θ¼θϕ¼diag ϕðÞθ;
ð3Þ
where θdenotes the target model in the form of DCF, and denotes the circular con-
volution operator. The indicator vector ϕcan potentially be expressed by θand
ϕ
0
=θ
0
, and diag(ϕ) is the diagonal matrix generated from the indicator vector of
selected features ϕ. The
0
-norm is non-convex, and the
1
-norm is widely used to ap-
proximate the sparsity [24], so a temporal consistency is constructed by
1
-norm relax-
ation spatial feature selection model [16]:
argmin
θ
θxy
kk
2
2þλ1θ
kk
1þλ2θθmodel
kk
1ð4Þ
where λ
1
and λ
2
are tuning parameters, and λ
1
<<λ
2
.θ
model
denotes the model parame-
ters estimated from the previous frame.
The
2
-norm relaxation is adopted to further simplify the following expression:
argmin
θ
θxy
kk
2
2þλ1θ
kk
1þλ2θθmodel
kk
2
2ð5Þ
where the lasso regularization controlled by λ
1
select the spatial feature. In the above
formula, the filter template model is used to increase smoothness between consecutive
frames to promote time consistency. In this way, the temporal consistency of spatial
feature selection can be preserved to extract and retain the diversity of the static and
dynamic appearance.
Since the multi-channel features share the same spatial layout [16], the multi-channel
input is represented as Χ={x
1
,x
2
,,x
L
}, and the corresponding filter is represented as
θ={θ
1
,θ
2
,,θ
L
}. By minimization, the goal can be extended to multi-channel functions
with structured sparsity [16]:
argmin
θXL
i¼1θixiy
kk
2
2þλ1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
XL
i¼1θiθi
r
1þλ2XL
i¼1θiθmodel i
kk
2
2ð6Þ
where θ
j
is the jth element of the ith channel feature vector θiD2.denotes the
element-wise multiplication operator. The structured spatial feature selection term cal-
culates the
2
-norm of each spatial location and then executes the
1
-norm to achieve
joint sparsity.
Subsequently, utilizing ADMM [27] to optimize the above formula, we introduce the
relaxation variables to construct the goals based on convex optimization [31]. Then, we
could obtain the global optimal solution of the model through ADMM and form an
enhanced Lagrange operator [16]:
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 6 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
L¼X
L
i¼1
θixiy
kk
2
2þλ1X
D2
j¼1
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
L
i¼1
θj
i

2
v
u
u
t
1
þλ2X
L
i¼1
θiθmodel i
kk
2
2
þμ
2XL
i¼1θiθ0
iþηi
μ
2
2ð7Þ
where H¼fη1;η2;;ηLgare the Lagrange multipliers, and μ> 0 is the corresponding
penalty parameter controlling the convergence rate [16,32]. As Lis convex, ADMM is
exploited iteratively to optimize the following sub-problems with guaranteed
convergence:
θ¼arg min
θ
Lθ;θ0
;H;μ

θ0¼arg min
θ0
Lθ;θ0
;H;μ

H¼arg min
H
Lθ;θ0
;H;μ

8
>
>
>
>
<
>
>
>
>
:
ð8Þ
3.3 Re-detector
3.3.1 Confidence criterion
Most existing trackers do not consider whether the detection is accurate or not. In fact,
once the target is detected incorrectly in the current frame, severely occluded, or com-
pletely missing, this may cause the tracking failure in subsequent frames.
We introduce a measure to determine the confidence degree of the target objects,
which is the first step in the re-detection model. The peak value and the fluctuation of
the response map can reveal the confidence about the tracking results. The ideal re-
sponse map should have only one peak while all the other regions are smooth. Other-
wise, the response map will fluctuate intensely. If we continue to use the uncertain
samples to track the target in the subsequent frames, the tracking model will be
destroyed. Thus, we exploit to fuse two confidence degree evaluation criteria. The first
one is the maximum response value F
max
of the current frame.
The second one is the APCE measure which is defined as:
APCE ¼Fmax Fmin
jj
2
mean Pw;hFw;hFmin

2
 ð9Þ
where the F
max
and F
min
are the maximum response and the minimum response of the
current frame, respectively. F
w,h
is the element value of the wth row and the hth col-
umn of the response matrix.
If the target is moving slowly and is easily distinguishable, the APCE value is gener-
ally high. However, if the target is undergoing fast motion with significant deforma-
tions, the value of APCE will be low even if the tracking is correct.
3.3.2 Target re-detection
In this section, we describe the re-detection mechanism used in the case of tracking
failure. In the re-detection module, when the confidence level is lower than the thresh-
old, the SVM [33] is used for re-detection. Considering a sample set (x
1
,y
1
), (x
2
,y
2
), ,
(x
i
,y
i
), ,x
i
R
d
, including positive and negative samples, where dis the dimension of
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 7 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
the sample, y
i
(+1, 1) is sample labels, SVM can make segmentation of positive and
negative samples to obtain the best classification hyperplane. The classification plane is
defined as [33]:
ωΤxþb¼0ð10Þ
where ωrepresents the weight vector, and bdenotes the bias term. In the case of the
linearly classifiable, for a given dataset Tand classification hyperplane, the following
formula is used for classification judgment:
ωΤxþb1;yi¼1
ωΤxþb1;yi¼þ1
ð11Þ
Combining the two equations, we can abbreviate it as:
yωΤxþb

1ð12Þ
The distance from each support vector to the hyperplane can be written as:
d¼ωΤxþb
ω
kk ð13Þ
The problem of solving the maximum partition hyperplane of the SVM model can be
expressed as the following constrained optimization problem:
min 1
2ω
kk
2
s:t:yiωΤxiþb

1ð14Þ
Next, the paper introduces the Lagrangian function to solve the above problem [33].
Lðω;λ;cÞ¼1
2ω2Xl
j¼1ciyiðωxiþbÞþXl
i¼1ci
ð15Þ
where c
i
> 0 is the Lagrange multiplier, the solution of the optimization problem satis-
fies the partial derivative of L(ω,λ,c)toωand bbe 0. The corresponding decision func-
tion is expressed as:
fðxÞ¼signðωxþbÞ¼signXl
j¼1c
jyjðxjxiÞÞþbgð16Þ
Then, the new sample points are imported into the decision function to get the sam-
ple classification.
In the case of linear inseparability, we use the kernel function to map it to the high-
dimensional space. In this work, we use the Gaussian kernel function as follows:
kx
i;xj

¼e
xixj
kk
2
2σ2
 ð17Þ
When a frame is re-detected, an exhaustive search is performed on the current frame
using a sliding window, and the HOG features are extracted for each image patch as
the Χvector in the above formula. And the f(x) is calculated by formula (16). Then, we
obtain the sample area with the largest f(x). When the response value is greater than
the threshold, it will be used as the location of the tracking target again.
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 8 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
The training process of SVM is as follows [33]. By the confidence level, we determine
the quality of the sample. Then, samples with high confidence are used as the positive
samples, and samples with low confidence are used as the negative samples. The HOG
features from positive and negative samples are extracted to obtain the feature vectors.
The feature vectors are represented as (x
i
,y
i
), i=1, 2, ,n, where ndenotes the num-
ber of training samples, x
i
represents the HOG feature vector, and y
i
represents the at-
tribute of the extracted sample. If the training sample is positive, then y
i
= 1, and if the
sample is negative, then y
i
=1. For the binary classification problem of our samples,
the loss function is defined as formula (18).
Lossðx;y;ωÞ¼maxð0;1yðxωÞÞ ð18Þ
When the value of loss is negative, the parameters of SVM are updated as follows.
ω¼Xl
j¼1c
jyjxjð19Þ
b¼yiXl
j¼1yjc
jðxjxiÞð20Þ
where c
j
is the Lagrangian coefficient, xis the feature vector extracted from the sample,
and y is the label corresponding to the sample.
4 Experimental results and discussion
In this section, we evaluate the proposed algorithm on OTB-2015 and UAV123 bench-
marks [17] with comparisons to other detection-based tracking algorithms and classical
correlation filtering tracking algorithms. Section 4.1 introduces the experimental platform
and parameter settings of the experiments. Section 4.2 introduces the experimental data-
sets and the evaluation criteria for the experiments. Section 4.3 describes the quantitative
evaluation of the results and describes the qualitative evaluation in Section 4.4
4.1 Experimental setups
The experimental software environment is MATLAB R2016a, and the hardware envir-
onment is Intel Core i5-4200M processor, 4GB memory, Windows 8 operating system.
The regularization parameters λ
1
and λ
2
are set to 1 and 15, respectively; the initial pen-
alty parameter μ= 1; the maximum penalty parameter μ
max
= 20; the maximum number
of iterations K= 2; the padding parameter as ϱ= 4; the scale factor as a=1.01; the thresh-
old for re-detection is set to tr = 0.13; and the update threshold is set to tu =0.20.
4.2 Experimental datasets and evaluation criteria
The OTB-2015 dataset has a total of 100 video sequences, including 11 challenges,
namely, illumination variation (IV), scale variation (SV), occlusion (OCC), deformation
(DEF), motion blur (MB), fast motion (FM), in-plane rotation (IPR), out-of-plane rota-
tion (OPR), out-of-view (OV), background clutter (BC), and low resolution (LR). The
UAV123 consists of 143 challenging sequences, including 123 short-term sequences
and 20 long-term sequences. Their evaluation criteria adopt the distance precision and
overlap precision in one-pass evaluation (OPE) as the criteria of the evaluation algo-
rithm. The overlap precision is defined as the percentage of overlap ratios exceeding
0.5. The distance precision shows the percentage of location error within 20 pixels.
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 9 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
4.3 Quantitative evaluation
In this paper, we compare our algorithm with 6 state-of-the-art trackers on the OTB-
2015 dataset, including 2 tracking-by-detection algorithms, such as LCT [13] and large
margin object tracking with circulant feature maps (LMCF) [14], and 4 mainstream
correlation filtering tracking algorithms, such as CSK [20], KCF [6], DSST [23], and
background-aware correlation filters (BACF) [26]. Figure 2shows the OPE success rate
and precision plots of these algorithms. It can be seen from Fig. 2that the proposed al-
gorithm has been significantly improved compared with other algorithms. The preci-
sion and success rate of our method are 81.4% and 59.9%, respectively. Through
experiments, we found that the short-term target trackers learn some wrong informa-
tion, when the target is occluded or disappears. Thus, the template is polluted by the
wrong information and unable to track the target correctly in subsequent frames.
Therefore, compared with the BACF algorithm, our method improves the precision
and success rate by 14.8% and 7.8%, respectively. The LCT exploits the random fern al-
gorithm to re-detect targets, which is slow to operate. Thus compared with the
tracking-by-detection LCT algorithm, the proposed algorithm improves the precision
and success rate by 8% and 9.3%, respectively. Compared with the LMCF algorithm
with multi-peak detection, our method increased the precision and success rate by
11.2% and 11.1%, respectively.
In order to further verify the superiority of our method, we analyze the tracking per-
formance through attribute-based comparison in Table 1, which shows the area under
the curve (AUC) scores of the success plots with 11 different attributes.
As shown in Table 1, the proposed algorithm in this paper achieves the best perform-
ance on 11 attributes. In the case of OCC, our algorithm score is 10.1% higher than
that of the LMCF algorithm (tracking-by-detection style) and 12% higher than the algo-
rithm BACF algorithm (short-term correlation filtering style). For FM images, our algo-
rithm is 4.6% higher than the second-ranked BACF algorithm and 5.1% higher than the
LCT algorithm using random fern re-detection. In the above condition, the target
model may be contaminated, which makes target tracking difficult. Meanwhile, our
model can solve this problem by accurate re-detection via SVM. In the case of OPR,
LCT achieves a score of 48.5%. And our tracker provides a gain of 8.7%. This is because
the baseline algorithm applied in this paper solves the influence of boundary effects to
Fig. 2 Precision and success rate plots of the proposed method and state-of-art methods over OTB-2015
benchmark sequences using one-pass evaluation (OPE)
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 10 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
a certain extent and can achieve higher accuracy when the target rotation occurs. In
the case of OV, the score of our algorithm is 50.7%, which is 3.9% higher than the
BACF algorithm. The reason is that our template stops updating when the target goes
out of view; the SVM is used to detect the target again. When the target reappears in
the field of view, our model is not contaminated and can continue tracking the target
correctly.
Furthermore, we present the OPE success rate and precision plots on UAV123 in Fig. 3.
As shown in Fig. 3, our method beats other algorithms on the UAV123 dataset. Spe-
cifically, our method achieves the AUC scores of 65.2% and 46.1%, which is better than
LCT by 13.1% and 13.4%. At the same time, the proposed method is 16.1% and 10.5%
higher than BACF, because the proposed re-detection approach provides a novel solu-
tion to re-detect the low-confidence targets to improve tracking accuracy.
4.4 Qualitative evaluation
We selected 7 representative benchmark sequences from OTB-2015 to demonstrate
the effectiveness of our algorithm. The visual evaluation results are shown in Fig. 4.As
it can be seen from Fig. 4, in the Joggersequence, the target is blocked at the 70th
Table 1 The AUC scores of success plots on OTB-2015 sequences with different attributes
CSK KCF DSST BACF LCT LMCF Ours
IV 0.357 0.470 0.533 0.523 0.509 0.524 0.610
SV 0.299 0.385 0.454 0.505 0.420 0.456 0.563
OCC 0.313 0.429 0.452 0.456 0.462 0.475 0.576
DEF 0.309 0.404 0.414 0.465 0.457 0.446 0.513
MB 0.287 0.431 0.439 0.505 0.498 0.471 0.548
IPR 0.354 0.441 0.482 0.475 0.511 0.453 0.565
FM 0.297 0.434 0.422 0.489 0.484 0.447 0.535
OPR 0.332 0.440 0.450 0.483 0.485 0.469 0.572
OV 0.230 0.371 0.350 0.468 0.423 0.440 0.507
BC 0.382 0.481 0.503 0.539 0.501 0.502 0.597
LR 0.248 0.290 0.381 0.502 0.281 0.399 0.526
Fig. 3 Precision and success rate plots of the proposed method and state-of-art methods over UAV123
benchmark sequences using one-pass evaluation (OPE)
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 11 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
frame and the target reappears in the field of view at the 84th frame. Due to the re-
detection mechanism, our tracker can track the target correctly. But the short-time cor-
relation filter tracking algorithm learns error information during occlusion, which leads
to tracking errors in subsequent frames. In the Soccerand Matrixsequences, due to
background clutter, the algorithms such as LCT and BACF lose the target. In contrast,
the proposed algorithm can successfully handle such situations. In the Car4sequence,
due to the scale change problem, the scale-based DSST algorithm and the proposed al-
gorithm both show better performance. In the Shakingsequence, the proposed algo-
rithm loses its target in the 17th frame due to issues such as similar lighting changes
and background. However, owing to the supplement of a re-detection mechanism, the
proposed algorithm relocates the target at the 18th frame and keeps tracking correctly.
In the Boltsequence, our algorithm follows the target very closely even in the case of
rapid motion of the target. In the Dogsequence, when the target is deformed, our al-
gorithm can accurately track the target, while the BACF and LMCF algorithms have a
Fig. 4 The tracking results of each algorithm on 7 video sequences (from top to bottom are Jogging,
Soccer, Matrix, Car4, Shaking, Bolt, Dog, respectively)
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 12 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
certain offset. It can be seen from the above description that our algorithm achieves
higher accuracy in these 7 sequences.
Furthermore, we compare our method with the baseline tracker using 7 representa-
tive benchmark sequences of OTB-2015 in Fig. 5. The first three rows are short-term
Fig. 5 The performance comparison of two algorithms on 6 video sequences (from top to bottom are
Soccer, Ironman, Bird1, Sylvester, Lemming, Rubik, Liquor, respectively)
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 13 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
sequences which none of which exceeds 1000 frames, and the last four rows are long-
term sequences, which all exceed 1000 frames.
As shown in Fig. 5, in the experiments for the short-term sequences, the LADCF
tracker drifts when the target objects undergo heavy occlusions (Soccer) and does not
re-detect targets in the case of tracking failure. Moreover, the LADCF tracker fails to
handle background clutter and deformation (Ironman, Bird1), since only the tracking
component without the re-detection mechanism makes it less effective to discriminate
targets from the cluttered background. In contrast, our method can track the object
correctly on these challenging sequences because the trained detector effectively re-
detects the target objects.
In the Sylvester and Lemming sequences, the LADCF algorithm tracks incorrectly
due to the rotating conditions encountered in these sequences, while our method pro-
vides better robustness to these conditions. In the Liquor sequence, the LADCF track-
ing algorithm is similar to our algorithm before the target is occluded. But when the
target is occluded, the LADCF method fails to locate the occluded target. In the Rubik
sequence, since the target object has undergone deformation and color variation at the
854th frame, the LADCF tracker fails to track correctly. Our method is able to track
successfully due to re-detection. In our method, if the tracking fails, we perform the re-
detection procedure and initialize the tracker so that the target can be re-detected.
Thus, our method can correctly track the target all the time.
Overall, our method performs well in estimating the positions of the target objects,
which can be attributed to three reasons. Firstly, the combined confidence criterion of
our method can correctly identify the target even in very low-confidence cases. Sec-
ondly, our re-detection component effectively re-detects the target objects in the case
of tracking failure. Thirdly, our baseline tracker achieves adaptive discriminative learn-
ing ability on a low-dimensional manifold and improves the tracking effect.
5 Conclusions
This paper proposes a long-term target tracking algorithm, where the two main com-
ponents are a state-of-the-art LADCF short-term tracker which estimates the target
translation and a re-detector which re-detect the target objects in the case of tracking
failure. Besides, the algorithm introduces a robust confidence criterion to evaluate the
confidence value of the predicted target. When the confidence value is lower than the
specified threshold, the SVM model is utilized to re-detect the target objects and the
template is not updated. The algorithm is suitable for long-term tracking because it can
detect the target accurately in real time and update the template with high reliability.
Numerous experimental results show that the proposed algorithm achieves better per-
formances than the other tracking algorithms.
Abbreviations
LADCF: Learning adaptive discriminative correlation filters; APCE: Average peak-to-correlation energy; SVM: Support
vector machine; OPE: One-pass evaluation
Acknowledgements
Thanks to the anonymous reviewers and editors for their hard work.
Authorscontributions
ZZ and DOW proposed the original idea of the full text. JZ and JW designed the experiment. JW and NX performed
the experiment. JW and HY wrote the manuscript under the guidance of ZZ. CW, JZ, and JW revised the manuscript.
All authors read and approved this submission.
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 14 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Funding
This work was supported in part by the China Postdoctoral Science Special Foundation Funded Project (2015T80717),
the Natural Science Foundation of Shandong Province (ZR2020MF086).
Availability of data and materials
The datasets used during the current study are the OTB2015 dataset [17] and the UAV123 dataset [18], which are
available online or from the corresponding author on reasonable request.
Competing interests
The authors declare that they have no competing interests.
Author details
1
College of Electronic and Information Engineering, Shandong University of Science and Technology, Qingdao 266590,
P.R. China.
2
School of Control Science & Engineering, Shandong University, Jinan 250061, China.
3
Department of
Electrical & Computer Engineering, University of Florida, Gainesville, FL 32611, USA.
Received: 29 July 2020 Accepted: 11 December 2020
References
1. D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach.
Intell. 24(5), 603619 (2002)
2. M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters for online nonlinear/non-Gaussian
Bayesian tracking. IEEE Trans. Signal Process. 50(2), 174188 (2002)
3. H. Yang, J. Wang, Y. Miao, Y. Yang, Z. Zhao, Z. Wang, Q. Sun, D.O. Wu, Combining spatio-temporal context and Kalman
filtering for visual tracking. Mathematics 7(11), 113 (2019)
4. D.S. Bolme, J.R. Beveridge, B.A. Draper, Y.M. Lui, in IEEE Computer Society Conference on Computer Vision and Pattern
Recognition. Visual object tracking using adaptive correlation filters, (IEEE, San Francisco, 2010), pp. 25442550
5. M. Danelljan, F. Shahbaz Khan, M. Felsberg, J. Van de Weijer, in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition. Adaptive color attributes for real-time visual tracking (2014), pp. 10901097
6. J.F. Henriques, R. Caseiro, P. Martins, J. Batista, High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern
Anal. Mach. Intell. 37(3), 583596 (2015)
7. H. Nam, B. Han, in 2016 IEEE Conference on Computer Vision and Pattern Recognition. Learning multi-domain
convolutional neural networks for visual tracking (2016), pp. 42934302
8. L. Bertinetto, J. Valmadre, J.F. Henriques, et al., in European Conference on Computer Vision Workshop. Fully-convolutional
Siamese networks for object tracking, vol 9914 (2016), pp. 850865
9. E. Gundogdu, A.A. Alatan, Good features to correlate for visual tracking. IEEE Trans. Image Process. 27(5), 25262540
(2018)
10. M. Asadi, C.S. Regazzoni, Tracking using continuous shape model learning in the presence of occlusion. EURASIP J. Adv.
Signal Process. 2008, 250780 (2008)
11. T. Li, S. Zhao, Q. Meng, et al., A stable long-term object tracking method with re-detection strategy. Pattern Recognit.
Lett. 127, 119127 (2018)
12. B. Yan, H. Zhao, D. Wang, H. Lu, X. Yang, in IEEE/CVF International Conference on Computer Vision.Skimming-perusal
tracking: a framework for real-time and robust long-term tracking (2019), pp. 23852393
13. C. Ma, X. Yang, C. Zhang, M.H. Yang, in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern
Recognition. Long-term correlation tracking (2015), pp. 53885396
14. M. Wang, Y. Liu, Z. Huang, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Large
margin object tracking with circulant feature maps (2017), pp. 48004808
15. Z. Kalal, K. Mikolajczyk, J. Matas, Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 14091422
(2012)
16. T. Xu et al., Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature
selection for robust visual object tracking. IEEE Trans. Image Process. 28(11), 55965609 (2019)
17. Y. Wu, J. Lim, M.H. Yang, Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 18341848 (2015)
18. M. Mueller, N. Smith, B. Ghanem, in European Conference on Computer Vision. A benchmark and simulator for UAV
tracking, (Springer, Amsterdam, 2016), pp. 445461
19. Y. Wu, J. Lim, M.H. Yang, in Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Online
object tracking: a benchmark (2013), pp. 24112418
20. J.F. Henriques, R. Caseiro, P. Martins, J. Batista, in European Conference on Computer Vision. Exploiting the circulant
structure of tracking-by-detection with kernels (2012), pp. 702715
21. W. Ou, D. Yuan, D. Li, et al., Patch-based visual tracking with online representative sample selection. J. Electron. Imaging
26(3), 033006 (2017)
22. W. Ou, D. Yuan, Q. Liu, et al., Object tracking based on online representative sample selection via non-negative least
square. Multimed. Tools Appl. 77(9), 1056910587 (2018)
23. M. Danelljan, G. Häger, F.S. Khan, M. Felsberg, in Proceedings of the British Machine Vision Conference. Accurate scale
estimation for robust visual tracking, (BMVA Press, Nottingham, 2014), pp. 15
24. Y. Li, J. Zhu, in European Conference on Computer Vision Workshop. A scale adaptive kernel correlation filter tracker with
feature integration, (Springer, Zurich, 2014), pp. 254265
25. M. Danelljan, G. Hager, F. Shahbaz Khan, M. Felsberg, in Proceedings of the IEEE International Conference on Computer
Vision. Learning spatially regularized correlation filters for visual tracking (2015), pp. 43104318
26. H. Kiani Galoogahi, A. Fagg, S. Lucey, in Proceedings of the IEEE International Conference on Computer Vision. Learning
background-aware correlation filters for visual tracking (2017), pp. 11351143
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 15 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
27. S. Boyd, N. Parikh, E. Chu, et al., Distributed optimization and statistical learning via the alternating direction method of
multipliers. Found. Trends Mach. Learn. Now Publishers Inc 3(1), 1122 (2011)
28. M. Mueller, N. Smith, B. Ghanem, in IEEE Conference on Computer Vision & Pattern Recognition. Context-aware correlation
filter tracking (2017), pp. 13871395
29. A. Lukezic, T. Vojir, L.C. Zajc, J. Matas, M. Kristan, in IEEE Conference on Computer Vision and Pattern Recognition.
Discriminative correlation filter with channel and spatial reliability (2017), pp. 48474856
30. A. Lukežič,L.Čehovin Zajc, T. Vojíř, J. Matas, M. Kristan, in Asian Conference on Computer Vision. FCLT - a fully-
correlational long-term tracker (2017)
31. R. Jenatton, J. Mairal, et al., Structured sparsity through convex optimization. Stat. Sci. 27(4), 450468 (2012)
32. D.P. Bertsekas, Constrained optimization and Lagrange multiplier methods, (Academic, Pittsburgh, 1982)
33. T. Joachims, in Advances in kernel methods support vector learning. Chapter 11, ed. by B. Scholkopf, C. Burges, A. Smola.
Making large-scale SVM learning practical (MIT Press, Cambridge, 1999), pp. 169184
PublishersNote
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Wang et al. EURASIP Journal on Advances in Signal Processing (2021) 2021:2 Page 16 of 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Earlier, researchers have proposed various algorithms to solve the occlusion in tracking process. Wang [26] combines a tracking algorithm with a detection algorithm to solve the problem of re-detecting the target when it is obscured or lost, but the tracking accuracy is low. Yang [27] proposed the object tracking algorithm based on spatiotemporal context information, which utilizes the background information of the target location, and uses it to track when the target is occluded to avoid losing the target, but its tracking robustness is not strong for the case of complete occlusion. ...
... A multichannel version of MOSSE was also investigated in [29]. More discriminative features are widely used, such as HOG [22], CN [21], and deep CNN features [26]. In addition, particle filter-based method [6], long-term tracking [30], and continuous convolution [9] have also been to be developed to improve tracking accuracy and robustness. ...
Article
Full-text available
Despite the impressive performance of correlation filter-based trackers in terms of robustness and accuracy, the trackers have room for improvement. The majority of existing trackers use a single feature or fixed fusion weights, which makes it possible for tracking to fail in the case of deformation or severe occlusion. In this paper, we propose a multi-feature response map adaptive fusion strategy based on the consistency of individual features and fused feature. It is able to improve the robustness and accuracy by building the better object appearance model. Moreover, since the response map has multiple local peaks when the target is occluded, we propose an anti-occlusion mechanism. Specifically, if the nonmaximal local peak is satisfied with our proposed conditions, we generate a new response map which is obtained by moving the center of the region of interest to the nonmaximal local peak position of the response map and re-extracting features. We then select the response map with the largest response value as the final response map. This proposed anti-occlusion mechanism can effectively cope with the problem of tracking failure caused by occlusion. Finally, by adjusting the learning rate in different scenes, we designed a high-confidence model update strategy to deal with the problem of model pollution. Besides, we conducted experiments on OTB2013, OTB2015, TC128 and UAV123 datasets and compared them with the current state-of-the-art algorithms, and the proposed algorithms have impressive advantages in terms of accuracy and robustness.
... These methods significantly improved tracking precision but have a low tracking frame rate even in high-end desktops. Wang et al. [24] presented a long-term target tracking method by combining adaptive discriminative correlation filters with a support vector machine-based component. Siam-RM [25] has an object-tracking framework that uses the Siamese network and adopts the Siamese instance search tracker as the re-detection network. ...
Article
Full-text available
While the robotics techniques have not developed to full automation, robot following is common and crucial in robotic applications to reduce the need for dedicated teleoperation. To achieve this task, the target must first be robustly and consistently perceived. In this paper, a robust visual tracking approach is proposed. The approach adopts a scene analysis module (SAM) to identify the real target and similar distractors, leveraging statistical characteristics of cross-correlation responses. Positive templates are collected based on the tracking confidence constructed by the SAM, and negative templates are gathered by the recognized distractors. Based on the collected templates, response fusion is performed. As a result, the responses of the target are enhanced and the false responses are suppressed, leading to robust tracking results. The proposed approach is validated on an outdoor robot-person following dataset and a collection of public person tracking datasets. The results show that our approach achieved state-of-the-art tracking performance in terms of both the robustness and AUC score.
... The major difference between ST and LT tracking is that in ST tracking, the target is always in view, whereas in LT tracking, the target may disappear from the view, and may reenter in the view. LT is still a major challenge for visual-tracking researchers as this requires a huge database to train trackers [93]. ...
Article
Full-text available
Particle Filter is one of the widely used techniques in visual tracking as it can model a dynamic environment with non-linear motions and multimodal non-Gaussian noises. Many decades of active research in visual tracking using particle filter has improved its various techniques, such as importance proposal, particle degeneracy and impoverishment, parallel implementation of resampling and weight update, data association, and target labelling, to make particle filter more accurate and efficient. In the last decade, many attempts have been reported which integrate the particle filter with the convolutional neural network. This integration has produced more accurate visual trackers as compared to traditional particle filter-based techniques. However, there are many unresolved challenges, such as variations in illumination, rapid and sudden change in motion, deformation of targets, complex and cluttered dynamic environment that need further research. Multiple target tracking is posing additional problems, such as identification and labeling of targets, track model drifting, misdetection, and computational explosion with the increase in the number of targets. In this paper, a review of recent advances, specifically in the last decade, in single target tracking and multiple targets tracking using particle filter is reported and research gaps are identified to give impetus to further research.
Article
Full-text available
As one of the core contents of intelligent monitoring, target tracking is the basis for video content analysis and processing. In visual tracking, due to occlusion, illumination changes, and pose and scale variation, handling such large appearance changes of the target object and the background over time remains the main challenge for robust target tracking. In this paper, we present a new robust algorithm (STC-KF) based on the spatio-temporal context and Kalman filtering. Our approach introduces a novel formulation to address the context information, which adopts the entire local information around the target, thereby preventing the remaining important context information related to the target from being lost by only using the rare key point information. The state of the object in the tracking process can be determined by the Euclidean distance of the image intensity in two consecutive frames. Then, the prediction value of the Kalman filter can be updated as the Kalman observation to the object position and marked on the next frame. The performance of the proposed STC-KF algorithm is evaluated and compared with the original STC algorithm. The experimental results using benchmark sequences imply that the proposed method outperforms the original STC algorithm under the conditions of heavy occlusion and large appearance changes.
Article
Full-text available
We propose FCLT - a fully-correlational long-term tracker. The two main components of FCLT are a short-term tracker which localizes the target in each frame and a detector which re-detects the target when it is lost. Both the short-term tracker and the detector are based on correlation filters. The detector exploits properties of the recent constrained filter learning and is able to re-detect the target in the whole image efficiently. A failure detection mechanism based on correlation response quality is proposed. The FCLT is tested on recent short-term and long-term benchmarks. It achieves state-of-the-art results on the short-term benchmarks and it outperforms the current best-performing tracker on the long-term benchmark by over 18%.
Article
Full-text available
Occlusion is one of the most challenging problems in visual object tracking. Recently, a lot of discriminative methods have been proposed to deal with this problem. For the discriminative methods, it is difficult to select the representative samples for the target template updating. In general, the holistic bounding boxes that contain tracked results are selected as the positive samples. However, when the objects are occluded, this simple strategy easily introduces the noises into the training data set and the target template and then leads the tracker to drift away from the target seriously. To address this problem, we propose a robust patch-based visual tracker with online representative sample selection. Different from previous works, we divide the object and the candidates into several patches uniformly and propose a score function to calculate the score of each patch independently. Then, the average score is adopted to determine the optimal candidate. Finally, we utilize the non-negative least square method to find the representative samples, which are used to update the target template. The experimental results on the object tracking benchmark 2013 and on the 13 challenging sequences show that the proposed method is robust to the occlusion and achieves promising results.
Article
Full-text available
During the recent years, correlation filters have shown dominant and spectacular results for visual object tracking. The types of the features that are employed in these family of trackers significantly affect the performance of visual tracking. The ultimate goal is to utilize robust features invariant to any kind of appearance change of the object, while predicting the object location as properly as in the case of no appearance change. As the deep learning based methods has emerged, the study of learning features for specific tasks has accelerated. For instance, discriminative visual tracking methods based on deep architectures have been studied with promising performance. Nevertheless, correlation filter based (CFB) trackers confine themselves to use the pre-trained networks which are trained for object classification problem. To this end, in this manuscript the problem of learning deep fully convolutional features for the CFB visual tracking is formulated. In order to learn the proposed model, a novel and efficient backpropagation algorithm is presented based on the loss function of the network. The proposed learning framework enables the network model to be flexible for a custom design. Moreover, it alleviates the dependency on the network trained for classification. Extensive performance analysis shows the efficacy of the proposed custom design in the CFB tracking framework. By fine-tuning the convolutional parts of a state-of-the-art network and integrating this model to a CFB tracker, which is the winner of VOT2016, 18% increase is achieved in terms of expected average overlap, and tracking failures are decreased by 25%, while maintaining the superiority over the state-of-the-art methods in OTB-2013 and OTB-2015 tracking datasets.
Article
Full-text available
In the most tracking approaches, a score function is utilized to determine which candidate is the optimal one by measuring the similarity between the candidate and the template. However, the representative samples selection in the template update is challenging. To address this problem, in this paper, we treat the template as a linear combination of representative samples and propose a novel approach to select representative samples based on the coefficient constrained model. We formulate the objective function into a non-negative least square problem and obtain the solution utilizing standard non-negative least square. The experimental results show that the observation module of our approach outperforms several other observation modules under the same feature and motion module, such as support vector machine, logistic regression, ridge regression and structured outputs support vector machine.
Conference Paper
Full-text available
Correlation filter (CF) based trackers have recently gained a lot of popularity due to their impressive performance on benchmark datasets, while maintaining high frame rates. A significant amount of recent research fo-cuses on the incorporation of stronger features for a richer representation of the tracking target. However, this only helps to discriminate the target from background within a small neighborhood. In this paper, we present a framework that allows the explicit incorporation of global context within CF trackers. We reformulate the original optimization problem and provide a closed form solution for single and multi-dimensional features in the primal and dual domain. We demonstrate with extensive experiments that this framework can significantly improve the performance of many CF trackers with only a modest impact on their frame rate.
Article
With efficient appearance learning models, Discriminative Correlation Filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filers. Consequently, the process of learning spatial filters can be approximated by the lasso regularisation. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimisation framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches.
Article
In this work, we proposed a long-term tracking strategy to deal with the occlusion, out-of-plane rotation, and the confusing non-target object. Our tracking system is composed of two parts, the CA-CF tracker, an efficient correlation method for short-term tracking, and the SVM-based re-detector, which prevents the CA tracker from degradation. When the tracker works with confidence, the CA-CF module ensures an accurate tracking result and the SVM updates accordingly. When the response maps fluctuate heavily, the SVM switches to work as a re-detector and the tracker will be initialized. We also introduced to adopt both the maximum response criterion and the APCE criterion to judge the performance of the tracker in time. By evaluating our algorithm on the OTB benchmark datasets, we proposed to analyze the result affected by the parameters of our CA-CF-SVM strategy. The experimental results show that our method has a significant improvement than the state-of-the-art methods for the long-term tracking both in accuracy and robustness.