ArticlePDF Available

Abstract and Figures

Sparse representation has been applied successfully in abnormal event detection, in which the baseline is to learn a dictionary accompanied by sparse codes. While much emphasis is put on discriminative dictionary construction, there are no comparative studies of sparse codes regarding abnormality detection. We comprehensively study two types of sparse codes solutions - greedy algorithms and convex L1-norm solutions - and their impact on abnormality detection performance. We also propose our framework of combining sparse codes with different detection methods. Our comparative experiments are carried out from various angles to better understand the applicability of sparse codes, including computation time, reconstruction error, sparsity, detection accuracy, and their performance combining various detection methods. Experiments show that combining OMP codes with maximum coordinate detection could achieve state-of-the-art performance on the UCSD dataset [14].
Content may be subject to copyright.
arXiv:1603.04026v1 [cs.CV] 13 Mar 2016
Huamin Ren1, Hong Pan2, Søren Ingvor Olsen3, Thomas B. Moeslund4
1,4Department of Architecture, Design and Media Technology, Aalborg University, Denmark
2,3Department of Computer Science, University of Copenhagen, Denmark
Sparse representation has been applied successfully in abnor-
mal event detection, in which the baseline is to learn a dic-
tionary accompanied by sparse codes. While much empha-
sis is put on discriminative dictionary construction, there are
no comparative studies of sparse codes regarding abnormal-
ity detection. We comprehensively study two types of sparse
codes solutions - greedy algorithms and convex L1-norm so-
lutions - and their impact on abnormality detection perfor-
mance. We also propose our framework of combining sparse
codes with different detection methods. Our comparative ex-
periments are carried out from various angles to better un-
derstand the applicability of sparse codes, including compu-
tation time, reconstruction error, sparsity, detection accuracy,
and their performance combining various detection methods.
Experiments show that combining OMP codes with maxi-
mum coordinate detection could achieve state-of-the-art per-
formance on the UCSD dataset [14].
Index TermsSparse representation, sparse codes, ab-
normal event detection
Sparse representation has gained a great deal of attention
since being applied effectively in many image analysis ap-
plications, such as image denoising [1] and action recogni-
tion [2]. Sparse representation finds the most compact repre-
sentation of a signal in terms of linear combination of atoms
in an overcomplete dictionary. As is pointed out in [3], re-
search has focused on three aspects of sparse representation:
pursuit methods for solving the optimization problem, such
as matching pursuit [4], orthogonal matching pursuit [5], and
basis pursuit [6]; dictionary design, such as the K-SVD [7]
method and the BSD algorithm [8]; and the applications of
the sparse representation for different tasks, such as abnor-
mal event detection. Abnormal event detection is the core of
video surveillance applications, which could assist people in
various situations, such as monitoring patients/children, ob-
serving people and vehicles within a busy environment, or
preventing theft and robbery. The aim of the method is to
learn normal patterns or behaviors through training and de-
tect any abnormal or suspicious behaviors in test videos.
Research on sparse representation can be generally
divided into dictionary learning [9] and sparse cod-
ing [4] [5] [6]. Dictionary learning aims to obtain atoms (or
basis vectors) for a dictionary. Such atoms could be either
predefined, e.g., undecimated Wavelets, steerable Wavelets,
Contourlets, Curvelets, and more variants of Wavelets, or
learned from the data itself. Sparse coding, on the other hand,
attempts to find sparse codes (or coefficients) by giving a dic-
tionary, i.e., finding the solution to the underdetermined sys-
tem of equations y=Dx either by greedy algorithms or con-
vex algorithms. Through sparse coding, input features can be
approximately represented as a weighted linear combination
of a small number of (unknown) basis vectors.
When applying sparse representation on abnormal event
detection, much emphasis is put on dictionary learning. A
common procedure is: first, visual features are extracted ei-
ther on a spatial or temporal domain. A dictionary Dis then
learned based on these visual features, which consists of basis
vectors capturing high-level patterns in the input features, as
in [9, 8]. A sparse representation of a feature is a linear com-
bination of a few elements or atoms from a dictionary. Math-
ematically, it can be expressed as y=Dx, where yRpis a
feature of interest, DRp×mis a dictionary, and xRmis
the sparse representation of yin D. Typically mpresults
in an overcomplete or redundant dictionary. During the de-
tection procedure, each testing feature can be determined as
normal or an anomaly based on its reconstruction error.
However, most approaches use only an approximate re-
construction error to save computation; for example, the least
square error. This means the sparse codes are actually not
taken into consideration during the detection. In fact, the
impact of sparse codes generated by different approaches is
still unclear. Therefore, we offer a comprehensive study of
the sparse codes, in terms of their performance on abnormal
event detection. Among the huge research of codes represen-
tations, we put special attention on two major types: greedy
algorithms and L1-norm minimization algorithms.
Greedy algorithms rely on an interactive approximation
of the feature coefficients and supports, either by iteratively
identifying the support of the feature until a convergence cri-
terion is met, or by obtaining an improved estimate of the
sparse signal at each iteration that attempts to account for the
mismatch with the measured data. Compared to L1-norm
minimization methods, greedy algorithms are much faster,
and thus are more applicable to very large problems.
Meanwhile, L1-norm minimization has become a popu-
lar tool to solve sparse coding, which benefits both from effi-
cient algorithms and a well-developed theory for generaliza-
tion properties and variable selection consistency [10]. We
list two common L1-norm minimization formulations in E.q.
1 and E.q. 2. Since the problem is convex, there are efficient
and accurate numerical solvers.
ˆx= argmin
2kDx yk2
ˆx= argmin
xkxk1subject to kDx yk2ǫ(2)
Our main contributions are: 1) we offer a comprehen-
sive study of sparse codes, in terms of their reconstruction
error, sparsity, computation time and detection performance
on anomaly datasets; 2) we propose a framework to detect
abnormality, which combines sparse representation with var-
ious detection methods; and 3) we provide insights into the
impact of sparse representation and their detection methods.
The remainder of this paper is organized as follows. We
give a brief review of greedy algorithms and L1-norm solu-
tions in Sec.2 and propose our framework of abnormal event
detection in Sec.3, which combines sparse codes with various
detection methods. We show our comparative results in Sec.4
and concludes the paper with discussions and future work in
There are various ways of generating sparse codes through
optimization solutions. We introduce two categorized solu-
tions: greedy algorithms and L1-norm approximation solu-
2.1. Greedy Algorithms
We review two broad categories of greedy methods to recon-
struct y, which are called ‘greedy pursuits’ and ‘threshold’
algorithms. Greedy pursuits can be defined as a set of meth-
ods that iteratively build up an estimate x. They contains
three basic steps. First, the xis set to a zero vector. Sec-
ond, these methods estimate a set of non-zero components of
xby iteratively adding new components that are deemed to
be non-zeros. Third, the values for all non-zeros components
are optimized. In contrast, thresholding algorithms alternate
between element selection and element pruning steps.
There is a large and growing family of greedy pursuit
methods. The general framework in greedy pursuit tech-
niques is 1) to select an element and 2) to update the coeffi-
cients. Matching Pursuit (MP) [4] discusses a general method
for approximate decomposition in E.q. 3, which addresses
the sparsity issue directly. The algorithm selects one column
from Dat a time and only the coefficient associated with the
selected column is updated at each iteration. More concretely,
it starts from an initial approximation x(0) = 0 and residual
R(0) = x, then builds up to a sequence of sparse approxima-
tions stepwise. At stage k, it identifies the dictionary atom
that best correlates with the residual and then adds to the cur-
rent approximation a scalar multiple of that atom. After m
steps, one has a sparse code in E.q. 3 with residual R=R(m).
Orthogonal Matching Pursuit (OMP) [5], updates xin
each iteration by projecting yorthogonally onto the columns
of Dassociated with the current support atoms. Different
from MP, OMP never reselects an atom and the residual at any
iteration is always orthogonal to all currently selected atoms
in the dictionary. Another difference is that OMP minimizes
the coefficients for all selected atoms at iteration k, while
MP only updates the coefficient of the most recently selected
atom. In order to speed up pursuit algorithms, it is necessary
to select multiple atoms at a time; therefore, the algorithms
are proposed to keep computational costs low enough for ap-
plying to large-scale problems, such as Stagewise Orthogonal
Matching Pursuit (StOMP) [11]. These algorithms choose the
element that meets some threshold criterion at the atom selec-
tion step and has demonstrated both theoretical and empirical
effectiveness for the large system.
Greedy algorithms are easy to implement and use and can
be extremely fast. However, they do not have recovery guar-
antees, i.e., how well each sample can be reconstructed by
the dictionary and their sparse codes, compared to L1-norm
2.2. L1-norm Approximation
L1-norm approximation replaces the L0 constraint with a re-
laxed L1-norm. For example, in the Basis Pursuit method
(BP) [6], an almost everywhere differentiable and often con-
vex cost function is applied, while in the Focal Underdeter-
mined System Solver (FOCUSS) algorithm [12], a more gen-
eral model is optimized.
Donoho and etc. [1] sugguest that for some measurement
matrices D, the generally NP-Hard problem (L0 norm) should
be equivalent to its convex relaxation: L1 norm, see E.q. 1.
The convex L1 problem can be solved using methods of lin-
ear programming. Representative work includes Basis Pur-
suit (BP). Instead of seeking sparse representations directly,
it seeks representations that minimize the L1 norm of the
coefficients. Furthermore, BP can compute sparse solutions
in situations where greedy algorithms fail. The Lasso algo-
rithm [13] is quite similar to BP and is, in fact, know as Basis
Pursuit De-Noising (BPDN) in some areas. Rather than trying
to minimize the L1-norm like BP, the Lasso places a restric-
tion on its value.
The FOCUSS algorithm has two integral parts: a low-
resolution initial estimate of the real signal and the iteration
process that refines the initial estimate to the final localized
energy solution. The iterations are based on the weighted
norm minimization of the dependent variable with the weights
acting as a function of the preceding iterative solutions. The
algorithm is presented as a general estimation tool usable
across different applications. In general, L1-norm methods
offer better performance in many cases, but they are also more
demanding with respect to computation.
Feature Extraction Dictionary Learning Algorithm
Detection Methods
Training Images
Test Image
Feature Extraction
Sparse Codes Represnetation
Abnormal FeaturesAbnormal Frames
Fig. 1. Our framework of combining sparse codes with dif-
ferent detection methods.
In addressing the detection of abnormal behaviors based on
sparse codes, two issues should be addressed: 1) how to gen-
erate the sparse codes, i.e., the solution of x; and 2) how to
determine whether the testing code is normal or anomalous.
For the first issue, various sparse codes discussed in Sec.2
could be adopted; while for the second issue, we take various
detection methods into consideration; our proposed abnormal
event detection framework is shown in Fig. 1.
After a testing feature is represented by a sparse code, the
detection method determines whether it is normal or abnor-
mal. There are two commonly used detection methods: the
reconstruction error (RE) and the approximated reconstruc-
tion error (ARE). In terms of sparse codes, the high response
of dictionary atoms, or concentrated non-zeros in coefficients,
may indicate a connection to a possible normality. Unfortu-
nately, these codes property and their connection with nor-
mality or abnormality are not explored yet. Therefore, we
also introduce maximum coordinate (MC) and the non-zero
concentration (NC) as two new detection methods.
Reconstruction Error (RE): Most existing approaches
treat dictionary learning and detection as two separate pro-
cesses, i.e., a dictionary is typically learned based on the train-
ing data, and then different measurements are adopted to de-
termine whether the testing sample is an anomaly. More so-
phisticated approaches unify these two processes into a mixed
reconstructive and discriminative formulation. Nevertheless,
a basic measurement that is widely used in both cases is re-
construction error. The reconstruction error of the testing
sample y, according to the dictionary D, is represented as:
2, where αis the sparse code of y.
Approximate Reconstruction Error (ARE): To speed
up detection, reconstruction error is sometimes not calculated
based on sparse codes through an optimization solution; in-
stead it is approximated by the least squares [9]; thus, the re-
construction error is calculated as: kyD(DTD)1DTyk.
Maximum Coordinate (MC): Given a testing sample y,
its sparse code is denoted as α. Ideally, all non-zero entries
in the estimate αwould be associated with the columns of the
dictionary from a normal pattern (note that only normal data is
used during the training). Then we could detect yas a normal
feature if a single largest entry in αwere found; otherwise, it
would be detected as an anomaly.
Non-zero Concentration (NC): Inspired by [8], the dis-
tribution of non-zeros is more important to the detection than
the location of non-zero elements. Thus, we propose a detec-
tion measurement called non-zero concentration. Based on
the dictionary proposed in [8], a normal code should have a
non-zero concentration property, i.e., non-zeros concentrated
in the dictionary that has the smallest reconstruction error.
Anomalies can be detected if no concentration is found on
any of the existing dictionaries.
We provide a comprehensive study the abnormality detection
performance of sparse codes. Our experiments are carried out
on the UCSD [14] Ped1 dataset, due to that it is a popular
abnormal event detection dataset, and many detection results
are reported. We start by evaluating the performance of var-
ious sparse codes, especially comparing sparse codes gener-
ated by two types of algorithms: greedy algorithms and L1-
norm approximation algorithms. The following aspects are
highlighted: computation time, reconstruction error, the ra-
tio of sparsity in codes, and their performance on abnormal
event detection. Next, we use the OMP algorithm to generate
sparse codes, and then we combine the codes with different
detection methods concluding by evaluating their detection
performance with state-of-the-art algorithms.
4.1. Dataset and Settings
UCSD Ped1 dataset [14] is a frequently used public dataset
for detecting abnormal behaviors. It includes clips of groups
of people walking towards and away from the camera with
some perspective distortion. There are 34 training videos and
36 testing videos with a resolution of 238 ×158. Training
videos contain only normal behaviors. Testing videos are ab-
normal behaviors where there are either non-pedestrian enti-
ties in the walkways or anomalous pedestrian motion patterns.
We use the spatial-temporal cubes, in which 3D gradient
features are computed, which mimics the setting in [15]. Each
frame is divided into patches with a size of 23 ×15. Consec-
utive 5 frames are used to form 3D patches, and gradients fea-
tures are extracted in each patch. See details in [15]. Through
this, we obtain 500-dimensional visual features and reduce
them to 100 dimension by using the PCA algorithm.
4.2. Comparison of Sparse Codes
We evaluate sparse codes from four perspectives: computa-
tion time, reconstruction error, the ratio of sparsity in codes,
and the codes’ performance on abnormal event detection
based on their reconstruction error.
We randomly select 1% of the training features (238,000
features in total), use the K-SVD algorithm [7] to construct
a dictionary consisting of 1000 atoms, and generate sparse
codes by applying various algorithms. There are many al-
gorithms available; we select only representative greedy al-
gorithms (OMP, MP, StOMP) and compare them with repre-
sentative L1-norm solutions (BP and Lasso algorithm). The
reconstruction error is calculated by Re =kyDxk2
2. We
also calculate the mean ratio of sparsity in the codes, i.e., the
average percentage of non zeros in the dimension of the codes
(1000). We report these results as well as computation time
in Tab. 1. Greedy algorithms need far less time to compute,
and the OMP achieves the fastest computation, followed by
the StOMP algorithm. OMP is approx. 180 times faster than
the Lasso algorithm. Both OMP and StOMP could achieve
sparser solutions, while BP could obtain an extremely dense
solution with an exact recovery.
To measure the accuracy of abnormality detection, we
calculate the reconstruction error of each feature and regis-
ter features with large reconstruction errors as anomalies. A
frame with an abnormal feature is considered a positive frame.
To compare performance, we adopt two popular evaluation
criteria in abnormality detection: frame-level evaluation and
pixel-level evaluation, which are defined in [14]. We follow
precisely their setting in our evaluation, which is to say that
in the frame-level evaluation, a frame is considered abnormal
if it contains at least one anomaly feature. In contrast, for
the pixel-level evaluation, a frame is marked as a correctly
detected abnormality if at least 40% of the truly abnormal
pixels are detected. Ground truth on frame-level and pixel-
level annotation is available, and we calculate the true posi-
tive and false positive rates to draw ROC curves, and report
the Area Under the Curve (AUC). Following [14], we obtain
the value when the false positive number equals the missing
value. These are called the equal error rate (EER) and equal
detected rate (EDR) in the frame and pixel-level evaluations,
respectively. See Tab. 2 for details. In the frame-level evalu-
ation, the MP algorithm achieves the best results with a mod-
erate computation time. The StOMP algorithm is relatively
fast, and the AUC is satisfactory.
It is worth noting that the pixel-level AUC is lower than
the frame-level AUC in general because the pixel-level eval-
uation is stricter and takes location into consideration. In the
frame-level evaluation, there could be a coincidental detec-
tion - a normal feature could be erroneously detected as an
anomaly in an abnormal frame, and this erroneous detection
could end up with a correct detection of that frame. In pixel-
level evaluation in contrast, a frame is marked as a correctly
detected abnormality only if a sufficient number of anomaly
features has been found. Compared to the MP algorithm,
the StOMP algorithm can achieve a competitive detection re-
sult in the pixel-level evaluation, but it is three times faster
than the MP algorithm. The BP algorithm also performs well
on pixel-level detection; however, its high computation cost
hampers its application in real detection problems.
In summary, greedy algorithms compute quickly, but their
reconstruction errors are larger than L1-norm solutions. Con-
vex relaxations, such as the BP and the Lasso algorithm, have
better theoretical guarantees and recovery ability, but they are
more time consuming. Surprisingly, greedy algorithms, espe-
cially the StOMP algorithm, seem to perform better on pixel-
level detection, which means that they could more accurately
localize anomaly features.
4.3. Comparison of Combining Sparse Codes with Detec-
tion Methods
We choose the OMP algorithm to generate sparse codes due to
computation considerations, combine them with four types of
detection methods (RE, ARE, MC, NC), and compare their
detection performance. We draw comparative frame-level
AUC curves that corresponds to the detection methods. Fur-
thermore, we compare these combinations with state-of-the-
art methods on abnormality detection.
As displayed in Fig. 2, abnormality detection by com-
puting the real reconstruction error outperforms the estimated
reconstruction error on frame-level evaluation, which fur-
ther validates the idea that the decomposition of real co-
efficients is necessary. Among all of these approaches,
OMP+RE achieves the best AUC score on frame-level evalu-
ation (0.6603), followed by MC (0.6340), NC (0.5697) and
ARE (0.5013). We give further insight into how accurate
the detection is in an even stricter pixel-level evaluation. We
find that OMP+MC achieves the best result, with a AUC of
0.5433. This is because that the high response in the code
Table 1. Comparison of greedy algorithms and L1-norm solutions on sparse code generation.
MP 166.00 0 31.8%
OMP 1.83 0.4236 1.9%
STOMP 15.79 0 10%
BP 114.20 0 100%
LASSO 333.49 0.0005 9.9%
Table 2. Comparative results on UCSD Ped1: frame-level evaluation results (AUC and EER) and pixel-level evaluation results
(AUC and EDR) are reported.
MP 0.6956 0.3547 0.3898 0.5716 13342
OMP 0.5003 0.5052 0.2849 0.6637 527
STOMP 0.5415 0.465 0.3494 0.6190 4668
BP 0.5454 0.4764 0.3057 0.6479 38949
LASSO 0.5305 0.5173 0.3132 0.6383 56400
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False positive rate
True positive rate
Fig. 2. Combining sparse codes generated by the OMP al-
gorithm with various detection methods: non-zero concen-
tration (NC), approximate reconstruction error (ARE), recon-
struction error (RE) and maximum coordinate (MC).
means there is a strong connection between the testing fea-
ture with some atoms in the dictionary. This happens when
the features have the similar pattern as those atoms convey.
Therefore, the high response also implies that the testing
feature is normal. However, we also notice that NC detec-
tion, which also considers the non-zeros distribution in sparse
codes, performs relative poorly. This may be due to the type
of dictionary being adopted, or due to the principle of how the
OMP code is generated, which are based on the reconstruction
error of the chosen atoms, rather than the concentrated atoms.
Finally we compare combining OMP codes and various
detection methods with state-of-the-art abnormality detection
algorithms. Comparison of AUC in the frame-level evalua-
tion of UCSD Ped1 is shown in Fig. 3, and quantized eval-
uations are shown in Tab. 3 Compared with state-of-the-art
algorithms, combining OMP codes with detection methods
outperforms other methods on two criteria evaluation, which
verifies the effectiveness of sparse codes generated by greedy
algorithms; furthermore, maximum coordinate detection out-
performs other methods, which implies that a high response
(large code value) could contribute to the detection.
0 0.2 0.4 0.6 0.8 1
False positive rate
True positive rate
UCSD Ped1 Frame-level ROC
BSD (Ours)
Fig. 3. Comparison with state-of-the-art abnormality detec-
tion approaches.
Table 3. Comparative values of AUC, EER and EDR on UCSD Ped1 dataset.
Method AUC (frame-level) EER AUC (pixel-level) EDR
SF-MPPCA [14] 0.5900 0.3200 0.2130 0.3200
MDT [14] 0.8180 0.2500 0.4610 0.2500
Lu13[9] 0.5842 0.4413 0.3622 0.5826
OMP+RE 0.6603 0.3823 0.5386 0.5113
OMP+ARE 0.5013 0.5081 0.5317 0.5113
OMP+NC 0.5697 0.5055 0.5397 0.5113
OMP+MC 0.6339 0.4016 0.5433 0.5113
In this paper, we givea comprehensive study of sparse codes,
in respect to their performance in abnormal event detection.
We compare two category sparse codes: codes generated by
greedy algorithms and those generated by L1-norm solutions.
Various aspects are covered: computational cost, recovery
ability, sparsity, and their detection performance. Further-
more, we explore into the sparse codes, and compare different
methods to determine whether a testing code is an anomaly or
not.Experimental results show that greedy algorithms can ob-
tain good detection results with fewer computations. Among
the top three best detection results, two are greedy algo-
rithms. Considering the computation requirement, which
limits some L1-norm algorithms from being applied in real
surveillance applications, greedy algorithms are promising.
When combining OMP codes with various detection measure-
ments, maximum coordinate measurement outperforms other
methods, which implies that the high response in the code
could help the detection result.
We have also found that due to the large amount of video
data only the OMP code is acceptable, which is mainly for
computational reasons. Despite the great progress being made
in the optimization field, the applicability of various optimiza-
tion solutions is still unknown. Therefore, one line of future
work could focus on more practical sparse code algorithms.
Another line of work may fall into discriminative feature se-
lection to reduce the computation of sparse codes generation.
[1] D. L. Donoho and M. Elad, Optimally Sparse Represen-
tation in General (non-orthogonal) Dictionaries Via L1
Minimization, Department of Statistics, Stanford Uni-
versity, 2002.
[2] Z. Jiang, Z. Lin, and L. S. Davis, “Label consistent k-
svd: Learning a discriminative dictionary for recogni-
tion,” PAMI, vol. 35, no. 11, pp. 2651–2664, 2013.
[3] K. Huang and Selin Aviyente, Sparse representation
for signal classification,” in In Adv. NIPS, 2006.
[4] S. G. Mallat and Z. Zhang, “Matching pursuits with
time-frequency dictionaries,” Signal Processing, IEEE
Transactions on, vol. 41, no. 12, pp. 3397–3415, Dec
[5] Y. C. Pati, R. Rezaiifar, and P.S. Krishnaprasad, “Or-
thogonal matching pursuit: recursive function approxi-
mation with applications to wavelet decomposition,” in
Signals, Systems and Computers, Nov 1993, pp. 40–44
[6] S. Chen, D. L. Donoho, and M. A. Saunders, Atomic
decomposition by basis pursuit,” SIAM Journal on Sci-
entific Computing, vol. 20, pp. 33–61, 1998.
[7] A. Michal, E. Michael, and B. Alfred, “K-svd: Design
of dictionaries for sparse representation,” in SPARS’05,
2005, pp. 9–12.
[8] H. Ren, W. Liu, S. Escalera S. Olsen, and T. B. Moes-
lund, “Unsupervised behavior-specific dictionary learn-
ing for abnormal event detection,” in BMVC, 2015.
[9] C. Lu, J. Shi, and J. Jia, “Abnormal event detection at
150 fps in matlab,” in ICCV, 2013, pp. 2720–2727.
[10] T. Zhang, “Some sharp performance bounds for least
squares regression with l1 regularization,” Ann. Statist.,
vol. 37, pp. 2109–2144, 2009.
[11] D. L. Donoho, Y. Tsaig, I. Drori, and J. L Starck,
“Sparse solution of underdetermined systems of linear
equations by stagewise orthogonal matching pursuit,”
Information Theory, IEEE Transactions on, vol. 58, no.
2, pp. 1094–1121, 2012.
[12] J.F. Murray and K. Kreutz-Delgado, An improved
focuss-based learning algorithm for solving sparse lin-
ear inverse problems,” in Signals, Systems and Comput-
ers, Nov 2001, vol. 1, pp. 347–351 vol.1.
[13] R. Tibshirani, “Regression shrinkage and selection via
the lasso,” Journal of the Royal Statistical Society (Se-
ries B), vol. 58, pp. 267–288, 1996.
[14] V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconce-
los, “Anomaly detection in crowded scenes,” in CVPR,
2010, pp. 1975–1981.
[15] L. Kratz and K. Nishino, Anomaly detection in ex-
tremely crowded scenes using spatio-temporal motion
pattern models,” in CVPR, 2009, pp. 1446–1453.
In this paper, we present novel objectives for one-class learning, which we collectively refer to as Generalized One-class Discriminative Subspaces (GODS). Our key idea is to learn a pair of complementary classifiers to flexibly bound the one-class data distribution, where the data belongs to the positive half-space of one of the classifiers in the complementary pair and to the negative half-space of the other. To avoid redundancy while allowing non-linearity in the classifier decision surfaces, we design each classifier as an orthonormal frame and learn these frames via jointly optimizing for two objectives, namely: i) to minimize the distance between the two frames, and ii) to maximize the margin between the frames and the data. The learned frames will thus characterize a piecewise linear decision surface allowing for efficient inference, while our objectives seek to bound the data within a minimal volume that maximizes the decision margin, thereby robustly capturing the data distribution. We explore several variants of our formulation under different constraints on the constituent classifiers, including kernelized feature maps. We provide experiments on several applications in computer vision, including anomaly detection in video sequences, human poses, and activities, as well as on five UCI datasets, demonstrating state-of-the-art results.
Recent video anomaly detection methods focus on reconstructing or predicting frames. Under this umbrella, the long-standing inter-class data-imbalance problem resorts to the imbalance between foreground and stationary background objects in video anomaly detection and this has been less investigated by existing solutions. Naively optimizing the reconstructing loss yields a biased optimization towards background reconstruction rather than the objects of interest in the foreground. To solve this, we proposed a simple yet effective solution, termed attention-driven loss to alleviate the foreground-background imbalance problem in anomaly detection. Specifically, we compute a single mask map that summarizes the frame evolution of moving foreground regions and suppresses the background in the training video clips. After that, we construct an attention map through the combination of the mask map and background to give different weights to the foreground and background region respectively. The proposed attention-driven loss is independent of backbone networks and can be easily augmented in most existing anomaly detection models. Augmented with attention-driven loss, the model is able to achieve AUC 86.0% on Avenue, 83.9% on Ped1, 96% on Ped2 datasets. Extensive experimental results and ablation studies further validate the effectiveness of our model.
This paper presents an anomaly detection method that is based on a sparse coding inspired Deep Neural Networks (DNN). Specifically, we propose a Temporally-coherent Sparse Coding (TSC), where a temporally-coherent term is used to preserve the similarity between two neighboring frames. The optimization of sparse coefficients in TSC is equivalent to a special stacked Recurrent Neural Networks (sRNN) architecture. Further, to reduce the computational cost in alternatively updating the dictionary and sparse coefficients in TSC optimization and to alleviate hyperparameters selection in TSC, we stack one more layer on top of the TSC-inspired sRNN to reconstruct the inputs, and arrive at an sRNN-AE. We further improve sRNN-AE in the following aspects: i) we propose to learn a data-dependent similarity measurement between neighboring frames in sRNN-AE; ii) we reduce the depth of the sRNN in sRNN-AE; iii) we conduct temporal pooling over the appearance features of several consecutive frames for motion characterization. We also build a large-scale anomaly detection dataset for performance evaluation. Extensive experiments on both a toy dataset under controlled settings and real datasets demonstrate the effectiveness of our sRNN-AE method for anomaly detection.
Sparse coding based anomaly detection has shown promising performance, of which the keys are feature learning, sparse representation, and dictionary learning. In this work, we propose a new neural network for anomaly detection (termed AnomalyNet) by deeply achieving feature learning, sparse representation and dictionary learning in three joint neural processing blocks. Specifically, to learn better features, we design a motion fusion block accompanied by a feature transfer block to enjoy the advantages of eliminating noisy background, capturing motion and alleviating data deficiency. Furthermore, to address some disadvantages (e.g., nonadaptive updating) of existing sparse coding optimizers and embrace the merits of neural network (e.g., parallel computing), we design a novel recurrent neural network to learn sparse representation and dictionary by proposing an adaptive iterative hard-thresholding algorithm (adaptive ISTA) and reformulating the adaptive ISTA as a new long short term memory (LSTM). To the best of our knowledge, this could be one of first works to bridge the l1-solver and LSTM and may provide novel insight in understanding LSTM and model-based optimization (or named differentiable programming), as well as sparse coding based anomaly detection. Extensive experiments show the state-of-the-art performance of our method in the abnormal events detection task.
Conference Paper
Full-text available
Abnormal event detection has been an important issue in video surveillance applications. Due to the huge amount of surveillance data, only a small proportion could be loaded during the training. As a result, there is a high chance of incomplete normal patterns in the training data, which makes the task very challenging. Sparse representation, as one of solutions, has shown its effectiveness. The basic principle is to find a collection (a dictionary) of atoms so that each training sample can only be represented by a few atoms. However, the relationship of atoms within the dictionary is commonly neglected, which brings a high risk of false alarm rate: atoms from infrequent normal patterns are difficult to be distinguished from real anomalies. In this paper, we propose behavior-specific dictionaries (BSD) through unsupervised learning, in which atoms from the same dictionary representing one type of normal behavior in the training video. Moreover, ‘missed atoms’ that are potentially from infrequent normal features are used to refine these behavior dictionaries. To further reduce false alarms, the detection of abnormal features is not only dependent on reconstruction error from the learned dictionaries, but also on non zero distribution in coefficients. Experimental results on Anomaly Stairs dataset and UCSD Anomaly dataset show the effectiveness of our algorithm. Remarkably, our BSD algorithm can improve AUC significantly by 10% on the stricter pixel-level evaluation, compared to the best result that has been reported so far.
Full-text available
In recent years there is a growing interest in the study of sparse representation for signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse linear combinations of these atoms. Re- cent activity in this eld concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. In this paper we propose a novel algorithm ñ the K-SVD algorithm ñ generalizing the K-Means clustering process, for adapting dictionaries in order to achieve sparse signal representations. We analyze this algorithm and demonstrate its results on both synthetic tests and in applications on real data.
Conference Paper
Full-text available
In this paper, application of sparse representation (factorization) of signals over an overcomplete basis (dictionary) for signal classification is discussed. Search- ing for the sparse representation of a signal over an overcomplete dictionary is achieved by optimizing an objective function that includes two terms: one that measures the signal reconstruction error and another that measures the sparsity. This objective function works well in applications where signals need to be recon- structed, like coding and denoising. On the other hand, discriminative methods, such as linear discriminative analysis (LDA), are better suited for classification tasks. However, discriminative methods are usually sensitive to corruption in sig- nals due to lacking crucial properties for signal reconstruction. In this paper, we present a theoretical framework for signal classification with sparse representa- tion. The approach combines the discrimination power of the discriminative meth- ods with the reconstruction property and the sparsity of the sparse representation that enables one to deal with signal corruptions: noise, missing data and outliers. The proposed approach is therefore capable of robust classification with a sparse representation of signals. The theoretical results are demonstrated with signal classification tasks, showing that the proposed approach outperforms the standard discriminative methods and the standard sparse representation in the case of cor- rupted signals.
The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries-stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), Matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l(1) norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.
Conference Paper
Speedy abnormal event detection meets the growing demand to process an enormous number of surveillance videos. Based on inherent redundancy of video structures, we propose an efficient sparse combination learning framework. It achieves decent performance in the detection phase without compromising result quality. The short running time is guaranteed because the new method effectively turns the original complicated problem to one in which only a few costless small-scale least square optimization steps are involved. Our method reaches high detection rates on benchmark datasets at a speed of 140-150 frames per second on average when computing on an ordinary desktop PC using MATLAB.
A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse codes during the dictionary learning process. More specifically, we introduce a new label consistency constraint called "discriminative sparse-code error" and combine it with the reconstruction error and the classification error to form a unified objective function. The optimal solution is efficiently obtained using the K-SVD algorithm. Our algorithm learns a single overcomplete dictionary and an optimal linear classifier jointly. The incremental dictionary learning algorithm is presented for the situation of limited memory resources. It yields dictionaries so that feature points with the same class labels have similar sparse codes. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse-coding techniques for face, action, scene, and object category recognition under the same learning conditions.
In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.
Conference Paper
We describe a recursive algorithm to compute representations of functions with respect to nonorthogonal and possibly overcomplete dictionaries of elementary building blocks e.g. affine (wavelet) frames. We propose a modification to the matching pursuit algorithm of Mallat and Zhang (1992) that maintains full backward orthogonality of the residual (error) at every step and thereby leads to improved convergence. We refer to this modified algorithm as orthogonal matching pursuit (OMP). It is shown that all additional computation required for the OMP algorithm may be performed recursively