ChapterPDF Available

GANomaly: Semi-supervised Anomaly Detection via Adversarial Training

Authors:

Abstract and Figures

Anomaly detection is a classical problem in computer vision, namely the determination of the normal from the abnormal when datasets are highly biased towards one class (normal) due to the insufficient sample size of the other class (abnormal). While this can be addressed as a supervised learning problem, a significantly more challenging problem is that of detecting the unknown/unseen anomaly case that takes us instead into the space of a one-class, semi-supervised learning paradigm. We introduce such a novel anomaly detection model, by using a conditional generative adversarial network that jointly learns the generation of high-dimensional image space and the inference of latent space. Employing encoder-decoder-encoder sub-networks in the generator network enables the model to map the input image to a lower dimension vector, which is then used to reconstruct the generated output image. The use of the additional encoder network maps this generated image to its latent representation. Minimizing the distance between these images and the latent vectors during training aids in learning the data distribution for the normal samples. As a result, a larger distance metric from this learned data distribution at inference time is indicative of an outlier from that distribution—an anomaly. Experimentation over several benchmark datasets, from varying domains, shows the model efficacy and superiority over previous state-of-the-art approaches.
Content may be subject to copyright.
GANomaly: Semi-Supervised Anomaly
Detection via Adversarial Training
Samet Akcay1, Amir Atapour-Abarghouei1, and Toby P. Breckon1,2
Department of {Computer Science1, Engineering2}, Durham University, UK
{samet.akcay, amir.atapour-abarghouei, toby.breckon }@durham.ac.uk
Abstract. Anomaly detection is a classical problem in computer vi-
sion, namely the determination of the normal from the abnormal when
datasets are highly biased towards one class (normal) due to the in-
sufficient sample size of the other class (abnormal). While this can be
addressed as a supervised learning problem, a significantly more challeng-
ing problem is that of detecting the unknown/unseen anomaly case that
takes us instead into the space of a one-class, semi-supervised learning
paradigm. We introduce such a novel anomaly detection model, by using
a conditional generative adversarial network that jointly learns the gen-
eration of high-dimensional image space and the inference of latent space.
Employing encoder-decoder-encoder sub-networks in the generator net-
work enables the model to map the input image to a lower dimension
vector, which is then used to reconstruct the generated output image.
The use of the additional encoder network maps this generated image to
its latent representation. Minimizing the distance between these images
and the latent vectors during training aids in learning the data distribu-
tion for the normal samples. As a result, a larger distance metric from
this learned data distribution at inference time is indicative of an out-
lier from that distribution an anomaly. Experimentation over several
benchmark datasets, from varying domains, shows the model efficacy and
superiority over previous state-of-the-art approaches.
Keywords: Anomaly Detection ·Semi-Supervised Learning ·Genera-
tive Adversarial Networks ·X-ray Security Imagery.
1 Introduction
Despite yielding encouraging performance over various computer vision tasks,
supervised approaches heavily depend on large, labeled datasets. In many of the
real world problems, however, samples from the more unusual classes of interest
are of insufficient sizes to be effectively modeled. Instead, the task of anomaly de-
tection is to be able to identify such cases, by training only on samples considered
to be normal and then identifying these unusual, insufficiently available samples
(abnormal) that differ from the learned sample distribution of normality. For
example a tangible application, that is considered here within our evaluation, is
that of X-ray screening for aviation or border security where anomalous items
2 S. Akcay et al.
(a) Normal Data (X-ray Scans) (b) Normal + Abnormal Data (X-ray Scans)
Fig. 1. Overview of our anomaly detection approach within the context of an X-ray
security screening problem. Our model is trained on normal samples (a), and tested
on normal and abnormal samples (b). Anomalies are detected when the output of the
model is greater than a certain threshold A(x)> φ.
posing a security threat are not commonly encountered, exemplary data of such
can be difficult to obtain in any quantity, and the nature of any anomaly pos-
ing a potential threat may evolve due to a range of external factors. However,
within this challenging context, human security operators are still competent
and adaptable anomaly detectors against new and emerging anomalous threat
signatures.
As illustrated in Figure 1, a formal problem definition of the anomaly detec-
tion task is as follows: given a dataset Dcontaining a large number of normal
samples Xfor training, and relatively few abnormal examples ˆ
Xfor the test, a
model fis optimized over its parameters θ.flearns the data distribution pXof
the normal samples during training while identifying abnormal samples as out-
liers during testing by outputting an anomaly score A(x), where x is a given test
example. A Larger A(x) indicates possible abnormalities within the test image
since flearns to minimize the output score during training. A(x) is general in
that it can detect unseen anomalies as being non-conforming to pX.
There is a large volume of studies proposing anomaly detection models within
various application domains [2–4,23,39]. Besides, a considerable amount of work
taxonomized the approaches within the literature [9,19,28,29,33]. In parallel to
the recent advances in this field, Generative Adversarial Networks (GAN) have
emerged as a leading methodology across both unsupervised and semi-supervised
problems. Goodfellow et al. [16] first proposed this approach by co-training a
pair networks (generator and discriminator). The former network models high
dimensional data from a latent vector to resemble the source data, while the
latter distinguishes the modeled (i.e., approximated) and original data samples.
Several approaches followed this work to improve the training and inference
stages [8,17]. As reviewed in [23], adversarial training has also been adopted by
recent work within anomaly detection.
GANomaly 3
Schlegl et al. [39] hypothesize that the latent vector of a GAN represents
the true distribution of the data and remap to the latent vector by optimizing
a pre-trained GAN based on the latent vector. The limitation is the enormous
computational complexity of remapping to this latent vector space. In a follow-up
study, Zenati et al. [40] train a BiGAN model [14], which maps from image space
to latent space jointly, and report statistically and computationally superior
results albeit on the simplistic MNIST benchmark dataset [25].
Motivated by [6, 39, 40], here we propose a generic anomaly detection archi-
tecture comprising an adversarial training framework. In a similar vein to [39],
we use single color images as the input to our approach drawn only from an
example set of normal (non-anomalous) training examples. However, in con-
trast, our approach does not require two-stage training and is both efficient
for model training and later inference (run-time testing). As with [40], we also
learn image and latent vector spaces jointly. Our key novelty comes from the
fact that we employ adversarial autoencoder within an encoder-decoder-encoder
pipeline, capturing the training data distribution within both image and latent
vector space. An adversarial training architecture such as this, practically based
on only normal training data examples, produces superior performance over
challenging benchmark problems. The main contributions of this paper are as
follows:
semi-supervised anomaly detection a novel adversarial autoencoder within
an encoder-decoder-encoder pipeline, capturing the training data distribu-
tion within both image and latent vector space, yielding superior results to
contemporary GAN-based and traditional autoencoder-based approaches.
efficacy an efficient and novel approach to anomaly detection that yields
both statistically and computationally better performance.
reproducibility simple and effective algorithm such that the results could
be reproduced via the code1made publicly available.
2 Related Work
Anomaly detection has long been a question of great interest in a wide range
of domains including but not limited to biomedical [39], financial [3] and secu-
rity such as video surveillance [23], network systems [4] and fraud detection [2].
Besides, a considerable amount of work has been published to taxonomize the
approaches in the literature [9,19, 28, 29, 33]. The narrower scope of the review
is primarily focused on reconstruction-based anomaly techniques.
The vast majority of the reconstruction-based approaches have been em-
ployed to investigate anomalies in video sequences. Sabokrou et al. [37] investi-
gate the use of Gaussian classifiers on top of autoencoders (global) and nearest
neighbor similarity (local) feature descriptors to model non-overlapping video
patches. A study by Medel and Savakis [30] employs convolutional long short-
term memory networks for anomaly detection. Trained on normal samples only,
1The code is available on https://github.com/samet-akcay/ganomaly
4 S. Akcay et al.
the model predicts the future frame of possible standard example, which dis-
tinguishes the abnormality during the inference. In another study on the same
task, Hasan et al. [18] considers a two-stage approach, using local features and
fully connected autoencoder first, followed by fully convolutional autoencoder for
end-to-end feature extraction and classification. Experiments yield competitive
results on anomaly detection benchmarks. To determine the effects of adversar-
ial training in anomaly detection in videos, Dimokranitou [13] uses adversarial
autoencoders, producing a comparable performance on benchmarks.
More recent attention in the literature has been focused on the provision
of adversarial training. The seminal work of Ravanbakhsh et al. [35] utilizes
image to image translation [21] to examine the abnormality detection problem
in crowded scenes and achieves state-of-the-art on the benchmarks. The approach
is to train two conditional GANs. The first generator produces optical flow from
frames, while the second generates frames from optical-flow.
The generalisability of the approach mentioned above is problematic since in
many cases datasets do not have temporal features. One of the most influential
accounts of anomaly detection using adversarial training comes from Schlegl et
al. [39]. The authors hypothesize that the latent vector of the GAN represents
the distribution of the data. However, mapping to the vector space of the GAN
is not straightforward. To achieve this, the authors first train a generator and
discriminator using only normal images. In the next stage, they utilize the pre-
trained generator and discriminator by freezing the weights and remap to the
latent vector by optimizing the GAN based on the zvector. During inference,
the model pinpoints an anomaly by outputting a high anomaly score, reporting
significant improvement over the previous work. The main limitation of this work
is its computational complexity since the model employs a two-stage approach,
and remapping the latent vector is extremely expensive. In a follow-up study,
Zenati et al. [40] investigate the use of BiGAN [14] in an anomaly detection task,
examining joint training to map from image space to latent space simultaneously,
and vice-versa. Training the model via [39] yields superior results on the MNIST
[25] dataset.
Overall prior work strongly supports the hypothesis that the use of autoen-
coders and GAN demonstrate promise in anomaly detection problems [23,39,40].
Motivated by the idea of GAN with inference studied in [39] and [40], we intro-
duce a conditional adversarial network such that generator comprises encoder-
decoder-encoder sub-networks, learning representations in both image and latent
vector space jointly, and achieving state-of-the-art performance both statistically
and computationally.
3 Our Approach: GANomaly
To explain our approach in detail, it is essential to briefly introduce the back-
ground of GAN.
Generative Adversarial Networks (GAN) are an unsupervised machine
learning algorithm that was initially introduced by Goodfellow et al. [16]. The
GANomaly 5
Real / Fake
Input/Output Conv LeakyReLU BatchNorm ConvTransp ose ReLU Tanh Softmax
Fig. 2. Pipeline of the proposed approach for anomaly detection.
original primary goal of the work is to generate realistic images. The idea being
that two networks (generator and discriminator) compete with each other during
training such that the former tries to generate an image, while the latter decides
whether the generated image is a real or a fake. The generator is a decoder-
alike network that learns the distribution of input data from a latent space.
The primary objective here is to model high dimensional data that captures the
original real data distribution. The discriminator network usually has a classical
classification architecture, reading an input image, and determining its validity
(i.e., real vs. fake).
GAN have been intensively investigated recently due to their future potential
[12]. To address training instability issues, several empirical methodologies have
been proposed [7, 38]. One well-known study that receives attention in the liter-
ature is Deep Convolutional GAN (DCGAN) by Radford and Chintala [34], who
introduce a fully convolutional generative network by removing fully connected
layers and using convolutional layers and batch-normalization [20] throughout
the network. The training performance of GAN is improved further via the use
of Wasserstein loss [8, 17].
Adversarial Auto-Encoders (AAE) consist of two sub-networks, namely an
encoder and a decoder. This structure maps the input to latent space and remaps
back to input data space, known as reconstruction. Training autoencoders with
adversarial setting enable not only better reconstruction but also control over
latent space. [12, 27, 31].
GAN with Inference are also used within discrimination tasks by exploit-
ing latent space variables [10]. For instance, the research by [11] suggests that
networks are capable of generating a similar latent representation for related
6 S. Akcay et al.
high-dimensional image data. Lipton and Tripathi [26] also investigate the idea
of inverse mapping by introducing a gradient-based approach, mapping images
back to the latent space. This has also been explored in [15] with a specific focus
on joint training of generator and inference networks. The former network maps
from latent space to high-dimensional image space, while the latter maps from
image to latent space. Another study by Donahue et al. [14] suggests that with
the additional use of an encoder network mapping from image space to latent
space, a vanilla GAN network is capable of learning inverse mapping.
3.1 Proposed Approach
Problem Definition. Our objective is to train an unsupervised network that
detects anomalies using a dataset that is highly biased towards a particular
class - i.e., comprising normal non-anomalous occurrences only for training. The
formal definition of this problem is as follows:
We are given a large tranining dataset Dcomprising only Mnormal images,
D={X1, . . . , XM}, and a smaller testing dataset ˆ
Dof N normal and abnormal
images, ˆ
D={(ˆ
X1, y1),...,(ˆ
XN, yN)}, where yi[0,1] denotes the image label.
In the practical setting, the training set is significantly larger than the test set
such that MN.
Given the dataset, our goal is first to model Dto learn its manifold, then
detect the abnormal samples in ˆ
Das outliers during the inference stage. The
model flearns both the normal data distribution and minimizes the output
anomaly score A(x). For a given test image ˆx, a high anomaly score of Ax))
indicates possible anomalies within the image. The evaluation criteria for this is
to threshold (φ) the score, where Ax)> φ indicates anomaly.
Ganomaly Pipeline. Figure 2 illustrates the overview of our approach, which
contains two encoders, a decoder, and discriminator networks, employed within
three sub-networks.
First sub-network is a bow tie autoencoder network behaving as the gener-
ator part of the model. The generator learns the input data representation and
reconstructs the input image via the use of an encoder and a decoder network,
respectively. The formal principle of the sub-network is the following: The gen-
erator Gfirst reads an input image x, where xRw×h×c, and forward-passes
it to its encoder network GE. With the use of convolutional layers followed by
batch-norm and leaky ReLU() activation, respectively, GEdownscales xby com-
pressing it to a vector z, where zRd.zis also known as the bottleneck features
of Gand hypothesized to have the smallest dimension containing the best rep-
resentation of x. The decoder part GDof the generator network Gadopts the
architecture of a DCGAN generator [34], using convolutional transpose layers,
ReLU () activation and batch-norm together with a tanh layer at the end. This
approach upscales the vector zto reconstruct the image xas ˆx. Based on these,
the generator network Ggenerates image ˆxvia ˆx=GD(z), where z=GE(x).
The second sub-network is the encoder network Ethat compresses the im-
age ˆxthat is reconstructed by the network G. With different parametrization,
GANomaly 7
it has the same architectural details as GE.Edownscales ˆxto find its feature
representation ˆz=E(ˆx). The dimension of the vector ˆzis the same as that of
zfor consistent comparison. This sub-network is one of the unique parts of the
proposed approach. Unlike the prior autoencoder-based approaches, in which the
minimization of the latent vectors is achieved via the bottleneck features, this
sub-network Eexplicitly learns to minimize the distance with its parametriza-
tion. During the test time, moreover, the anomaly detection is performed with
this minimization.
The third sub-network is the discriminator network Dwhose objective is to
classify the input xand the output ˆxas real or fake, respectively. This sub-
network is the standard discriminator network introduced in DCGAN [34].
Having defined our overall multi-network architecture, as depicted in Figure
2, we now move on to discuss how we formulate our objective for learning.
3.2 Model Training
We hypothesize that when an abnormal image is forward-passed into the network
G,GDis not able to reconstruct the abnormalities even though GEmanages to
map the input Xto the latent vector z. This is because the network is modeled
only on normal samples during training and its parametrization is not suitable
for generating abnormal samples. An output ˆ
Xthat has missed abnormalities
can lead to the encoder network Emapping ˆ
Xto a vector ˆzthat has also
missed abnormal feature representation, causing dissimilarity between zand ˆz.
When there is such dissimilarity within latent vector space for an input image
X, the model classifies Xas an anomalous image. To validate this hypothesis,
we formulate our objective function by combining three loss functions, each of
which optimizes individual sub-networks.
Adversarial Loss. Following the current trend within the new anomaly de-
tection approaches [39, 40], we also use feature matching loss for adversarial
learning. Proposed by Salimans et al. [38], feature matching is shown to reduce
the instability of GAN training. Unlike the vanilla GAN where Gis updated
based on the output of D(real/fake), here we update Gbased on the internal
representation of D. Formally, let fbe a function that outputs an intermediate
layer of the discriminator Dfor a given input xdrawn from the input data dis-
tribution pX, feature matching computes the L2distance between the feature
representation of the original and the generated images, respectively. Hence, our
adversarial loss Ladv is defined as:
Ladv =ExpXkf(x)ExpXf(G(x)k2.(1)
Contextual Loss. The adversarial loss Ladv is adequate to fool the discrim-
inator Dwith generated samples. However, with only an adversarial loss, the
generator is not optimized towards learning contextual information about the
input data. It has been shown that penalizing the generator by measuring the
8 S. Akcay et al.
distance between the input and the generated images remedies this issue [21].
Isola et al. [21] show that the use of L1yields less blurry results than L2. Hence,
we also penalize Gby measuring the L1distance between the original xand the
generated images x=G(x)) using a contextual loss Lcon defined as:
Lcon =ExpXkxG(x)k1.(2)
Encoder Loss. The two losses introduced above can enforce the generator to
produce images that are not only realistic but also contextually sound. Moreover,
we employ an additional encoder loss Lenc to minimize the distance between the
bottleneck features of the input (z=GE(x)) and the encoded features of the
generated image z=E(G(x))). Lenc is formally defined as
Lenc =ExpXkGE(x)E(G(x))k2.(3)
In so doing, the generator learns how to encode features of the generated image
for normal samples. For anomalous inputs, however, it will fail to minimize the
distance between the input and the generated images in the feature space since
both Gand Enetworks are optimized towards normal samples only.
Overall, our objective function for the generator becomes the following:
L=wadvLadv +wcon Lcon +wencLenc (4)
where wadv,wadv and wadv are the weighting parameters adjusting the impact
of individual losses to the overall objective function.
real/fake
D(x, x')
x z x' E(x') z'
GE(x) GD(z)
D(x, x')
x
z
x'
D(x, x')
x
z'
E(x)
z
x'
G(z)
A
B
C
Fig. 3. Comparison of the three models. A) AnoGAN [39], B) Efficient-GAN-Anomaly
[40], C) Our Approach: GANomaly
3.3 Model Testing
During the test stage, the model uses Lenc given in Eq 3 for scoring the abnor-
mality of a given image. Hence, for a test sample ˆx, our anomaly score A( ˆx) or
sˆxis defined as
Ax) = kGE(ˆx)E(G(ˆx))k1.(5)
GANomaly 9
To evaluate the overall anomaly performance, we compute the anomaly score
for individual test sample ˆxwithin the test set ˆ
D, which in turn yields us a set
of anomaly scores S={si:A( ˆxi),ˆxiˆ
D}. We then apply feature scaling to
have the anomaly scores within the probabilistic range of [0,1].
s0
i=simin(S)
max(S)min(S)(6)
The use of Eq 6 ultimately yields an anomaly score vector S0for the final
evaluation of the test set ˆ
D.
4 Experimental Setup
To evaluate our anomaly detection framework, we use three types of dataset
ranging from the simplistic benchmark of MNIST [25], the reference benchmark
of CIFAR [24] and the operational context of anomaly detection within X-ray
security screening [5].
MNIST. To replicate the results presented in [40], we first experiment on
MNIST data [25] by treating one class being an anomaly, while the rest of the
classes are considered as the normal class. In total, we have ten sets of data,
each of which consider individual digits as the anomaly.
CIFAR10. Within our use of the CIFAR dataset, we again treat one class
as abnormal and the rest as normal. We then detect the outlier anomalies as
instances drawn from the former class by training the model on the latter labels.
University Baggage Anomaly Dataset (UBA). This sliding window
patched-based dataset comprises 230,275 image patches. Normal samples are ex-
tracted via an overlapping sliding window from a full X-ray image, constructed
using single conventional X-ray imagery with associated false color materials
mapping from dual-energy [36]. Abnormal classes (122,803) are of 3 sub-classes
knife (63,496), gun (45,855) and gun component (13,452) contain man-
ually cropped threat objects together with sliding window patches whose inter-
section over union with the ground truth is greater than 0.3.
Full Firearm vs. Operational Benign (FFOB). In addition to these
datasets, we also use the UK government evaluation dataset [1], comprising both
expertly concealed firearm (threat) items and operational benign (non-threat)
imagery from commercial X-ray security screening operations (baggage/parcels).
Denoted as FFOB, this dataset comprises 4,680 firearm full-weapons as full
abnormal and 67,672 operational benign as full normal images, respectively.
The procedure for train and test set split for the above datasets is as follows:
we split the normal samples such that 80% and 20% of the samples are considered
10 S. Akcay et al.
AUC
Digit designated as anomalous class
01 2 345 6 78 9
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
(a) (b)
GANomaly EGBAD [40]
Class designated as anomalous class
plane car bird cat deer dog frog horse ship truck
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
AnoGAN [39] VAE [6]
Fig. 4. Results for MNIST (a) and CIFAR (b) datasets. Variations due to the use of
3 different random seeds are depicted via error bars. All but GANomaly results in (a)
were obtained from [40].
as part of the train and test sets, respectively. We then resize MNIST to 32 ×32,
DBA and FFOB to 64 ×64, respectively.
Following Schlegl et al. [39] (AnoGAN) and Zenati et al. [40] (EGBAD), our
adversarial training is also based on the standard DCGAN approach [34] for
a consistent comparison. As such, we aim to show the superiority of our multi-
network architecture regardless of using any tricks to improve the GAN training.
In addition, we also compare our method against the traditional variational
autoencoder architecture [6] (VAE) to show the advantage of our multi-network
architecture. We implement our approach in PyTorch [32] (v0.4.0 with Python
3.6.5) by optimizing the networks using Adam [22] with an initial learning rate
lr = 2e3, and momentums β1= 0.5, β2= 0.999. Our model is optimized based
on the weighted loss L(defined in Equation 4) using the weight values wbce = 1,
wrec = 50 and wenc = 1, which were empirically chosen to yield optimum results.
(Figure 5 (b)). We train the model for 15, 25, 25 epochs for MNIST, UBA and
FFOB datasets, respectively. Experimentation is performed using a dual-core
Intel Xeon E5-2630 v4 processor and NVIDIA GTX Titan X GPU.
5 Results
We report results based on the area under the curve (AUC) of the Receiver
Operating Characteristic (ROC), true positive rate (TPR) as a function of false
positive rate (FPR) for different points, each of which is a TPR-FPR value for
different thresholds.
Figure 4 (a) presents the results obtained on MNIST data using 3 different
random seeds, where we observe the clear superiority of our approach over pre-
vious contemporary models [6, 39,40]. For each digit chosen as anomalous, our
GANomaly 11
model achieves higher AUC than EGBAD [40], AnoGAN [39] and variational
autoencoder pipeline VAE [6]. Due to showing its poor performance within rela-
tively unchallenging dataset, we do not include VAE in the rest of experiments.
Figure 4 (b) shows the performance of the models trained on the CIFAR10
dataset. We see that our model achieves the best AUC performance for any of
the class chosen as anomalous. The reason for getting relatively lower quantita-
tive results within this dataset is that for a selected abnormal category, there
exists a normal class that is similar to the abnormal (plane vs. bird, cat vs. dog,
horse vs. deer and car vs. truck).
UBA FFOB
Method gun gun-parts knife overall full-weapon
AnoGAN [39] 0.598 0.511 0.599 0.569 0.703
EGBAD [40] 0.614 0.591 0.587 0.597 0.712
GANomaly 0.747 0.662 0.520 0.643 0.882
Table 1. AUC results for UBA and FFOB datasets
For UBA and FFOB datasets shown in Table 1, our model again outperforms
other approaches excluding the case of the knife. In fact, the performance of the
models for knife is comparable. Relatively lower performance of this class is its
shape simplicity, causing an overfit and hence high false positives. For the overall
performance, however, our approach surpasses the other models, yielding AUC
of 0.666 and 0.882 on the UBA and FFOB datasets, respectively.
Figure 5 depicts how the choice of hyper-parameters ultimately affect the
overall performance of the model. In Figure 5 (a), we see that the optimal per-
formance is achieved when the size of the latent vector zis 100 for the MNIST
dataset with an abnormal digit-2. Figure 5 (b) demonstrates the impact of tuning
the loss function in Equation 4 on the overall performance. The model achieves
the highest AUC when wbce = 1, wrec = 50 and wenc = 1. We empirically
observe the same tuning-pattern for the rest of datasets.
Figure 6 provides the histogram of the anomaly scores during the inference
stage (a) and t-SNE visualization of the features extracted from the last convo-
lutional layer of the discriminator network (b). Both of the figures demonstrate
a clear separation within the latent vector zand feature f(.) spaces.
Table 2 illustrates the runtime performance of the GAN-based models. Com-
pared to the rest of the approaches, AnoGAN [39] is computationally rather
expensive since optimization of the latent vector is needed for each example.
For EGBAD [40], we report similar runtime performance to that of the original
paper. Our approach, on the other hand, achieves the highest runtime perfor-
mance. Runtime performance of both UBA and FFOB datasets are comparable
to MNIST even though their image and network size are double than that of
MNIST.
12 S. Akcay et al.
AUC
Digit designated as anomalous class
01 2 345 6 78 9
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Weight range for and
10 20 30 40 50 60 70 80 90
1
0.55
0.60
0.75
0.80
0.90
0.95
0.65
0.70
0.85
(a) (b)
Fig. 5. (a) Overall performance of the model based on varying size of the latent vector
z. (b) Impact of weighting the losses on the overall performance. Model is trained on
MNIST dataset with an abnormal digit-2
(a) (b)
Normal
Abnormal
Normal
Abnormal
0.0 0.2 0.4 0.6 0.8 1.0 Perplexity: 40, LR: 140, Iter: 1000
Fig. 6. (a) Histogram of the scores for both normal and abnormal test samples. (b)
t-SNE visualization of the features extracted from the last conv. layer f(.) of the
discriminator
A set of examples in Figure 7 depict real and fake images that are respectively
the input and output of our model. We expect the model to fail when generating
anomalous samples. As can be seen in Figure 7(a), this is not the case for the
class of 2 in the MNIST data. This stems from the fact that MNIST dataset is
relatively unchallenging, and the model learns sufficient information to be able
to generate samples not seen during training. Another conclusion that could be
drawn is that distance in the latent vector space provides adequate details for
detecting anomalies even though the model cannot distinguish abnormalities in
GANomaly 13
Model MNIST CIFAR DBA FFOB
AnoGAN [39] 7120 7120 7110 7223
EGBAD [40] 8.92 8.71 8.88 8.87
GANomaly 2.79 2.21 2.66 2.53
Table 2. Computational performance of the approaches. (Runtime in terms of mil-
lisecond)
the image space. On the contrary to the MNIST experiments, this is not the case.
Figures 7 (b-c) illustrate that model is unable to produce abnormal objects.
Overall these results purport that our approach yields both statistically and
computationally superior results than leading state-of-the-art approaches [39,40].
6 Conclusion
We introduce a novel encoder-decoder-encoder architectural model for general
anomaly detection enabled by an adversarial training framework. Experimen-
tation across dataset benchmarks of varying complexity, and within the oper-
ational anomaly detection context of X-ray security screening, shows that the
proposed method outperforms both contemporary state-of-the-art GAN-based
and traditional autoencoder-based anomaly detection approaches with general-
ization ability to any anomaly detection task. Future work will consider employ-
ing emerging contemporary GAN optimizations [7, 17, 38], known to improve
generalized adversarial training.
References
1. OSCT Borders X-ray Image Library, UK Home Office Centre for Applied Science
and Technology (CAST). Publication Number: 146/16 (2016)
2. Abdallah, A., Maarof, M.A., Zainal, A.: Fraud detection system: A sur-
vey. Journal of Network and Computer Applications 68, 90–113 (jun 2016).
https://doi.org/10.1016/J.JNCA.2016.04.007, https://www.sciencedirect.com/
science/article/pii/S1084804516300571
3. Ahmed, M., Mahmood, A.N., Islam, M.R.: A survey of anomaly detection
techniques in financial domain. Future Generation Computer Systems 55,
278–288 (feb 2016). https://doi.org/10.1016/J.FUTURE.2015.01.001, https://
www.sciencedirect.com/science/article/pii/S0167739X15000023
4. Ahmed, M., Naser Mahmood, A., Hu, J.: A survey of network anomaly detection
techniques. Journal of Network and Computer Applications 60, 19–31 (jan 2016).
https://doi.org/10.1016/J.JNCA.2015.11.016, https://www.sciencedirect.com/
science/article/pii/S1084804515002891
5. Akcay, S., Kundegorski, M.E., Willcocks, C.G., Breckon, T.P.: Using deep convolu-
tional neural network architectures for object classification and detection within x-
ray baggage security imagery. IEEE Transactions on Information Forensics and Se-
curity 13(9), 2203–2215 (Sept 2018). https://doi.org/10.1109/TIFS.2018.2812196
14 S. Akcay et al.
UBAgun
FFOB
CIFAR
MNIST
Real Fake
Fig. 7. Exemplar real and generated samples containing normal and abnormal objects
in each dataset. The model fails to generate abnormal samples not being trained on.
GANomaly 15
6. An, J., Cho, S.: Variational autoencoder based anomaly detection using recon-
struction probability. Special Lecture on IE 2, 1–18 (2015)
7. Arjovsky, M., Bottou, L.: Towards Principled Methods for Training Genera-
tive Adversarial Networks. In: 2017 ICLR (April 2017), http://arxiv.org/abs/
1701.04862
8. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks.
In: Proceedings of the 34th International Conference on Machine Learning. pp.
214–223. Sydney, Australia (06–11 Aug 2017), http://proceedings.mlr.press/
v70/arjovsky17a.html
9. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection. ACM Computing
Surveys 41(3), 1–58 (jul 2009). https://doi.org/10.1145/1541880.1541882
10. Chen, X., Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel,
P.: InfoGAN: Interpretable Representation Learning by Information Maximizing
Generative Adversarial Nets. In: Advances in Neural Information Processing Sys-
tems. pp. 2172–2180 (2016)
11. Creswell, A., Bharath, A.A.: Inverting the generator of a generative adversarial
network (ii). arXiv preprint arXiv:1802.05701 (2018)
12. Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., Bharath,
A.A.: Generative adversarial networks: An overview. IEEE Signal Processing Mag-
azine 35(1), 53–65 (2018)
13. Dimokranitou, A.: Adversarial Autoencoders for Anomalous Event Detection in
Images. Ph.D. thesis, Purdue University (2017)
14. Donahue, J., Kr¨ahenb¨uhl, P., Darrell, T.: Adversarial Feature Learning. In: In-
ternational Conference on Learning Representations (ICLR). Toulon, France (apr
2017), http://arxiv.org/abs/1605.09782
15. Dumoulin, V., Belghazi, I., Poole, B., Mastropietro, O., Lamb, A., Arjovsky, M.,
Courville, A.: Adversarially learned inference. In: ICLR (2017)
16. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,
S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural
information processing systems. pp. 2672–2680 (2014)
17. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved
training of wasserstein gans. In: Advances in Neural Information Processing Sys-
tems. pp. 5767–5777 (2017)
18. Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning
temporal regularity in video sequences. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition. pp. 733–742 (2016)
19. Hodge, V., Austin, J.: A Survey of Outlier Detection Method-
ologies. Artificial Intelligence Review 22(2), 85–126 (oct 2004).
https://doi.org/10.1023/B:AIRE.0000045502.10941.a9, http://
link.springer.com/10.1023/B:AIRE.0000045502.10941.a9
20. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training
by reducing internal covariate shift. In: Proceedings of the 32nd International
Conference on Machine Learning. pp. 448–456. Lille, France (07–09 Jul 2015),
http://proceedings.mlr.press/v37/ioffe15.html
21. Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation
with conditional adversarial networks. In: 2017 IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR). pp. 5967–5976 (July 2017).
https://doi.org/10.1109/CVPR.2017.632
22. Kinga, D., Adam, J.B.: Adam: A method for stochastic optimization. In: Interna-
tional Conference on Learning Representations (ICLR). vol. 5 (2015)
16 S. Akcay et al.
23. Kiran, B.R., Thomas, D.M., Parakkal, R.: An overview of deep learning based
methods for unsupervised and semi-supervised anomaly detection in videos. Jour-
nal of Imaging 4(2), 36 (2018)
24. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images.
Tech. rep., Citeseer (2009)
25. LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010), http://
yann.lecun.com/exdb/mnist/
26. Lipton, Z.C., Tripathi, S.: Precise recovery of latent vectors from generative ad-
versarial networks. In: ICLR Workshop (2017)
27. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoen-
coders. In: ICLR (2016)
28. Markou, M., Singh, S.: Novelty detection: a review—part 1: sta-
tistical approaches. Signal Processing 83(12), 2481–2497 (dec
2003). https://doi.org/10.1016/J.SIGPRO.2003.07.018, https://
www.sciencedirect.com/science/article/pii/S0165168403002020
29. Markou, M., Singh, S.: Novelty detection: a review—part 2:: neu-
ral network based approaches. Signal Processing 83(12), 2499–2521
(dec 2003). https://doi.org/10.1016/J.SIGPRO.2003.07.019, https:
//www.sciencedirect.com/science/article/pii/S0165168403002032
30. Medel, J.R., Savakis, A.: Anomaly Detection in Video Using Predictive Convolu-
tional Long Short-Term Memory Networks. CoRR abs/1612.0 (dec 2016)
31. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint
arXiv:1411.1784 (2014)
32. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z.,
Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch (2017)
33. Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty
detection. Signal Processing 99, 215–249 (2014)
34. Radford, A., Metz, L., Chintala, S.: Unsupervised Representation Learning with
Deep Convolutional Generative Adversarial Networks. In: ICLR (2016)
35. Ravanbakhsh, M., Sangineto, E., Nabi, M., Sebe, N.: Training Adversarial Discrimi-
nators for Cross-channel Abnormal Event Detection in Crowds. CoRR abs/1706.0
(jun 2017), http://arxiv.org/abs/1706.07680
36. Rogers, T.W., Jaccard, N., Morton, E.J., Griffin, L.D.: Automated x-ray image
analysis for cargo security: critical review and future promise. Journal of X-ray
science and technology (Preprint), 1–24 (2016)
37. Sabokrou, M., Fathy, M., Hoseini, M., Klette, R.: Real-time anomaly
detection and localization in crowded scenes. 2015 IEEE Conference
on Computer Vision and Pattern Recognition Workshops (CVPRW)
pp. 56–62 (2015). https://doi.org/10.1109/CVPRW.2015.7301284, http:
//ieeexplore.ieee.org/document/7301284/
38. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Im-
proved techniques for training gans. In: Advances in Neural Information Processing
Systems. pp. 2234–2242 (2016)
39. Schlegl, T., Seeb¨ock, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsu-
pervised anomaly detection with generative adversarial networks to guide marker
discovery. Lecture Notes in Computer Science (including subseries Lecture Notes in
Artificial Intelligence and Lecture Notes in Bioinformatics) 10265 LNCS, 146–147
(2017). https://doi.org/10.1007/978-3-319-59050-9 12
40. Zenati, H., Foo, C.S., Lecouat, B., Manek, G., Chandrasekhar, V.R.: Efficient gan-
based anomaly detection. arXiv preprint arXiv:1802.06222 (2018)
... GAN employs unsupervised learning to extract features without using labels, and its architectural flexibility has led to its various applications. In the field of anomaly detection, GAN has achieved high performance [7][10] [11] [12]. We aim to improve the performance of Efficient GAN, proposed for anomaly detection [11] by data augmentation and enhancing the loss function to incorporate mapping consistency. ...
... In anomaly detection using GAN, first, the generator and discriminator are trained using only normal images. Second, when a test image is given, the trained generator reconstructs the image that is closest to a normal image, and the anomaly loss is calculated [7][10] [11] [12] as ...
Article
Anomaly detection is essential in a wide range of fields. In this study, we focus on an Efficient GAN applied to anomaly detection, and aim to improve its performance by random erasing data augmentation and enhancing the loss function to incorporate mapping consistency. Experiments using images of normal lemons and damaged lemons reveal that the proposed method significantly improves the anomaly detection performance of Efficient GAN.
... The residual between the input image and its reconstruction then serves as an indicator of the lesion. Popular generative models, such as variational autoencoders (VAE) 18 and generative adversarial networks (GAN), 19 have been widely adopted for UAD tasks, reusing generators and applying random sampling in latent space. However, a common challenge is that these models often reconstruct both anomalous and healthy regions, producing an image nearly identical to the input, which limits their ability to detect anomalies accurately. ...
Article
Full-text available
Background Unsupervised traumatic brain injury (TBI) lesion detection aims to identify and segment abnormal regions, such as cerebral edema and hemorrhages, using only healthy training data. Recent advancements in generative models have achieved success in unsupervised anomaly detection by transforming abnormal patterns into normal counterparts. However, current mask‐free image generators often fail to maintain semantic consistency of anatomical structures during the restoration process. This limitation negatively impacts residual‐based anomaly detection, particularly in cases where structural deformations occur due to the mass effect of TBI lesions. Purpose This study aims to develop a semantic‐consistent, unsupervised TBI lesion detection and segmentation method that minimizes false positives by preserving normal tissue consistency during the image generation process while addressing mass effect‐related tissue deformations. Methods We propose the semantic‐consistent diffusion model (SCDM) for unsupervised TBI lesion detection, focusing on the localization and segmentation of various lesion types from noncontrast CT scans of TBI patients. Leveraging the high‐quality image generation capabilities of unconditioned diffusion models (DM), we introduce a normal tissue retainment (NTR) regularization to ensure that normal tissues remain unaltered throughout the iterative denoising process. Furthermore, we address normal tissue compression and deformation caused by the mass effect of TBI lesions through diffeomorphic registration, reducing erroneous activations in residual images and final lesion maps. Results Extensive experiments were conducted on three publicly available brain lesion datasets and one internal dataset. These datasets comprised 75, 51, 92, and 56 CT scans, respectively. Thirty seven CT scans without TBI lesions were used for training and validation, while the remaining scans were used for testing. The proposed method achieved average DSC of 0.56, 0.51, 0.47, and 0.52 and AUPRC of 0.57, 0.48, 0.53, and 0.50 on the BCIHM, BHSD, Seg‐CQ500, and internal datasets, respectively, surpassing state‐of‐the‐art unsupervised methods for TBI lesion detection and segmentation. An ablation study validated the effectiveness of the proposed NTR regularization and diffeomorphic registration‐based mass effect simulation. Conclusions The results suggest that the proposed SCDM enables effective TBI lesion detection and segmentation across diverse TBI CT scans. It significantly reduces false positives by addressing inconsistencies in normal tissue during the iterative image restoration process and mitigating mass effect‐induced tissue deformations.
... Specifically, we examine the impact of using (1) the distance-based score, (2) the similarity-based score, and (3) the combined anomaly score in Eqs. (15), (13), and (14), respectively. Figure 15 illustrates the AUC scores across all datasets for each scoring method. ...
Article
Full-text available
Anomaly detection in human behavior monitoring is a challenging task which requires capturing contextual information. In this paper, we propose a novel anomaly detection method which leverages multimodal (image and caption) and multiview self-supervised learning objectives. Previous works successfully used deep captioning alongside images. However, their reliance on unimodal pre-trained image and text features revealed deficiencies in capturing contextual information across modalities. Our method learns high-quality multimodal feature representations and captures contextual information across modalities by combining contrastive objectives which exploit complementary and consistent information from different modalities and views. We evaluate our method on four real-world datasets for human monitoring anomaly detection. Our extensive experimental results demonstrate substantial improvements compared to the baseline methods. Specifically, our method achieved higher area under the receiver operating characteristic curve (AUC) scores, increasing from 0.967 to 0.99, 0.973 to 0.987, 0.885 to 0.94, and 0.671 to 0.713. Additionally, the area under the precision-recall curve (AUPRC) scores improved from 0.892 to 0.96, 0.90 to 0.905, 0.512 to 0.661, and 0.89 to 0.907.
Article
Visual surface anomaly detection targets the location of anomalies, with numerous methods available to address the challenge. Reconstruction-based methods are popular for their adaptability and interpretability. However, reconstruction-based methods currently struggle with the challenge of achieving low image fidelity and a tendency to reconstruct anomalies. To overcome these challenges, we introduces the Feature Bank-guided Reconstruction method (FBR), incorporating three innovative modules: anomaly simulation, feature bank module, and a cross-fused Discrete Cosine Transform channel attention module. Guided by these modules, our method is capable of reconstructing images with enhanced robustness. The experimental results validate the effectiveness of the proposed approach, which not only achieves outstanding performance on the BeanTech AD dataset with an 96.4% image-AUROC and a 97.3% pixel-AUROC, but also demonstrates competitive performance on the MVTec AD dataset with a 99.5% image-AUROC and a 98.3% pixel-AUROC.
Article
Full-text available
We consider the use of deep Convolutional Neural Networks (CNN) with transfer learning for the image classification and detection problems posed within the context of X-ray baggage security imagery. The use of the CNN approach requires large amounts of data to facilitate a complex end-to-end feature extraction and classification process. Within the context of Xray security screening, limited availability of object of interest data examples can thus pose a problem. To overcome this issue, we employ a transfer learning paradigm such that a pre-trained CNN, primarily trained for generalized image classification tasks where sufficient training data exists, can be optimized explicitly as a later secondary process towards this application domain. To provide a consistent feature-space comparison between this approach and traditional feature space representations, we also train Support Vector Machine (SVM) classifier on CNN features. We empirically show that fine-tuned CNN features yield superior performance to conventional hand-crafted features on object classification tasks within this context. Overall we achieve 0.994 accuracy based on AlexNet features trained with Support Vector Machine (SVM) classifier. In addition to classification, we also explore the applicability of multiple CNN driven detection paradigms such as sliding window based CNN (SW-CNN), Faster RCNN (F-RCNN), Region-based Fully Convolutional Networks (R-FCN) and YOLOv2. We train numerous networks tackling both single and multiple detections over SW-CNN/F-RCNN/RFCN/ YOLOv2 variants. YOLOv2, Faster-RCNN, and R-FCN provide superior results to the more traditional SW-CNN approaches. With the use of YOLOv2, using input images of size 544×544, we achieve 0.885 mean average precision (mAP) for a six-class object detection problem. The same approach with an input of size 416×416 yields 0.974 mAP for the two-class firearm detection problem and requires approximately 100ms per image. Overall we illustrate the comparative performance of these techniques and show that object localization strategies cope well with cluttered X-ray security imagery where classification techniques fail.
Article
Full-text available
Videos represent the primary source of information for surveillance applications and are available in large amounts but in most cases contain little or no annotation for supervised learning. This article reviews the state-of-the-art deep learning based methods for video anomaly detection and categorizes them based on the type of model and criteria of detection. We also perform simple studies to understand the different approaches and provide the criteria of evaluation for spatio-temporal anomaly detection.
Article
Full-text available
Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this through deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transfer, image super-resolution and classification. The aim of this review paper is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible. In addition to identifying different methods for training and constructing GANs, we also point to remaining challenges in their theory and application.
Article
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.
Article
Generative adversarial networks (GANs) are able to model the complex highdimensional distributions of real-world data, which suggests they could be effective for anomaly detection. However, few works have explored the use of GANs for the anomaly detection task. We leverage recently developed GAN models for anomaly detection, and achieve state-of-the-art performance on image and network intrusion datasets, while being several hundred-fold faster at test time than the only published GAN-based method.
Article
Generative adversarial networks (GANs) learn a deep generative model that is able to synthesise novel, high-dimensional data samples. New data samples are synthesised by passing latent samples, drawn from a chosen prior distribution, through the generative model. Once trained, the latent space exhibits interesting properties, that may be useful for down stream tasks such as classification or retrieval. Unfortunately, GANs do not offer an "inverse model", a mapping from data space back to latent space, making it difficult to infer a latent representation for a given data sample. In this paper, we introduce a technique, inversion, to project data samples, specifically images, to the latent space using a pre-trained GAN. Using our proposed inversion technique, we are able to identify which attributes of a dataset a trained GAN is able to model and quantify GAN performance, based on a reconstruction loss. We demonstrate how our proposed inversion technique may be used to quantitatively compare performance of various GAN models trained on three image datasets. We provide code for all of our experiments, https://github.com/ToniCreswell/InvertingGAN.
Article
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Conference Paper
In this paper we propose a new method for regularizing autoencoders by imposing an arbitrary prior on the latent representation of the autoencoder. Our method, named "adversarial autoencoder", uses the recently proposed generative adversarial networks (GAN) in order to match the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior. Matching the aggregated posterior to the prior ensures that there are no "holes" in the prior, and generating from any part of prior space results in meaningful samples. As a result, the decoder of the adversarial autoencoder learns a deep generative model that maps the imposed prior to the data distribution. We show how adversarial autoencoders can be used to disentangle style and content of images and achieve competitive generative performance on MNIST, Street View House Numbers and Toronto Face datasets.