PreprintPDF Available

A Survey on Assessing the Generalization Envelope of Deep Neural Networks: Predictive Uncertainty, Out-of-distribution and Adversarial Samples

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Deep Neural Networks (DNNs) achieve state-of-the-art performance on numerous problem set-ups. However, humans are not able to tell beforehand if a DNN receiving an input will deliver the desired output since their decision criteria are usually non-transparent. A DNN delivers the desired output if the input is within its generalization envelope. In this case, the information contained in the input sample is processed reasonably by the network. Since common DNNs fail to provide relevant information to assess the generalization envelope at inference time, additional methods or adaptations to the DNN have to be performed. Existing methods are evaluated using different set-ups respectively connected to three literature fields: predictive uncertainty, out-of-distribution detection and adversarial example detection. This survey connects those fields and gives an overview of the adaptations and methods that provide at inference time information if the current input is within the generalization area of a DNN.
Content may be subject to copyright.
A Survey on Assessing the Generalization Envelope of Deep Neural Networks:
Predictive Uncertainty, Out-of-distribution and Adversarial Samples
Julia Lust and Alexandru P. Condurache
Robert Bosch GmbH, Automated Driving Research
University of L¨
ubeck, Institute of Signal Processing
JuliaRebecca.Lust, AlexandruPaul.Condurache@de.bosch.com
Abstract
Deep Neural Networks (DNNs) achieve state-of-the-art
performance on numerous applications. However, it is dif-
ficult to tell beforehand if a DNN receiving an input will
deliver the correct output since their decision criteria are
usually nontransparent. A DNN delivers the correct output
if the input is within the area enclosed by its generalization
envelope. In this case, the information contained in the in-
put sample is processed reasonably by the network. It is
of large practical importance to assess at inference time if
a DNN generalizes correctly. Currently, the approaches to
achieve this goal are investigated in different problem set-
ups rather independently from one another, leading to three
main research and literature fields: predictive uncertainty,
out-of-distribution detection and adversarial example de-
tection. This survey connects the three fields within the
larger framework of investigating the generalization per-
formance of machine learning methods and in particular
DNNs. We underline the common ground, point at the most
promising approaches and give a structured overview of the
methods that provide at inference time means to establish if
the current input is within the generalization envelope of a
DNN.
1. Introduction
Generalization is the ability of a classifier to correctly
predict the class of previous unseen data points. In machine
learning the classifier’s decisions are derived from a train-
ing set. How well the derived decision criteria generalize is
tested by using an additional test set containing unseen data,
which is sampled from the same distribution as the training
set. A good performance on the training set in combination
with a poor performance on the test set is usually related to
a lack of generalization ability. After the development of a
classifier is finished, a poor performance on a data sample
at inference time can often be traced back to a distribution
shift between the development and inference data. For im-
proved classification performance it is important to detect
such situations in which the input has left the generalization
area of the classifier.
Statistical analysis of the generalization potential of a clas-
sifier for the purpose of optimizing its parameters during
training, has led to well known results such as the Proba-
bly Approximately Correct (PAC) Theory [76]. Generaliza-
tion was investigated as a combination of properly sampled
training data and simpler decision methods.
Recently, Deep Neural Networks (DNNs) are the most suc-
cessful machine learning methods for various tasks such as
computer vision, speech recognition and object-detection
[16], [43], [24]. DNNs make use of non-linearities in com-
bination with simple matrix multiplications and an efficient
data driven training procedure. This allows for complex
deep structures and hence grasping the connection between
the input and the output is usually not trivial. The deeper
and hence more complex the neural network is, the less
transparent and comprehensible is its correlation-based be-
havior. Usually humans, using their mostly causality-based
intuition, are not able to tell why a DNN produced a certain
output, which would be a major step in understanding their
generalization behavior.
A possible solution to this dilemma is to decide for each in-
put at inference time, if the DNN is likely to predict the cor-
rect output. Consequently research currently focuses on de-
veloping dedicated, separated generalization-detector meth-
ods that detect if the current input is within the correspond-
ing generalization area of DNNs. In general this is done
by analyzing the relationship between the input sample and
the training set eventually complemented by an analysis of
the information flow within the Network. These approaches
largely replaced methods where a confidence in the result is
computed from the DNN themselves (e.g., with the help of a
softmax layer), as these tend to be overconfident [27], [62].
A generalization detector is illustrated in Figure 1. A DNN
receives an input on which it performs its classification task.
The generalization detector now has to decide, based on the
1
arXiv:2008.09381v4 [cs.LG] 6 Sep 2021
Figure 1. A detector deciding at inference time for a given data
sample if it is within the generalization area of a DNN.
input and the behavior of the DNN, if this input is within
the generalization area. The intention of this paper is to give
an overview and compare methods that decide at inference
time for a given data sample if it is within the generaliza-
tion area of a DNN. Even though this setup is valid for any
application of DNNs, we concentrate here on image clas-
sification, as currently one of the best investigated areas in
that field.
The current literature in the field of generalization detec-
tion methods at inference time can be split into three main
fields: predictive uncertainty, out-of-distribution detection
and adversarial example detection. Predictive uncertainty
mainly concentrates on set-ups involving at inference time
data sampled from the same distribution as the training data.
The goal is to assign a high uncertainty to the samples
that lead to a misclassification. Out-of-distribution detec-
tion deals with data that is different to the training data in
a principled manner. The goal here is to detect such data
at inference time. The adversarial example detection field
concentrates on detecting samples that are carefully gener-
ated in order to fool the DNN.
Until now those research areas were usually treated seper-
atly rather ignoring their common root cause of deficient
generalization performance. Methods belonging to each
group are only evaluated on its corresponding setup. We
strive here for a complete overview of all different sectors
and their corresponding methods on the grounds of their
common generalization-related root cause.
In Section 2 we introduce relevant concepts and the mathe-
matical definitions necessary to explain the detection meth-
ods surveyed in Section 4. The related work is described in
Section 3. We list different properties and discuss similari-
ties and differences across the surveyed fields in Section 5.
Finally we draw our conclusion in Section 6.
2. Preliminaries
In this section, we first give an overview on generaliza-
tion and we describe the setup of our survey focusing on
concepts and mathematical notations.
2.1. Generalization Overview
Consider a classifier that is trained on a training set sam-
pled from an unknown distribution. We do not know a-
priori if the training set represents a proper sample or not
- where a proper sample is one that allows us to perfectly
reconstruct the unknown distribution. We wish that the clas-
sifier that we devise using the potentially improper training
sample is able to generalize, which means that its perfor-
mance at inference time is good on any sample of the distri-
bution.
One possibility to obtain good generalization performance
is to use a stable classifier. A stable classifier has limited
variability with respect to the training set. Thus similar clas-
sifiers are found by a convergent training procedure even
when the training data is (very) different. Not being able to
closely ’follow’ the training set, this classifier will hardly
overfit and at the same time, the chances that the perfor-
mance observed on the training set is representative for the
performance on all data are higher. From a certain point
of view, such a classifier has a good generalization perfor-
mance in the sense that there is a good chance that the error
on all data can be estimated using only the training set, even
if this error is large and thus its classification performance
on any sample of the distribution is low. We propose to call
this property of being able to estimate the true performance
on the training set stability to keep it separated from gener-
alization as introduced earlier.
The usual way to obtain a stable classifier is to limit the
hypothesis space that it covers. The parameters of the clas-
sifier are then found such as to minimize the empirical error
computed on the training set. Clearly, this approximation is
better the more (properly sampled) data is available. Thus,
in this case generalization is understood mainly in relation
to properties of the function implementing the classifier and
also in relation to the empirical error. In practice we need to
control the modeling capacity of the function space to which
the classifier belongs in relation to the cardinality of the
training sample (assuming it is a proper sample), while min-
imizing the training error. Should we succeed and achieve
minimal (ideally zero) training error, we can assume that we
have found the correct solution to our classification prob-
lem, as the classifier does not use its modeling capacity to
learn by heart and generalizes well in the sense that it has
good classification performance on unseen data. This ap-
proach with its bias-variance trade-off flavor, inspired the
support vector machines [79], [78] whose linear decision
surface is constructed to ensure high generalization ability
[13].
In the DNN context, the modeling capacity is infinite [31],
but still we would want it to generalize, i.e., perform well on
previously unobserved input data [5]. During training the
generalization ability is approximated mainly by the empir-
ical error. Particular design choices what the architecture
2
Figure 2. A feature space at inference time with a possible gen-
eralization area (green) for a classifier (orange) trained on a two
class problem. For further details see section 2.2.
of the DNN solution and the training procedure are con-
cerned are made such as to further improve the general-
ization. These ensure good generalization despite infinite
model capacity and limited training data. The validity of
these design choices is typically investigated with the help
of validation data sampled from the unknown distribution
independently of the development data set, which consists
of the train and the test set. The DNNs are able to generalize
as they learn a feature representation optimally suited to the
classification problem they need to solve, assuming that the
classification problem has a solution and the training data
represents a proper sample.
2.2. Concepts
Consider a classification problem in which, given separa-
ble and unambiguous realizations of several different con-
cepts, we are supposed to group realizations of the same
concept into a corresponding class. Assuming the realiza-
tions of concepts are observed by means of a system that
produces images, which are available to us and we have
to solve an image classification problem. For such a prob-
lem the corresponding problem space consists of all images
that depict valid realizations of concepts belonging to a pre-
defined set-up such that each image can be meaningfully
assigned to one of the classes. Such a set-up comes with
an underlying distribution statistically describing the occu-
rance of the images.
The problem space usually consists of an infinite number of
images. However, in order to develop a classifier a finite set
of images is necessary. Therefore, a problem space sample
is drawn from the problem space. This problem space sam-
ple is used as the training data set for a classifier. We will fo-
cus on DNNs as classifiers. The classification DNN learns
afeature space and a classification boundary. In the fea-
ture space, the images can be separated more easily in their
classes than in the vector space of the input image data. The
aim is generalizing from the training set to any sample from
the problem space distribution. The generalization error is
estimated using a test set which usually is another subset
sampled from the problem space, independent of the train-
ing set.
The generalization envelope of a DNN encloses the gen-
eralization area, i.e., the region of the problem space for
which the network decides reasonably correct on both pre-
viously seen and unseen input samples at inference time. A
misclassified input sample at inference-time is called a gen-
eralization error.
There are two main sources that can lead to a generaliza-
tion error. One source is the model itself. In this case the
architecture or the training procedure or both do not allow
to learn a feature space and a decision boundary such that
the data can be separated. A large enough DNN has infi-
nite model capacity. Therefore the architecture and training
procedure need to be chosen wisely such as to successfully
generalize from the training sample and shatter the entire
problem space. The second source of error stems from the
training set, when this does not cover the problem space
properly, in the sense that successful generalization is im-
possible. There may be several issues here: the problem-
space sample may be sparse, it may cover only a region of
the problem space, it may not reflect the distribution and/or
extent of each concept correctly, etc.
Consequently, the uncertainty of a prediction can be sep-
arated into epistemic, which refers to uncertainty in the
model predictions, and aleatoric, which captures uncer-
tainty pertaining to the available problem-space sample. In
practice it is often not possible to decide which kind of un-
certainty caused a misclassification. The uncertainty infor-
mation can not be entangled and therefore most uncertainty
predictors are constructed to measure the combined uncer-
tainty. Hence, in the following the term uncertainty includes
epistemic as well as aleatoric uncertainty. As usual in the
literature we we also use the term confidence as an antonym
for uncertainty, a high confidence implies a low uncertainty
and vice versa. If for an input x, the DNN’s output has a
high uncertainty we assign a higher probability to the event
that xlies outside the generalization area. Conversely, a low
uncertainty does not necessarily imply a smaller probability
that the corresponding input lays outside the generalization
area.
To illustrate our set-up, Figure 2 depicts an example of a
two dimensional feature space for a classification problem.
The blue and red dots represent inference time data of two
classes. A DNN is trained on a problem space sample and
finds the orange decision boundary which splits the whole
input space into two areas in which the data would be clas-
3
sified such that the area to the left corresponds to the red
class and the data to the right corresponds to the blue class.
The area in which the network is able to predict correctly
is the generalization area visualized in green. Conversely,
a correct decision given the combination of training data,
classifier architecture and training process can not be guar-
anteed within the white area. As depicted in Figure 2 there
are typically three main types of data samples that are mis-
classified at inference time: Predictive uncertainty samples
close to the separation surface, out-of-distribution samples
located outside the problem space or far in the tail of the
problem space distribution as approximated from the prob-
lem space sample and adversarial samples which are sam-
ples from within or outside the problem space purposefully
selected or constructed such as a misclassification occurs,
they therefore can be located in all regions outside the gen-
eralization area.
As visualized in Figure 2 predictive uncertainty samples are
observed in regions close to a decision boundary. Such gen-
eralization errors are caused by a slightly misplaced deci-
sion boundary or noise in the data lead to a small misplace-
ment of the sample in the feature space.
Out-of-distribution samples are often referred to as anoma-
lies. They are different in a general way from the training
samples. Either they are sampled from a part of the problem
space not covered by the distribution estimated after train-
ing or they do not even belong to the problem space. Some
possible out-of-distribution samples are marked at the right
hand side of Figure 2. The purple color marks the fact that
they are not in the problem space.
The third class are adversarial examples. They are con-
structed in order to fool the DNN by performing small
changes on an image such that the sample is shifted to the
wrong side of the decision boundary in the feature space
which leads to a misclassification. Depending on their con-
struction method adversarial examples can be found among
out-of-distribution and predictive uncertainty samples, but
also in low probability pockets [59], exemplarly visualized
in the center of the red class area. Here, training data is
sparse and therefore the region was not assigned correctly,
i.e. to the blue class 1.
The literature on the detection of samples outside the gener-
alization area may be divided into the described three main
fields: predictive uncertainty, out-of-distribution detection
and adversarial example detection. Each field relates to a
part of the question how to determine if an input has left the
generalization area of a DNN. Further relations and spe-
cific technical definitions in each field are introduced and
described respectively in Section 4.1, 4.2 and 4.3.
Table 1. Nomenclature
xinput data, in our case an image xRn
lclass label for an image l∈ {1, . . . , m},
mis the number of possible classes
F(·)DNN for classification F:Rn[0,1]m,
x7→ F(x)s.t. Pm
i=1 F(x)i= 1
ypredicted class y∈ {1, . . . , m}
y= argmaxi∈{1,··· ,m}F(x)i
L(·,·)loss function L: [0,1]m×[0,1]mR,
(F(x), l)7→ L(F(x), l)compares the output
F(x)of xto the label lto train the DNN
θparameters of a DNN
DF(·)detector DF(·) : RnR, x 7→ DF(x)
predicts if xis within (D(F, x)< T )
or outside (D(F, x)T) the generalization
area of F;Tis a predefined threshold
2.3. Mathematical notation
For quick reference we have gathered in Table 1 the most
important mathematical terms that we use. In the following
we describe these in more detail.
In the classification setup a DNN is a function F(·)that
computes for an input xRnan output F(x)Rm. The
output F(x)contains scores, one for each possible class.
The maximal score defines the predicted class label y
y= argmax
i∈{1,...,m}
F(x)i.
Often, the last layer contains a softmax function that nor-
malizes the final class scores
F(x)[0,1]m,s.t.
m
X
i=1
(F(x))i= 1 .
Each DNN consists of several layers fj, j ∈ {1, ..., k}
F(x) = f(k)(. . . f (2)(f(1)(x))) .
Usually each layer is a function on the output of the pre-
vious layer. Depending on the architecture and application
area of the DNN, there are different kinds of layers, e.g.
fully connected, convolutional or pooling layers. Often, the
last layer contains a softmax activation. The parameters
of these layers are the network’s weights Θ. The weights
are optimized using a training set that contains input-output
pairs (x, l). The output labels l∈ {1, . . . , m}hold the cor-
rect class for the inputs x. During training, the weights Θ
1Samples of incorrectly assigned low probability pockets do not have
to be adversarial.
4
are iteratively updated such that the loss L(F(x), l), which
compares the network output to the true label for samples
from the training set, is reduced.
The task for our work is to find a detector method DF(·)
that states at inference time for any possible input xif it
is within the generalization area of the DNN F(·). Based
on the input x, the way the DNN processes x, and the out-
put F(x), the detector DF(·)returns a value DF(x)R,
which in combination with a threshold TRdefines if x
is expected to be within the generalization area or not:
DF(x)< T xwithin generalization area
DF(x)Txnot within generalization area .
Sometimes there is no hard decision needed, but the prob-
ability how likely the classification decision of the network
is. This probability information can be achieved by defining
a monotonously increasing function
M:R[0,1],
that maps the output of DF(x)onto the interval [0,1]. The
closer M(DF(x)) is to one the more likely the input is mis-
classified. The closer the value is to zero, the more likely
the classification result of the DNN is correct.
3. Related Work
To the best of our knowledge, there have been no other
contributions focusing on the generalization capabilities of
Machine Learning and in particular Deep Learning meth-
ods, as the root cause for research in the fields of, uncer-
tainty estimation, out-of-distribution detection and adver-
sarial examples.
Recently the most important methods in the area of uncer-
tainty estimation have been benchmarked [35], [64].
There are numerous surveys reviewing literature on out-of-
distribution detection, which is also referred to as anomaly
detection, especially in a context different from deep learn-
ing. Some review anomaly detection in general [12], [68].
Others specialize on various setups such as data mining
techniques [3] or on application areas as e.g. the medical
domain or video-related anomaly detection[41], [50]. In
general those surveys that touch upon deep learning do not
focus on anomalous behavior in combination with a prede-
fined classifier [11], [1] and limit themselves to analyzing
the training data. Detailed information on anomaly detec-
tion based on machine learning procedures is collected in
several books [19], [57], [2].
Surveys on adversarial examples usually review literature
on that topic in general and only include a section on adver-
sarial example detection [85], [4], [10].
Bulusu et al. [8] focus on out-of-distribution and adversar-
ial example detection methods applied on top of pre-trained
DNNs. Different to our survey however, they do not use
generalization to provide an unifying look on both fields
and leave out predictive uncertainty.
4. Detecting Samples Outside the Generaliza-
tion Area
In the following we introduce each literature field and
discuss their detection methods thus setting the stage for
a discussion focusing on the core principles they share (s.
Section 5.2). As shown in Figure 3 the core principles are:
metric, inconsistency, generative and ensemble based meth-
ods. The various literature contributions within each cate-
gory are arranged in ascending chronological order of their
appearance.
4.1. Predictive Uncertainty
For classification tasks the output values F(x)of the
DNN are usually understood as the corresponding probabil-
ities of the sample xbelonging to class y. Against expec-
tations it has been shown that this confidence score is mis-
leading, as the DNNs tend to be overconfident [27], [62].
Initial research on predictive uncertainty in the DNN-based
classification context concentrated on adapting the output
such that its values better approximate the actual probabil-
ity distribution p(y|x).
The methods from the predictive uncertainty field are
among the first attempts to investigate generalization at in-
ference time, being older than out-of-distribution detection
and adversarial example detection. Most procedures were
originally constructed for regression problems, and were
computationally cheap. Typically an ensemble of real world
regression datasets first introduced by [36] was used as eval-
uation set.
Later, Ovadia et al. [64] performed some image classifica-
tion experiments on the most prevalent, scalable and practi-
cally applicable predictive-uncertainty deep-learning meth-
ods. To determine how well the predicted vector F(x)fits
the probability distribution of the corresponding probability
p(y|x)for each class ydifferent evaluation scores are used
such as the Negative Log-Likelihood, the Brier Score and
the Expected Calibration Error.
The Negative Log-Likelihood for Nsample pairs (xi, li)is
defined as
NLL =X
i∈{1,...,N}
log(F(xi)li).
It is also referred to as the cross entropy loss [25]. It is
a proper score, which means that the value NLL is only
minimized if the predicted probability distribution equals
the groundtruth distribution.
Another proper score is the Brier Score [7] which is for N
5
Figure 3. Taxonomy of methods detecting samples outside the generalization area at inference time.
sample pairs (xi, li), i ∈ {1, . . . , N }given as
BS =1
NX
i∈{1,...,N}
X
j∈{1,...,m}
(F(xi)j1{li}(j))2.
It can be explained as a decomposition between calibration
and refinement, for further explanation we refer to [14].
The Expected Calibration Error [60] is popular and intu-
itive. The evaluation data is split into Sdisjoint buckets
Bs, s ∈ {1, . . . , S}and the average gap of within bucket
accuracy and within bucket predicted probability is mea-
sured
EC E =X
s∈{1,...,S}
|Bs|
n|acc(Bs)conf(Bs)|.
The accuracy and the confidence of a bucket Bswith la-
beled samples (xi, li)and the corresponding predicted class
yi, i Bsis given as
acc(Bs) = 1
|Bs|X
iBs
1{li}(yi)
conf(Bs) = 1
|Bs|X
iBs
F(x)yi.
The ECE is not a proper score, since e.g. returning the
same value for all F(x)i, i ∈ {1, . . . , m}yields perfect
calibrated, but non-usable predictions [64].
Next we explain the predictive uncertainty methods and
group them according to their core principle into metric
based and ensemble based approaches.
4.1.1 Metric Methods
Metric-based methods investigate if for the current input
data sample the classifier behaves similarly to the samples
inside the generalization area. The training set is often used
as the set of samples inside the generalization area. How
similar the behaviors for two samples are is established by
measuring the differences among them with the help of
a function and comparing its output against a threshold.
Typically the function used is a composition of some
transformation and a metric. If a significant dissimilarity
from the current sample to the samples from the inside of
the generalization area is observed, the sample is expected
to be outside the generalization envelope.
The only metric method in the literature field predictive
uncertainty was proposed by Guo et al. [30]. Their simple
method called temperature scaling ”softens” the output
F(x)of the DNN. This is achieved by dividing the values
of each class by a temperature T > 1before the softmax
layer. The exact value of T is statistically derived from
samples inside the generalization area. This procedure
leaves the class prediction unchanged, since the parameter
Tdoes not change the maximum, but the output values
are softened, such that the output is less overconfident
and hence more reliable. If the derived output value for
the predicted class is much smaller than the output value
for samples inside the generalization area, the sample is
expected to be outside the generalization area.
6
Figure 4. Visualization of the method from Blundell et al. [6].
4.1.2 Ensemble Methods
Ensemble methods usually apply the current input to several
networks and analyse their outputs. The outputs are either
combined to one output using an average procedure or the
difference of the different outputs is computed. The more
similar the components of the averaged output vector are, or
the more the outputs of the individual networks differ, the
more likely is the event that the investigated input sample is
outside the generalization area.
One well known Bayesian Neural Network method is
based on variational inference, in which an approximating
distribution is constructed for each weight [28], [6], [51],
[52], [81]. Hence, the weights of the network are no longer
fixed values but are represented by a probability distribu-
tion, compare Figure 4. One of the most popular Bayesian
Neural Networks ideas is that from Blundell et al., since he
was the first to introduce an algorithm for training Bayesian
Neural Network using backpropagation [6]. At inference
time several concrete weight values are sampled from
their corresponding distribution. The different samples are
used to build an ensemble of networks. The input is run
through each network and the outputs are combined using
a weighted average.
Gal et al. introduced a method that is called Monte Carlo
dropout or dropout at inference time [22]. Different from
the standard dropout procedure [75], in which dropout
is used to prevent overfitting by randomly deleting con-
nections, in Monte Carlo Dropout this masking is also
applied at test time. Hence, the prediction is no longer
deterministic. Depending on the links randomly chosen to
be kept, the networks output is different. For one input the
network is run several times using different masks. The
output results are averaged.
Lakshminarayanan et al. also use an ensemble based
method [42]. They randomly initialize several networks
and train them on the same dataset. Additionally they use
adversarial training, which means that they incorporate ad-
versarial images in the training dataset to keep the models
more robust against adversarial attacks. At inference time
each DNN is run for the current input image and the output
results are averaged.
Riquelment et al. proposed two methods [70]. One relies on
variational inference [6], the other on Monte-Carlo dropout
[22]. In order to enforce a lower computational overhead
they incorporated the corresponding procedure only in the
last layer of the network. Hence, the computation until the
last layer stays the same for each input. Only in the last
layer different weight settings need to be sampled and the
corresponding output computed.
4.2. Out-of-Distribution Detection
Out-of-distribution detection methods search for input
samples that stem from another distribution than the sam-
ples used to train the DNN. In the evaluation procedure
additional datasets are used, which either contain inputs of
different classes than the training classes or inputs with ad-
ditional random noise. Some set-ups also include out-of-
distribution images of classes the network is actually trained
for but the images occur in a different representation. An
example would be to feed images from the SVHN dataset
[61] which contains images of house numbers in a DNN
that is trained to classify images from the handwritten digit
dataset MNIST [44]. Such cases are referred to as novelty
detection in literature and, often treated as subproblem of
the out-of-distribution detection.
Typically classification data sets used for out-of-distribution
detection are CIFAR-10 and CIFAR-100 [39], SVHN [61],
ImageNet [15] and LSUN [83]. A DNN is trained on one
of the datasets, the other datasets are then used as out-of-
distribution data.
In the out-of-distribution detection set-up there are two
classes, the inlier class covering the generalization area and
the outlier class. A generalization detector returns for a
sample xand the corresponding behavior of the DNN a
score. The higher the score the more likely the sample x
is an outlier, the lower, the more likely xis an inlier. To get
a hard decision for xto be an inlier or an outlier, a decision
threshold Thas to be determined.
The ratio between False-Positive-Rate (FPR), and the True-
Positive-Rate (TPR) is directly linked to the value T. How-
ever, depending on the problem and the corresponding re-
quirements different ratios between the two rates can be
desired and thus a threshold-independent evaluation metric
is needed. For this purpose, the Area Under the Receiver
Operating Characteristic (AUROC/AUCROC) is typically
used. This is the are under the plot of the true positive rate
over the false positive rate. A 100% AUROC value corre-
sponds to a perfect detector, while a 50%AUROC to a de-
tector deciding at random.
7
Figure 5. Visualization of the method from DeVries et al., image
adapted from [17].
Should the positive and negative classes have different base
rates, the AUROC score can be misleading. In such a case
it can make sense to use the Area under the precision recall
curve (AUPR). It is defined as the area under the plot of the
precision over the recall. Here, in contrary to the AUROC it
makes a difference whether inliers or outliers are assigned
to be negative or positive [33].
Lastly, sometimes the False positive rate at e.g. 95 % true
positive rate is used. The threshold Tis chosen such that
the true positive rate is at 95% and the according value of
the false positive rate is used for comparison.
Usually, the test data is adapted to have a ratio of one be-
tween the numbers of samples in the positive and negative
class and the AUROC score is used for evaluation.
The corresponding methods from the literature field out-of-
distribution detection can be grouped into metric, inconsis-
tency, generative and ensemble approaches. In the follow-
ing, the methods are listed according to their group.
4.2.1 Metric Methods
As already mentioned, metric methods determine if the be-
havior of the current input is similar to the behavior of sam-
ples inside the generalization area of the DNN under inves-
tigation. Again, the training samples are used as samples
inside the generalization area and similarity is typically es-
tablished using a transformation and a metric. In this case
however, the transformation may be applied on the input
data, the output of one or several intermediate layers or the
gradient computed on the loss function of the current input-
output combination. A large distance computed by the met-
ric indicates a sample outside the generalization area.
DeVries et al. proposed a method in which an usual classifi-
cation DNN is adopted such that it outputs a detection value
cadditionally to the softmax scores [17], compare Figure 5.
To train the network they use a two-folded loss function ˜
L.
On the one hand it interpolates the softmax output ywith
the true class one-hot label land hence lowers the classifi-
cation loss if cis low, and on the other hand it is penalizing
a small confidence score c
˜
L((y, c), l) = L(y+ (1 c)·l, l) + log(c).
Via this training process the detecting transformation and
the metric based on the layerwise output of the DNN is
directly incorporated in the network structure which is
different to most other metric methods. At inference time
a small value for cindicates that the sample is likely to be
outside the generalization area.
Oberdiek et al. introduced a method that is based on
the layer-wise gradient of the weights regarding the loss
function of the predicted class [63]. From that they generate
features such as the layerwise norm, minimum and maxi-
mum values of the gradient. Those are fed together with
the entropy of the estimated class distribution to a logistic
regression approach that determines the score deciding if
the input is expected to be outside the generalization area.
Jiang et al. proposed a method that first defines for each
class a high density set consisting of images from the
training set of that according class [38]. At inference time
atrust score for the current input output combination is
computed by taking the ratio between the distance from the
test sample to the high density class set of the predicted
class and the distance to the second nearest high density
class set. The higher the trust score, the more the predicted
class is expected to be correct.
Another detector that is completely based on the activation
spaces of the DNN’s layers was introduced by Lee et al.
[45]. They compute the Mahalanobis distance to the closest
class-conditional Gaussian distribution. The layerwise
distances are then combined by a logistic regression
network stating if the input is likely to be outside the
generalization area. It is one of the only works yet that
evaluates the detector on both, adversarial examples and
out-of-distribution set-ups.
Hendrycks et al. adapts the training procedure of the
DNN [34]. They carefully construct a dataset contains
samples outside the generalization area that is different
from that used for testing. During training they use the
original training set with the original labels and additionally
samples from the constructed dataset for which they use
the uniform distribution over mclasses as labels. An input
leading to an output with an higher entropy than samples
from the training set within the generalization area are
expected to be outside the generalization area.
A similar method was proposed by Hein et. al [32]. They
also use an additionally constructed dataset with samples
outside the generalization are for the training. During train-
ing the loss for such samples is defined by the maximal out-
put value over all classes. During testing a comparison of
the maximal output value to that of common samples inside
the generalization area is used to detect samples outside the
generalization area.
8
Figure 6. Visualization of the Method from Hendrycks et al. [33].
4.2.2 Inconsistency Methods
The core idea of prediction-inconsistency methods is to
observe the reaction of the DNN to small changes in
the input image, when the input image lays inside and
respectively outside the generalization area. At inference
time the current input sample and a slightly transformed
image are processed by the DNN. The corresponding
outputs are compared. The higher the difference in the
outputs the more the investigated sample is expected to be
outside the generalization area.
Liang et al. proposed a method in which the input image is
perturbed by shifting it away from the original class [48].
This is done similarly as in the adversarial example method
FGSM [27]: gradient descent is used to increase the loss
regarding the predicted class and hence a slightly modified
image is generated. A sample is expected to be inside the
generalization area, if the output value of the modified im-
age for the original predicted class is high.
4.2.3 Generative Methods
Generative methods make use of an image pre-processing
procedure. The idea is somewhat similar to the prediction-
inconsistency approaches, but the pre-processing procedure
is more sophisticated since it is generated in order to shift
the image in the direction of the training distribution. The
shift is realized using a generative network or similar which
is trained on the training set. At inference time both the cur-
rent input image and the corresponding output of the gener-
ative network (i.e., the result of applying the pre-processing
procedure on the input image) are processed by the DNN.
The more the outputs differ, the more the input sample is
assumed to be outside the generalization area.
Hendrycks et al. proposed a method that requires changes
in the original DNN [33]. As shown in Figure 6 in purple,
they attach an encoder on top of the penultimate layer of
the original DNN, that reconstructs the input image. The
difference between the encoder output and the input image
is then fed together with the output of the penultimate and
last layer to a so called abnormality module shown in red
which is trained to output the generalization score for the
prediction.
Ren et al. introduced a method that is based on a likelihood
ratio statistic [69]. They train two generative models called
PixelCNNs [71] on training samples that additionally
returns how likely an input is side the generalization area.
One model is trained on the original training dataset and
the other one on slightly perturbed images from the training
dataset. They suppose that both models are able to capture
background but that the model trained on the original
training dataset is more sensitive regarding the actual
content part. Hence, if the likelihood ratio determined
between the output of the model trained on the original
images and the one on the perturbed images is low, the
input image is expected to be outside the generalization
area.
Serr`
a et al. show that images with high complexity tend to
produce lowest likelihoods for generative models [73]. This
leads to wrong predictions if the sample outside the gener-
alization area has a lower complexity than the actual data
inside the generalization area. They balance this effect in a
method, that takes both the predicted log-likelihood of gen-
erative models and the quantitative estimates of complexity
of an image into account to decide whether this image is
outside the generalization area. They use several different
generative models in order to evaluate their method.
4.2.4 Ensemble Methods
As described in Section 4.1.2, ensemble methods use
several slightly different networks, the more the outputs of
those networks differ at inference time, the more the image
is expected to be outside the generalization area.
Vyas et al. splits the training data in kpartitions, such
that each class is belonging to one partition [80]. Then
kindependent but structure wise identical networks are
trained, each uses one of the partitions as data outside
the generalization area and the rest as in-distribution data.
Additional to the usual loss term, they use a term that
pushes the entropy over the softmax output of the network
below a threshold for samples outside the generalization
area and above the threshold for in-distribution samples.
During testing the softmax outputs of the classifiers are
averaged and the maximum value and the entropy are
combined and used as detection score.
Another ensemble based method was proposed by Yu et
al., they are training two networks, both on the same in-
9
Figure 7. An original ”clean” image that is correctly predicted as
panda, the added perturbation computed by the fast gradient sign
method and the resulting adversarial example which is incorrectly
classified as gibbon, image adapted from [27].
and out-of-distribution examples [84]. For in-distribution
images they use an usual loss function, but for out-of-
distribution examples they use a loss that is thought to maxi-
mize the difference between the softmax outputs of the two
networks. During testing they expect the input to be out-
side the generalization area if the L1-Norm of the difference
of the outputs of the two networks is above a predefined
threshold.
4.3. Adversarial Example Detection
Adversarial examples are artificially generated data sam-
ples that are carefully constructed to fool a network into
deciding falsely. Usually, an adversarial example xadv is
based on an original data sample x. The difference δbe-
tween those samples is constructed to be as small as pos-
sible under the constraint that the predicted class c(F(x))
changes
xadv = argmin
xadv
||xxadv|| s.t. c(F(x)) 6=c(F(xadv)) .
The difference between the two samples is often not visible
to humans, as shown in Figure 7. This adversarial example,
for which the network predicts a gibbon instead of a panda
is constructed using the fast gradient sign method (FGSM)
by Goodfellow et al. [27], which is one of the first and
simplest adversarial attack methods. Instead of computing
the gradient of the loss function L(F(x), l)regarding the
weights θof a model, the gradient regarding the image x
is computed. Then the adversarial image xadv is generated
by moving the original image xin a simple one step pro-
cedure in the direction of the steepest ascent regarding the
loss L(F(x), l), without exceeding the L1 norm difference
to the original image.
xadv =x+·sign(xL(θ, x, y )) .
The main adversarial example methods used for evaluating
adversarial example detectors are FGSM [27], BIM [40],
JSMA [67] and CW [9]. Typical datasets used for evalua-
tion are MNIST [44], CIFAR [39] and SVHN [61]. In order
to prove the scalability to large datasets some works addi-
tional evaluate their methods on the ImageNet dataset [15].
In the adversarial sample set-up usually the DNN is trained
on the training data. At inference time the DNN receives
both, adversarial and clean images from the test set. The
task of the generalization detector is to decide whether the
pre-trained DNN has as input an adversarial example from
outside the generalization envelope and hence predicts the
wrong class, or if the current input data is clean and the net-
work predicts correctly.
The evaluation procedure for adversarial example detection
is similar to the evaluation of out-of-distribution detection
explained in Section 4.2, with the adversarial examples be-
ing considered outliers.
The methods for adversarial example detection can be
grouped into metric, inconsistency and generative. In the
following the methods are explained and listed according to
their core principle.
4.3.1 Metric Methods
As described in more detail in Section 4.2.1 the metric
methods use the information if the current input sample is
behaving similar to input samples inside the generalization
area that are taken from the training set. It is based on
a transformation and a metric applied on the input data,
on the output of one or several layers or on the gradient
computed on the loss function of the current input output
combination. A high difference in the behavior indicates
an adversarial sample laying outside the generalization area.
Grosse et al. introduced an approach that applied maximum
mean discrepancy on input data [29] from which a distance
to samples inside the generalization area is computed. This
was one of the first works for the detection of adversarial
examples. The main difference to the following works in
this category is that the procedure only depends on the
images, the information is independent from the behavior
of the DNN. It later showed that this method does not work
well for some adversarial attacks, hence following methods
in the metric based category took more information than
just the image space itself into account.
Metzen et al. proposed one of the first detectors that uses the
activation space of the DNN as decisive criterion [59]. On
top of each activation layer output they trained small sub-
networks that receive the activations of the original network
for each example as input. The subnetworks were trained
on data in- and outside the generalization area. A visualiza-
tion of this method can be seen in Figure 8. Based on that
information the subnetworks classify if the current example
is outside the generalization area. Metzen et al. found that
the subnetwork on one of the middle layers leads to the best
10
Figure 8. Subnetworks for the detection task trained on the output
of the activation space. Original image from Metzen et al. [59].
detection results.
Li and Li developed a detector that is using the output of
the convolutional layers in the DNN [46]. For each filter
output it collects statistical information from normalized
principal component analysis, minimal and maximal values
and several percentile values. Based on those statistics a
distance is computed to samples inside the generalization
area from the training set and a decision is made if the
current input behaves similar to previously investigated
image samples.
A two-folded detector was introduced by Feinman et al.
[20]. The main procedure is based on the Kernel Density
calculated on the activation space of each layer which is
then compared to samples of the training set. The second
procedure uses Bayesian Uncertainty determined from sev-
eral runs using dropout in the DNN. If one of the detectors
determines deviations from a normal behavior, the current
example is detected as being outside the generalization area.
A procedure based on local intrinsic dimensionality and
was developed by Ma et al. [55]. They characterize
subspaces outside the generalization area using local
intrinsic dimensionality on the layer outputs of the DNN
which is a weighted distance metric computed on the
k-nearest-neighbors of 100 randomly chosen examples
from the training set. The results of the different layers are
then combined by a logistic regression network.
A method also based on a k-nearest-neighbor approach was
proposed by Papernot et al. [66]. In each layer’s activation
space a input image of the k-nearest neighbor images of
the training set are determined. This method was actually
thought and is inserted as an example for several methods
that actual recover the actual class of the adversarial image
but could be easily updated: if the classes mismatch
to often with the predicted class of the current input im-
age, the image is likely to be outside the generalization area.
Zheng et al. proposed a method that fits for each class
on each activation space of the fully connected layers a
Figure 9. An illustration of the gradient based detector method of
Lust and Condurache [53].
Gaussian Mixture Model [86]. The Gaussian Mixture
Model is trained by an expectation-maximization algorithm
[56]. For each class a threshold is chosen to reject inputs
whose Gaussian Mixture Models values are below that
threshold.
Pang et al. introduced a method based on the entropy of
normalized non-maximal elements (non-ME) of the predic-
tions of the network outputs F(x)[65]
non-ME(x) = X
i6=y
F(x)i·log(F(x)i).
The training of the DNN is adapted with a loss function that
has an additional non-ME term, to keep the non-ME value
for samples outside the generalization area low. During
testing, an image xwith a high non-ME value is expected
to be outside the generalization area.
As mentioned in section 4.2, the method proposed by Lee
et al. evaluated on the detection of adversarial examples as
well as out-of-distribution data [45]. It is one of the only
works yet, that evaluates the detector on both set-ups.
Ma et al. introduced an approach that is on the one hand
based on a one-class support vector machine on the activa-
tion layers, and on the other hand they apply on top of each
layer an additional fully connected softmax layer which
directly returns a softmax score class vector [54]. These
vectors are compared to each other. The more they differ,
the more probable the current example is misclassified.
Furthermore, the output of the support vector machine is
taken into account. The combined information decides
whether the current input is expected to be outside the
generalization area.
A similar procedure to Papernot et al.’s method [66] was
proposed by Dubey et al. [18]. It is using only a few
feature representations constructed from the activations
from different intermediate layers of the input image.
Those feature representations are compared to pre-saved
representations of labeled images from a large database.
Nearest neighbor search is used, followed by a weighted
combination of the predictions of the nearest neighbors
leading to a final class prediction. The approach is original
thought to recover the original class, but it can be adapted
11
Figure 10. Visualization of the prediction inconsistency based ap-
proach of Xu et al. [82].
to a detector and the image is supposed to be outside the
generalization area if the class predictions mismatch.
Recently Lust and Condurache introduced a detector based
on the gradient of the weights regarding the loss function
that compares the predicted class to the softmax output of
the network [53]. They use the layerwise norm of the gra-
dient as features in the logistic regression detector. Further-
more, they add a smoothing step in order to remove perturb-
ing noise and hence, in case of a sample outside the gener-
alization area, increase the contradictions of the weights to
the predicted class, compare Figure 9.
4.3.2 Inconsistency Methods
As described in Section 4.2.2 prediction-inconsistency
methods use the difference in the DNN output when
processing an input sample and a slightly modified input,
between samples that lay inside and outside the generaliza-
tion area. A large difference indicate a sample outside the
generalization area.
One of the first methods in this category was proposed by
Liang et al. [47]. They aim to remove adaptive noise from
the current input image and hence the perturbation by the
use of scalar quantization and a spatial smoothing filter. The
current input image and the generated clear image are then
classified by the DNN. If the class prediction for the two is
different, the image is supposed to be outside the general-
ization area.
The most successful approach in the prediction inconsis-
tency based approaches is if from Xu et al. and called fea-
ture squeezing [82]. The original image, an additional im-
age with reduced color depth generated from the original
image and a smoothed image are run through the DNN. The
outputs of the generated images are compared to the output
of the original image. If one of those differences exceeds
a threshold, the image is detected as an image outside the
generalization area.
Figure 11. The detector method of Liao et al. using a high-level
representation guided denoiser [49].
4.3.3 Generative Methods
As described in Section 4.2.3 generative based methods
shift the input image in the direction of the training
distribution. A high difference in the outputs of the current
input image and the corresponding shifted image indicates
that the input image is outside the generalization area.
Meng et al. introduced MagNet, a two-pronged detection
method [58]. They use an auto-encoder that is trained to
reconstruct examples of the original dataset. Each input of
the DNN is additionally processed by the autoencoder and
a reconstructed image is achieved and as well run through
the DNN. If the current sample and its reconstruction differ
a lot or the difference between the DNN’s output of the
current image and the DNN’s output of the reconstructed
image is too high, the sample is supposed to be outside the
generalization area.
Song et al. proposed a method that is based on PixelCNN
[77], [71], a generative model with tractable likelihood
[74]. They train this PixelCNN on the original training
data. During testing each image is run through a procedure
in which each pixel of the image is modified such that
the log-likelihood of the PixelCNN is maximized. This
procedure is thought to remove the perturbation. If the class
prediction of the ”cleaned” image is different to the class
prediction of the original image, the image is supposed to
be outside the generalization area.
Defense-gan is a method proposed by Samangouei et al.
[72]. They train a Generative adversarial network (GAN)
[26] on the original dataset. Next they find an input zto
the generator part Gof the GAN via an iterative procedure,
such that the output image G(z)is as similar as possible
to the current input image xof the DNN. The image G(z)
is then also fed into the DNN. Again, a difference in the
predicted class leads the method to predict the input to be
outside the generalization area.
Liao et al. proposed a similar detector method to MagNet.
12
They use a high-level representation guided denoiser [49]
which is instead of the typical encoder-decoder structure
based on a UNet structure to generate the reconstructed im-
age. It is trained on clean and adversarial images such that
adversarial images have the same DNN top-level outputs
as the corresponding non adversarial samples. The image
is classified as being outside the generalization area if the
class output of the denoised and the original image differs.
The method is visualized in Figure 11.
5. Discussion
In the following we discuss and summarize advantages
and disadvantages of the introduced methods. Important
comparison criteria are listed in Table 2. In the first sec-
tion, we clarify the used criteria and discuss the findings in
the second section.
5.1. Comparison Criteria
Table 2 captures the methods described in Section 4
ordered by their literature field and core principle.
The third column, no link to inference engine, defines
whether a method can be directly applied to any DNN. If
yes, the method belongs to the class of post-hoc procedures,
which means that a pre-trained DNN can be taken and
no further adaption is needed, this is marked by 3. If
e.g. a special loss function or additional layers needs to
be incorporated in the classification DNN and hence the
training process needs to be adapted, the method is marked
with an 7.
The column labeled no outlier modeling states if training
samples from the outlier class, e.g. adversarial or out-
of-distribution samples are necessary to train the detector
method. Here, we distinguish between 4 categories.
3means the training of the method is completely outlier
independent. If the method is marked by (3) outliers are
necessary for the training of the method but the method is
(additionally) evaluated on a set-up in which the training
outliers were sampled from a different distribution than
the test outliers, the method was able to generalize well.
(7) means that they evaluate (additionally) on a set-up in
which the training outliers were sampled from a different
distribution than the test outliers but the method was not
able to generalize well. Methods that are evaluated only
on outlier set-ups they were in particularly trained for are
marked by 7.
In the additional parameter comparison we group the
methods into three subgroups. The first group consists
of detector methods that do not need many additional
parameters #in comparison to the classification DNN. If
the number of additional parameters is in the range of the
numbers of parameters needed for the DNN the methods
belong to the second group #G. Lastly, if the number of
parameters needed for the detection method is more than
twice the numbers of parameters needed for the DNN they
belong to the third group .
For most methods the computational overhead is not
given within the corresponding paper and we did not
do any experiments on that. Due to the lack of precise
information we were not able to include an extra column
for the computational efficiency in the table. However,
some rough estimations can be made. The most time
consuming methods are most likely the ensemble based
approaches, since here all the slightly different networks,
often two or more, have to process the same input. For the
other categories we expect the computational overhead to
be in the range of the parameter overhead, since usually
one parameter is related to one computation step. Thus,
the only methods needing significant fewer parameters and
computational overhead than the original network itself
are found within the metric based approaches category.
This information is relevant for applications in hardware
restricted areas as e.g. autonomous driving [23], [21].
The last column named publication date gives the month
and the year the method was introduced. For most papers
the performance of the introduced method is better than
the performances of earlier introduced methods. Some
works however concentrate on e.g. lowering the number of
parameters while maintaining a similar performance [53].
5.2. Findings
In Section 4 we introduced methods to detect at infer-
ence time if an input is within the generalization area of a
DNN. Currently, those methods can be found in three main
literature fields: predictive uncertainty, out-of-distribution
detection and adversarial example detection. Each litera-
ture field concentrates on a different reason for a sample to
not be within the generalization envelope.
Methods in the predictive uncertainty field try to deter-
mine the probability of in-distribution samples to be mis-
classified. For this purpose they typically focus on samples
that are close to a decision boundary and assign them a high
uncertainty. Approaches based on this idea, however, are
not capable of assigning a high uncertainty to samples that
are not close to any decision boundary but still not within
the generalization area.
Samples far away from the decision boundary and out-
side the generalization area are considered in the literature
field out-of-distribution detection. A sample is out-of-
distribution if it is different from the training samples in a
principled manner.
The third literature field is adversarial example detection.
Adversarial examples are constructed or selected such as
to fool the DNN on purpose. For the construction of an
adversarial image an in-distribution input image is slightly
changed so that the relevant features for the DNN’s decision
13
Table 2. Compact comparison of detection methods. For further details on the comparison criteria see section 5.1.
Method reference Core No link to No outlier Additional Publication
principle inference engine modeling parameters date
Literature Field: Predictive uncertainty methods
Guo et al. [30] Metric 3-#08.2017
Blundell et al. [6] Ensemble 7-#G 06.2015
Gal et al. [22] Ensemble 7-#06.2016
Lakshminarayanan et al. [42] Ensemble 7- 12.2017
Riquelme et al. [70] Ensemble 7-#05.2018
Literature Field: Out-of-distribution detection
DeVries et al. [17] Metric 7(3)#02.2018
Oberdiek et al. [63] Metric 3(3)#09.2018
Jiang et al. [38] Metric 3 7 12.2018
Lee et al. [45] Metric 3(3) 12.2018
Hendrycks et al. [34] Metric 7(3)#04.2019
Hein et al. [32] Metric 7(3)#06.2019
Liang et al. [48] Inconsistency 3 7 #04.2018
Hendrycks et al. [33] Generative 7(3)#G 10.2016
Ren et al. [69] Generative 7(3)#G 12.2019
Serr`
a et al. [73] Generative 3 3 #G 04.2020
Vyas et al. [80] Ensemble 7 3 09.2018
Yu et al. [84] Ensemble 7 7 #G 10.2019
Literature Field: Adversarial example detection
Grosse et al. [29] Metric 3 3 #G 02.2017
Metzen et al. [59] Metric 3(7) 04.2017
Li and Li [46] Metric 3 7 #G 10.2017
Feinman et al. [20] Metric 3 7 11.2017
Ma et al. [55] Metric 3 7 03.2018
Papernot et al. [66] Metric 3 7 03.2018
Zheng et al. [86] Metric 3 3 #G 05.2018
Pang et al. [65] Metric 7 7 #G 12.2018
Lee et al. [45] Metric 3(3) 12.2018
Ma et al. [54] Metric 3 3 02.2019
Dubey et al. [18] Metric 3 3 03.2019
Lust and Condurache [53] Metric 3 7 #04.2020
Liang et al. [47] Inconsistency 3(7)#04.2018
Xu et al. [82] Inconsistency 3(7)#02.2018
Meng et al. [58] Generative 3 3 #G 05.2017
Song et al. [74] Generative 3 3 #G 10.2017
Samangouei et al. [72] Generative 3 7 #G 05.2018
Liao et al. [49] Generative 3 7 #G 06.2018
14
Figure 12. The structural bases of the methods of the four core principles: ensemble, generative, metric and inconsistency. At inference
time they decide if the input sample xis inside (yes) or outside (no) the generalization area and hence if the predicted class based on the
output F(x) is expected to be correct.
point to the wrong class. The change in the image is hardly
perceivable by humans or at least of a nature not impair-
ing a correct decision. Adversarial examples are often nei-
ther close to a decision boundary in the features space, nor
far away from in-distribution examples. This sets this field
apart from predictive uncertainty and out-of-distribution.
These fields are currently considered separately, even
though the same core principles are used to handle the un-
derlying generalization deficiencies. These core principles
lead to specific approaches, that we have identified and
gathered into: ensemble methods, inconsistency methods,
generative methods and metric methods. The structural base
of each core principle is shown in Figure 12.
Ensemble methods use several DNNs each trained slightly
different. At inference time they are all applied on the in-
put. The more their outputs differ the higher the chance that
the decision for that input is wrong. This concept works
well for samples that are close to a decision boundary, since
for all the different networks the decision boundary varies
slightly. However, out-of-distribution and certain types of
adversarial samples are usually found far away from the de-
cision boundary in the feature space. For those adversar-
ial examples that are in the problem space or even close to
the decision boundary, ensemble methods are less suited, as
DNNs trained on the same data are vulnerable to the same
attacks — adversarial examples exhibit a closer relationship
to the training dataset than to the training procedure itself
[37]. Thus, most ensemble based methods are mainly found
in the predictive uncertainty category and none in the field
of adversarial example detection. The two methods in the
out-of-distribution category are slightly different to those
of the predictive-uncertainty since they use for training an
additional data set containing samples from outside the ex-
pected generalization area (i.e., the problem space). During
the training the DNNs are forced to produce strongly differ-
ent outputs when receiving inputs from outside the expected
generalization area.
Inconsistency methods use the idea, that for misclassified
inputs the classification output is more sensitive to small
changes in the input sample. A high difference between
the outputs of an input image and the slightly transformed
image indicates the input to be outside the generalization
area. These assumptions fit particularly the adversarial ex-
amples detection setup and less the out-of-distribution and
the predictive uncertainty setup. The inconsistency methods
in the adversarial example field tend to fail for adversarial
examples on more complex datasets since relatively simple
noise-reduction procedures currently used were not able to
remove the adversarial noise without distortion also impor-
tant parts of the input [82], [54]. The only inconsistency-
based method for out-of-distribution images adds additional
noise to the input such as to shift the sample away from the
predicted class. This procedure shows some success but is
outperformed by newer out-of-distribution detection meth-
ods using other core principles than inconsistency.
In the case of generative methods the images are shifted
during a pre-processing procedure towards the generaliza-
tion envelope, often assuming that this is found close to the
training data. A difference between the outputs of the DNN
computed from the shifted and the original input is then
used as an indicator for an input outside the generalization
area. In contrast to inconsistency methods here we do fo-
cus with the generator model on the setup that we know and
for which we have evidence in the form of correctly clas-
15
sified samples. Intuitively, in-distribution samples are left
untouched, while for the others, the generator model is sup-
posed to eliminate the features that are responsible for this,
which in turn is expected to lead to a shift in classification
output. The underlying assumptions of generative methods
are rather well suited for adversarial and out-of-distribution
samples, but do not fit that well the predictive uncertainty
setup.
The most promising methods are metric methods, which
can be found in all three categories. They also hold the only
method that is evaluated on an adversarial as well as on an
out-of-distribution set-up [45]. They typically compare the
DNN’s outputs or gradients of several layers to the corre-
sponding outputs of the training samples that have been in-
vestigated before using some metric. Unfortunately, they
usually need an additional set with samples outside the gen-
eralization area for their training process in order to cali-
brate the threshold used for deciding when a sample lays
outside the generalization area. However, as shown in some
works [32],[45], it is possible to find methods that are able
to train on one outlier class and to generalize well to a dif-
ferent one. Methods using this core principle often take a
closer look on how the input is processed in the DNN. They
take thus into consideration the possibility that the reasons
for the lack of generalization might be hidden deep in the
information flow within the DNN but are blurred in the ac-
tual output. Furthermore, the metric methods hold the most
efficient methods both what the number of parameters and
the computation time are considered, as described in Sec-
tion 5.1.
There are a lot of promising methods for each of the differ-
ent literature fields, but due to the lack of evaluation variety
for each method it is not possible to tell which method is
best when applied to the combined task of detecting sam-
ples outside the generalization area. Most detection meth-
ods just focus on one of the literature fields. Only metric
methods have been shown to achieve good results on more
than one detection task (out-of-distribution and adversarial
example detection).
6. Conclusion
There is a general interest for improving DNN perfor-
mance by analyzing the generalization behavior. However,
despite good performance in numerous problem setups,
grasping the generalization behavior of DNNs represents
an open research field. Its significance rises even more
with the advent of safety-critical DNN applications like for
example autonomous driving. In such cases understanding
the generalization behavior represents a cornerstone of a
coherent safety argumentation, which in turn is paramount
for the wide public acceptance of such solutions.
This paper presented a comprehensive survey covering the
methods thought to detect at inference time if an input is
within the generalization area of a DNN with a focus on
the task of image classification.
There can be different reasons for a sample not to be within
the generalization area. In general we may either have to do
with naturally occurring samples that lay outside the gen-
eralization area or in the case of adversarial examples, the
input samples are selected on purpose or even engineered
such as to lay outside the generalization area. However, the
underlaying cause of the error in classification, irrespective
of the setup, is that the DNN does not generalize correctly.
From this perspective, all setups are equivalent and hence
this paper reviews all corresponding contributions and sets
them into the common broader generalization context.
The generalization issues often stem from the fact that
the feature projection learned during training is such
that it contains bad modes that afflict separability. Thus,
significant and decision-influencing differences between
inputs do not correspond to large-enough distances in the
feature space. So we either have no separability in the
feature space or we have it but it is ignored during training.
An analysis of the generalization behavior of a DNN can
be used both during training and at inference time. During
training the results of the analysis are used to build the
optimal pair of feature extractor and decision surface given
the available data. This setup has been studied extensively
and it still lays in the focus of research.
At inference time the common practice for investigating
the generalization behavior, was to define confidence
measures on the base of layer-wise neuron activations
computed using the weights established during training.
This however, leads often to misleading results, as the
DNNs tend to be overconfident in their decision as mea-
sured by the relative significance allocated to the winning
class. Predictive uncertainty methods improve upon this
approach but usually focus mainly on the region of the
input/feature space close to the decision surface. Methods
of the ”Out-of-distribution” literature field analyse the
probability of the current input sample conditioned on the
training data and may thus also look at regions far away
from the decision surface. As adversarial examples may
occur everywhere, methods to detect them come closest
to a general approach for investigating the generalization
area. However, they assume intent and accordingly have a
tendency to ignore some blatant generalization issues, that
do not involve looking at samples that are close in the input
(image) space and far away in the feature space or the other
way around.
A major approach in investigating for a certain input if
the classification output can be trusted, irrespective of the
literature field is to analyse if the input is within the range
of the training data. In particular for metric methods, and
to a lesser extent for inconsistency methods, this approach
is complemented by one targeting the way information is
processed within the DNN.
Given the fact that until now the generalization questions
have been treated mostly in separated rather application-
related setups, it is difficult to tell beyond doubt which
method or class of methods holds the best promise for
deciding if a sample is within the generalization area of
16
a DNN. In general there are many promising approaches.
For example, metric methods are currently in the focus of
research in all literature fields when judging by the number
of publications. In particular the adversarial-example setup
constitutes a very active research field that we believe has
much to profit from becoming more aware of the other
related fields.
References
[1] Aderemi O Adewumi and Andronicus A Akinyelu. A sur-
vey of machine-learning and nature-inspired based credit
card fraud detection techniques. International Journal of
System Assurance Engineering and Management, 8(2):937–
953, 2017.
[2] Charu C Aggarwal. Outlier analysis. In Data mining, pages
237–263. Springer, 2015.
[3] Shikha Agrawal and Jitendra Agrawal. Survey on anomaly
detection using data mining techniques. Procedia Computer
Science, 60:708–713, 2015.
[4] Naveed Akhtar and Ajmal Mian. Threat of adversarial at-
tacks on deep learning in computer vision: A survey. IEEE
Access, 6:14410–14430, 2018.
[5] Christopher M Bishop. Pattern recognition and machine
learning. springer, 2006.
[6] Charles Blundell, Julien Cornebise, Koray Kavukcuoglu,
and Daan Wierstra. Weight uncertainty in neural networks.
arXiv preprint arXiv:1505.05424, 2015.
[7] Glenn W Brier. Verification of forecasts expressed in terms
of probability. Monthly weather review, 78(1):1–3, 1950.
[8] Saikiran Bulusu, Bhavya Kailkhura, Bo Li, Pramod K Varsh-
ney, and Dawn Song. Anomalous instance detection in deep
learning: A survey. arXiv preprint arXiv:2003.06979, 2020.
[9] N. Carlini and D. Wagner. Towards evaluating the robustness
of neural networks. In SP, pages 39–57, 2017.
[10] Anirban Chakraborty, Manaar Alam, Vishal Dey, Anu-
pam Chattopadhyay, and Debdeep Mukhopadhyay. Ad-
versarial attacks and defences: A survey. arXiv preprint
arXiv:1810.00069, 2018.
[11] Raghavendra Chalapathy and Sanjay Chawla. Deep learn-
ing for anomaly detection: A survey. arXiv preprint
arXiv:1901.03407, 2019.
[12] Varun Chandola, Arindam Banerjee, and Vipin Kumar.
Anomaly detection: A survey. ACM computing surveys
(CSUR), 41(3):1–58, 2009.
[13] Corinna Cortes and Vladimir Vapnik. Support-vector net-
works. Machine learning, 20(3):273–297, 1995.
[14] Morris H DeGroot and Stephen E Fienberg. The comparison
and evaluation of forecasters. Journal of the Royal Statistical
Society: Series D (The Statistician), 32(1-2):12–22, 1983.
[15] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei.
ImageNet: A Large-Scale Hierarchical Image Database. In
CVPR09, 2009.
[16] Li Deng and Dong Yu. Deep learning: methods and appli-
cations. Foundations and trends in signal processing, 7(3–
4):197–387, 2014.
[17] Terrance DeVries and Graham W Taylor. Learning confi-
dence for out-of-distribution detection in neural networks.
arXiv preprint arXiv:1802.04865, 2018.
[18] Abhimanyu Dubey, Laurens van der Maaten, Zeki Yalniz,
Yixuan Li, and Dhruv Mahajan. Defense against adversar-
ial images using web-scale nearest-neighbor search. CVPR,
2019.
[19] Ted Dunning and Ellen Friedman. Practical machine learn-
ing: a new look at anomaly detection. ” O’Reilly Media,
Inc.”, 2014.
[20] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner.
Detecting adversarial samples from artifacts. arXiv preprint
arXiv:1703.00410, 2017.
[21] Di Feng, Lars Rosenbaum, and Klaus Dietmayer. Towards
safe autonomous driving: Capture uncertainty in the deep
neural network for lidar 3d vehicle detection. In 2018 21st
International Conference on Intelligent Transportation Sys-
tems (ITSC), pages 3266–3273. IEEE, 2018.
[22] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian
approximation: Representing model uncertainty in deep
learning. In international conference on machine learning,
pages 1050–1059, 2016.
[23] L. Gauerhof, P. Munk, and S. Burton. Structuring validation
targets of a machine learning function applied to automated
driving. In SAFECOMP, pages 45–58. Springer, 2018.
[24] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE inter-
national conference on computer vision, pages 1440–1448,
2015.
[25] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep
learning. MIT press, 2016.
[26] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
Yoshua Bengio. Generative adversarial nets. In Z. Ghahra-
mani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q.
Weinberger, editors, Advances in Neural Information Pro-
cessing Systems 27, pages 2672–2680, 2014.
[27] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and
harnessing adversarial examples. ICLR, 2015.
[28] Alex Graves. Practical variational inference for neural net-
works. In Advances in neural information processing sys-
tems, pages 2348–2356, 2011.
[29] Kathrin Grosse, Praveen Manoharan, Nicolas Papernot,
Michael Backes, and Patrick McDaniel. On the (statis-
tical) detection of adversarial examples. arXiv preprint
arXiv:1702.06280, 2017.
[30] Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger.
On calibration of modern neural networks. In Proceedings
of the 34th International Conference on Machine Learning-
Volume 70, pages 1321–1330. JMLR. org, 2017.
[31] Simon Haykin. Neural networks and Learning Machines: A
Comprehensive Foundation. Number 3. Prentice-Hall, Inc.,
2008.
[32] Matthias Hein, Maksym Andriushchenko, and Julian Bitter-
wolf. Why relu networks yield high-confidence predictions
far away from the training data and how to mitigate the prob-
lem. In The IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR), June 2019.
17
[33] Dan Hendrycks and Kevin Gimpel. A baseline for detect-
ing misclassified and out-of-distribution examples in neural
networks. arXiv preprint arXiv:1610.02136, 2016.
[34] Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich.
Deep anomaly detection with outlier exposure. ICLR, 2019.
[35] Maximilian Henne, Adrian Schwaiger, Karsten Roscher, and
Gereon Weiss. Benchmarking uncertainty estimation meth-
ods for deep learning with safety-related metrics. In Pro-
ceedings of the Workshop on Artificial Intelligence Safety,
co-located with 34th AAAI Conference on Artificial Intelli-
gence, SafeAI@AAAI 2020, New York City, NY, USA, Febru-
ary 7, 2020, volume 2560 of CEUR Workshop Proceedings,
pages 83–90. CEUR-WS.org, 2020.
[36] Jos´
e Miguel Hern´
andez-Lobato and Ryan Adams. Prob-
abilistic backpropagation for scalable learning of bayesian
neural networks. In International Conference on Machine
Learning, pages 1861–1869, 2015.
[37] Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan
Engstrom, Brandon Tran, and Aleksander Madry. Adversar-
ial examples are not bugs, they are features. arXiv preprint
arXiv:1905.02175, 2019.
[38] Heinrich Jiang, Been Kim, Melody Guan, and Maya Gupta.
To trust or not to trust a classifier. In Advances in neural
information processing systems, pages 5541–5552, 2018.
[39] A. Krizhevsky, G. Hinton, et al. Learning multiple layers of
features from tiny images. Technical report, Citeseer, 2009.
[40] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial ex-
amples in the physical world. ICLR, 2017.
[41] Donghwoon Kwon, Hyunjoo Kim, Jinoh Kim, Sang C Suh,
Ikkyun Kim, and Kuinam J Kim. A survey of deep learning-
based network anomaly detection. Cluster Computing, pages
1–13, 2017.
[42] Balaji Lakshminarayanan, Alexander Pritzel, and Charles
Blundell. Simple and scalable predictive uncertainty esti-
mation using deep ensembles. In Advances in neural infor-
mation processing systems, pages 6402–6413, 2017.
[43] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep
learning. nature, 521(7553):436–444, 2015.
[44] Y. LeCun, B. E Boser, J. S. Denker, D. Henderson, R. E.
Howard, W. E. Hubbard, and L. D. Jackel. Handwritten
digit recognition with a back-propagation network. In NIPS,
pages 396–404, 1990.
[45] Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A
simple unified framework for detecting out-of-distribution
samples and adversarial attacks. In Advances in Neural In-
formation Processing Systems, pages 7167–7177, 2018.
[46] Xin Li and Fuxin Li. Adversarial examples detection in deep
networks with convolutional filter statistics. In Proceedings
of the IEEE International Conference on Computer Vision,
pages 5764–5772, 2017.
[47] Bin Liang, Hongcheng Li, Miaoqiang Su, Xirong Li, Wen-
chang Shi, and Xiaofeng Wang. Detecting adversarial image
examples in deep neural networks with adaptive noise reduc-
tion. IEEE Transactions on Dependable and Secure Comput-
ing, 2018.
[48] Shiyu Liang, Yixuan Li, and R Srikant. Enhancing the re-
liability of out-of-distribution image detection in neural net-
works. arXiv preprint arXiv:1706.02690, 2017.
[49] F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu. De-
fense against adversarial attacks using high-level representa-
tion guided denoiser. In CVPR, pages 1778–1787, 2018.
[50] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Ar-
naud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen
Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Gin-
neken, and Clara I S´
anchez. A survey on deep learning in
medical image analysis. Medical image analysis, 42:60–88,
2017.
[51] Christos Louizos and Max Welling. Structured and effi-
cient variational deep learning with matrix gaussian poste-
riors. In International Conference on Machine Learning,
pages 1708–1716, 2016.
[52] Christos Louizos and Max Welling. Multiplicative normal-
izing flows for variational bayesian neural networks. In Pro-
ceedings of the 34th International Conference on Machine
Learning-Volume 70, pages 2218–2227. JMLR. org, 2017.
[53] Julia Lust and Alexandru Paul Condurache. Gran: An ef-
ficient gradien-norm based detector for adversarial and mis-
classified examples. In European Symposium on Artificial
Neural Networks, Computational Intelligence and Machine
Learning. Bruges (Belgium), 2020.
[54] S. Ma, Y. Liu, G. Tao, W. Lee, and X. Zhang. NIC: detecting
adversarial samples with neural network invariant checking.
In NDSS, 2019.
[55] X. Ma, B. Li, Y. Wang, S. M. Erfani, S. N. R. Wijewick-
rema, G. Schoenebeck, D. Song, M. E. Houle, and J. Bailey.
Characterizing adversarial subspaces using local intrinsic di-
mensionality. In ICLR, 2018.
[56] Geoffrey J McLachlan and Thriyambakam Krishnan. The
EM algorithm and extensions, volume 382. John Wiley &
Sons, 2007.
[57] Kishan G Mehrotra, Chilukuri K Mohan, and HuaMing
Huang. Anomaly detection principles and algorithms.
Springer, 2017.
[58] D. Meng and H. Chen. Magnet: a two-pronged defense
against adversarial examples. In CCS, pages 135–147, 2017.
[59] Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and
Bastian Bischoff. On detecting adversarial perturbations. In
5th International Conference on Learning Representations,
ICLR 2017, Toulon, France, April 24-26, 2017, Conference
Track Proceedings, 2017.
[60] Mahdi Pakdaman Naeini, Gregory Cooper, and Milos
Hauskrecht. Obtaining well calibrated probabilities using
bayesian binning. In Twenty-Ninth AAAI Conference on Ar-
tificial Intelligence, 2015.
[61] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y.
Ng. Reading digits in natural images with unsupervised fea-
ture learning. 2011.
[62] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural
networks are easily fooled: High confidence predictions for
unrecognizable images. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition, pages
427–436, 2015.
[63] Philipp Oberdiek, Matthias Rottmann, and Hanno
Gottschalk. Classification uncertainty of deep neural
networks based on gradient information. In Luca Pancioni,
18
Friedhelm Schwenker, and Edmondo Trentin, editors,
Artificial Neural Networks in Pattern Recognition, 2018.
[64] Yaniv Ovadia, Jasper Snoek, Emily Fertig, Balaji Lakshmi-
narayanan, Sebastian Nowozin, D Sculley, Joshua Dillon, Jie
Ren, and Zachary Nado. Can you trust your model’s un-
certainty? evaluating predictive uncertainty under dataset
shift. In Advances in Neural Information Processing Sys-
tems, pages 13969–13980, 2019.
[65] Tianyu Pang, Chao Du, Yinpeng Dong, and Jun Zhu. To-
wards robust detection of adversarial examples. In Ad-
vances in Neural Information Processing Systems, pages
4579–4589, 2018.
[66] Nicolas Papernot and Patrick D. McDaniel. Deep k-nearest
neighbors: Towards confident, interpretable and robust deep
learning. ArXiv, 2018.
[67] N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson,
Z. Berkay Celik, and A. Swami. The limitations of deep
learning in adversarial settings. In EuroSP, pages 372–387,
2016.
[68] Marco AF Pimentel, David A Clifton, Lei Clifton, and Li-
onel Tarassenko. A review of novelty detection. Signal Pro-
cessing, 99:215–249, 2014.
[69] Jie Ren, Peter J Liu, Emily Fertig, Jasper Snoek, Ryan
Poplin, Mark Depristo, Joshua Dillon, and Balaji Lakshmi-
narayanan. Likelihood ratios for out-of-distribution detec-
tion. In Advances in Neural Information Processing Systems,
pages 14680–14691, 2019.
[70] Carlos Riquelme, George Tucker, and Jasper Snoek. Deep
bayesian bandits showdown: An empirical comparison of
bayesian deep networks for thompson sampling. arXiv
preprint arXiv:1802.09127, 2018.
[71] Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P
Kingma. Pixelcnn++: Improving the pixelcnn with dis-
cretized logistic mixture likelihood and other modifications.
ICLR 2017, 2017.
[72] Pouya Samangouei, Maya Kabkab, and Rama Chel-
lappa. Defense-gan: Protecting classifiers against adver-
sarial attacks using generative models. arXiv preprint
arXiv:1805.06605, 2018.
[73] Joan Serr`
a, David ´
Alvarez, Vicenc¸ G ´
omez, Olga Slizovskaia,
Jos´
e F. N´
u˜
nez, and Jordi Luque. Input complexity and
out-of-distribution detection with likelihood-based genera-
tive models. In International Conference on Learning Rep-
resentations, 2020.
[74] Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Er-
mon, and Nate Kushman. Pixeldefend: Leveraging genera-
tive models to understand and defend against adversarial ex-
amples. ICLR, 2018.
[75] Rupesh K Srivastava, Klaus Greff, and J¨
urgen Schmidhuber.
Training very deep networks. In Advances in neural infor-
mation processing systems, pages 2377–2385, 2015.
[76] Leslie G Valiant. A theory of the learnable. Communications
of the ACM, 27(11):1134–1142, 1984.
[77] A¨
aron van den Oord, Nal Kalchbrenner, and Koray
Kavukcuoglu. Pixel recurrent neural networks. CoRR,
abs/1601.06759, 2016.
[78] Vladimir Vapnik. The nature of statistical learning theory.
Springer science & business media, 2013.
[79] Vladimir N Vapnik. An overview of statistical learning the-
ory. IEEE transactions on neural networks, 10(5):988–999,
1999.
[80] Apoorv Vyas, Nataraj Jammalamadaka, Xia Zhu, Dipankar
Das, Bharat Kaul, and Theodore L. Willke. Out-of-
distribution detection using an ensemble of self supervised
leave-out classifiers. In Vittorio Ferrari, Martial Hebert, Cris-
tian Sminchisescu, and Yair Weiss, editors, Computer Vision
- ECCV 2018 - 15th European Conference, Munich, Ger-
many, September 8-14, 2018, Proceedings, Part VIII.
[81] Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, and
Roger Grosse. Flipout: Efficient pseudo-independent
weight perturbations on mini-batches. arXiv preprint
arXiv:1803.04386, 2018.
[82] W. Xu, D. Evans, and Y. Qi. Feature squeezing: Detect-
ing adversarial examples in deep neural networks. In NDSS,
2018.
[83] Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianx-
iong Xiao. Lsun: Construction of a large-scale image dataset
using deep learning with humans in the loop. arXiv preprint
arXiv:1506.03365, 2015.
[84] Qing Yu and Kiyoharu Aizawa. Unsupervised out-of-
distribution detection by maximum classifier discrepancy. In
Proceedings of the IEEE International Conference on Com-
puter Vision, pages 9518–9526, 2019.
[85] Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. Ad-
versarial examples: Attacks and defenses for deep learning.
IEEE transactions on neural networks and learning systems,
30(9):2805–2824, 2019.
[86] Zhihao Zheng and Pengyu Hong. Robust detection of ad-
versarial attacks by modeling the intrinsic properties of deep
neural networks. In Advances in Neural Information Pro-
cessing Systems, pages 7913–7922, 2018.
19
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Deep neural networks (DNNs) are vulnerable to adversarial examples and other data perturbations. Especially in safety critical applications of DNNs, it is therefore crucial to detect misclassified samples. The current state-of-the-art detection methods require either significantly more runtime or more parameters than the original network itself. This paper therefore proposes GraN, a time-and parameter-efficient method that is easily adaptable to any DNN. GraN is based on the layer-wise norm of the DNN's gradient regarding the loss of the current input-output combination, which can be computed via backpropagation. GraN achieves state-of-the-art performance on numerous problem setups .
Article
Full-text available
To assure that an autonomous car is driving safely on public roads, its deep learning-based object detector should not only predict correctly, but show its prediction confidence as well. In this work, we present practical methods to capture uncertainties in object detection for autonomous driving. We propose a probabilistic 3D vehicle detector for Lidar point clouds that can model both classification and spatial uncertainty. Experimental results show that our method captures reliable uncertainties related to the detection accuracy, vehicle distance and occlusion. The results also show that we can improve the detection performance by 1%-5% by modeling the uncertainty.
Article
Full-text available
Deep neural networks (DNNs) enable innovative applications of machine learning like image recognition, machine translation, or malware detection. However, deep learning is often criticized for its lack of robustness in adversarial settings (e.g., vulnerability to adversarial inputs) and general inability to rationalize its predictions. In this work, we exploit the structure of deep learning to enable new learning-based inference and decision strategies that achieve desirable properties such as robustness and interpretability. We take a first step in this direction and introduce the Deep k-Nearest Neighbors (DkNN). This hybrid classifier combines the k-nearest neighbors algorithm with representations of the data learned by each layer of the DNN: a test input is compared to its neighboring training points according to the distance that separates them in the representations. We show the labels of these neighboring points afford confidence estimates for inputs outside the model's training manifold, including on malicious inputs like adversarial examples--and therein provides protections against inputs that are outside the models understanding. This is because the nearest neighbors can be used to estimate the nonconformity of, i.e., the lack of support for, a prediction in the training data. The neighbors also constitute human-interpretable explanations of predictions. We evaluate the DkNN algorithm on several datasets, and show the confidence estimates accurately identify inputs outside the model, and that the explanations provided by nearest neighbors are intuitive and useful in understanding model failures.
Article
Full-text available
Deep Neural Networks (DNNs) have recently been shown to be vulnerable against adversarial examples, which are carefully crafted instances that can mislead DNNs to make errors during prediction. To better understand such attacks, a characterization is needed of the properties of regions (the so-called `adversarial subspaces') in which adversarial examples lie. In particular, effective measures are required to discriminate adversarial examples from normal examples in such regions. We tackle this challenge by characterizing the dimensional properties of adversarial regions, via the use of Local Intrinsic Dimensionality (LID). LID assesses the space-filling capability of the region surrounding a reference example, based on the distance distribution of the example to its neighbors. We first provide explanations about how adversarial perturbation can affect the LID characteristic of adversarial regions, and then show empirically that LID characteristics can facilitate the detection of adversarial examples generated using the state-of-the-art attacks. We show that when applied for adversarial detection, an LID-based method can outperform several state-of-the-art detection measures by large margins for five attack strategies across three benchmark datasets. Our analysis of the LID characteristic for adversarial regions not only motivates new directions of effective adversarial defense, but also opens up more challenges for developing new attacks to better understand the vulnerabilities of DNNs.