Abstract and Figures

Deep learning (henceforth DL) has become most powerful machine learning methodology. Under specific circumstances recognition rates even surpass those obtained by humans. Despite this, several works have shown that deep learning produces outputs that are very far from human responses when confronted with the same task. This the case of the so-called “adversarial examples” (henceforth AE). The fact that such implausible misclassifications exist points to a fundamental difference between machine and human learning. This paper focuses on the possible causes of this intriguing phenomenon. We first argue that the error in adversarial examples is caused by high bias, i.e. by regularization that has local negative effects. This idea is supported by our experiments in which the robustness to adversarial examples is measured with respect to the level of fitting to training samples. Higher fitting was associated to higher robustness to adversarial examples. This ties the phenomenon to the trade-off that exists in machine learning between fitting and generalization.
This content is subject to copyright. Terms and conditions apply.
1 3
International Journal of Machine Learning and Cybernetics (2020) 11:935–944
Robustness toadversarial examples can be improved withovertting
OscarDeniz1 · AnibalPedraza1· NoeliaVallez1· JesusSalido1· GloriaBueno1
Received: 2 September 2019 / Accepted: 30 January 2020 / Published online: 26 February 2020
© The Author(s) 2020
Deep learning (henceforth DL) has become most powerful machine learning methodology. Under specific circumstances
recognition rates even surpass those obtained by humans. Despite this, several works have shown that deep learning pro-
duces outputs that are very far from human responses when confronted with the same task. This the case of the so-called
“adversarial examples” (henceforth AE). The fact that such implausible misclassifications exist points to a fundamental dif-
ference between machine and human learning. This paper focuses on the possible causes of this intriguing phenomenon. We
first argue that the error in adversarial examples is caused by high bias, i.e. by regularization that has local negative effects.
This idea is supported by our experiments in which the robustness to adversarial examples is measured with respect to the
level of fitting to training samples. Higher fitting was associated to higher robustness to adversarial examples. This ties the
phenomenon to the trade-off that exists in machine learning between fitting and generalization.
Keywords Adversarial examples· Deep learning· Bioinspired learning
1 Introduction
While advances in deep learning [25] have been been
unprecedented, many researchers know that the capabili-
ties of this methodology are being at times overestimated
[38]. Some works have been published where performance
reported with DL surpass that obtained by humans on the
same task (see [14] and [34]). Despite this, some studies
have also shown that DL networks have a weird behavior
which is very different from human responses when con-
fronted with the same task [23, 32]. Perhaps the best exam-
ple to describe it is the case of the so-called “adversarial
examples” [32], see Fig.1. Adversarial examples are appar-
ently identical to the original example versions except for
a very small change in pixels of the image. Despite being
perceived by humans as completely equal to the originals,
DL techniques fail miserably at classifying them.
Thus, while apparently having superhuman capabilities,
DL also seems to have weaknesses that are not coherent
with human performance. Not only that, from the structure
of DL (essentially an interconnected network of neurons
with numerical weights), it is unclear what gives rise to that
behavior. The problem is not also in maliciously-selected
noise, since some transformations involving rescaling, trans-
lation, and rotation produce the same results [2]. Likewise,
physical changes in the objects (graffiti or stickers) have
been shown to produce the same effect [9]. While not strictly
adversarial examples, in video processing it often happens
that the object of interest is recognized in one frame but
not in the next one, even if there is no noticeable difference
in the frames. Such implausible ’stability’ issues are exem-
plified in real-life cases, like Uber’s self-driving vehicle in
which a pedestrian was killed in Arizona (USA). The pre-
liminary report released by the NTSB1 states: “As the vehi-
cle and pedestrian paths converged, the self-driving system
software classified the pedestrian as an unknown object, as
a vehicle, and then as a bicycle with varying expectations
of future travel path…”. Other real-life examples have been
shown in the contexts of optical flow-based action recogni-
tion [15], vision for robots [20] and even in other domains
such as machine learning-based malware detection [19]. If
the safety of a DL system depends on the classifier never
making obvious mistakes then the system must be consid-
ered intrinsically unsafe.
* Oscar Deniz
1 VISILAB, ETSI Industriales, Universidad de Castilla-La
Mancha, Avda. Camilo Jose Cela SN, 13071CiudadReal,
1 Preliminary Report HWY18MH010, National Transportation
Safety Board, available at: https ://www.ntsb.gov/inves tigat ions/Accid
entRe ports /Repor ts/HWY18 MH010 -preli m.pdf.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
936 International Journal of Machine Learning and Cybernetics (2020) 11:935–944
1 3
On the other hand, while the prevailing trend in the sci-
entific community is currently in proposing variant architec-
tures for DL, it has been demonstrated that for a given data-
set the same adversarial examples persist even after training
with different architectures [24].
The goal of this paper is not to introduce a novel method,
but to advance our knowledge about the phenomenon, its
root causes and implications. The contributions of this paper
are as follows. While other underlying reasons have been
proposed in the literature for the existence of adversarial
examples, in this paper we contend that the phenomenon of
adversarial examples is tied to the inescapable trade-off that
exists in machine learning between fitting and generaliza-
tion. This idea is supported by experiments carried out in
which the robustness to adversarial examples is measured
with respect to the degree of fitting to the training samples.
This paper is an extended version of conference paper [8].
The contributions in this paper are: new correct methodol-
ogy used, removal of the concept of ’cognitively adversar-
ial examples’ introduced in [8] and new set of experiments
with a deep network (experiments in [8] were carried out
with K-NN), plus a more detailed analysis of the results and
2 Previous work
The two major lines of research around adversarial examples
have been: (1) generating AEs and (2) defending against
AEs. This paper will not cover either, and the reader is
referred to recent surveys [1, 6, 40]. In parallel to those
two lines, however, a significant body of work has been
carried out to delve into the root causes of AEs and their
In early work, the high nonlinearity of deep neural net-
works was suspected as a possible reason explaining the
existence of adversarial examples [32]. On the other hand,
later in [13] it is argued that high-dimensional linearities
cause the adversarial pockets in the classification space. This
suggests that generalization (as implied by the less complex
linear discrimination boundaries) has a detrimental effect
that produces AEs. In the same line, in [10] it is stated:
“Unlike the initial belief that adversarial examples are
caused by the high non-linearity of neural networks, our
results suggest instead that this phenomenon is due to the
low flexibility of classifiers”.
In [32] the authors had suggested a preliminary explana-
tion for the phenomenon, arguing that low-probability adver-
sarial “pockets” are densely distributed in input space. In
later work [33] the authors probed the space of adversarial
images using noise of varying intensity and distribution.
They showed that adversarial images appear in large regions
in the pixel space instead.
In [27] the existing literature on the topic is reviewed,
showing that up to 8 different explanations have been given
for AEs. The prevailing trend, however, seems to focus on
the linear/non-linear and in general in the overfitting prob-
lems of the classifier. Under two interpretations (the bound-
ary tilting hypothesis [35] and in [11]) the authors argue that
the phenomenon of AEs is essentially due to overfitting and
can be alleviated through regularisation or smoothing of the
classification boundary.
Recent work has linked low test error with low robustness
to AEs. In [31] it is shown that a better performance in test
accuracy in general reduces robustness to AEs. In [12] the
authors perform experiments on a synthetic dataset and state
that low (test) error classification and AEs are intrinsically
linked. They argue that this does not imply that defending
against adversarial examples is impossible, only that success
in doing so would require improved model generalization.
Thus, they argue that the only way to defend against AEs is
to massively reduce that error. However, we note that this
would be in apparent contradiction with the main finding in
that paper (that AEs appear with low classification error).
Thus, while generalization may help reduce error in general,
without additional considerations it would not necessarily
remove AEs.
In [26] the authors point out that an implicit assumption
underlying most of the related work is that the same training
dataset that enables good standard accuracy also suffices to
train a robust model. The authors argue that the assump-
tion may be invalid and suggest that, for high-dimensional
Fig. 1 Adversarial example.
“Person” is the so-called target
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
937International Journal of Machine Learning and Cybernetics (2020) 11:935–944
1 3
problems, adversarial robustness can require a significantly
larger number of samples. Similar conclusions are drawn
in [30], where it is stated that adversarial vulnerability
increases with input dimension. Again, all of this would
point to overfitting as the primary cause.
On the other hand, despite the several methods that have
been proposed to increase robustness to AEs, the phenom-
enon appears to be difficult or impossible to avoid [3, 10,
12, 28, 29, 36].
In summary, despite significant research on the topic, the
cause of the phenomenon remains elusive. It is not clear
whether the phenomenon is due to overfitting or, on the con-
trary, to underfitting. Some researchers have also tied the
phenomenon to the (limited) amount of training samples that
are available or a large input dimension (or the relationship
between these two).
3 Datasets andmethods
Our reasoning is based on two simple facts:
1. Adversarial examples can be generated from training
samples (just as they can be generated from test sam-
2. The adversarial example can be arbitrarily close to the
original sample
During training, we always try to minimize both bias (by
reducing training set error) and variance (by applying some
form of regularization). It is well known that reducing vari-
ance increases bias and vice versa.
Thus, if we consider facts 1 and 2 in the limit of distance
towards 0 (with respect to the training examples), the situa-
tion is equivalent to a model that has been trained with high
bias (high training error). In other words, this would equate
to a model in which the source of error in the adversarial
examples is due to high bias. This situation is depicted in
Fig.2, where regularization near the known training sample
causes the adversarial example.
If the error can be attributed to bias, then reducing it
should reduce that error. In other words, this means that
reducing model bias (and therefore increasing model vari-
ance) should reduce error in adversarial examples. That is
exactly the hypothesis that we address below in the experi-
ments. The bias and variance errors are in general controlled
by the classifier’s trade-off between fitting and generaliza-
tion. Our aim is to test if such change in the fitting-gener-
alization trade-off point reflects in the robustness to AEs.
In the experiments below we used the MNIST [18],
CIFAR-10 [16] and ImageNet [17] datasets, arguably the
three most common datasets used in research on the nature
of adversarial examples (these three datasets were used in
46 of the 48 papers reviewed in [27]). MNIST is a dataset of
handwritten grayscale 28x28 images representing the dig-
its 0–9. Typically, 60,000 images are used for training and
10,000 for testing. The CIFAR-10 dataset consists of 60,000
32 colour images in 10 classes2, with 6000 images per
class (50,000 training images and 10,000 test images). The
CIFAR-10 dataset is in general considered more challenging
than MNIST. On the other hand, compared to MNIST and
CIFAR-10 datasets, ImageNet is much more challenging in
terms of images and classes (1000 classes) and it has been
shown in previous work that ImageNet images are easier to
attack but harder to defend than images from MNIST and
To validate our hypothesis and show that accuracy in the
AE set is linked to the fitting capability, we need a classifier
working under various points of the fitting-generalization
regime. In a first set of experiments, we used a K-Nearest
Neighbor Classifier, for K values equal and greater than 1,
to control the point in the trade-off between fitting and gen-
eralization. The K-NN classifier is a natural choice here. It is
widely known that large values of K are used to achieve bet-
ter generalization, while lower values (down to K = 1) may
produce overfitting. Given the dimensionalities involved,
an efficient KD-tree-based implementation was used for the
K-NN classifier.
In the second set of experiments we used a LENET-5
CNN for classification. The architecture of this network is
shown in Fig.3. Note that we did not use any pre-trained
models in any of the experiments with deep networks.
Attack methods are iterative optimization algorithms
based on trained deep networks. They essentially optimize
a norm or distance to the original sample and a change in
label from that of the original sample to that of the target
class. In our experiments, the AEs were obtained using
four methods: the Fast Gradient Sign Method (a so-called
white-box targeted attack, introduced in [13]), DeepFool
Fig. 2 Adversarial example caused by regularization in the vicinity of
a training sample
2 Airplane, automobile, bird, cat, deer, dog, frog, horse, ship and
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
938 International Journal of Machine Learning and Cybernetics (2020) 11:935–944
1 3
[22] targeting all classes, Carlini-Wagner [5] and the
recent HopSkipJump method [7]. For FGSM we fixed the
attack step size (input variation) to
. For DeepFool
the maximum number of iterations was set at 100. For all
methods we used the aforementioned LENET-5 network
architecture to generate the adversarial examples (also
in the first set of experiments with the K-NN classifier).
Figs.4 and 5 show some examples of the AEs generated.
4 Experimental results
Experiments were performed with two different classifiers:
(1) K-NN classifier and (2) Convolutional Neural Network.
Our objective in the experiments is to bring those two clas-
sifiers to overfitting and show the accuracy trends in three
sets of samples: (a) the test set, (b) an adversarial set and
(c) a so-called fail subset.
In the following we describe the results obtained in each
4.1 K‑NN classier
For the K-NN classifier, Fig.6 shows how the three afore-
mentioned subsets are obtained. To get to the adversarial
Fig. 3 Network architecture used in the experiments (for MNIST)
Fig. 4 Sample AEs generated with FGSM for the MNIST dataset
Fig. 5 Sample AEs generated with DeepFool for the CIFAR-10 data-
set. Best viewed in color
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
939International Journal of Machine Learning and Cybernetics (2020) 11:935–944
1 3
subset we first obtained the test samples that were correctly
classified by the K-NN, for a
(for a fixed
Z odd). The attack method (FGSM, DeepFool, Carlini-
Wagner and HopSkipJump) was then used to generate a set
of AEs from those. Note that at this point AE generation
was done with the CNN network (since all attack methods
are based on trained CNN networks and backpropagation),
and therefore this set has to be filtered to discard samples
that were correctly classified by the Z-NN. Thus the Z-NN
accuracy in this final AE set is 0%.
Then we measured the K-NN classifier accuracy on this
AE set, for values of K smaller than Z, down to K = 1.
Again, our hypothesis is that accuracy in this AE set should
increase as K gets smaller. As can be seen in Fig.7, the
classifier is, for both datasets, overfitting as K gets smaller.
In order to discard the possibility of this being a general
trend with lower values of K, we also obtained the accuracy
for the whole test set and for the subset of test samples in
which the classifier gave a wrong decision for K = Z. We
call the latter fail subset, see Fig.6. Note that, by definition,
the fail subset and the adversarial set both give 0% accuracy
for K = Z. The fail subset is a sort of worst-case set in which
accuracy is also expected to grow as K gets smaller (since it
starts with an accuracy of 0% for K = Z).
We repeated the experiment a number of times, each run
performing a stratified shuffling of the dataset between the
training and test sets (always leaving 60,000 samples for
training and 10,000 samples for test for MNIST, and 50,000
samples for training with 10,000 samples for test in CIFAR-
10). The results are shown in Fig.8.
The results show that accuracy in the adversarial set has
the highest increase rate as K gets smaller. The accuracy
values obtained in the whole test set are always very stable
(and very close 100% in the case of MNIST) which makes
it difficult to establish a trend in that case. For the other
two sets, in order to check if there was a statistically sig-
nificant difference of trends we applied hypothesis testing
in the following way. Let
be the accuracy obtained using
. We calculated the slope between accuracies for suc-
cessive values of K in the following way:
, for
Fig. 6 How the three sets of samples used in the experiments with K-NN are obtained
Fig. 7 K-NN train and test accuracies for MNIST (left) and CIFAR10 (right) datasets
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
940 International Journal of Machine Learning and Cybernetics (2020) 11:935–944
1 3
. Then we used the slopes as the random vari-
ables to perform a paired Welch’s t test3. In Fig9 we show
the p values of the test.
The values in Fig.9 show that the trends in the two sets
are statistically different. Note also that for
the value is
not meaningful since
is 0 in both cases so the slope is
actually infinite.
4.2 CNN classier
In the experiments carried out with the CNN classifier,
the number of epochs is the parameter that will control the
degree of overfitting (with more epochs increasing overfit-
ting). Thus, instead of using K as in the previous experi-
ment we will use E, which here is the number of epochs, in
this case varying from 1 to 35. The adversarial examples in
this case are obtained from test samples that are correctly
classified (by the CNN classifier) for
, where M is
the number of epochs for which the test accuracy was the
highest. Thus, for
the accuracy in this set of adver-
sarial examples is zero. Likewise, we also consider the
subset of test samples that are not correctly classified when
Fig. 8 Accuracy values for the datasets used. Left: Accuracy values for the MNIST dataset, using
. Right: Accuracy values for the CIFAR-
10 dataset, using
. Best viewed in color
Fig. 9 p values obtained. Left: p values (represented in logarith-
mic scale) obtained by the paired Welch’s t test between results in
the Adversarial set and those in the misclassified test samples, for
MNIST. The dashed horizontal lines represent the 95% and 99% con-
fidence thresholds. Right: idem for CIFAR-10
3 https ://en.wikip edia.org/wiki/Welch %27s_t-test.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
941International Journal of Machine Learning and Cybernetics (2020) 11:935–944
1 3
. This is what we have been calling the fail subset
(the accuracy for this subset for
is also zero). This
part of the experimental workflow is shown in Fig.10.
This set of experiments was only carried out with the
CIFAR-10 dataset since MNIST with CNN provided test
accuracies near 100%, which did not allow to obtain mean-
ingful results. To obtain the results from
we used model checkpointing. We repeated the experi-
ment a number of times, each run performing a stratified
shuffling of the dataset between the training and test sets.
The Adam optimizer was used with batch size of 32 and
learning rate of 0.003.
The results are shown in Fig.11.
The results show that the accuracy in the training set
is always improving. The accuracy in the test set ini-
tially increases to a maximum value (approximately at
epochs) and then slowly decreases. This region
shows overfitting, which is the regime of interest in our
case. As for the other sets, from the figure it is difficult
to compare trends. To measure them, for each run of the
experiment we obtained the point (i.e. number of epochs)
that provided the maximum test set accuracy, again let
this be
. Then we calculated the delta accuracy as:
ACC (E=35)−ACC(E=M)
. The boxplot of deltas thus
obtained is shown in Fig.12.
Welch’s t test between test set and adv FGSM and
between test set and adv DeepFool sets gave p values of
respectively, so the difference in
trends is statistically significant.
Note also that the adv FGSM and adv DeepFool sets gave
trends that were very different from each other. This should
not come as a surprise, since the methods are different and
they produce different sets of AEs. To further analyze this,
we obtained the L2 norm between each original test sample
and the corresponding AE generated by either method, see
Fig.13. The lower norms of DeepFool’s AEs are coherent
with the lower trend observed in Fig.12 for adv DeepFool vs
adv FGSM. Again, the induced overfitting improves results
for samples that lie closer to the originals.
Note that in Figs.11 and8 show that the the perfor-
mance change trends are very similar in the fail subset
and the adversarial subset. However, it is the magnitude
of the increase what is statistically different. The increase
in the adversarial subset is statistically higher than in the
fail subset. On the other hand, note again that the two sub-
sets represent qualitatively different data. The adversarial
Fig. 10 How the three sets of
samples used in the experiments
with CNN are obtained. Note
that the two CNN boxes are the
same model
Fig. 11 Results with the CNN for the CIFAR-10 dataset Fig. 12 Boxplots of the deltas for the sets considered
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
942 International Journal of Machine Learning and Cybernetics (2020) 11:935–944
1 3
subset contains samples that have been failed and we know
that there is a close sample which has been correctly clas-
sified. The fail subset contains samples that have been
failed and they do not have a close sample which has been
correctly classified (therefore not qualifying as AEs). In
summary, one subset represents AEs while the other only
represent non-AE fails, and the accuracy results show
trends of statistically different magnitudes, which we find
it supports our hypothesis.
We also conducted experiments with the ImageNet
dataset. The results, see Fig.14, show the same general
trend as with the other datasets. Figure15 shows the the
boxplot of deltas.
5 Discussion
Our hypothesis that AEs are intrinsic to the bias-variance
dilemma has been supported by experiments in which a
classifier moving towards the variance extremum showed
increased robustness to the AEs. This increase was, with
statistical significance, higher than in the test set, mean-
ing that the increased robustness to AEs was not asso-
ciated with a higher accuracy in general. Overall this is
essentially the expected behavior for the well-know bias-
variance dilemma: good generalization and robustness to
AEs are not achieved simultaneously.
We note that our work is coherent with some defen-
sive methods such as feature squeezing [37], whereby the
input is transformed to make similar samples coalesce
into a single point in the feature space. One example of
such transformation is bit depth reduction. In this respect,
simple binarization on the inputs has been shown to add
robustness against AEs. In fact, other squeezers have been
proposed, such as image denoising [21] and learnable bit
depth reduction [4]. In our context, such transformations
are in fact helping the standard classifier decide for sam-
ples that lie near the training samples.
In the light of the results, we postulate that the exist-
ence of AEs do not reflect a problem of either overfit-
ting or lack of expressive power, as suggested by previous
work. Rather, AEs exist in practice because our models
lack both aspects simultaneously. Rather than being an
impossibility statement, this actually calls for methods that
have more flexibility to reflect both aspects. While practi-
cally all machine learning methods already incorporate
some form of trade-off between generalization and fitting,
we hypothesize that such trade-offs may be fundamentally
different from any such trade-off used by human perceptual
Fig. 13 L2 norms between original test sample and corresponding
AE generated by the two attack methods
Fig. 14 Results with the CNN for the ImageNet dataset. In this case
we used an InceptionV3 architecture trained from scratch with a
learning rate of 0.05, 10 randomly selected classes, 13,000 training
images and 500 test images. Averages of 3 runs
Fig. 15 Boxplots of the deltas for the sets considered
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
943International Journal of Machine Learning and Cybernetics (2020) 11:935–944
1 3
learning (since the latter presumably allows for both good
generalization and robustness to AEs simultaneously).
Based on our reasoning at the beginning of Sect.3, it can
be argued that the “overfitting” argument will not hold for
deeper networks, as for deeper networks the training loss can
be made close to 0, see for example [39]. However, we have
to emphasize that in this paper we are not claiming that AEs
exist just because of a high training error. Our reasoning is
that the presence of AEs is akin to a situation of high train-
ing error. In this respect, note that in fact this same reasoning
can be applied to test samples, meaning that the presence of
AEs is also akin to a situation of high test error. The only
logical conclusion is the one already put forward, i.e. that
AEs exist because our algorithms suffer from high training
error and/or high test error. In other words, the only way to
remove AEs is to have an algorithm with both low training
error and low test error. For this to happen, the algorithm
must be such that it has both overfitting (in the sense of
good fitting to training samples) and good generalization *at
the same time*. This contrasts with extant machine learning
which implicitly assumes an unavoidable trade-off between
fitting and generalization. Our work points to the need for
methods with enough expressibility to accommodate both
aspects simultaneously. In machine learning the focus on
generalization aims at answering the question ‘how can we
generalize to unseen samples?’. In the light of our results the
question would be more a ‘how can we generalize equally
well while keeping good fitting at the same time?’.
Our analysis suggests that the existence of AEs is a
manifestation of the implicit trade-off between fitting and
generalization. While the emphasis in machine learning is
typically focused on improving generalization, here we argue
that the generalization-fitting trade-off is also important. Ide-
ally, while the classifier must have generalization power, it
should be also flexible enough to accommodate the good
effects of overfitting.
6 Conclusions
Despite the biological plausibility of deep neural networks,
adversarial examples are an incontrovertible demonstration
that a certain fundamental difference between human and
machine learning exists. In this paper we have considered
the possible causes of this intriguing phenomenon. While
many methods have been proposed to make classifiers more
robust to AEs, apparently the phenomenon essentially per-
sists and cannot be definitely avoided.
Our results support the notion that the phenomenon is
rooted in the inescapable trade-off that exists in machine
learning between fitting and generalization. This is sup-
ported by experiments carried out in which the robustness
to adversarial examples is measured with respect to the
degree of fitting to the training samples, showing an
inverse relation between generalization and robustness to
adversarial examples. As far as the authors know, this is
the first time that such reason is proposed as the under-
lying cause for AEs. This hypothesis should in any case
receive additional support through future work.
While the bias-variance dilemma is posited as the root
cause, that should not be considered an impossibility state-
ment. Rather, this would actually call for methods that
have more flexibility to reflect both aspects. Current trade-
offs between bias and variance or equivalently between
fitting and generalization would seem to be themselves
biased towards generalization.
Acknowledgements This work was partially funded by projects
TIN201782113C22R by the Spanish Ministry of Economy and Busi-
ness, SBPLY/17/180501/000543 by the Autonomous Government of
Castilla-La Mancha and the ERDF and by the European Union’s Hori-
zon 2020 Research and Innovation Programme under grant agreement
No 732204 (BONSEYES) and the Swiss State Secretariat for Educa-
tion, Research and Innovation (SERI) under Contract number 16.0159.
AP was supported by postgraduate Grant FPU17/04758 from the Span-
ish Ministry of Science, Innovation, and Universities.
Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adapta-
tion, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons licence, and indicate if changes
were made. The images or other third party material in this article are
included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in
the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a
copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.
1. Akhtar N, Mian AS (2018) Threat of adversarial attacks on
deep learning in computer vision: a survey. IEEE Access
2. Athalye A, Engstrom L, Ilyas A, Kwok K (2017) Synthesizing
robust adversarial examples. CoRR. arXiv :1707.07397
3. Bortolussi L, Sanguinetti G (2018) Intrinsic geometric vulner-
ability of high-dimensional artificial intelligence. CoRR. arXiv
4. Buckman J, Roy A, Raffel C, Goodfellow I (2018) Thermometer
encoding: one hot way to resist adversarial examples. https ://
openr eview .net/pdf?id=S18Su --CW
5. Carlini N, Wagner D (2017) Towards evaluating the robustness
of neural networks. In: 2017 IEEE symposium on security and
privacy (SP), pp 39–57. https ://doi.org/10.1109/SP.2017.49
6. Chakraborty A, Alam M, Dey V, Chattopadhyay A, Mukho-
padhyay D (2018) Adversarial attacks and defences: a survey.
CoRR arXiv :1810.00069
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
944 International Journal of Machine Learning and Cybernetics (2020) 11:935–944
1 3
7. Chen, J., Jordan, M.I., Wainwright, M.J., (2019) HopSkipJump-
Attack: a query-efficient decision-based adversarial attack.
arXiv preprint arXiv :1904.02144
8. Deniz O, Vallez N, Bueno G (2019) Adversarial examples are
a manifestation of the fitting-generalization trade-off. In: Int.
work-conference on artificial neural networks (IWANN)
9. Evtimov I, Eykholt K, Fernandes E, Kohno T, Li B, Prakash A,
Rahmati A, Song D (2017) Robust physical-world attacks on
machine learning models. CoRR. arXiv :1707.08945
10. Fawzi A, Fawzi O, Frossard P (2015) Fundamental limits on
adversarial robustness. Proceedings of ICML, workshop on
deep learning. http://infos cienc e.epfl.ch/recor d/21492 3
11. Fawzi A, Moosavi-Dezfooli S, Frossard P (2016) Robustness
of classifiers: from adversarial to random noise. CoRR. arXiv
12. Gilmer J, Metz L, Faghri F, Schoenholz SS, Raghu M, Watten-
berg M, Goodfellow IJ (2018) Adversarial spheres. CoRR. arXiv
13. Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and har-
nessing adversarial examples. arXiv preprint arXiv :1412.6572
14. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification.
CoRR. arXiv :1502.01852 . http://dblp.uni-trier .de/db/journ als/
corr/corr1 502.html#HeZR0 15
15. Inkawhich N, Inkawhich M, Chen Y, Li H (2018) Adversarial
attacks for optical flow-based action recognition classifiers.
CoRR. arXiv :1811.11875
16. Krizhevsky A, Nair V, Hinton G CIFAR-10 (Canadian Institute for
Advanced Research). http://www.cs.toron to.edu/~kriz/cifar .html
17. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classifica-
tion with deep convolutional neural networks. In: Proceedings of
the 25th international conference on neural information process-
ing systems—volume 1, NIPS’12, pp 1097–1105. Curran Associ-
ates Inc., USA. http://dl.acm.org/citat ion.cfm?id=29991 34.29992
18. LeCun Y, Cortes C (2010) MNIST handwritten digit database.
http://yann.lecun .com/exdb/mnist /
19. Liu X, Zhang J, Lin Y, Li H (2019) Atmpa: Attacking machine
learning-based malware visualization detection methods via
adversarial examples. In: Proceedings of the international sym-
posium on quality of service, IWQoS ’19, pp. 38:1–38:10. ACM,
New York, NY, USA. https ://doi.org/10.1145/33262 85.33290 73
20. Melis M, Demontis A, Biggio B, Brown G, Fumera G, Roli F
(2017) Is deep learning safe for robot vision? adversarial examples
against the icub humanoid. CoRR. arXiv :1708.06939
21. Meng D, Chen H (2017) Magnet: a two-pronged defense against
adversarial examples. CoRR. arXiv :1705.09064
22. Moosavi-Dezfooli S, Fawzi A, Frossard P (2015) Deepfool: a
simple and accurate method to fool deep neural networks. CoRR.
arXiv :1511.04599
23. Nguyen AM, Yosinski J, Clune J (2015) Deep neural networks
are easily fooled: high confidence predictions for unrecognizable
images. In: CVPR, pp 427–436. IEEE Computer Society. http://
dblp.uni-trier .de/db/conf/cvpr/cvpr2 015.html#Nguye nYC15
24. Papernot N, McDaniel P, Goodfellow I (2016) Transferability in
machine learning: from phenomena to black-box attacks using
adversarial samples. arXiv preprint arXiv :1605.07277
25. Schmidhuber J (2015) Deep learning in neural networks: an over-
view. Neural Networks 61:85–117. https ://doi.org/10.1016/j.neune
t.2014.09.003. http://www.scien cedir ect.com/scien ce/artic le/pii/
S0893 60801 40021 35
26. Schmidt L, Santurkar S, Tsipras D, Talwar K, Madry A (2018)
Adversarially robust generalization requires more data. CoRR.
arXiv :1804.11285
27. Serban AC, Poll E (2018) Adversarial examples: a complete char-
acterisation of the phenomenon. CoRR. arXiv :1810.01185
28. Shafahi A, Huang WR, Studer C, Feizi S, Goldstein T (2018) Are
adversarial examples inevitable? CoRR. arXiv :1809.02104
29. Shamir A, Safran I, Ronen E, Dunkelman O (2019) A simple
explanation for the existence of adversarial examples with small
hamming distance. CoRR. arXiv :1901.10861
30. Simon-Gabriel CJ, Ollivier Y, Schölkopf B, Bottou L, Lopez-Paz
D (2018) Adversarial vulnerability of neural networks increases
with input dimension. CoRR. arXiv :1802.01421
31. Su D, Zhang H, Chen H, Yi J, Chen P, Gao Y (2018) Is robustness
the cost of accuracy?—a comprehensive study on the robustness
of 18 deep image classification models. CoRR. arXiv :1808.01688
32. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfel-
low IJ, Fergus R (2013) Intriguing properties of neural networks.
CoRR. arXiv :1312.6199. http://dblp.uni-trier .de/db/journ als/corr/
corr1 312.html#Szege dyZSB EGF13
33. Tabacof P, Valle E (2016) Exploring the space of adversarial
images. In: 2016 international joint conference on neural networks
(IJCNN), pp 426–433
34. Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: clos-
ing the gap to human-level performance in face verification. In:
Conference on computer vision and pattern recognition (CVPR)
35. Tanay T, Griffin LD (2016) A boundary tilting persepective on the
phenomenon of adversarial examples. CoRR. arXiv :1608.07690
36. Tsipras D, Santurkar S, Engstrom L, Turner A, Madry A (2019)
Robustness may be at odds with accuracy. In: International con-
ference on learning representations. https ://openr eview .net/forum
?id=SyxAb 30cY7
37. Xu W, Evans D, Qi Y (2017) Feature squeezing: Detecting
adversarial examples in deep neural networks. CoRR. arXiv
38. Yuille AL, Liu C (2018) Deep nets: What have they ever done for
vision? CoRR. arXiv :1805.04025
39. Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2017) Under-
standing deep learning requires rethinking generalization. arXiv
40. Zhang J, Li C (2019) Adversarial examples: opportunities and
challenges. IEEE Trans Neural Netw Learn Syst. https ://doi.
org/10.1109/TNNLS .2019.29335 24
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
... Overfitting was an essential trouble which appeared illogically from outside; it occurred when the model proved its data accurately. [1] [2]. ...
... On the contrary, underfitting led to great training and a collection of test errors, as shown in figure 2 below. [2] [3]. 2) Dimensionality Reduction. ...
... The six groups in Table 3 are formulated by including at least one of the two features extracted from PPG and one of the four EEG features. The dimension of the features should not be too high, or it will cause overfitting [46]. ...
Full-text available
The increasing development in the field of biosensing technologies makes it feasible to monitor students’ physiological signals in natural learning scenarios. With the rise of mobile learning, educators are attaching greater importance to the learning immersion experience of students, especially with the global background of COVID-19. However, traditional methods, such as questionnaires and scales, to evaluate the learning immersion experience are greatly influenced by individuals’ subjective factors. Herein, our research aims to explore the relationship and mechanism between human physiological recordings and learning immersion experiences to eliminate subjectivity as much as possible. We collected electroencephalogram and photoplethysmographic signals, as well as self-reports on the immersive experience of thirty-seven college students during virtual reality and online learning to form the fundamental feature set. Then, we proposed an evaluation model based on a support vector machine and got a precision accuracy of 89.72%. Our research results provide evidence supporting the possibility of predicting students’ learning immersion experience by their EEGs and PPGs.
... These strategies diminish overfitting, yet they can likewise prompt faster model optimization and better performance. Among these techniques [7], we can distinguish empirical methods (dropout, dropconnect, stochastic pooling) and explicit strategies (weight degradation, network size adjustment) [8]. Still, these means must be subject to controls and rules that we will try to discuss. ...
Full-text available
Deep learning is based on a network of artificial neurons inspired by the human brain. This network is made up of tens or even hundreds of "layers" of neurons. The fields of application of deep learning are indeed multiple; Agriculture is one of those fields in which deep learning is used in various agricultural problems (disease detection, pest detection, and weed identification). A major problem with deep learning is how to create a model that works well, not only on the learning set but also on the validation set. Many approaches used in neural networks are explicitly designed to reduce overfit, possibly at the expense of increasing validation accuracy and training accuracy. In this paper, a basic technique (dropout) is proposed to minimize overfit, we integrated it into a convolutional neural network model to classify weed species and see how it impacts performance, a complementary solution (exponential linear units) are proposed to optimize the obtained results. The results showed that these proposed solutions are practical and highly accurate, enabling us to adopt them in deep learning models.
... We see that image puzzling creates a lot of perturbation and prevents the model to reach overfitting. On the other hand, in [62] they showed that robustness to adversarial examples could be improved by only forcing the model to overfit. With considering this in mind, we think, one reason for performance degrading for jigsaw puzzling could be the distance to overfitting. ...
Full-text available
Over the past few years, best SSL methods, gradually moved from the pre-text task learning to the Contrastive learning. But contrastive methods have some drawbacks which could not be solved completely, such as performing poor on fine-grained visual tasks compare to supervised learning methods. In this study, at first, the impact of ImageNet pre-training on fine-grained Facial Expression Recognition (FER) was tested. It could be seen from the results that training from scratch is better than ImageNet fine-tuning at stronger augmentation levels. After that, a framework was proposed for standard Supervised Learning (SL), called Hybrid Multi-Task Learning (HMTL) which merged Self-Supervised as auxiliary task to the SL training setting. Leveraging Self-Supervised Learning (SSL) can gain additional information from input data than labels which can help the main fine-grained SL task. It is been investigated how this method could be used for FER by designing two customized version of common pre-text techniques, Jigsaw puzzling and in-painting. The state-of-the-art was reached on AffectNet via two types of HMTL, without utilizing pre-training on additional datasets. Moreover, we showed the difference between SS pre-training and HMTL to demonstrate superiority of proposed method. Furthermore, the impact of proposed method was shown on two other fine-grained facial tasks, Head Poses estimation and Gender Recognition, which concluded to reduce in error rate by 11% and 1% respectively.
Convolutional Neural Network is one of the famous members of the deep learning family of neural network architectures, which is used for many purposes, including image classification. In spite of the wide adoption, such networks are known to be highly tuned to the training data (samples representing a particular problem), and they are poorly reusable to address new problems. One way to change this would be, in addition to trainable weights, to apply trainable parameters of the mathematical functions, which simulate various neural computations within such networks. In this way, we may distinguish between the narrowly focused task-specific parameters (weights) and more generic capability-specific parameters. In this paper, we suggest a couple of flexible mathematical functions (Generalized Lehmer Mean and Generalized Power Mean) with trainable parameters to replace some fixed operations (such as ordinary arithmetic mean or simple weighted aggregation), which are traditionally used within various components of a convolutional neural network architecture. We named the overall architecture with such an update as a hyper-flexible convolutional neural network. We provide mathematical justification of various components of such architecture and experimentally show that it performs better than the traditional one, including better robustness regarding the adversarial perturbations of testing data.
The concentration of persistent organic pollutants (POPs) makes remarkable difference to environmental fate. In the field of passive sampling, the partition coefficients between polystyrene-divinylbenzene resin (XAD) and air (i.e., KXAD-A) are indispensable to obtain POPs concentration, and the KXAD-A is generally thought to be governed by temperature and molecular structure of POPs. However, experimental determination of KXAD-A is unrealistic for countless and novel chemicals. Herein, the Abraham solute descriptors of poly parameter linear free energy relationship (pp-LFER) and temperature were utilized to develop models, namely pp-LFER-T, for predicting KXAD-A values. Two linear (MLR and LASSO) and four nonlinear (ANN, SVM, kNN and RF) machine learning algorithms were employed to develop models based on a data set of 307 sample points. For the aforementioned six models, R²adj and Q²ext were both beyond 0.90, indicating distinguished goodness-of-fit and robust generalization ability. By comparing the established models, the best model was observed as the RF model with R²adj = 0.991, Q²ext = 0.935, RMSEtra = 0.271 and RMSEext = 0.868. The mechanism interpretation revealed that the temperature, size of molecules and dipole-type interactions were the predominant factors affecting KXAD-A values. Concurrently, the developed models with the broad applicability domain provide available tools to fill the experimental data gap for untested chemicals. In addition, the developed models were helpful to preliminarily evaluate the environmental ecological risk and understand the adsorption behavior of POPs between XAD membrane and air.
Medical image reconstruction (MIR) is the elementary way of producing an internal 3D view of the patient. MIR is inherently ill-posed, and various approaches have been proposed to address to resolve the ill-posedness. Recent inverse problem aims to create a mathematically consistent framework for merging data-driven models, particularly based on machine learning and deep learning, with domain-specific information contained in physical–analytical models. This study aims to discuss some of the significant contributions of data-driven techniques to solve the inverse problems in MIR. This paper provides a detailed survey of MIR which includes the traditional reconstruction algorithm, machine learning and deep learning-based approaches such as GAN, autoencoder, RNN, U-net, etc., to solve inverse problems, evaluation metrics, and openly available codes used in the literature. This paper also summarises the contribution of the most recent state-of-the-art methods used in MIR. The potentially attractive strategic paths for future study and fundamental problems in MIR are also discussed.
Polydimethylsiloxane-air partition coefficient (KPDMS-air) is a key parameter for passive sampling to measure POPs concentrations. In this study, 13 QSPR models were developed to predict KPDMS-air, with two descriptor selection methods (MLR and RF) and seven algorithms (MLR, LASSO, ANN, SVM, kNN, RF and GBDT). All models were based on a data set of 244 POPs from 13 different categories. The diverse model evaluation parameters calculated from training and test set were used for internal and external verification. Notably, the Radj2, QBOOT2 and Qext2 are 0.995, 0.980 and 0.951 respectively for GBDT model, showing remarkable superiority in fitting, robustness and predictability compared with other models. The discovery that molecular size, branches and types of the bonds were the main internal factors affecting the partition process was revealed by mechanism explanation. Different from the existing QSPR models based on single category compounds, the models developed herein considered multiple classes compounds, so that its application domain was more comprehensive. Therefore, the obtained models can fill the data gap of missing experimental KPDMS-air values for compounds in the application range, and help researchers better understand the distribution behavior of POPs from the perspective of molecular structure.
Deep neural networks have been shown vulnerable to adversarial attacks launched by adversarial examples. These examples’ transferability makes an attack in the real-world feasible, which poses a security threat to deep learning. Considering the limited representation capacity of a single deep model, the transferability of an adversarial example generated by a single attack model would cause the failure of attacking other different models. In this paper, we propose a new adversarial attack method, named EnsembleFool, which flexibly integrates multiple models to enhance adversarial examples’ transferability. Specifically, the model confidence concerning an input example reveals the risk of a successful attack. In an iterative attacking case, the result of a previous attack could guide us to enforce a new attack that possesses a higher probability of success. Regarding this, we design a series of integration strategies to improve the adversarial examples in each iteration. Extensive experiments on ImageNet indicate that the proposed method has superior attack performance and transferability than state-of-the-art methods.
Autonom fahrende Autos gelten als eine der größten kommenden Entwicklungen für die Mobilität und den Verkehr der Zukunft. Eine flächendeckende Einführung und Nutzung erfordern ein hohes Maß an Zuverlässigkeit der selbstfahrenden Fahrzeuge. Insbesondere die visuelle Sensorik und Bilderkennung autonomer Fahrzeuge sind von entscheidender Bedeutung, um eine sichere Verkehrsführung zu gewährleisten.
Full-text available
This is an opinion paper about the strengths and weaknesses of Deep Nets for vision. They are at the heart of the enormous recent progress in artificial intelligence and are of growing importance in cognitive science and neuroscience. They have had many successes but also have several limitations and there is limited understanding of their inner workings. At present Deep Nets perform very well on specific visual tasks with benchmark datasets but they are much less general purpose, flexible, and adaptive than the human visual system. We argue that Deep Nets in their current form are unlikely to be able to overcome the fundamental problem of computer vision, namely how to deal with the combinatorial explosion, caused by the enormous complexity of natural images, and obtain the rich understanding of visual scenes that the human visual achieves. We argue that this combinatorial explosion takes us into a regime where “big data is not enough” and where we need to rethink our methods for benchmarking performance and evaluating vision algorithms. We stress that, as vision algorithms are increasingly used in real world applications, that performance evaluation is not merely an academic exercise but has important consequences in the real world. It is impractical to review the entire Deep Net literature so we restrict ourselves to a limited range of topics and references which are intended as entry points into the literature. The views expressed in this paper are our own and do not necessarily represent those of anybody else in the computer vision community.
Conference Paper
Full-text available
Since the threat of malicious software (malware) has become increasingly serious, automatic malware detection techniques have received increasing attention, where machine learning (ML)-based visualization detection methods become more and more popular. In this paper, we demonstrate that the state-of-the-art ML-based visualization detection methods are vulnerable to Adversarial Example (AE) attacks. We develop a novel Adversarial Texture Malware Perturbation Attack (ATMPA) method based on the gradient descent and L-norm optimization method, where attackers can introduce some tiny perturbations on the transformed dataset such that ML-based malware detection methods will completely fail. The experimental results on the MS BIG malware dataset show that a small interference can reduce the accuracy rate down to 0% for several ML-based detection methods, and the rate of transferability is 74.1% on average.
Full-text available
State of the art computer vision models have been shown to be vulnerable to small adversarial perturbations of the input. In other words, most images in the data distribution are both correctly classified by the model and are very close to a visually similar misclassified image. Despite substantial research interest, the cause of the phenomenon is still poorly understood and remains unsolved. We hypothesize that this counter intuitive behavior is a naturally occurring result of the high dimensional geometry of the data manifold. As a first step towards exploring this hypothesis, we study a simple synthetic dataset of classifying between two concentric high dimensional spheres. For this dataset we show a fundamental tradeoff between the amount of test error and the average distance to nearest error. In particular, we prove that any model which misclassifies a small constant fraction of a sphere will be vulnerable to adversarial perturbations of size $O(1/\sqrt{d})$. Surprisingly, when we train several different architectures on this dataset, all of their error sets naturally approach this theoretical bound. As a result of the theory, the vulnerability of neural networks to small adversarial perturbations is a logical consequence of the amount of test error observed. We hope that our theoretical analysis of this very simple case will point the way forward to explore how the geometry of complex real-world data sets leads to adversarial examples.
Full-text available
Deep learning is at the heart of the current rise of machine learning and artificial intelligence. In the field of Computer Vision, it has become the workhorse for applications ranging from self-driving cars to surveillance and security. Whereas deep neural networks have demonstrated phenomenal success (often beyond human capabilities) in solving complex problems, recent studies show that they are vulnerable to adversarial attacks in the form of subtle perturbations to inputs that lead a model to predict incorrect outputs. For images, such perturbations are often too small to be perceptible, yet they completely fool the deep learning models. Adversarial attacks pose a serious threat to the success of deep learning in practice. This fact has lead to a large influx of contributions in this direction. This article presents the first comprehensive survey on adversarial attacks on deep learning in Computer Vision. We review the works that design adversarial attacks, analyze the existence of such attacks and propose defenses against them. To emphasize that adversarial attacks are possible in practical conditions, we separately review the contributions that evaluate adversarial attacks in the real-world scenarios. Finally, we draw on the literature to provide a broader outlook of the research direction.
2018 Curran Associates Inc..All rights reserved. Machine learning models are often susceptible to adversarial perturbations of their inputs. Even small perturbations can cause state-of-the-art classifiers with high “standard” accuracy to produce an incorrect prediction with high confidence. To better understand this phenomenon, we study adversarially robust learning from the viewpoint of generalization. We show that already in a simple natural data model, the sample complexity of robust learning can be significantly larger than that of “standard” learning. This gap is information theoretic and holds irrespective of the training algorithm or the model family. We complement our theoretical results with experiments on popular image classification datasets and show that a similar gap exists here as well. We postulate that the difficulty of training robust classifiers stems, at least partially, from this inherently larger sample complexity.
We show that there exists an inherent tension between the goal of adversarial robustness and that of standard generalization. Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy. We demonstrate that this trade-off between the standard accuracy of a model and its robustness to adversarial perturbations provably exists even in a fairly simple and natural setting. These findings also corroborate a similar phenomenon observed in practice. Further, we argue that this phenomenon is a consequence of robust classifiers learning fundamentally different feature representations than standard classifiers. These differences, in particular, seem to result in unexpected benefits: the features learned by robust models tend to align better with salient data characteristics and human perception.
Deep neural networks (DNNs) have shown huge superiority over humans in image recognition, speech processing, autonomous vehicles, and medical diagnosis. However, recent studies indicate that DNNs are vulnerable to adversarial examples (AEs), which are designed by attackers to fool deep learning models. Different from real examples, AEs can mislead the model to predict incorrect outputs while hardly be distinguished by human eyes, therefore threaten security-critical deep-learning applications. In recent years, the generation and defense of AEs have become a research hotspot in the field of artificial intelligence (AI) security. This article reviews the latest research progress of AEs. First, we introduce the concept, cause, characteristics , and evaluation metrics of AEs, then give a survey on the state-of-the-art AE generation methods with the discussion of advantages and disadvantages. After that, we review the existing defenses and discuss their limitations. Finally, future research opportunities and challenges on AEs are prospected. Index Terms-Adversarial examples (AEs), artificial intelligence (AI), deep neural networks (DNNs).
In recent scientific literature, some studies have been published where recognition rates obtained with Deep Learning (DL) surpass those obtained by humans on the same task. In contrast to this, other studies have shown that DL networks have a somewhat strange behavior which is very different from human responses when confronted with the same task. The case of the so-called “adversarial examples” is perhaps the best example in this regard. Despite the biological plausibility of neural networks, the fact that they can produce such implausible misclassifications still points to a fundamental difference between human and machine learning. This paper delves into the possible causes of this intriguing phenomenon. We first contend that, if adversarial examples are pointing to an implausibility it is because our perception of them relies on our capability to recognise the classes of the images. For this reason we focus on what we call cognitively adversarial examples, which are those obtained from samples that the classifier can in fact recognise correctly. Additionally, in this paper we argue that the phenomenon of adversarial examples is rooted in the inescapable trade-off that exists in machine learning (including DL) between fitting and generalization. This hypothesis is supported by experiments carried out in which the robustness to adversarial examples is measured with respect to the degree of fitting to the training samples.
Over the past four years, neural networks have proven vulnerable to adversarial images: targeted but imperceptible image perturbations lead to drastically different predictions. We show that adversarial vulnerability increases with the gradients of the training objective when seen as a function of the inputs. For most current network architectures, we prove that the $\ell_1$-norm of these gradients grows as the square root of the input-size. These nets therefore become increasingly vulnerable with growing image size. Over the course of our analysis we rediscover and generalize double-backpropagation, a technique that penalizes large gradients in the loss surface to reduce adversarial vulnerability and increase generalization performance. We show that this regularization-scheme is equivalent at first order to training with adversarial noise. Finally, we demonstrate that replacing strided by average-pooling layers decreases adversarial vulnerability. Our proofs rely on the network's weight-distribution at initialization, but extensive experiments confirm their conclusions after training.