Conference PaperPDF Available

On the Robustness to Adversarial Examples of Neural ODE Image Classifiers

Authors:

Abstract and Figures

The vulnerability of deep neural networks to adversarial attacks currently represents one of the most challenging open problems in the deep learning field. The NeurIPS 2018 work that obtained the best paper award proposed a new paradigm for defining deep neural networks with continuous internal activations. In this kind of networks, dubbed Neural ODE Networks, a continuous hidden state can be defined via parametric ordinary differential equations, and its dynamics can be adjusted to build representations for a given task, such as image classification. In this paper, we analyze the robustness of image classifiers implemented as ODE Nets to adversarial attacks and compare it to standard deep models. We show that Neural ODE are natively more robust to adversarial attacks with respect to state-of-the-art residual networks, and some of their intrinsic properties, such as adaptive computation cost, open new directions to further increase the robustness of deep-learned models. Moreover, thanks to the continuity of the hidden state, we are able to follow the perturbation injected by manipulated inputs and pinpoint the part of the internal dynamics that is most responsible for the misclassification.
Content may be subject to copyright.
On the Robustness to Adversarial Examples of
Neural ODE Image Classifiers
Fabio Carrara
ISTI CNR, Pisa, Italy
fabio.carrara@isti.cnr.it
Roberto Caldelli
CNIT, Florence, Italy
roberto.caldelli@unifi.it
Fabrizio Falchi
ISTI CNR, Pisa, Italy
fabrizio.falchi@isti.cnr.it
Giuseppe Amato
ISTI CNR, Pisa, Italy
giuseppe.amato@isti.cnr.it
Abstract—The vulnerability of deep neural networks to adver-
sarial attacks currently represents one of the most challenging
open problems in the deep learning field. The NeurIPS 2018
work that obtained the best paper award proposed a new
paradigm for defining deep neural networks with continuous
internal activations. In this kind of networks, dubbed Neural
ODE Networks, a continuous hidden state can be defined via
parametric ordinary differential equations, and its dynamics can
be adjusted to build representations for a given task, such as
image classification. In this paper, we analyze the robustness
of image classifiers implemented as ODE Nets to adversarial
attacks and compare it to standard deep models. We show that
Neural ODE are natively more robust to adversarial attacks with
respect to state-of-the-art residual networks, and some of their
intrinsic properties, such as adaptive computation cost, open
new directions to further increase the robustness of deep-learned
models. Moreover, thanks to the continuity of the hidden state,
we are able to follow the perturbation injected by manipulated
inputs and pinpoint the part of the internal dynamics that is
most responsible for the misclassification.
I. INTRODUCTION
In a broad set of perceptual tasks, state-of-the-art deep-
learned models are usually comprised of a set of discrete trans-
formations, known as layers, whose parameters are optimized
in an end-to-end fashion via gradient-based methods. Thanks
to the hierarchical architecture, the modelling stage is divided
among a series of transformations each extracting features
from its input and providing higher-level abstractions down-
stream. When trained with enough data, their ability to model
complex mappings is remarkable and produced new state-of-
the-art results in multiple fields, such as automatic guided vehi-
cles, speech recognition, text classification, anomaly detection,
decision making and even human-like face/people generation
in images and videos.
However, the success of deep-learnt models is accompanied
by a major downside, that is the vulnerability to adversarial
examples — maliciously manipulated inputs that appear legit
but fool the model to produce a wrong output [1], [2], [3]. Due
to the complexity of data and models (often required to learn
complex input-output mapping,) adversarial examples can be
usually found for most of neural-network-based models in
efficient ways [4], [5], [6], and despite studies in this field [7],
Preprint submitted to WIFS‘2019.
Final version to appear on IEEE.
[8], a universal approach to produce adversarial-robust deep
models is still missing.
A complementary way pursued by the research community
to solve the adversarial problem is the investigation of the
effects of adversarial examples on trained models [9], [10].
The characterization of these effects often gives enough insight
to detect and distinguish adversarial examples from authentic
inputs and thus led to several proposals in adversarial detec-
tion [11], [12], [13], [14], [15].
In this paper, our goal is to get insights into a novel recently
proposed deep-learning architecture based on ordinary differ-
ential equations (ODEs) — the Neural ODE Network [16] —
when undergone adversarial attacks. Differently from standard
neural networks, ODE-defined networks are not comprised
of a set of discrete transformations, but instead define their
processing pipeline as a continuous transformation of the
input towards the output whose derivative is parameterized
and trainable with standard gradient-descent methods. The
continuous transformation is computed by an adaptive ODE
solver which gives the model additional useful properties,
such as reversibility, O(1)-memory cost, sample-wise adaptive
computation, and a trade-off between computational speed and
accuracy tunable at inference time. This novel formulation
(and all the benefits that derive from it) makes ODE nets an
interesting and promising new tool in the deep learning field:
thanks to their generality, they can potentially replace current
models (e.g. residual networks in image processing) and
enable new models in novel applications such as continuous-
time modelling.
However, Neural ODE networks are still differentiable mod-
els, and thus, the same adversarial attacks used in standard
neural networks are implementable and applicable. This allows
us to apply the same efficient adversarial samples crafting
algorithm used in standard neural networks and study the
robustness of ODE nets to adversarial attacks in the context
of image classification.
The main contributions of the present work are the following:
we analyzed the behavior of ODE Nets against adversarial
examples (that we believe it has not been tested yet)
in the context of image classification; we compared
three architectures — a standard residual network and
two ODE-based models — in terms of robustness to
adversarial attacks on the MNIST and CIFAR-10 datasets;
we studied how the robustness to adversarial attacks
varies with specific properties of ODE networks, such
as the precision-cost trade-off offered by adaptive ODE
solvers via a tunable tolerance parameter; our findings
showed that lowering the solver precision leads to a
substantial decrease in the attack success rate while
degrading only marginally the classification performance;
thanks to the peculiarity of ODE Nets (i.e. their conti-
nuity), we observed and measured the effect of attacks
on the evolution of the internal state of the network
as the adversarial perturbation propagates through the
continuous input-output mapping.
After this introductory section, the paper is organized as
follows: Section II presents some related work, Section III
briefly describes the working of ODE-Nets, while Section IV
describes the whole operation context in terms of tested
networks and attacks. Section V provides details on the
experimental set-up1, Section VI discusses achieved results,
and Section VII draws main conclusions proposing, at the
same time, some possible future directions.
II. RE LATE D WOR K
Adversarial examples represent one of the major challenges
applicability and technological transfer of deep learning-based
techniques. Thus, after the seminal work of Szegedy et
al. [2] exploring adversarial examples for convolutional neural
networks, the research community focused on this problem
publishing several analyses [4], [9], [17] and organizing public
challenges [18] about this phenomenon.
Several works provided efficient crafting algorithms for
adversarial examples including the Fast Gradient Sign Method
(FGSM) [4], Projected Gradient Descend (PGD) [19], and the
one proposed by Carlini and Wagner [6]. Kurakin et al. [7] and
Athalye et al. [20] showed that adversarial attacks to computer
vision systems are possible and effective also in the physical
world using respectively 2D or 3D objects with malicious
textures.
Defensive actions against adversarial attacks have also
been proposed. In this regard, the literature can be roughly
divided in robustness improvement and adversarial detec-
tion techniques. The former aims at changing the model to
be more robust and include techniques such as adversarial
training [21] and model distillation [8]. The latter aims at
detecting adversarial examples and thus discarding malicious
predictions. Detection methods include statistical tests on the
inputs [10], [11], adversarial perturbation removal [22], and
auxiliary detection models [12], [13], [23].
Due to its peculiarity, ODE nets may differ from standard
deep neural networks in their interaction with adversarial
examples. To the best of our knowledge, we are the first
analyzing the robustness of ODE nets to adversarial attacks.
1Code and resources to reproduce the experiments presented here are
available at https://github.com/fabiocarrara/neural-ode- features
III. NEURAL ODE NE TWORKS:BACK GR OUND
In this section, we provide the reader with a brief description
of ODE Nets and their properties; for a full detailed descrip-
tion, see [16].
We refer as ODE Net to a parametric model including
an ODE block — a computation block defined a parametric
ordinary differential equation (ODE) whose solution provides
the output result. Let h0the block’s input coinciding with the
initial state at time t0of the following initial-state ODE
(dh(t)
dt=f(h(t), t, θ)
h(t0) = h0
.(1)
The function f, parameterized by θ, defines the continuous
dynamic of the state h(t). In the context we are interested
in (image classification), fis often implemented as a small
convolutional neural network. The output of the block is the
value h(t1)of the state at a time t1> t0that can be computed
by integrating the ODE
h(t1) = h(t0) + Zt1
t0
dh(t)
dtdt=h(t0) + Zt1
t0
f(h(t), t, θ)dt .
(2)
The above integral can be computed with standard ODE
solvers, such as Runge-Kutta or Multi-step methods [24].
Thus, the computation performed by the ODE block can be
formalized as a call to a generic ODE solver
h(t1) = ODESolver(f, h(t0), t0, t1, θ).(3)
During the training phase, the gradients of the output h(t1)
with respect to the input h(t0)and the parameter θcan
be obtained using the adjoint sensitivity method [25]. This
consists of solving an additional ODE backward in time
and thus invoking again the ODE solver in the backward
pass. Once the gradient is obtained, standard gradient-based
optimization can be applied.
ODE Nets benefit from several properties derived by their
formulation or inherited from ODE solvers, including a)
reversibility, as the evolution can be computed in forward
or backward in time, b) O(1)-memory cost, as intermediate
states do not need to be stored when solving ODEs, c)
adaptive computation, as adaptive ODE solvers can adjust
the integration step size for each input, d) accuracy-efficiency
trade-off tunable at inference time, that can be controlled by
tuning the tolerance of adaptive ODE solvers.
IV. ROBUSTNESS OF ODE N ET S
We are interested in implementing image classifiers with
ODE Nets and compare their robustness to adversarial attacks
with respect to standard neural networks. We tested a total of
three architectures: a standard residual network model (RES)
used as a baseline, a mixed model (MIX) comprised of some
residual layers and an ODE block, and an ODE-dominated
model (ODE-Only Net, OON) whose computation is mostly
comprised of a single ODE block. We train all models on
public datasets for image classification, perform adversarial
attacks on respective test sets, and measure the attack success
rate.
3x3, 64
Res
Block
3x3 / 2, 64
3x3, 64
Res
Block
3x3 / 2, K
3x3, K
Res
Block
3x3, K
3x3, K
Res
Block
3x3, K
3x3, K
x6
Classifier
Avg.Pool +
10-d FC +
Softmax
3x3, 64
Res
Block
3x3 / 2, 64
3x3, 64
Res
Block
3x3 / 2, K
3x3, K
ODE Block
3x3, K
3x3, K
Classifier
Avg.Pool +
10-d FC +
Softmax
4x4 / 2, K
ODE Block
3x3, K
3x3, K
Classifier
Avg.Pool +
10-d FC +
Softmax
imageimageimage
class
class
class
Residual Network (RES)
Mixed Residual-ODE Network (MIX)
ODE-Only Network (OON)
Fig. 1: Architectures of the tested models. Convolutional layers
are written in the format kernel width ×kernel height [/
stride], n. filters; padding is always set to 1. For MNIST,
K= 64, and for CIFAR-10, K= 256.
A. Tested Architectures
In this section, we provide details about the tested architec-
tures (also depicted in Figure 1).
a) Residual Network (RES):this is the convolutional
residual neural network image classifier defined and used as
a baseline by Chen et al. [16] in their comparisons with
ODE-Nets. It is comprised of a 64-filter 3x3 convolutional
layer and 8 residual blocks. Each residual block follows the
standard formulation defined in [26], with the only difference
that Group Normalization [27] is used instead of batch nor-
malization. The sequence of layers comprising a residual block
is GN-ReLU-Conv-GN-ReLU-Conv-GN where GN stands for
a Group Normalization with 32 groups, and Conv is a 3x3
convolutional layer. The first two blocks downsample their
input by a factor of 2 using a stride of 2 (also employed
in 1x1 convolutions in the shortcut connections), while the
subsequent blocks maintain the input dimensionality. The first
block employs 64-filters convolutions while subsequent blocks
employ K-filter convolutions where Kvaries with the specific
dataset. The final classification step is implemented with a
global average-pooling operation followed by a single fully-
connected layer with softmax activation.
b) Mixed ResNet-ODE Network (MIX):this is the ODE
Net architecture defined by Chen et al. [16]. The model
is comprised of the same first convolutional layer and the
first two residual blocks of the above-mentioned residual
network plus an ODE block. The ODE function fdefining the
dynamics of the internal state is implemented by a residual
block defined as above, with the difference that the current
evolution time tis also considered as input and concatenated
as a constant feature map to the inputs of the convolutions
in the block. The input and the output of the ODE block are
arbitrarily mapped respectively to time t= 0 and t= 1, i.e.
h(0) is the input coming from the residual blocks, and h(1)
the output of the ODE block. The output of the ODE block is
average-pooled and followed by a fully-connected layer with
softmax activation as in the residual net.
c) ODE-Only Network (OON):in the previously de-
scribed mixed architecture, it is not clear how the feature
extraction process is distributed among standard and ODE
blocks. To ensure that most of the image processing and
feature extraction process happens in the ODE block, we
define and test also an architecture mostly comprised of a
single ODE block. In this model, the input is fed to a minimal
pre-processing stage comprised of a single K-filter 4x4 con-
volutional layer with no activation function that linearly maps
the image in an adequate state space. The output of this step
is then fed to a single ODE block which is responsible for the
whole feature extraction chain. Finally, the same classification
stage as above-mentioned architectures follows.
B. Measuring the Robustness to Attacks
In this section, we describe the methodology used to per-
form attacks and measure the robustness of the trained models.
Among the available adversarial attacks, we consider the
untargeted Projected Gradient Descent (PGD) algorithm (also
known as Basic Iterative Method or iterative-FGSM) [7], [19],
a strong and widely-used gradient-based multi-step attack.
Starting from an original sample, the PGD algorithm itera-
tively searches for an adversarial sample by taking small steps
in the direction of the gradient of the loss function with respect
to the input. Let xa sample in the input space, L(x)the
classifier loss function, xL(x)its gradient with respect to
the input, d(x1,x2)a distance function in the sample space, η
the step size, and εthe maximum magnitude of the adversarial
perturbation. Formally, PGD performs
x0
adv =x,xi
adv =Projε(xi1
adv +ηNorm(xL(xi1
adv )) .(4)
Projε(·)is a function that projects a sample ˜
xhaving a distance
d(˜
x,x)> ε on the Lpball centered on the original sample x
with radius ε(common choices for dare the L2or Lnorms).
The Norm(·)function ensures the gradient has unitary norm:
it is implemented as sign(·)when using d=Land as L2
normalization (Norm(x) = x
|x|2) when d=L2is used.
We quantify the overall robustness to adversarial examples
of a classifier by measuring the success rate of the attack under
different configurations of the attack parameters. We consider
an attack successful if the PGD algorithm is able to find an
adversarial sample leading to a misclassification within the
available budget, i.e. without exceeding the maximum number
of iterations or the maximum perturbation norm ε). For seek
of simplicity, we fixed some parameters dataset-wise (ηand
the maximum number of PGD iterations) while exploring
configurations of other parameters (i.e. εand d).
V. EX PE RI ME NTAL SE TU P
The following experimental set-up has been considered to
evaluate the robustness to adversarial examples of the proposed
three kinds of neural networks.
TABLE I: Performance (classification error % on MNIST
and CIFAR-10 test sets) and complexity (number of trainable
parameters) of the tested architectures: ResNet (RES), Mixed
(MIX), and ODE-Only Net (OON). For architectures with
ODE blocks, we show how the classification error varies with
the ODE solver tolerance τ.
MNIST CIFAR-10
τRES MIX OON RES MIX OON
1040.4% 0.5% 0.5% 7.3% 7.8% 9.1%
1030.5% 0.5% 7.8% 9.2%
1020.5% 0.6% 7.9% 9.3%
1010.5% 0.8% 7.9% 10.6%
1000.5% 1.2% 7.9% 11.3%
1010.5% 1.5% 7.8% 11.5%
params 0.60M 0.22M 0.08M 7.92M 2.02M 1.20M
A. Datasets
We trained the models under analysis on two standard
low-resolution image classification datasets: MNIST [28] and
CIFAR-10 [29]. MNIST consists of 60,000 28x28 grayscale
images of hand-written digits subdivided in train (50,000) and
test (10,000) sets; it is the de facto standard baseline for novel
machine learning algorithms and is the only dataset used in
most research concerning ODE nets.
In addition to MNIST, we extended our analysis using also
CIFAR-10 — a 10-class image classification dataset comprised
of 60,000 32x32 RGB images of common objects subdivided
in train (50,000) and test (10,000) sets.
B. Training Details
The number of filters Kof convolutional networks in
internal blocks is set to 64 for MNIST and 256 for CIFAR-10.
For all models, we adopted the following hyperparameters and
training procedures: dropout with 0.5drop probability applied
before the fully-connected classifier, the SGD optimizer with
momentum of 0.9, weight decay of 104, batch size of 128,
and learning rate of 0.1reduced by a factor 10 every time
the error plateaus. For models containing an ODE block, we
adopted the Dormand–Prince variant of the fifth-order Runge–
Kutta ODE solver [30] in which the step size is adaptive and
can controlled by a tolerance parameter τ(set to 103in our
experiments during the training phase). The specific value of τ
indicates the maximum absolute and relative error (estimated
using the difference between the fourth-order and the fifth-
order solution) accepted when performing a step of integration;
if the step error is higher than τ, the integration step is rejected
and the step size decreased.
All the models obtained a classification performance com-
parable with current state of the art on those datasets. In
Table I, we report the classification error (as percentage of
misclassification) and the model complexity (as the number
of trainable parameters in convolutions and fully-connected
layers) for each model and dataset. For models with ODE
blocks, Table I also shows how the classification performance
changes with the tolerance of the ODE solver. A higher
value of τcorresponds to a lower precision of the integration
carried out by the ODE solver. We observed that this loss in
precision has only marginal effects on the overall accuracy
of the classifier (the worst degradation happens for the ODE-
Only Network, that exhibit a 1-2% drop in accuracy) and also
slightly decreases the computational cost of the forward pass.
C. Adversarial Attack Details
We employed the Foolbox toolkit [31] to perform adversar-
ial attacks on PyTorch models. We attacked each model with
iterative PGD using the test sets as source of original samples
to perturb. We discarded from the analysis the images that are
naturally misclassified by the models. For MNIST, we fixed
the step size η= 0.05 and performed attacks with maximum
perturbation ε= 0.05,0.1, and 0.3; concerning the measure of
the perturbation magnitude, we experimented with d=L2and
L. The same considerations apply for CIFAR-10, with the
difference that we lowered the maximum perturbation allowed
(and thus the step size) in order to capture the diversities, in
terms of robustness, among the models. Thus, for CIFAR-
10 we set η= 0.01 and ε= 0.01,0.03, and 0.05 (attacks
with higher values of εalways yield an adversarial example
in all models). In all experiments, we set the maximum number
of PGD iterations to 10; if an adversarial is found before
the last iteration, we early stop the attack as soon as the
misclassification happens.
VI. RE SU LTS
Tables II and III report the attack success rate respectively
on the MNIST and CIFAR-10 datasets for all the models and
configuration tested.
In our experiments, ODE-based models (MIX and OON)
consistently have a lower or equal attack success rate with
respect to standard residual networks in the same attack
context. This is particularly true for the ODE-only model when
attacked with smaller perturbations: the attack success rate is
roughly halved with respect to RES or MIX.
In both OON and MIX models, lowering the constraints
on integration precision (i.e. increasing the solver tolerance)
results in a decreasing probability of attack success. Increasing
the solver tolerance τto 1decreases the attack success
rate by roughly 40% in the best cases while increasing the
classification error at most by roughly 2% in the worst case.
For ease of comparison, we report in Table IV the classification
error and attack success rate of the best configurations — the
ones obtaining the best trade-off between the two quantities
— for different values of τ. This phenomenon is probably
due to the fact that allowing greater approximations in the
feature extraction and classification process results in blurring
the decision boundaries and thus attenuating the effects of
malicious perturbations during the ODE integration.
To explore this hypothesis, we analyzed the continuous
internal states of the ODE-Only Net when classifying a pristine
or perturbed image. Specifically, let h(t)the trajectory of
the continuous internal state caused by a pristine sample and
hadv(t)the one caused by the same but adversarially perturbed
TABLE II: Attack success rate on MNIST test set of iterative
PGD (step size η= 0.05) applied to the tested architectures:
ResNet (RES), Mixed (MIX), and ODE-Only Net (OON). ε
indicates the maximum perturbation allowed and dthe distance
used to measure the perturbation magnitude.
ε=.05,d=L2ε=.1,d=L2ε=.3,d=L2
tol RES MIX OON RES MIX OON RES MIX OON
104.52 .51 .27 .99 .98 .87 1.1.1.
103.51 .19 .98 .75 1. .99
102.50 .09 .98 .57 1. .94
101.44 .08 .95 .54 1. .92
100.35 .06 .91 .46 1. .96
ε=.05,d=Lε=.1,d=Lε=.3,d=L
tol RES MIX OON RES MIX OON RES MIX OON
104.04 .04 .02 .32 .31 .12 1.1. .96
103.04 .02 .31 .08 1. .87
102.03 .01 .30 .04 .99 .66
101.03 .01 .24 .05 .98 .64
100.02 .02 .17 .04 .95 .65
TABLE III: Attack success rate on CIFAR-10 test set of
iterative PGD (step size η= 0.01) applied to the tested
architectures: ResNet (RES), Mixed (MIX), and ODE Net
(OON). εindicates the maximum perturbation allowed and
dthe distance used to measure the perturbation magnitude.
ε=.01,d=L2ε=.03,d=L2ε=.05,d=L2
tol RES MIX OON RES MIX OON RES MIX OON
104.97 .96 .96 1.1.1.1.1.1.
103.96 .95 1.1.1.1.
102.95 .86 1.1.1.1.
101.95 .61 1. .98 1.1.
100.87 .52 1. .96 1.1.
ε=.01,d=Lε=.03,d=Lε=.05,d=L
tol RES MIX OON RES MIX OON RES MIX OON
104.81 .79 .73 1.1.1.1.1.1.
103.79 .68 1.1.1.1.
102.77 .50 1. .98 1.1.
101.76 .35 1. .89 1. .99
100.62 .28 1. .81 1. .97
sample. In Figure 2, we reported the difference h(t)(see
Equation (5)) between those two trajectories measured as the
L2distance between each point in time
h(t) = |h(t)hadv(t)|2(5)
and averaged over all the samples in the test set. Each line
is computed using adversarial examples crafted with the same
tolerance τused when measuring h(t). We can confirm that
increasing values of τcauses the adversarial perturbation to
be attenuated during the evolution of the internal state, even if
the adversarial attack is performed setting a higher tolerance.
VII. CONCLUSIONS
In this paper, we presented an analysis of the robustness
to adversarial examples of ODE-Nets — a recently intro-
duced neural network architecture with continuous hidden
TABLE IV: Best configurations in terms of the trade-off
between classification error and attack success rate.
Model Tolerance of ODE Solver (τ)
104103102101100
Classification Error (MNIST)
RES .004 − − − −
MIX .005 .005 .005 .005 .005
OON .005 .005 .006 .008 .012
Attack Success Rate (MNIST, ε=.1,d=L2)
RES .99 − − − −
MIX .98 .98 .98 .95 .91
OON .87 .75 .57 .54 .46
Classification Error (CIFAR-10)
RES .073 − − − −
MIX .078 .078 .079 .079 .079
OON .091 .092 .093 .106 .113
Attack Success Rate (CIFAR-10, ε=.01,d=L)
RES .81 − − − −
MIX .79 .79 .77 .76 .62
OON .73 .68 .50 .35 .28
states defined by ordinary differential equations. We compared
three architectures (a residual, an ODE-based, and a mixed
architecture) using the MNIST and CIFAR-10 datasets and
observed that ODE-Nets provide superior robustness against
PGD — a strong multi-step adversarial attack. Furthermore, by
investigating the evolution of the internal states of ODE-nets in
response to pristine and corresponding adversarial examples,
we observed that the error tolerance parameter of the ODE
solver does not substantially affect classification performances
with respect to pristine samples while it significantly improve
the robustness to adversarial examples: the higher the toler-
ance, the higher the resilience to adversarial inputs.
As future work, we plan to extend our analysis to higher-
resolution and/or larger-scale datasets and to additional state-
of-the-art adversarial attacks, such as Carlini and Wagner [6].
In addition, the presented findings open new interesting re-
search directions in the field: as an example and possible future
work, a novel detection method for adversarial examples could
rely on the analysis of the outputs (or internal states) of ODE
nets evaluated with multiple values of the tolerance parameter.
ACKNOWLEDGMENTS
This work was partially supported by the ADA project,
funded by Regione Toscana (CUP CIPE D55F17000290009),
by the AI4EU EC-H2020 project (Contract n. 825619) and by
the SMARTACCS project, funded by Fondazione CR Firenze
(Contract n. 2018.0896). The authors gratefully acknowledge
the support of NVIDIA Corporation with the donation of the
Titan Xp and Tesla K40 GPUs used for this research.
REFERENCES
[1] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. ˇ
Srndi´
c, P. Laskov,
G. Giacinto, and F. Roli, “Evasion attacks against machine learning
at test time,” in Joint European conference on machine learning and
knowledge discovery in databases. Springer, 2013, pp. 387–402.
(a) MNIST - ε=.05,d=L2
0.0 0.2 0.4 0.6 0.8 1.0
t
10
20
30
40
50
60
70
|h(t)hadv(t)|2
τ
104.0
103.0
102.0
101.0
100.0
101.0
(b) CIFAR-10 - ε=.01,d=L2
0.0 0.2 0.4 0.6 0.8 1.0
t
0
10
20
30
40
50
|h(t)hadv(t)|2
τ
104.0
103.0
102.0
101.0
100.0
101.0
Fig. 2: Discrepancy of internal representations of the OON model caused by successful adversarial attacks. The y-axis shows
the mean L2distance h(t)between pristine and attacked image representations during its continuous trajectory (time/depth
is in the x-axis) for different values of the tolerance τof the adaptive ODE solver.
[2] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J.
Goodfellow, and R. Fergus, “Intriguing properties of neural networks,”
in ICLR, 2014. [Online]. Available: http://arxiv.org/abs/1312.6199
[3] M. Barni, M. C. Stamm, and B. Tondi, “Adversarial multimedia foren-
sics: Overview and challenges ahead,” in 2018 26th European Signal
Processing Conference (EUSIPCO), Sep. 2018, pp. 962–966.
[4] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and
harnessing adversarial examples,” in ICLR, 2015. [Online]. Available:
http://arxiv.org/abs/1412.6572
[5] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple
and accurate method to fool deep neural networks,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 2016,
pp. 2574–2582.
[6] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural
networks,” in 2017 IEEE Symposium on Security and Privacy (SP).
IEEE, 2017, pp. 39–57.
[7] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples
in the physical world,” in ICLR Workshops, 2017. [Online]. Available:
https://openreview.net/forum?id=HJGU3Rodl
[8] N. Papernot, P. D. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation
as a defense to adversarial perturbations against deep neural networks,
in IEEE Symposium on Security and Privacy, SP 2016, 2016, pp.
582–597. [Online]. Available: https://doi.org/10.1109/SP.2016.41
[9] P. Tabacof and E. Valle, “Exploring the space of adversarial images,
in 2016 International Joint Conference on Neural Networks (IJCNN).
IEEE, 2016, pp. 426–433.
[10] K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. D. McDaniel,
“On the (statistical) detection of adversarial examples,CoRR, vol.
abs/1702.06280, 2017. [Online]. Available: http://arxiv.org/abs/1702.
06280
[11] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner, “Detecting
adversarial samples from artifacts,CoRR, vol. abs/1703.00410, 2017.
[Online]. Available: http://arxiv.org/abs/1703.00410
[12] Z. Gong, W. Wang, and W. Ku, “Adversarial and clean data are
not twins,” CoRR, vol. abs/1704.04960, 2017. [Online]. Available:
http://arxiv.org/abs/1704.04960
[13] J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff, “On
detecting adversarial perturbations,” in ICLR, 2017. [Online]. Available:
https://openreview.net/forum?id=SJzCSf9xg
[14] F. Carrara, F. Falchi, R. Caldelli, G. Amato, and R. Becarelli, “Adver-
sarial image detection in deep neural networks,” Multimedia Tools and
Applications, vol. 78, no. 3, pp. 2815–2835, 2019.
[15] R. Caldelli, R. Becarelli, F. Carrara, F. Falchi, and G. Amato, “Exploiting
CNN layer activations to improve adversarial image classification,” in
2019 IEEE International Conference on Image Processing (ICIP), Sep.
2019, pp. 2289–2293.
[16] T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural
ordinary differential equations,” in Advances in Neural Information
Processing Systems, 2018, pp. 6572–6583.
[17] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and
A. Swami, “The limitations of deep learning in adversarial settings,
in Security and Privacy (EuroS&P), 2016 IEEE European Symposium
on. IEEE, 2016, pp. 372–387.
[18] A. Kurakin, I. J. Goodfellow, S. Bengio, Y. Dong, F. Liao, M. Liang,
T. Pang, J. Zhu, X. Hu, C. Xie, J. Wang, Z. Zhang, Z. Ren, A. L.
Yuille, S. Huang, Y. Zhao, Y. Zhao, Z. Han, J. Long, Y. Berdibekov,
T. Akiba, S. Tokui, and M. Abe, “Adversarial attacks and defences
competition,” CoRR, vol. abs/1804.00097, 2018. [Online]. Available:
http://arxiv.org/abs/1804.00097
[19] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards
deep learning models resistant to adversarial attacks,” in ICLR, 2018.
[Online]. Available: https://openreview.net/forum?id=rJzIBfZAb
[20] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesizing robust
adversarial examples,” in ICML, 2018, pp. 284–293.
[21] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial machine
learning at scale,” in ICLR, 2017. [Online]. Available: https:
//openreview.net/forum?id=BJm4T4Kgx
[22] X. Li and F. Li, “Adversarial examples detection in deep networks with
convolutional filter statistics.” in ICCV, 2017, pp. 5775–5783.
[23] F. Carrara, R. Becarelli, R. Caldelli, F. Falchi, and G. Amato, “Adver-
sarial examples detection in features distance spaces,” in Proceedings of
the European Conference on Computer Vision (ECCV), 2018.
[24] E. Hairer, S. P. Nørsett, and G. Wanner, Solving ordinary differential
equations. 1, Nonstiff problems. Springer-Vlg, 1991.
[25] L. S. Pontryagin, Mathematical theory of optimal processes. Routledge,
2018.
[26] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual
networks,” in European conference on computer vision. Springer, 2016,
pp. 630–645.
[27] Y. Wu and K. He, “Group normalization,” in Proceedings of the
European Conference on Computer Vision (ECCV), 2018, pp. 3–19.
[28] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner et al., “Gradient-based
learning applied to document recognition,” Proceedings of the IEEE,
vol. 86, no. 11, pp. 2278–2324, 1998.
[29] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from
tiny images,” Citeseer, Tech. Rep., 2009.
[30] J. R. Dormand and P. J. Prince, “A family of embedded runge-kutta
formulae,” Journal of computational and applied mathematics, vol. 6,
no. 1, pp. 19–26, 1980.
[31] J. Rauber, W. Brendel, and M. Bethge, “Foolbox v0.8.0: A
python toolbox to benchmark the robustness of machine learning
models,” CoRR, vol. abs/1707.04131, 2017. [Online]. Available:
http://arxiv.org/abs/1707.04131
... Among its properties, we find the ability to tune at test-time the precision-efficiency trade-off of the network by changing the tolerance of the adaptive ODE solver used in the forward computation. Previous work [5] showed that neural ODE nets are more robust to PGD attacks than standard architectures such as ResNets, and most importantly, higher tolerance valuesi.e. lower-precision higher-efficiency regimes-provided increased robustness at a negligible expense of accuracy of the model. ...
... Regarding analyzing and defending neural ODE architectures, few works in the current literature cover the subject. Seminal works include Carrara et al. [5], that analyzes ODE nets under PGD attacks and asses their superior robustness with respect to standard architectures, and Hanshu et al. [10], that proposes a regularization based on the time-invariance property of steady states of ODE solutions to further improve robustness. Relevant to our proposed scheme is also the work of Liu et al. [16] that exploit stochasticity by injecting noise in the ODE to increase robustness to perturbations of initial conditions, including adversarial ones. ...
... 5.1 for details on the datasets). Though ODE-Nets are very promising and show good performances, they are prone to be attacked as well as the other kinds of networks [5]. This can be appreciated in Table 1, where for each model and dataset, we report the classification error, the attack success rate (in percentage) the mean L 2 norm of the adversarial perturbation. ...
Chapter
Deep learned models are now largely adopted in different fields, and they generally provide superior performances with respect to classical signal-based approaches. Notwithstanding this, their actual reliability when working in an unprotected environment is far enough to be proven. In this work, we consider a novel deep neural network architecture, named Neural Ordinary Differential Equations (N-ODE), that is getting particular attention due to an attractive property—a test-time tunable trade-off between accuracy and efficiency. This paper analyzes the robustness of N-ODE image classifiers when faced against a strong adversarial attack and how its effectiveness changes when varying such a tunable trade-off. We show that adversarial robustness is increased when the networks operate in different tolerance regimes during test time and training time. On this basis, we propose a novel adversarial detection strategy for N-ODE nets based on the randomization of the adaptive ODE solver tolerance. Our evaluation performed on standard image classification benchmarks shows that our detection technique provides high rejection of adversarial examples while maintaining most of the original samples under white-box attacks and zero-knowledge adversaries.
... During this year, an extensive analysis of the layer activation in case of adversarial attacks were re-ported in [5]. We also adapted our techniques to the case of recently proposed ODE-Nets in [6]. Finally, we consider the task of detecting adversarial faces, i.e., malicious faces given to machine learning systems in order to fool the recognition and the verification in particular [7]. ...
... Another important analysis we conducted concerns the vulnerability of ODE-defined networks to adversarial attacks: we conducted experiments on classical (residual), mixed, and proposed ODE-only architectures, and showed that when feature extraction is performed mostly by ODE blocks, the attack success rate of strong adversarial attacks, like Projected Gradient Descent, is lower and can be furthermore reduced by controlling the tolerance of the adaptive ODE solver [6]. (Mode details in Section 3.2.9). ...
... F. Carrara, R.Caldelli, F. Falchi, G. Amato. In IEEE International Workshop on Information Forensics and Security -WIFS 2019, Delft, Netherlands, December 9-12, 2019 [6]. Abstract: ...
Technical Report
Full-text available
The Artificial Intelligence for Multimedia Information Retrieval (AIMIR) research group is part of the NeMIS laboratory of the Information Science and Technologies Institute ``A. Faedo'' (ISTI) of the Italian National Research Council (CNR). The AIMIR group has a long experience in topics related to: Artificial Intelligence, Multimedia Information Retrieval, Computer Vision and Similarity search on a large scale. We aim at investigating the use of Artificial Intelligence and Deep Learning, for Multimedia Information Retrieval, addressing both effectiveness and efficiency. Multimedia information retrieval techniques should be able to provide users with pertinent results, fast, on huge amount of multimedia data. Application areas of our research results range from cultural heritage to smart tourism, from security to smart cities, from mobile visual search to augmented reality. This report summarize the 2019 activities of the research group.
... Yan et al. (2019) showed that ODE-Nets are more robust than classical CNN models against both random Gaussian perturbations and adversarial attacks. Carrara et al. (2019) also studied the robustness of NODEs to adversarial attacks and arrived at the same conclusion. A way of defending against adversarial attacks, utilizing NODEs, is presented in (Kang et al., 2021). ...
... In general, including the Sobolev gradient in the NCG training further reduces the sensitivity to adversarial attacks, but not for all settings. It has previously been demonstrated how NODEs compare favorably to conventional networks in terms of robustness to adversarial attacks (Yan et al., 2019;Carrara et al., 2019), and our experiments confirm that this also apply to the standalone NODEs trained with NCG. ...
Preprint
This paper presents the Standalone Neural ODE (sNODE), a continuous-depth neural ODE model capable of describing a full deep neural network. This uses a novel nonlinear conjugate gradient (NCG) descent optimization scheme for training, where the Sobolev gradient can be incorporated to improve smoothness of model weights. We also present a general formulation of the neural sensitivity problem and show how it is used in the NCG training. The sensitivity analysis provides a reliable measure of uncertainty propagation throughout a network, and can be used to study model robustness and to generate adversarial attacks. Our evaluations demonstrate that our novel formulations lead to increased robustness and performance as compared to ResNet models, and that it opens up for new opportunities for designing and developing machine learning with improved explainability.
... deep features) collecting encouraging results over the last three years. This year's research activity on the topic included a) the evaluation of adversarial robustness of novel deep architectures [12] (see Section 1.1.4) and b) the analysis of adversarial defenses in a popular safety-critical application -face recognition. ...
... During this year, we started experimenting with this new architecture by a) testing its ability to efficiently create flexible image representations [14], and b) measuring its robustness to adversarial perturbations in light of its adaptability properties [12]. ...
Technical Report
Full-text available
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability. This report summarize the 2020 activities of the research group.
... deep features) collecting encouraging results over the last three years. This year's research activity on the topic included a) the evaluation of adversarial robustness of novel deep architectures [12] (see Section 1.1.4) and b) the analysis of adversarial defenses in a popular safety-critical application -face recognition. ...
... During this year, we started experimenting with this new architecture by a) testing its ability to efficiently create flexible image representations [14], and b) measuring its robustness to adversarial perturbations in light of its adaptability properties [12]. ...
... Neural ODEs allow training continuous deep neural networks in an end-to-end manner, which have been successfully applied in numerous tasks, such as density estimation [2,3], time-series modeling [4,5], physics-based models [6,7] and some others [8,9,10]. Besides, the robustness of Neural ODEs to adversarial attacks has also been discussed in several literatures [11,12,13]. Although Neural ODEs show great potential in handling various learning tasks, they still do not show their effectiveness on image recognition tasks. ...
Preprint
Full-text available
Neural Ordinary Differential Equations (Neural ODEs) construct the continuous dynamics of hidden units using ordinary differential equations specified by a neural network, demonstrating promising results on many tasks. However, Neural ODEs still do not perform well on image recognition tasks. The possible reason is that the one-hot encoding vector commonly used in Neural ODEs can not provide enough supervised information. We propose a new training based on knowledge distillation to construct more powerful and robust Neural ODEs fitting image recognition tasks. Specially, we model the training of Neural ODEs into a teacher-student learning process, in which we propose ResNets as the teacher model to provide richer supervised information. The experimental results show that the new training manner can improve the classification accuracy of Neural ODEs by 24% on CIFAR10 and 5% on SVHN. In addition, we also quantitatively discuss the effect of both knowledge distillation and time horizon in Neural ODEs on robustness against adversarial examples. The experimental analysis concludes that introducing the knowledge distillation and increasing the time horizon can improve the robustness of Neural ODEs against adversarial examples.
Article
The paper presents an approach to machine learning involving the use of a hybrid method in Neural ODE. The authors consider a way of the Neural ODE application with an embedded hybrid method for intelligent manipulator control in a non-deterministic environment. To reduce computational resources, the number of layers of the neural network using ODE is reduced, the solution of which is carried out by a hybrid transformation method. The hybrid method includes normalizing the system, linear transformation, and polynomial transformation. Using the hybrid method in Neural ODE can increase the learning and computation speed. The transformation method has high accuracy with low computational complexity of the algorithm. As a result of applying the hybrid method, the original nonlinear system is transformed to a simpler form. To solve the simplified transformed system, the scheme of the numerical Runge-Kutta method of the third order with an adaptive step is used. The results of the computational experiment show an increase in the speed and accuracy of the calculation. The presented approach to implementing the hybrid method in Neural ODE allows faster training of the neural network of the world model on limited data obtained from the real environment. After training on the data generated by the model of the world, a robotic arm always successfully reaches the object during subsequent interaction with the real environment. This solves the problem of the limited ability of the agent to interact when learning with the real environment in robotic systems.
Article
Neural networks are vulnerable to adversarial input perturbations imperceptible to human, which calls for robust machine learning for safety-critical applications. In this paper, we propose a new neural ODE layer which is inspired by Hopfield-type neural networks. We prove that the proposed ODE layer has global asymptotic stability on the projected space, which implies the existence and uniqueness of its steady state. We further show that the proposed layer satisfies the local stability condition such that the output is Lipschitz continuous in the ODE layer input, guaranteeing that the norm of perturbation on the hidden state does not grow over time. By experiments we show that an appropriate level of stability constraints imposed on the proposed ODE layer can improve the adversarial robustness of ODE layers, and present a heuristic method for finding good hyperparameters for stability constraints.
Preprint
Full-text available
A conventional approach to train neural ordinary differential equations (ODEs) is to fix an ODE solver and then learn the neural network's weights to optimize a target loss function. However, such an approach is tailored for a specific discretization method and its properties, which may not be optimal for the selected application and yield the overfitting to the given solver. In our paper, we investigate how the variability in solvers' space can improve neural ODEs performance. We consider a family of Runge-Kutta methods that are parameterized by no more than two scalar variables. Based on the solvers' properties, we propose an approach to decrease neural ODEs overfitting to the pre-defined solver, along with a criterion to evaluate such behaviour. Moreover, we show that the right choice of solver parameterization can significantly affect neural ODEs models in terms of robustness to adversarial attacks. Recently it was shown that neural ODEs demonstrate superiority over conventional CNNs in terms of robustness. Our work demonstrates that the model robustness can be further improved by optimizing solver choice for a given task. The source code to reproduce our experiments is available at https://github.com/juliagusak/neural-ode-metasolver.
Chapter
With rapid improvements in deep learning, one of the challenging open issues is the exposure of neural systems to adversarial attack. Many competitions explored various attacks and prevention strategies against these attacks. Another worldview of profound neural systems with persistent inward activations was presented at NeurIPS 2018 that likewise acquired the best paper grant. With the neural ODE block, all discrete transformations can be parametrized as a continuous transformation that can be solved using a ODE solver. We break down picture/content classifiers with ODE Nets, upon subject to ill-disposed assaults and contrast it with standard profound models. Neural ODEs are more robust towards black-box attacks. Some of their intrinsic properties, such as accuracy–efficiency trade-off and versatile calculation cost, allow us to build powerful models further.
Chapter
Full-text available
Maliciously manipulated inputs for attacking machine learning methods – in particular deep neural networks – are emerging as a relevant issue for the security of recent artificial intelligence technologies, especially in computer vision. In this paper, we focus on attacks targeting image classifiers implemented with deep neural networks, and we propose a method for detecting adversarial images which focuses on the trajectory of internal representations (i.e. hidden layers neurons activation, also known as deep features) from the very first, up to the last. We argue that the representations of adversarial inputs follow a different evolution with respect to genuine inputs, and we define a distance-based embedding of features to efficiently encode this information. We train an LSTM network that analyzes the sequence of deep features embedded in a distance space to detect adversarial examples. The results of our preliminary experiments are encouraging: our detection scheme is able to detect adversarial inputs targeted to the ResNet-50 classifier pre-trained on the ILSVRC’12 dataset and generated by a variety of crafting algorithms.
Article
Full-text available
Deep neural networks are more and more pervading many computer vision applications and in particular image classification. Notwithstanding that, recent works have demonstrated that it is quite easy to create adversarial examples, i.e., images malevolently modified to cause deep neural networks to fail. Such images contain changes unnoticeable to the human eye but sufficient to mislead the network. This represents a serious threat for machine learning methods. In this paper, we investigate the robustness of the representations learned by the fooled neural network, analyzing the activations of its hidden layers. Specifically, we tested scoring approaches used for kNN classification, in order to distinguish between correctly classified authentic images and adversarial examples. These scores are obtained searching only between the very same images used for training the network. The results show that hidden layers activations can be used to reveal incorrect classifications caused by adversarial attacks.
Article
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. A series of ablation experiments support the importance of these identity mappings. This motivates us to propose a new residual unit, which further makes training easy and improves generalization. We report improved results using a 1001-layer ResNet on CIFAR-10/100, and a 200-layer ResNet on ImageNet.
Article
State-of-the-art deep neural networks have achieved impressive results on many image classification tasks. However, these same architectures have been shown to be unstable to small, well sought, perturbations of the images. Despite the importance of this phenomenon, no effective methods have been proposed to accurately compute the robustness of state-of-the-art deep classifiers to such perturbations on large-scale datasets. In this paper, we fill this gap and propose the DeepFool framework to efficiently compute perturbations that fools deep network and thus reliably quantify the robustness of arbitrary classifiers. Extensive experimental results show that our approach outperforms recent methods in the task of computing adversarial perturbations and making classifiers more robust. To encourage reproducible research, the code of DeepFool will be available online.
Article
Neural networks are susceptible to adversarial examples: small, carefully-crafted perturbations can cause networks to misclassify inputs in arbitrarily chosen ways. However, some studies have showed that adversarial examples crafted following the usual methods are not tolerant to small transformations: for example, zooming in on an adversarial image can cause it to be classified correctly again. This raises the question of whether adversarial examples are a concern in practice, because many real-world systems capture images from multiple scales and perspectives. This paper shows that adversarial examples can be made robust to distributions of transformations. Our approach produces single images that are simultaneously adversarial under all transformations in a chosen distribution, showing that we cannot rely on transformations such as rescaling, translation, and rotation to protect against adversarial examples.
Article
Even todays most advanced machine learning models are easily fooled by almost imperceptible perturbations of their inputs. Foolbox is a new Python package to generate such adversarial perturbations and to quantify and compare the robustness of machine learning models. It is build around the idea that the most comparable robustness measure is the minimum perturbation needed to craft an adversarial example. To this end, Foolbox provides reference implementations of most published adversarial attack methods alongside some new ones, all of which perform internal hyperparameter tuning to find the minimum adversarial perturbation. Additionally, Foolbox interfaces with most popular deep learning frameworks such as PyTorch, Keras, TensorFlow, Theano and MXNet, provides a straight forward way to add support for other frameworks and allows different adversarial criteria such as targeted misclassification and top-k misclassification as well as different distance measures. The code is licensed under the MIT license and is openly available at https://github.com/bethgelab/foolbox.