ArticlePDF Available

A Survey of Adversarial Machine Learning in Cyber Warfare


Abstract and Figures

div class="page" title="Page 1"> The changing nature of warfare has seen a paradigm shift from the conventional to asymmetric, contactless warfare such as information and cyber warfare. Excessive dependence on information and communication technologies, cloud infrastructures, big data analytics, data-mining and automation in decision making poses grave threats to business and economy in adversarial environments. Adversarial machine learning is a fast growing area of research which studies the design of Machine Learning algorithms that are robust in adversarial environments. This paper presents a comprehensive survey of this emerging area and the various techniques of adversary modelling. We explore the threat models for Machine Learning systems and describe the various techniques to attack and defend them. We present privacy issues in these models and describe a cyber-warfare test-bed to test the effectiveness of the various attack-defence strategies and conclude with some open problems in this area of research. </div
Content may be subject to copyright.
Defence Science Journal, Vol. 68, No. 4, July 2018, pp. 356-366, DOI : 10.14429/dsj.68.12731
A Survey of Adversarial Machine Learning in Cyber Warfare
Vasisht Duddu
Indraprastha Institute of Information Technology, Delhi - 110 020, India
The changing nature of warfare has seen a paradigm shift from the conventional to asymmetric, contactless warfare
such as information and cyber warfare. Excessive dependence on information and communication technologies, cloud
infrastructures, big data analytics, data-mining and automation in decision making poses grave threats to business
and economy in adversarial environments. Adversarial machine learning is a fast growing area of research which
studies the design of Machine Learning algorithms that are robust in adversarial environments. This paper presents
a comprehensive survey of this emerging area and the various techniques of adversary modelling. We explore the
threat models for Machine Learning systems and describe the various techniques to attack and defend them. We
present privacy issues in these models and describe a cyber-warfare test-bed to test the eectiveness of the various
attack-defence strategies and conclude with some open problems in this area of research.
Keywords: Adversarial machine learning; Adversary modelling; Cyber attacks; Security; Privacy
Machine learning (ML) and articial intelligence are
ubiquitous and have been extensively used to automate tasks
and decision making processes. There has been a tremendous
growth and dependence in using ML applications in national
critical infrastructures and critical areas such as medicine and
healthcare, computer security, spam and malware detection,
autonomous driving vehicles, unmanned autonomous systems
and homeland security. The critical nature of such systems
and their applications demand a high level of defence against
cyber attacks. While data scientists successfully automate
tasks and use data mining techniques to uncover hidden, yet
undiscovered knowledge from the vast unstructured data
collected from disparate sources, there are serious concerns in
the security issues and vulnerabilities present in data mining
and ML systems. In such networks of data and knowledge
sources spanning distributed databases of critical nature present
in several public, private clouds, and government owned cyber
infrastructures, run many ML algorithms to extract useful
information and knowledge. They are highly vulnerable in the
cyber ecosystem and become the weakest link in the entire
chain which can compromise security of the entire system.
Medical and health-care domains for instance, using ML need
to ensure privacy and data leakage prevention. Recommender
systems, Stock market prediction, and Sentiment analysis use
ML algorithms for assessing market trends from the data and
any malicious change in the data or the underlying algorithms
eects the data distributions and end results. This eld of ML
is an important area of research owing to the growing concerns
of security, privacy and over reliance of users on automated
decision making. Security of ML models need to be evaluated
against adversaries and defences are to be set up to ensure
robust designs against adversarial attacks as shown in Fig. 1.
In this study, we explore the emerging area of adversarial
machine learning (AML) which is the design of machine
learning algorithms that are robust to various attacks under the
constraints of the knowledge and capabilities of the adversaries.
The study of AML helps in two ways: rst, we can plan
strategies and course of actions to model and counter against
adversarial behaviour; second is to understand and model
the adversary in order to strengthen our defences against the
actions. These are used for red teaming in a cyberwarfare test-
Vulnerabilities exist in machine learning models
implemented to generate information from data. An important
source of vulnerability lies in the faulty assumptions made
while designing and training the ML model.
Data scientists design ML models to be robust and accurate,
and they implicitly assume to preserve privacy; however this
assumption is not true and leads to serious breach in privacy.
Researchers have modelled ML systems on linearly
separable data and use linear function as decision function to
reduce the computation complexity. This assumption increases
the overall mis-classications as an adversary can create
adversarial examples to further degrade the performance of the
In some cases, collection of data is done in unsupervised
manner and in adversarial settings like collecting data from
honeypot servers. This allows attackers to carefully craft
Received : 25 November 2017, Revised : 19 March 2018
Accepted : 09 April 2018, Online published : 25 June 2018
adversarial examples to be collected as data which may degrade
the model since the adversary has direct access to the training
data.Dierent data instances are considered to be independent
and identically distributed. Some authors, for convenience and
ease of computation, assume that the features are independent
of each other. However, an adversary can try to obtain the
correlation between dierent data points and features to
introduce instances from dierent data distribution to degrade
the model’s performance.
One of the major vulnerabilities in ML models is that the
models perform well on testing and training data as they are
usually drawn from the same underlying distribution. If the
data from some other distribution is used as an input, the model
will behave dierently. This is the basic vulnerability that is
exploited by attackers to craft adversarial examples to evade
the model or degrade its performance.
Due to the critical nature of the applications of ML, it is
important to model the adversary and his strategies to attack the
decision making algorithms, to represent a realistic adversary
in a cyber warfare scenario.
The concept of AML was formally introduced by
Huang1, et al. who proposed a taxonomy of adversarial attacks
and the adversary modelled using the triple: Capability,
knowledge and goals.
3.1 Adversarial Capabilities
Adversarial capabilities refer to the possible impact or
inuence that an adversary can have by attacking the ML
model. Attacks of the adversary based on the capabilities can
be classied according to the following three dimensions:
Classication based on inuence of adversary is based
on the attempt to change the dataset or the algorithms of the
target during the course of the attack. Such attacks can be
further classied according to the inuence as causative or
Causative: Causative attacks alter the training process
through inuence over the training data. This requires the
adversary to modify or inuence both training and testing
data.Exploratory: Exploratory attacks do not alter the training
process but use other techniques, such as probing, to discover
information about training data. The adversary cannot modify
or manipulate the training data and can only craft new instances
based on the underlying data distribution.
The specicity of the attacks determines whether the
attacks modify or eect the model as a whole based on multiple
attack vectors or by using a specic attack vector to attack the
model. Attacks can be classied according to specicity as:
Targeted: In a targeted attack, the focus is on a single or
small set of target points.
Indiscriminate: An indiscriminate adversary has a more
exible goal like mis-classifying a very general class of
Four possible cases emerge based on the impact or eect
the adversary has on the ML model6 (Fig. 2):
Condence Reduction: Adversary tries to manipulate the
training data so that the prediction condence of the ML
model reduces. This can be done when the adversary has
little or no information about the model and can corrupt
the decision process of the critical ML system.
Mis-classication: The goal of the adversary is to mis-
classify the ML model’s response to an input in any way
possible. This includes modifying the input to make it fall
on the wrong side of the decision boundary. The attack
Figure 1. Major components of adversarial machine learning environment.
DEF. SCI. J., VOL. 68, NO. 4, JULY 2018
is indiscriminate and adversary just tries to maximise the
total number of mis-classications to reduce the overall
accuracy and condence of the model.
Targeted Mis-classication: The adversary generates a
carefully crafted adversarial example from random noise
using various algorithms and the model mis-classies the
noise as a legitimate sample. The perturbation is carefully
selected unlike the previous case.
Source/Target Mis-classication: An input of particular
type is modied by carefully adding perturbation to be
classied as a specic target class which can subvert
the logic of the entire ML system. Consider an example
of a ML model checking for malware and one malware
instance is modied by adding perturbation to be classied
as benign. This usually takes place during test time and
eects only the testing data.
3.2 Adversarial Knowledge
Knowledge of the underlying ML model plays a crucial
role in determining the success of the attacks by providing the
adversary an opportunity to make informed decisions as shown
in Fig. 3. The knowledge of the ML system can be classied
Data acquisition
Feature selection
Algorithm and parameters
Training and output
The adversary may have either complete or perfect
knowledge of the ML system or only a partial knowledge of
the system. Adversary attacks can be classied into black box
attacks and white box attacks based on the knowledge about
the model an adversary has.
Complete/perfect knowledge: An adversary is said to have
perfect knowledge if he has access to the knowledge of data
acquisition, data, feature selection, ML algorithms and tuned
parameters of the model. The attacker may or may not have
access to the training data which can be easily acquired by
using other knowledge. This is usually the case when the ML
model is open source and everyone has access to it.
Limited Knowledge: In this case, the adversary only knows
a part of the model. He does not have access to the training
data and may have very limited information about the model
architecture, parameters, and has access to only a small subset
of the total knowledge available.
For the adversary to evolve from black box to white box,
he iteratively goes through a process of learning using inference
mechanisms to gain more knowledge of the model.
3.3 Adversarial Goals
Based on the goals and intent of the adversary for attacking
the ML model we can classify them into the following:
Integrity violation: The adversary performs malicious
activity without compromising the normal system operation
but the output of the model is of attackers choosing. Poisoning
attacks are an example of integrity violation.
Availability violation: The adversary compromises the
system functionality with an intent to cause a denial of service
to users during the operations. One way is to maximise the
mis-classication or eect the output of model signicantly to
degrade the performance of the model and make it to crash.
Privacy Violation: The adversary tries to gain information
about sensitive user data from the ML model and also extract
key information about the model architecture. Model inversion,
member inference, reverse engineering and side channel on
ML models are examples of such attacks.
Figure 3. Adversary’s knowledge.
Figure 2. Impacts of adversarial capabilities.
In this section, we explore the various attacks on ML
models. An adversary implements attacks by generating
perturbed data instances called adversarial examples. A data
instance may be carefully modied where the perturbations
are calculated using algorithms to cause the ML classier to
mis-classify with high condence. The goal is to construct
adversarial examples
such that it is very close to the input
( ' )FxT=
being the target class, and
being the decision function. A simple indiscriminate approach
is gradient ascent during training of ML model.
The fast gradient sign method (FGSM) is one of the
ways to generate adversarial examples that was proposed by
Goodfellow2, et al. Let
be the parameters of the model,
denotes input to the model,
denotes the targets associated
(for a supervised learning paradigm) and
denote the cost function to train the neural network as shown
in Fig. 4. The cost function can be linearised around the
current value of
, to obtain an optimal max-norm constrained
perturbation of
sign J x yη=ε ∇ θ
Kurakin3, et al. showed that real world systems like
cameras and sensors were vulnerable to adversarial examples
by introducing the basic iterative method to generate
adversarial images by modifying the FGSM. As an extension,
they introduced label leaking eect which occurs when the
accuracy on adversarial images becomes higher than the
accuracy on clean images4. Dense adversary generation
algorithm5 generates a large family of adversarial examples to
exploit semantic segmentation and object detection.
The Jacobian based saliency map approach to search for
adversarial examples by modifying a small number of input
pixels in an image was proposed by Papernot6, et al. They
compute the Jacobian of a model to identify the sensitivity
of model or decision boundary. They use adversarial saliency
map that contains information about the likelihood of
misclassication for a given input feature.
Carlini-Wagner adversarial example7 capable of evading
all present defences including defensive distillation72. Given an
, we would want to nd
and minimise
( , ' )Dxx
( ' )FxT=
is valid where D is the distance function.
The minimisation problem was reformulated by adding a loss
( ' )gx
that measures the closeness of
( ' )Fx
time attack that does not require accessing and manipulating
the training data. The goal is to nd a sample
such that the
distance from target malicious sample
is minimised81:
' arg min ( )x gx=
0 max(,)dxxd
Laskov9, et al. study the eectiveness of evading PDF rate
ML system using adversarial examples by manipulating the
header elds in PDF format. An improvement was proposed10
using an oracle which uses a function threshold to classify
them as benign or malicious. A secure learning model against
evasion attacks on PDF malware detection was proposed by
Khorshidpour11, et al. An attack on text classiers trained using
DNNs was proposed12 using three attack strategies, namely,
insertion, modication and removal of text computed using
FGSM algorithm.
4.2 Poisoning Attacks
Poisoning attacks force an anomaly detection algorithm
to accept an attack point that lies outside of the normal set of
data instances. The attacker adds such adversarial examples to
the training data so that the ML model’s decision boundary can
be manipulated. Poisoning is a train time attack and requires
access to training data.
Kloft13, et al. introduced poisoning attacks and analysed
online centroid anomaly detection and adversarial noise for
poisoning. In face recognition, it is possible to poison face
templates with limited attacker knowledge14. Boiling frog
attack1, a type of poisoning attack, poisons the model over
several weeks by adding small amounts of cha. The detector
is gradually acclimated to cha and fails to identify the large
amount of poisoning done incrementally. An iterative attack
by selecting inputs which results in highest degradation
in classication accuracy was explored using healthcare
4.3 Equation Solving Attack
The equation solving attack16 is applicable on cloud
providers who provide ML as a service via APIs and for models
such as multi-layer perceptron, binary logistic regression and
multi-class logistic regression where they are represented as
equations in known and unknown variables. The goal is to use
the data to nd the unknown variables, which are usually the
parameters used to train the models. These attacks are expected
to reveal information about the model and its architecture to
the attacker.
4.4 Path Finding Attack
Path-nding attacks16 are used to traverse binary trees,
multi-n-ary trees, and regression trees. In these attacks, the
value of each input feature is varied till the conditions at each
node are satised, while the input traverses the tree. The tree is
traversed until a leaf is reached or an internal node with a split
over a missing feature is found. The value of the leaf node is
the output which reveals the path followed.
4.5 Model Inversion Attack
Fredrikson17, et al. propose an algorithm that computes the
optimal input feature vector close to the target feature vector
using a weighted probability estimate that indicates the correct
Figure 4. Generating adversarial example using fast gradient
sign method2.
4.1 Evasion Attacks
Evasion attacks evade the ML model by passing an
adversarial example so that the model misclassies. It is a test
DEF. SCI. J., VOL. 68, NO. 4, JULY 2018
value. The least-biased maximum a posteriori (MAP) estimate
for input feature vector further minimises the adversary’s
incorrect predictions. This is used to create an overall model
which is very close to the target model.
4.6 Black Box Attacks using Transferability
In black box attacks, the adversary has no access to the
data and the model. The attacker can only access the oracle
that returns an output for the input chosen by the attacker. ML
model on cloud is an example of black box scenario where
the adversary has no access to internals of the model and the
training data. The service provider provides a training API using
which the user can send data to the cloud to train the model
and a prediction API to query the model and obtain predictions
as output. In such a scenario, the adversary needs to alleviate
lack of knowledge of the model and lack of knowledge of the
training data.
The lack of knowledge of model can be alleviated using
the property of transferability which states that samples
crafted to mislead model A are likely to mislead model B.
The transferability property of adversarial examples exists
as they span a contiguous subspace of large dimensionality
which intersect enabling transferability18. Transferability can
be achieved in two ways19:
Cross-training Data Transferability: There are two
dierent instances of data: data A and data B. The attacker
trains the local model which is the same as the target model and
on local model using data A while data B is used to train target
model. Adversarial examples are tested on the local model
which is used to attack the target model as shown in Fig. 5.
the oracle and gets a condence score for the prediction based
on which the attacker decides the validity of the synthetic data.
This data and the corresponding labels given as output by the
oracle are used to create a substitute for the local data.
Papernot19, et al. use reservoir sampling to improve the
previous training procedure for the substitute model. They
developed a generalised algorithm for black box attacks using
transferability that exploit adversarial sample transferability on
broad classes of ML algorithms. This was demonstrated using
a deep neural network (DNN) trained on Google and Amazon
cloud services20 as shown in Fig. 7. An ensemble based
approach to generate transferable adversarial examples was
proposed to attack black box models on the cloud21. Hayes22,
et al. introduce a direct attack against black-box neural
networks (NNs) that uses another neural network to learn to
craft adversarial examples and did not use transferability of
adversarial examples unlike previous work.
Figure 5. Cross training data transferability (Same model,
dierent data).
Figure 6. Cross technique transferability (Same data, dierent
Figure 7. Cloud based black box model.
Cross Technique Transferability: In this case, the attacker
has access to the same data that was used to train the target
model. However, he does not have access to the model
internals and the local model is dierent from the target model.
The attacker tries various model combinations to get the most
optimal pair to generate the adversarial examples as shown in
Fig. 6.
The lack of knowledge of data can be alleviated by using
synthetic data generation. The adversary sends synthetic data to
4.7 Member Inference Attack
In member inference attacks, the attacker nds if a query
passed to the prediction API is part of the training set and if
so leak the training data information23.The authors implement
shadow models which predict whether the input is part of the
data or not.
Various attacks on machine learning paradigms, namely,
supervised, unsupervised and reinforcement learning are
discussed here.
5.1 Supervised Learning
In supervised learning, the data passed to the ML model
has labels associated with each input instance. This helps in
supervising the model to classify or predict values for new data
instances. If the target label is a continuous range of values, it
is referred to as regression problem and if the target label is a
discrete value, it is referred to as classication problem. Attack
models on classication models, regression models, Support
vector machines (SVM) and NNs are described as follows.
Classication Models: Biggio24, et al. present techniques to
hide the classier information from the adversary by introducing
randomness in the decision function. Further, Biggio25-26, et al.
argue for improving the robustness of classiers by an over/
under emphasis of input features of the data. The adversarial
classier reverse engineering (ACRE) learning problem27 was
introduced to learn sucient information about a classier so
as to construct adversarial attacks by reverse engineering linear
classiers with either continuous or Boolean features.
Regression Models: Regression problems in which an
adversary can exercise some control over the data generation
process were rst studied by Grobhans28, et al. They model the
problem as a Bayesian game and characterise conditions under
which a unique Bayesian equilibrium point exists.
Attacks on Support Vector Machines: SVMs have been
shown to be vulnerable to label ip attacks where the data
labels are ipped in training data. There are two dierent
strategies for contaminating the training set through label
ipping: random and adversarial label ips8.
Random Label Flips: The attacker randomly selects a
number of samples from the training data and ip their labels.
Adversarial Label Flips: The adversary aims to nd the
combination of label ips which maximises the classication
error on the untainted testing data. Dierent combinations
of label ips are iterated to measure the classication error
corresponding to each combination and retain that combination
which gives maximum classication error and use it to attack
the SVM.
A family of poisoning attacks using gradient ascent based
on SVMs optimal solution have been shown to signicantly
increase the error29. A model for the analysis of label noise
in support vector learning and modication of the SVM
formulation that compensates for the noise by correcting the
kernel matrix was suggested by Biggio8. A novel technique
where an optimisation function was used to nd label ips to
maximise error classication using Tikhonov regularisation
was proposed by Xiao30. Heuristic approach was used as an
extension to improve the performance31. Burkard32, et al.
examine the targeted attack on a SVM that learns from a
continuous data stream.
Attacks on Neural Networks: Szegedy33, et al. were the
rst to identify the misclassication of NNs due to perturbed
images. A maliciously trained neural network or backdoor neural
network that has good performance on the user’s training and
validation samples, but performs poorly on specic attacker-
chosen inputs was introduced34. The Deep Fool algorithm35
eciently computes perturbations that fool deep networks
by minimising the distance between the adversarial example
and the target example corresponding to a target class.
Munoz36, et al. extend the poisoning attacks to multi-class
problems and propose a poisoning algorithm based on back-
gradient optimisation to compute the gradient of interest
through automatic dierentiation to drastically reduce the attack
complexity. Adversarial attacks were shown to be eective
against Convolutional NN37 and categorical and sequential
Recurrent NNs using computational graph unfolding38.
5.2 Unsupervised Learning
In unsupervised learning, data does not have any labels
associated with it and only contains the input features. These
are used to cluster or group the data together based on similar
input features or learn a new representation of data. Attacks on
unsupervised ML models can be categorised into Generative
Models, Autoencoders and Clustering algorithms.
Generative Models: A generative model learns the
underlying probability distributions of training data to generate
and give an estimate of function tting the distribution which
enables model to generate new samples. Generative adversarial
networks (GAN)39 are a type of generative models that generate
new samples by using two networks to play a game against each
other. A discriminator network estimates the probability that
the data is real or fake while the generative network transforms
input to randomly generated samples as output and is trained
to fool the discriminator network.MalGAN40 generates
adversarial malware examples, which are able to bypass black-
box ML based detection models using a substitute detector.
A generative network is trained to minimise the generated
adversarial examples’ malicious probabilities predicted by
the substitute detector, making the retraining based defensive
method against adversarial examples ineective. APE-GAN41
defends against the adversarial examples by eliminating the
adversarial perturbation using a trained network and then feed
the processed example to classication networks.
Autoencoders: Autoencoder is a neural network variant
used for unsupervised learning where the number of neurons is
same in the input and output layer. This reduces the image and
represents it using less number of features (latent representation)
thereby creating a sparse representation of input data for image
compression, removing noisy images and creates new images.
Three classes of attacks on the variational autoencoder (VAE)
and VAE-GAN architectures were presented by Kos42, et al.
The rst attack attaches a classier to the trained encoder of the
target generative model which is used to indirectly manipulate
the latent representation. The second attack uses the VAE loss
function to generate a target reconstruction image from the
adversarial example. The third attack is based on optimising
the dierences in source and target latent representations.
A method to distort the input image to mislead the
autoencoder in reconstructing a completely dierent target
image was given by Tabacof43, et. al. They design an attack
on the internal latent representations to make the adversarial
DEF. SCI. J., VOL. 68, NO. 4, JULY 2018
input produce an internal representation similar to the target’s
representation. Makhzani44, et al. propose the adversarial
autoencoder (AAE), which is a probabilistic autoencoder
that uses generative adversarial networks (GAN) to perform
variational inference by matching the aggregated posterior of
the hidden code vector of the autoencoder with an arbitrary
prior distribution.
Clustering : Clustering is organising a set of data points
into groups of similar features called clusters. A clustering
algorithm can be formalised as a function
, where
i={1,...,n}, and,
is the clustering output and
12{ , ,..., }nDxxx=
. Clustering is extensively used to infer and
understand data without labels and is vulnerable to two main
categories of attacks:
Poisoning : Adversary aims to maximise the distance
between cluster C obtained from data
and cluster C'
obtained from contaminated data
is a set of
adversarial samples, i.e,
A' ).
Obfuscation or Bridging : The goal is to hide attack
samples in clusters without eecting the output. Bridges
are formed between clusters which result in combining
in clusters. Attacker’s goal is to minimise the distance
between Ctarget and C'
( ')
t et
These models are vulnerable mainly due to the inter-cluster
distance which solely depend on the distance between closest
points in the cluster which when minimised, allows attackers
to form a bridge and combine the clusters45. The single link and
complete link hierarchical clustering are vulnerable to bridging
and poisoning attacks47-48.
5.3 Reinforcement Learning
In the reinforcement learning paradigm, an agent is
placed in a situation without knowledge of any goals or other
information about the environment. For every action made by
the agent, it receives a feedback from the environment in the
form of a reward. The agent tries to maximise the reward by
optimising its actions over time and the agent learns to achieve
its goals. In an adversarial setting, there are multiple agents and
an agent wins a game when it is given a positive reinforcement
and its opponent is given negative reinforcement. Maximising
reward corresponds directly to winning games and over time
the agent learns to act so that it wins the game.
Uther49, et al. introduce algorithms to handle the multi-
agent, adversarial, and continuous-valued aspects of the domain
by extending prioritised sweeping that allows generalisation
of learnt knowledge over neighbouring states in the domain
and to allow the handling of continuous state spaces.
Behzadan50, et al. establish that reinforcement learning
techniques based on Deep Q-Networks (DQNs) are vulnerable
to adversarial input perturbations and verify using the
transferability of adversarial examples across dierent DQN
models. They present attacks that enable policy manipulation and
induction in the learning process of DQNs. Huang51, et al. show
that adversarial attacks are also eective when targeting neural
network policies in reinforcement learning using transferability
across policies to attack the Reinforcement Learning model.
A method for reducing the number of adversarial examples
that need to be injected for a successful attack based on the
value function was proposed by Kos52. It was observed that
retraining on random noise and FGSM perturbations improves
the resilience against adversarial examples.
Lin53, et al. introduce two tactics, strategically timed attack
and the enchanting attack, to attack reinforcement learning
agents using adversarial examples. In the strategically-timed
attack, the adversary aims at minimising the agent’s reward
by attacking the agent at a small subset of time steps. In the
enchanting attack, the adversary aims at luring the agent to
a designated target state by combining a generative model to
predict the future states and a planning algorithm to generate a
preferred sequence of actions for luring the agent.
Privacy preserving techniques enable to use ML on data
without knowing underlying content of users data. We study
various privacy preserving models that have been proposed to
ensure the protection of sensitive data. One of the main reasons
for leakage of information through ML models is due to over
tting due to which generalisation becomes very important.
Privacy preserving ML has followed three major directions:
Randomisation algorithms
Secure multi-party computation
Homomorphic encryption (HE)
In CryptoNets54-55, the authors perform neural network
computations on data encrypted using HE and used
approximations to evaluate the Sigmoid, ReLU and max
pooling. The computation is slow due to the noise generated
from HE and security parameters of HE should be considered
carefully based on the noise. Rouhani56, et al. propose a method
to perform DL computation using garbled circuits (GC) and
adopt pre-processing techniques to reduce the GC runtime by
mapping the NN to a lower dimension. Ohrimenko57, et al.
propose a solution for secure multiparty ML by using trusted
Intel SGX processors and used oblivious protocols between
client and server where the input and outputs are blinded.
Mohassel58, et al. use a two server model and distribute the
data into two parts for each server. The authors developed new
privacy preserving protocols for linear regression, logistic
regression and NNs and used garbled circuits for privacy and
arithmetic with pre-computed triplets.
Dierential privacy has been explored to ensure privacy
guarantees for ML models for non-convex objective functions
using dierentially private stochastic gradient descent59,61.
Shokri60, et al. designed a model where participants use
parameter sharing, allowing participants to benet from other
participants’ models without explicit sharing of training inputs.
After each round of training, participants asynchronously share
with each other the gradients they computed for some of the
Many approaches to building defences against adversarial
attacks have been proposed over the past few years. We present
dierent possible defences that have been proposed over the
years and discuss their shortcomings.
Gradient masking is based on the idea that if the model is
non-dierentiable or the model’s gradient is zero at data points,
then gradient based attacks are ineective. Two major types of
gradient masking are Gradient hiding and Gradient smoothing.
Gradient Hiding uses models that are non-dierentiable and
are highly non-linear which prevent the adversary from nding
the derivative. Gradient smoothing reduces the eectiveness
of white-box attacks by smoothing out the model’s gradient,
leading to numerical instabilities in attacks such as the FGSM.
However, in both white-box and black-box settings, models
are still vulnerable even after using gradient masking62.
Papernot63-64, et al. designed a defence based on distillation
technique where the authors leverage the softmax layer of
neural network.
A low value of temperature parameter T will result in
high condence but discrete probabilities while a high value
of T will reduce condence of prediction but smooth out the
probability distribution which makes crafting of adversarial
examples hard. Carlini65, et al. argued that the softmax layer
and function used does not change output even if input is
changed beyond certain values which was not considered in
defensive distillation. They suggested dividing the inputs
to the softmax by T before passing them to the function. To
make it more robust, Papernot66, et al. improved the defence
to extended defensive distillation and modied their previous
defensive distillation to address the numerical instabilities in
the previous model and attacks like black box attacks using
transferability. They modied the algorithm and instead of
using the probabilities from rst model they measured the
uncertainties in classifying output using dropout inference.
Szegedy33, et al. increase the model’s robustness by
injecting adversarial examples to the training data referred
to as adversarial training which was extended to ensemble
adversarial training that additionally augments training data
with perturbed inputs transferred from a number of xed pre-
trained models67. Adversarial training to the text domain was
explored by applying perturbations to the word embedded in a
recurrent neural network68.
Xu69, et al. detect adversarial examples by reducing the
colour depth of each pixel in an image, and spatial smoothing
to reduce the dierence among individual pixels. They compare
the model’s output with and without using feature squeezing
and dierentiate between adversarial or benign based on the
output. A safety net architecture was proposed by Lu70, et al.
that consists of the original classier and an adversary detector
which looks at the internal state of the later layers in the
original classier to detect adversarial examples. Similar work
was explored Metzen71. Reject on negative impact (RONI)
defence72 is a technique that measures the empirical eect
of each training instance and eliminates from training those
points that have a substantial negative impact on classication
Data transformations like dimensionality reduction
using Principal component analysis and data anti-whitening
to enhance the resilience of ML models were explored by
Bhagoji73, et al. However, adversarial examples can be made
robust to data transformations like rescaling, translation, and
rotation and an approach that produces images that remain
adversarial examples even after transformations74.
The security of linear classier itself can be improved by
using evenly weighted feature weights as this would require
the attacker to manipulate more features to evade detection75.
Feature selection methods are also compromised under
attack76. An adversary-aware feature selection model that can
improve classier security against evasion attacks was proposed
by selecting a feature subset that maximises the generalisation
capability of the classier77. It includes forward selection and
backward elimination wrapping algorithms, which iteratively
add or delete a feature from the current candidate set. Feature
squeezing techniques successfully detect the recent Carlini-
Wagner adversarial examples69.
The various defences described in this section are specic
to models using a particular learning algorithm. As a result,
a defence mechanism that is applicable to one model is not
applicable to some other model. However, there is no silver
bullet to defend all ML systems against adversarial attacks.
Cyberwargames are designed to examine how organisations
and critical response teams respond to realistic/ simulated cyber
crises and highly skilled adversaries. The wargaming process
comprises of the identication, defence, response, and recovery
phases to a cyberattack in depth. Cyberwargames that use Game
Theory to model the attackers and defenders are designed by
setting up a cyber test-bed to exercise cyberattack scenarios on
a network environment78-80. In the game theoretic framework,
two approaches have been used: a probabilistic framework and
a Bayesian belief framework where the attack and defender
try to anticipate the opponent’s strategy with complete and
incomplete information with learning. In this paper, we
describe the various components of cyber attacks in adversarial
machine learning environments namely: Vulnerabilities of
ML models in cyber warfare settings, adversary modelling,
attack modelling, defence modelling and data privacy in ML
models. In this comprehensive survey, we integrate the various
adversarial machine learning techniques in the cyber warfare
setting to analyse the dynamic attack and defence strategies to
improve the security of the simulated system.
1. Huang, L.; Joseph, A.D.; Nelson, B.; Rubinstein, Benjamin
I.P. & Tygar, J.D. Adversarial machine learning. Chicago,
Illinois, USA. AISec’ 11, October 21, 2011, pp. 43-58.
2. Goodfellow, I.J.; Shlens, J. & Szegedy, C. Explaining
and harnessing adversarial examples. In the International
Conference on Learning Representations (ICLR), San
Diego, CA, USA, 2015.
3. Kurakin, A.; Goodfellow, I.J. & Bengio, S. Adversarial
examples in the physical world. In the International
Conference on Learning Representations (ICLR), Toulon,
DEF. SCI. J., VOL. 68, NO. 4, JULY 2018
France, 2017.
4. Kurakin, A.; Goodfellow, I.J. & Bengio, S. Adversarial
machine learning at scale. In the International Conference
on Learning Representations (ICLR), Toulon, France,
5. Xie, C.; Wang, J.; Zhang, Z.; Zhou, Y.; Xie, L. & Yuille,
A. Adversarial examples for semantic segmentation and
object detection. arXiv:1703.08603v3 [cs.CV].
6. Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.;
Celik, Z.B. & Swami, A. The limitations of deep learning
in adversarial settings. In IEEE European Symposium on
Security and Privacy (EuroS&P), 2016, pp. 372-387.
doi: 10.1109/EuroSP.2016.36
7. Carlini, N. & Wagner, D. Towards evaluating the
robustness of neural networks. In IEEE Symposium on
Security and Privacy, San Jose, CA, USA, 2017, pp. 39-
doi: 10.1109/SP.2-17.49
8. Biggio, B.; Nelson, B. & Laskov, P. Support vector
machines under adversarial label noise. In JMLR:
Workshop and Conference Proceeding, Taoyuan, Taiwan,
2011, pp. 97-112.
9. Srndic, N. & Laskov, P. Practical evasion of a learning-
based classier: A case study. In IEEE Symposium on
Security and Privacy, San Jose, CA, USA, 2014, pp. 197-
doi: 10.1109/SP.2014.20.
10. Xu, W.; Qi, Y.; Evans, D. Automatically evading classiers:
A case study on PDF malware classiers. In Network and
Distributed System Security Symposium 2016 (NDSS),
San Diego, February 2016.
11. Khorshidpour, Z.; Hashemi, S. & Hamzeh, A. Learning
a secure classier against evasion attack. In IEEE 16th
International Conference on Data Mining Workshop,
Barcelona, Spain, 2016, pp. 295-302.
doi: 10.1109/ICDMW.2016.0049
12. Liang B.; Li, H.; Su, H.; Bian, M.; Li, M. & Shi, X. Deep
text classication can be fooled. arxiv:1704.08006 [cs.
13. Kloft, M. & Laskov, P. Online anomaly detection under
adversarial impact. In International Conference on
Articial Intelligence and Statistics (AISTATS), Sardinia,
Italy, 2010.
14. Biggio, B.; Didaci, L.; Fumera, G. & Roli, F. Poisoning
attacks to compromise face templates. In International
Conference on Biometrics (ICB), Madrid, Spain, 2013,
pp. 1-7.
doi: 10.1109/ICB.2013.6613006.
15. Mozaari-Kermani, M.; Sur-Kolay, S.; Raghunathan, A.
& Jha, N.K.; Systematic poisoning attacks on and defenses
for machine learning in healthcare. J. Biom. Health Infor.,
2015, 19(6), 1893-1905.
doi: 10.1109/JBHI.2014.2344095
16. Tramer, F.; Zhang, F.; Juels, A.; Reiter, M.K & Ristenpart,
T. Stealing machine learning models via prediction APIs.
In Proceedings of 25th Usenix Security Symposium,
Austin, Texas, 2016.
17. Frekrikson, M.; Jha, S. & Ristenpart, T. Model inversion
attacks that exploit condence information and Basic
Countermeasures. In Proceedings of the 22nd ACM
SIGSAC Conference on Computer and Communications
Security (CCS’15), Colorado, USA, 2015, pp. 1322-
doi: 10.1145/2810103.2813677.
18. Tramèr, F.; Papernot, N.; Goodfellow, I.J.; Boneh, D.
& McDaniel, P. The space of transferable adversarial
examples. arXiv:1704.03453v2[stat.ML].
19. Papernot, N.; McDaniel, P. & Goodfellow, I.J.
Transferability in machine learning: From phenomena
to black-box attacks using adversarial samples. arXiv:
1605.07277v1 [cs.CR].
20. Papernot, N., McDaniel, P., Goodfellow, I.J., Jha, S.,
Berkay Celik, Z. & Swami, A. Practical black-box attacks
against machine learning. In Proceedings of the 2017 ACM
on Asia Conference on Computer and Communications
Security (ASIA CCS’17), Abu Dhabi, 2017, pp.506-519.
doi: 10.1145/3052973.3053009.
21. Liu, Y.; Chen, X.; Liu, C. & Song, D. Delving into
transferable adversarial examples and black-box attacks.
arXiv:1611.02770v3 [cs.LG].
22. Hayes, J. & Danezis, G. Machine learning as an adversarial
service: Learning black-box adversarial examples. arXiv:
1708.05207v1 [cs.CR].
23. Shokri, R.; Stronati, M.; Song, C. & Shmatikov, V.
Membership inference attacks against machine learning
models. In IEEE Symposium on Security and Privacy
(S&P) -- Oakland, 2017, pp. 3-18.
doi: 10.1109/SP.2017.41
24. Biggio, B.; Fumera, G. & Roli, F. Adversarial pattern
classication using multiple classiers and randomisation.
In Proceedings of the 2008 Joint IAPR International
Workshop on Structural, Syntactic, and Statistical Pattern
Recognition (SSPR’08), Florida, USA, 2008, pp. 500-
25. Biggio, B.; Fumera, G. & Roli, F. Multiple classier
systems under attack. In Proceedings of the 9th
international conference on Multiple Classier Systems,
Cairo, Egypt, 2010, pp. 74-83.
doi: 10.1007/978-3-642-12127-2_8.
26. Biggio, B.; Fumera, G. & Roli, F. Security evaluation of
pattern classiers under attack. In IEEE Transactions on
Knowledge and Data Engineering, 2014, 26(4), pp. 984-
27. Lowd, D. & Meek, C. Adversarial learning. KDD, Illinois,
USA, 2005, pp. 641-647.
28. Grobhans, M.; Sawade, C.; Bruckner, M. & Scheer,
T. Bayesian games for adversarial regression problems.
In Proceedings of the 30th International Conference on
International Conference on Machine Learning, Atlanta,
GA, USA, 2013, pp. 55-63.
29. Biggio, B.; Nelson, B. & Laskov, P. Poisoning attacks
against support vector machine. ICML, Edinburg,
Scotland, 2012, pp. 1467-1474.
30. Xiao, H.; Xiao, H. & Eckert, C. adversarial label ips attack
on support vector machines. In ECAI’12 Proceedings of
the 20th European Conference on Articial Intelligence,
Montpellier, France, 2012, pp. 870-875.
doi: 10.3233/978-1-61499-098-7-870
31. Xiao, H.; Biggio, B.; Nelson, B.; Xiao, H.; Eckert, C. &
Roli, F. Support vector machines under adversarial label
contamination. In Neurocomputing, 2014, 160(C), 53-
doi: 10.1016/j.neucom.2014.08.081
32. Burkard, C. & Lagesse, B. Analysis of causative attacks
against SVMs learning from data streams. IWSPA,
Scottsdale, Arizona, 2017, pp. 31-36.
doi: 10.1145/3041008.3041012
33. Szegedy, C.; Erhan, D.; Ilya Sutskever, W.Z.; Goodfellow,
I.J.; Bruna, J. & Fergus, R. Intriguing properties of neural
networks. arXiv: 1312.6199v4 [cs.CV].
34. Gu, T.; Dolan-Gavitt, B. & Garg, S. BadNets: Identifying
vulnerabilities in the machine learning model supply
chain. arXiv:1708.06733v1 [cs.CR].
35. Dezfooli, S. M.; Fawzi, A. & Frossard, P. DeepFool: A
simple and accurate method to fool deep neural networks.
In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2016, pp. 2574-2582.
doi: 10.1109/CVPR.2016.282.
36. Munoz-Gonzalez, L.; Biggio, B.; Demontis, A.; Paudice,
A.; Wongrassamee, V.; Lupu, E. C. & Roli, F. Towards
poisoning of deep learning algorithms with back-gradient
optimization. In Proceedings of the 10th ACM Workshop
on Articial Intelligence and Security, 2017, pp. 27-38.
37. Narodytska, N. & Kasiviswanathan, S. Simple black-box
adversarial attacks on deep neural networks. In IEEE
Conference on Computer Vision and Pattern Recognition
Workshop, Hawaii, USA, 2017, pp. 1310-1318.
doi: 10.1109/CVPRW.2017.172
38. Papernot, N.; McDaniel, P.; Swami, A. & Harang, R.
Crafting adversarial input sequences for recurrent neural
networks. In Military Communications Conference,
MILCOM, LA, USA, 2016.
39. Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.;
Warde-Farley, D.; Ozair, S.; Courville, A. & Bengio, Y.
Generative adversarial nets. In NIPS’14 Proceedings of
the 27th International Conference on Neural Information
Processing Systems, Montreal, Canada, 2014, pp. 2672-
40. Hu, W. & Tan, Y. Generating adversarial malware examples
for black-box attacks based on GAN. arXiv:1702.05983v1
41. Shen, S.; Jin, G. & Gao, K. APEGAN: Adversarial
perturbation elimination with GAN. arXiv: 1707.05474v3
42. Kos, J.; Fischer, I. & Song, D.; Adversarial examples for
generative models. arXiv:1702.06832v1 [stat.ML].
43. Tabacof, P.; Tavares, J. & Valle, E. Adversarial images for
variational autoencoders. arXiv:1612.00155v1 [cs.NE].
44. Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.J. &
Frey, B. Adversarial autoencoders. arXiv:1511.05644v2
45. Biggio, B.; Pillai, I.; Rota Bulo, S.; Ariu, D.; Pelillo, M. &
Roli, F. Is data clustering in adversarial settings secure?.
AISec, Berlin, Germany, 2013, pp. 87-98.
doi: 10.1145/2517312.2517321
46. Dutrisac, J.D. & Skillicorn, D.B. Hiding clusters in
adversarial settings. In IEEE International Conference on
Intelligence and Security Informatics(ISI), 2008, pp. 185-
doi: 10.1109/ISI.2008.4565051
47. Biggio, B. Poisoning complete-linkage hierarchical
clustering. In Joint IAPR Int’l Workshop on Structural,
Syntactic, and Statistical Pattern Recognition (LNCS),
Joensuu, Finland, 2014, 8621, pp. 42-52.
doi: 10.1007/978-3-662-44415-3_5
48. Biggio, B. Poisoning behavioral malware clustering. In
Proceedings of the 2014 ACM Workshop on Articial
Intelligence and Security, colocated with CCS ‘14,
Scottsdale, Arizona, USA, 2014, pp. 27-36.
doi: 10.1145/2666652.2666666
49. Uther, W. & Veloso, M. Adversarial reinforcement
learning. 1997.
50. Behzadan, V. & Munir, A. Vulnerability of deep
reinforcement learning to policy induction attacks. In
International Conference on Machine Learning and Data
Mining in Pattern Recognition, 2017.
51. Huang, S.; Papernot, N.; Goodfellow, I.J; Duan, Y. &
Abbeel, P. Adversarial attacks on neural network policies.
arXiv: 1702.02284v1 [cs.LG].
52. Kos, J. & Song, D. Delving into adversarial attacks on
deep policies. Workshop track - ICLR, 2017.
53. Chen Lin, Y.; Wei Hong, Z.; Hong Liao, Y.; Shih, M.;
Liu, M. & Sun, M.; Tactics of adversarial attack on deep
reinforcement learning agents. In Proceedings of the 26th
International Joint Conference on Articial Intelligence,
IJCAI’17, Melbourne, Australia, 2017.
54. Xie, P. CryptoNets: Neural networks over encrypted data.
arXiv:1412.6181 [cs.LG].
55. Dowlin, N. CryptoNets: Applying neural networks to
encrypted data with high thorughput and accuracy. In
Proceedings of the 33rd International Conference on
Machine Learning, New York, NY, USA, 2016, pp. 201-
56. Rouhani, B.D.; Sadegh Riazi, M. & Koushanfar, F.
DeepSecure: Scalable provably-secure deep learning.
arXiv:1705.08963 [cs.CR].
57. Ohrimenko, O.; Schuster, F.; Fournet, C.; Mehta, A;
Nowozin, S.; Vaswani, K. & Costa, M. Oblivious multi-
party machine learning on trusted processors. In 25th
USENIX Security Symposium, Austin, TX, USA, 2016,
pp. 619-636.
58. Mohassel, P. & Zhang, Y. SecureML: A system for scalable
privacy-preserving machine learning. In IEEE Security
and Privacy Symposium, San Jose, CA, USA, 2015, pp.
doi: 10.1109/SP.2017.12
59. Papernot, N.; Abadi, M.; Erlingsson, U.; Goodfellow, I.J.
& Talwar, K. Semi-supervised knowledge transfer for
deep learning from private training data. In International
Conference on Learning Representations (ICLR), Toulon,
DEF. SCI. J., VOL. 68, NO. 4, JULY 2018
France, 2017.
60. Shokri, R. & Shmatikov, V. Privacy-preserving deep
learning. CCS’15, Colarado, USA, 2015, pp. 1310-1321.
doi: 10.1145/2810103.2813687
61. Abadi, M.; McMahan, H.B.; Chu, A.; Mironov, I.; Zhang,
L.; Goodfellow, I.J. & Talwar, K. Deep learning with
dierential privacy. CCS’16, Vienna, Austria, 2016, pp.
doi: 10.1145/2976749.2978318
62. Papernot, N.; McDaniel, P.; Sinha, A. & Wellman, M.
Towards the science of security and privacy in machine
learning. arxiv:1611.03814.
63. Hinton, G.; Vinyals, O. & Dean, J.; Distilling the
knowledge in a neural network. arXiv:1503.02531v1
64. Papernot, N.; McDaniel, P.; Wu, X.; Jha, S. & Swami, A.
Distillation as a defense to adversarial perturbations against
deep neural networks. In the 37th IEEE Symposium on
Security & Privacy, San Jose, CA, USA, 2016, pp. 582-
doi: 10.1109/SP.2016.41
65. Carlini, N. & Wagner, D. Defensive distillation is not
robust to adversarial examples. arXiv, 2016.
66. Papernot, N. & McDaniel, P. Extending defensive
distillation. arxiv: 1705.05264v1 [cs:LG].
67. Tramer, F.; Kurakin, A.; Papernot, N.; Boneh, D.
& McDaniel, P. Ensemble adversarial training.
arXiv:1705.07204v2 [stat.ML].
68. Miyato, T.; Dai, A. M. & Goodfellow, I.J. Adversarial
training methods for semi-supervised text classication.
In International Conference on Learning Representations
(ICLR), Toulon, France, 2017.
69. Xu, W.; Evans, D. & Qi, Y. Feature squeezing:
detecting adversarial examples in deep neural networks.
arXiv:1704.01155v1 [cs.CV].
70. Lu, J.; Issaranon, T. & Forsyth, D. SafetyNet:
Detecting and rejecting adversarial examples robustly.
arXiv:1704.00103v2 [cs.CV].
71. Metzen, J.K.; Genewein, T.; Fischer, V. & Bischo, B.
On Detecting adversarial perturbations. ICLR, Toulon,
France, 2017.
72. Barreno, M.; Nelson, B.; Joseph, A.D. & Tygar, J.D. The
security of machine learning. Machine Learning J., 2010,
81(2), 121-148.
doi: 10.1007/s10994-010-5188-5
73. Bhagoji, A.N; Cullina, D.; Sitawarin, B. & Mittal, P.
Enhancing robustness of machine learning systems via
data transformations. arxiv:1704.02654v3 [cs:CR].
74. Athalye, A. & Sutskever, I. Synthesizing robust adversarial
examples. arXiv:1707.07397v1 [cs.CV].
75. Demontis, A.; Melis, M.; Biggio, B.; Maiorca, D.; Arp,
D.; Rieck, K.; Corona, I.; Giacinto, G. & Roli, F. Yes.
Machine learning can be more secure! A case study on
android malware detection. In IEEE Transactions on
Dependable and Secure Computing, 2017, Early Access,
pp 1-1.
doi: 10.1109/TDSC.2017.2700270
76. Xiao, H.; Biggio, B.; Brown, G.; Fumera, G.; Eckert, C.
& Roli, F. Is Feature selection secure against training data
poisoning? In the Proceedings of the 32nd International
Conference on Machine Learning, Lille, France, 2015,
37, pp. 1689-1698.
77. Zhang, F.; Chan, P.; Biggio, B.; Yeung, D.S. & Roli, F.
Adversarial feature selection against evasion attacks.
IEEE Trans. Cybernetics, 2016, 46(3), 766-777.
doi: 10.1109/TCYB.2015.2415032
78. Ravishankar, M.; Vijay Rao, D. & Kumar, C.R.S. Game
theory based defence mechanisms of cyber warfare. In 1st
Conference on Latest Advances in Machine Learning and
Data Science LAMDA, NIT Goa, 2017.
79. Ravishankar, M.; Vijay Rao, D. & Kumar, C.R.S. A Game
theoretic approach to modeling jamming attacks, In delay
tolerant networks. Def. Sci. J., 2017, 67(3), 282-290.
doi: 10.14429/dsj.67.10051
80. Ravishankar, M.; Vijay Rao, D. & Kumar, C.R.S. A game
theoretic software test-bed for cyber security of critical
infrastructure. Def. Sci. J., 2018, 68(1), 54-63.
doi: 10.14429/dsj.68.11402
81. Biggio, B.; Corona, I.; Maiorca, D.; Nelson, B.;
Šrndić, N.; Laskov, P.; Giacinto, G & Roli, F.
Evasion attacks against machine learning at test
time. In Lecture Notes in Computer Science,
2013, 8190, pp. 387-402.
doi: 10.1007/978-3-642-40994-3_25
The author would like to thank Dr A.K. Sinha, Scientist
G, DRDO-Defence Terrain Research Laboratory, Delhi and
Dr D. Vijay Rao, Scientist G, DRDO-Institute for Systems
Studies and Analyses, Delhi for the fruitful discussions,
encouragement and guidance; and the anonymous reviewers
for their suggestions and critical reviews that have greatly
improved the quality of the paper.
Mr Vasisht Duddu is pursuing BTech (Electronics and
Communications Engineering) from Indraprastha Institute of
Information Technology (IIIT), Delhi and is currently working
as a researcher at System Security Lab, School of Computing,
National University of Singapore(NUS), Singapore. His primary
areas of research are security, privacy, anonymity and applied
... Fourthly, their performance is considerably affected by the curse of dimensionality usually associated with real-world data. Lastly, the use of data out of the underlying data distribution is used for ML model training and testing in which attackers can craft adversarial examples and cause performance degradation [3,4]. ...
... Researchers have reported the potential vulnerability of ML models to adversarial attacks [1,3,4]. This has promoted researcher efforts related to Adversarial Machine Learning (AML) challenges. ...
... In addition, they can also be grouped based on their architecture into (i) Traditional shallow learning and (ii) Deep learning techniques [1,2]. One should note that the direct application of traditional ML techniques proved to be inefficient in handling data in an IoT environment [3,4]. This can be inferred from several reasons related to ML algorithms in terms of complexity, scalability, real-time processing, ...
Full-text available
Internet of Things (IoT) technologies serve as a backbone of cutting-edge intelligent systems. Machine Learning (ML) paradigms have been adopted within IoT environments to exploit their capabilities to mine complex patterns. Despite the reported promising results, ML-based solutions exhibit several security vulnerabilities and threats. Specifically, Adversarial Machine Learning (AML) attacks can drastically impact the performance of ML models. It also represents a promising research field that typically promotes novel techniques to generate and/or defend against Adversarial Examples (AE) attacks. In this work, a comprehensive survey on AML attack and defense techniques is conducted for the years 2018–2022. The article investigates the employment of AML techniques to enhance intrusion detection performance within the IoT context. Additionally, it depicts relevant challenges that researchers aim to overcome to implement proper IoT-based security solutions. Thus, this survey aims to contribute to the literature by investigating the application of AML concepts within the IoT context. An extensive review of the current research trends of AML within IoT networks is presented. A conclusion is reached where several findings are reported including a shortage of defense mechanisms investigations, a lack of tailored IoT-based solutions, and the applicability of the existing mechanisms in both attack and defense scenarios.
... This can include adversarial attempts against deep RL with limited knowledge (black box) to those with complete knowledge (white box) with a variety of different methods, such as data poisoning. 91 Thus, here we can see that deep RL will not 'solve' conf lict but may appear to offer new perspectives where agents act through approximation and optimisation, introducing greater risks to the 'control' of conf lict. Such attacks, where incommensurability is a condition of its performance, then add yet another imperceptible trace of unknowability in their ethico-political standing. ...
Full-text available
... This allows organizations to better understand and prepare for potential cyber threats and improve their overall cybersecurity posture. However, traditional ML-based applications have limitations as they are typically trained on historical data and have limited generalizability [7][8][9]. The rapid progress of artificial intelligence (AI) presents the possibility of AI-assisted or self-governing AI red teaming, where AI can use its superior decision-making ability, learned through AI training, to create new attack methods against complex cybersystems that human red team experts may not have considered yet [10]. ...
Full-text available
Cybersecurity is a growing concern in today’s interconnected world. Traditional cybersecurity approaches, such as signature-based detection and rule-based firewalls, are often limited in their ability to effectively respond to evolving and sophisticated cyber threats. Reinforcement learning (RL) has shown great potential in solving complex decision-making problems in various domains, including cybersecurity. However, there are significant challenges to overcome, such as the lack of sufficient training data and the difficulty of modeling complex and dynamic attack scenarios hindering researchers’ ability to address real-world challenges and advance the state of the art in RL cyber applications. In this work, we applied a deep RL (DRL) framework in adversarial cyber-attack simulation to enhance cybersecurity. Our framework uses an agent-based model to continuously learn from and adapt to the dynamic and uncertain environment of network security. The agent decides on the optimal attack actions to take based on the state of the network and the rewards it receives for its decisions. Our experiments on synthetic network security show that the DRL approach outperforms existing methods in terms of learning optimal attack actions. Our framework represents a promising step towards the development of more effective and dynamic cybersecurity solutions.
Full-text available
As the adoption of machine learning models increases, ensuring robust models against adversarial attacks is increasingly important. With unsupervised machine learning gaining more attention, ensuring it is robust against attacks is vital. This paper conducts a systematic literature review on the robustness of unsupervised learning, collecting 86 papers. Our results show that most research focuses on privacy attacks, which have effective defenses; however, many attacks lack effective and general defensive measures. Based on the results, we formulate a model on the properties of an attack on unsupervised learning, contributing to future research by providing a model to use.
Medical Artificial Intelligence (MedAI) harnesses the power of medical research through AI algorithms and vast data to address healthcare challenges. The security, integrity, and credibility of MedAI tools are paramount because human lives are at stake. Predatory research, in a culture of ‘publish or perish’, is exploiting the ‘pay for publish’ model to infiltrate he research literature repositories. Though, it is challenging to measure the actual predatory research induced data pollution and patient harm, our work shows that the breached integrity of MedAI inputs is a serious threat to trust the MedAI output. We review a wide range of research literature discussing the threats of data pollution in the research literature, feasible attacks impacting MedAI solutions, research literature-based tools, and influence on healthcare. Our contribution lies in presenting a comprehensive literature review, addressing the gap of predatory research vulnerabilities affecting MedAI solutions, and helping to develop robust MedAI solutions in the future.
Among the issues the information system security community has to fix, the security of both data and algorithms is a concern. The security of algorithms is dependent on the reliability of the input data. This reliability is questioned, especially when the data is generated by humans (or bots operated by humans), such as in online social networks. Event detection algorithms are an example of technology using this type of data, but the question of the security is not systematically considered in this literature. We propose in this paper a first contribution to a threat model to overcome this problem. This threat model is composed of a description of the subject we are modelling, assumptions made, potential threats and defence strategies. This threat model includes an attack classification and defensive strategies which can be useful for anyone who wants to create a resilient event detection algorithm using online social networks.KeywordsThreat modelAdversarial LearningOnline Social NetworkEvent DetectionSecurity
E-health is a modern technology produced with the evolution and amalgamation of modern technologies such as the Internet of things (IoT) and machine learning (ML). The exploitation of efficient and suitable ML techniques to obtain appropriate data can enhance the mechanism of detection and ultimately prevent diseases. However, the datasets available in repositories for computerized medical analysis are inappropriate, incomplete, and prone to alteration and attacks. In this work, we consider attacks such as poison and evasion and analyze their effect on the decision-making processes in e-health. The results illustrate that the performance of the original model is high in almost all cases compared to the accuracy attained by the combined poisoned model. Interestingly, although the performance of the original model is higher, the difference is not that significant. For example, the artificial neural network achieves an accuracy of 75.39% on the original set. On the poisoned set, the artificial neural network achieves an accuracy of 74.5%. This means that the overall difference is just 1%. A similar trend can be found with the other classifiers except for the SVM and the logistic regression, where the difference is comparatively high. As such, our research proves that the protection of data in the training and testing phase is comparatively more important than the selection and application of the best ML technique.
Full-text available
p class="p1">National critical infrastructures are vital to the functioning of modern societies and economies. The dependence on these infrastructures is so succinct that their incapacitation or destruction has a debilitating and cascading effect on national security. Critical infrastructure sectors ranging from financial services to power and transportation to communications and health care, all depend on massive information communication technology networks. Cyberspace is composed of numerous interconnected computers, servers and databases that hold critical data and allow critical infrastructures to function. Securing critical data in a cyberspace that holds against growing and evolving cyber threats is an important focus area for most countries across the world. A novel approach is proposed to assess the vulnerabilities of own networks against adversarial attackers, where the adversary’s perception of strengths and vulnerabilities are modelled using game theoretic techniques. The proposed game theoretic framework models the uncertainties of information with the players (attackers and defenders) in terms of their information sets and their behaviour is modelled and assessed using a probability and belief function framework. The attack-defence scenarios are exercised on a virtual cyber warfare test-bed to assess and evaluate vulnerability of cyber systems. Optimal strategies for attack and defence are computed for the players which are validated using simulation experiments on the cyber war-games testbed, the results of which are used for security analyses.</p
Full-text available
Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper we show that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a \emph{BadNet}) that has state-of-the-art performance on the user's training and validation samples, but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our US street sign detector can persist even if the network is later retrained for another task and cause a drop in accuracy of {25}\% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and---because the behavior of neural networks is difficult to explicate---stealthy. This work provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.
Conference Paper
Full-text available
Deep learning classifiers are known to be inherently vulnerable to manipulation by intentionally perturbed inputs, named adversarial examples. In this work, we establish that reinforcement learning techniques based on Deep Q-Networks (DQNs) are also vulnerable to adversarial input perturbations, and verify the transferability of adversarial examples across different DQN models. Furthermore, we present a novel class of attacks based on this vulnerability that enable policy manipulation and induction in the learning process of DQNs. We propose an attack mechanism that exploits the transferability of adversarial examples to implement policy induction attacks on DQNs, and demonstrate its efficacy and impact through experimental study of a game-learning scenario.
Conference Paper
A number of online services nowadays rely upon machine learning to extract valuable information from data collected in the wild. This exposes learning algorithms to the threat of data poisoning, i.e., a coordinate attack in which a fraction of the training data is controlled by the attacker and manipulated to subvert the learning process. To date, these attacks have been devised only against a limited class of binary learning algorithms, due to the inherent complexity of the gradient-based procedure used to optimize the poisoning points (a.k.a. adversarial training examples). In this work, we first extend the definition of poisoning attacks to multiclass problems. We then propose a novel poisoning algorithm based on the idea of back-gradient optimization, i.e., to compute the gradient of interest through automatic differentiation, while also reversing the learning procedure to drastically reduce the attack complexity. Compared to current poisoning strategies, our approach is able to target a wider class of learning algorithms, trained with gradient-based procedures, including neural networks and deep learning architectures. We empirically evaluate its effectiveness on several application examples, including spam filtering, malware detection, and handwritten digit recognition. We finally show that, similarly to adversarial test examples, adversarial training examples can also be transferred across different learning algorithms.
Neural networks are known to be vulnerable to adversarial examples, inputs that have been intentionally perturbed to remain visually similar to the source input, but cause a misclassification. Until now, black-box attacks against neural networks have relied on transferability of adversarial examples. White-box attacks are used to generate adversarial examples on a substitute model and then transferred to the black-box target model. In this paper, we introduce a direct attack against black-box neural networks, that uses another attacker neural network to learn to craft adversarial examples. We show that our attack is capable of crafting adversarial examples that are indistinguishable from the source input and are misclassified with overwhelming probability - reducing accuracy of the black-box neural network from 99.4% to 0.77% on the MNIST dataset, and from 91.4% to 6.8% on the CIFAR-10 dataset. Our attack can adapt and reduce the effectiveness of proposed defenses against adversarial examples, requires very little training data, and produces adversarial examples that can transfer to different machine learning models such as Random Forest, SVM, and K-Nearest Neighbor. To demonstrate the practicality of our attack, we launch a live attack against a target black-box model hosted online by Amazon: the crafted adversarial examples reduce its accuracy from 91.8% to 61.3%. Additionally, we show attacks proposed in the literature have unique, identifiable distributions. We use this information to train a classifier that is robust against such attacks.
Conference Paper
We introduce two tactics, namely the strategically-timed attack and the enchanting attack, to attack reinforcement learning agents trained by deep reinforcement learning algorithms using adversarial examples. In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. In the enchanting attack, the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples is then crafted to lure the agent to take the preferred sequence of actions. We apply the proposed tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategically-timed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking the agent 4 times less often. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate. Example videos are available at