ArticlePDF Available

Privacy and Security Issues in Deep Learning: A Survey

Authors:

Abstract and Figures

Deep Learning (DL) algorithms based on artificial neural networks have achieved remarkable success and are being extensively applied in a variety of application domains, ranging from image classification, automatic driving, natural language processing to medical diagnosis, credit risk assessment, intrusion detection. However, the privacy and security issues of DL have been revealed that the DL model can be stolen or reverse engineered, sensitive training data can be inferred, even a recognizable face image of the victim can be recovered. Besides, the recent works have found that the DL model is vulnerable to adversarial examples perturbed by imperceptible noised, which can lead the DL model to predict wrongly with high confidence. In this paper, we first briefly introduces the four types of attacks and privacy-preserving techniques in DL. We then review and summarize the attack and defense methods associated with DL privacy and security in recent years. To demonstrate that security threats really exist in the real world, we also reviewed the adversarial attacks under the physical condition. Finally, we discuss current challenges and open problems regarding privacy and security issues in DL.
Content may be subject to copyright.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
Privacy and Security Issues in Deep
Learning: A Survey
XIMENG LIU1,2, (MEMBER, IEEE), LEHUI XIE1,2 , YAOPENG WANG1,2, JIAN ZOU1,2 , JINBO
XIONG3, (MEMBER, IEEE), ZUOBIN YING4, (MEMBER, IEEE) AND ATHANASIOS V.
VASILAKOS1,5,6, (SENIOR MEMBER, IEEE)
1College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China
2Fujian Provincial Key Laboratory of Information Security of Network Systems, Fuzhou University, Fuzhou 350108, China
3Fujian Provincial Key Laboratory of Network Security and Cryptology, College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117,
China
4School of Electrical & Electronic Engineering, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore
5School of Electrical and Data Engineering, University Technology Sydney, Australia
6Department of Computer Science, Electrical and Space Engineering, Lulea University of Technology, 97187 Lulea, Sweden
Corresponding author: Jian Zou (zoujian@fzu.edu.cn).
This work was funded by the National Natural Science Foundation of China under Grant Nos. U1804263 and 61702105.
ABSTRACT Deep Learning (DL) algorithms based on artificial neural networks have achieved remarkable
success and are being extensively applied in a variety of application domains, ranging from image
classification, automatic driving, natural language processing to medical diagnosis, credit risk assessment,
intrusion detection. However, the privacy and security issues of DL have been revealed that the DL model
can be stolen or reverse engineered, sensitive training data can be inferred, even a recognizable face image
of the victim can be recovered. Besides, the recent works have found that the DL model is vulnerable to
adversarial examples perturbed by imperceptible noised, which can lead the DL model to predict wrongly
with high confidence. In this paper, we first briefly introduces the four types of attacks and privacy-
preserving techniques in DL. We then review and summarize the attack and defense methods associated
with DL privacy and security in recent years. To demonstrate that security threats really exist in the real
world, we also reviewed the adversarial attacks under the physical condition. Finally, we discuss current
challenges and open problems regarding privacy and security issues in DL.
INDEX TERMS Deep Learning; DL privacy; DL security; model extraction attack; model inversion attack;
adversarial attack; poisoning attack; adversarial defense; privacy-preserving
I. INTRODUCTION
Internet of Things (IoT) is a network of physical devices
embedded with sensors, software, and connectivity that can
communicate over the network with other interconnected
devices. With large numbers of IoT devices, a colossal
amount of data is generated for usage. Fueled by the vast
quantities of data, algorithmic breakthroughs, availability of
computational resources, Deep Learning (DL), is part of a
broader family of Machine Learning (ML), has been exten-
sively applied in various fields such as image classification
[1], speech recognition [2], [3], facial recognition [4], [5],
medical diagnosis [6], credit risk assessment [7], Artificial
Intelligence (AI) game [8], [9]. While connected sensors,
found in everything from surveillance cameras to industrial
plants to fitness trackers, collect troves of sensitive data, has
driven interest in DL, a significant portion of data poses
potential privacy and security questions [10]. On the one
hand, companies such as Google, Amazon, and Facebook
take advantage of the massive amounts of data collected from
their users and the vast computational power of GPU farms
to deploy DL on a large scale. If such data is a collection
of users’ private data, including online behaviors, private im-
ages, interests, geographical positions, and more, companies
will have access to sensitive information that could poten-
tially be mishandled. Further, recent researches showed that
the adversary can duplicate the parameters/hyparameters of
the model deployed in the cloud to provide Machine Learn-
ing as a Service (MLaaS). The intellectual property of the
DL model (e.g., parameters, architecture) and the sensitive
training datasets are referred to as DL privacy in this paper.
On the other hand, due to the defects of the DL model itself,
the adversary can craft a sample to mislead the DL model
or lead the learner to train a bad model. For example, the
autopilot system recognizes the stop sign collected from the
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
sensor (images using the camera) as a limit 45 sign [11], the
adversary also can manipulate the training data by tampering
with sensors to significantly decrease overall performance,
cause targeted misclassification or insert backdoors [12]–
[14]. Therefore, the unique security challenges threaten the
availability of DL models, especially in security and safety
critical applications, such as autonomous driving, face recog-
nition, intrusion detection. These security threats are referred
as to DL security in this paper. All in all, the application of
DL faces many privacy and security challenges.
Currently, the privacy issues of DL have been revealed, and
various attacks have been proposed. The attacks that invade
the privacy of the model fall into two categories: model
extraction attack and model inversion attack. In model ex-
traction attacks, the adversary aims to duplicate the parame-
ters/hyperparameters of the model that is deployed to provide
cloud-based ML services [15], [16], which compromise the
ML algorithms confidentiality and intellectual property of the
service provider. In model inversion attacks, the adversary
aims to infer sensitive information by utilizing available
information. Shokri et al. [17] first proposed a membership
inference attack again ML models, which can infer whether
a sensitive record was used as a part of the training data when
the ML model is overfitted. Later, Long et al. [18] suggested
that even a well-generated model on training data also can
be attacked. Besides, there are a large number of model
inversion attacks, which are deployed in various scenarios,
their target model, model learning approach, and assumptions
are all different. A summary of the model inversion attack
is provided in Table 1. The security threats of DL can be
categorized into two types: adversarial attacks and poisoning
attacks. In adversarial attacks, Szegedy et al. [19] firstly
pointed out that Deep Neural Network (DNN) is vulnerable
to adversarial attacks in the form of perturbations that are
invisible to the human visual system by adding them to the
original image. Such attacks can make a neural network
classifier output wrongly predictions with high confidence, as
shown in Figure 1. Since the concept of adversarial examples
was proposed, a large number of adversarial attacks were
discovered, which can be further categorized into white-box
attack and black-box attack, as shown in Table 6. The ad-
versarial attacks have evolved from early white-box attack to
black-box attack. In white-box setting, the adversary has total
knowledge of the target model, such as model architecture,
parameter, training data. Instead, in black-box settings, the
adversary has no knowledge of the model, such as model
architecture parameters, training data. The adversary crafts
an adversarial example by sending a series of queries, which
is more practical in real scenarios. In poisoning attacks,
the adversary aims to pollute the training data by injecting
malicious samples, modifying data such that the learner train
a bad classifier, which would misclassify malicious samples
or activities crafted by the adversary at the testing stage.
For example, Muñoz-González et al. [20] crafted poisoning
samples that look like real data points to reduce the accuracy
of the classifier.
FIGURE 1. A demonstration of an adversarial sample [21]. The panda image
is recognized as a gibbon with high confidence by the classifier after adding
adversarial perturbations.
To solve privacy and security issues in DL, a variety of ap-
proaches have been proposed, as shown in Figure 2. There are
currently four mainstream technologies in the aspect of DL
privacy for privacy-preserving, namely differential privacy,
homomorphic encryption, secure multi-party computation,
and trusted execution environment. The differential privacy
aims to prevent the adversary from inferring whether a par-
ticular instance was used to train the target model. The ho-
momorphic encryption and secure multi-party computation
scheme focus on preserving the privacy of the training and
testing data. The trusted execution environment aims to use
hardware to create a secure and isolated environment to pro-
tect training code and sensitive data. However, these methods
significantly increase the computational overhead and require
customization for specific neural network models. At present,
there is still no universal approach to address DL privacy
issues. With respect to DL security, a large number of de-
fenses have been proposed to defend the adversarial attack,
which can be categorized into three categories: input pre-
processing, malware detection, and improving the robustness
of the model, as shown in Table 7. The purpose of pre-
processing aims to reduce the influence of immunity on the
model, which is done by performing some operations such
as image transformation, randomization, denoising, which
usually do not require modification and retraining of the
model. The second category aims to improve the robustness
of the model by introducing regulation, adversarial training,
feature denoising, which requires modification and retraining
of the model. The third category aims to detect the adversarial
by introducing a detection mechanism before the input and
first layer of the model, including stateful detection, image
transformation detection, and adaptive denoising detection.
Although a variety of defensive mechanisms have been pro-
posed, to our best knowledge, there still is no defense method
that can completely defend against adversarial examples. At
present, adversarial training is considered to be the most
effective method to defend against adversarial examples. For
poisoning attacks, there are two typical methods to defend
against poisoning attacks. The first one is the outlier detection
mechanism, which removes outliers outside the applicable
set. The second one is to improve the robustness of the neural
network to resist the pollution of poisoning samples.
2VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
A. MOTIVATION
Privacy and security issues in DL have been becoming a hot
topic in recent years. In this paper, we present a compre-
hensive survey on the privacy and security issues of DL. To
date, there are a few review and survey papers associated with
privacy and security in DL have been published. Akhtar et al.
[22] reviewed the adversarial example attacks and defenses
on DL in the field of computer vision. Tanuwidjaja et al.
[23] and Boulemtafes et al. [24] studied several privacy-
preserving techniques on DL. Yuan et al. [25] presented
a review on adversarial examples for DL, in which they
summarize the adversarial example generation methods and
discuss the corresponding defense methods. The above re-
view works all focus on the adversarial attacks or crypto-
graphic primitives-based privacy-preserving techniques. Liu
et al. [26] analyzed security threats and defenses on ML and
provided a more comprehensive literature review from a data
driven view. Papernot et al. [27] systematically studied the
security and privacy of ML, but they don’t involve many DL
models that are widely used.
B. MAIN CONTRIBUTIONS
The differences between our paper and existing review are
summary as follows:
1) Instead of focusing on one phase and a few types of
attacks and defenses, all aspects of privacy and security
in DL are systematically reviewed in this paper. All
types of attacks and defenses are reviewed with respect
to the life cycle of DL (training phase, testing phase),
as shown in Figure 2.
2) According to the life cycle of the DL model, the
adversary types and goals of the DL model are intro-
duced, and four types of attacks regarding the privacy
and security of the DL are reviewed, including model
extraction attacks, model inversion attacks, adversarial
attacks, and poisoning attacks.
3) This paper not only reviews the privacy-preserving
technology based on cryptography in DL but also
study the privacy and intellectual property protection
technology based on the trusted execution environment
and digital watermark.
4) This paper systematically reviews the privacy and se-
curity defenses that are representative of DL in recent
years. This paper also compared the advantages and
disadvantages of these defenses and the analysis of
their effectiveness.
5) The current challenges and open problems regarding
privacy and security issues in DL are discussed, in-
cluding the deflect of current privacy-preserving tech-
niques, attacks in real-world, effective, and low over-
head defense methods.
The rest of this paper is organized as follows. In Section
II, we first discuss four types of attacks and introduce the
cryptography technologies for preserving privacy in DL. In
Section III, we detail the recent attack and privacy-preserving
techniques in DL. In Section IV, we review the representative
attack and defense method. In Section V, we discussed future
challenges and research directions for security and privacy in
DL. In Section VI, we draw a conclusion.
II. PRELIMINARIES
In this section, we begin with an overview of attacks and
defenses in the DL. Then, we discuss the adversary’s capa-
bilities and four types of attacks based on the DL lifecycle.
Besides, we also briefly introduce the cryptographic tools
used to preserve the privacy of data in DL.
A. OVERVIEW OF ATTACKS AND DEFENSES
The DL life cycle can roughly fall into two phases (training
phase, testing/inference phase). Figure 2 shows the existing
threats and defense strategies regarding privacy and security
issues according to the DL life cycle. The privacy threat of
DL can be divided into two categories, model extraction at-
tack, and model inversion attack. There are four mainstream
defenses against them, namely, differential privacy, homo-
morphic encryption, secure multi-party computation, and
trusted execution environment. Differential privacy defends
against model inversion attacks by injecting noise during
the training phase. Homomorphic encryption, secure multi-
party computing, and trusted execution environment can be
used to protect DL privacy during the training and testing
phases. The threat of DL security falls into two categories,
adversarial attacks, and poisoning attacks. Adversarial de-
fenses are being developed along with three main directions,
pre-processing, malware detection, and improving model
robustness. Pre-processing and malware detection attempt
to reduce the effect of the adversarial perturbation on the
classification or detect the adversarial examples during the
testing/inference phase. Improving the model’s robustness
aims to essentially enhance the DL model to resist adversarial
examples. The poisoning defenses attempt to filter out the
poisoning sample during the training phase.
B. ADVERSARIAL CAPABILITIES
1) Training Stage
Data Injection. The adversary has no knowledge of the
target model and training data. The adversary only in-
jects poisoning samples into the training data to change
the distribution of the training data such that the learner
trains a bad model.
Data Modification. The adversary has no knowledge
of the target model, but can directly access the training
data. The adversary attempts to pollute the training data
by modifying the training data before it is used for
training the target model.
Logic Corruption. The adversary has complete knowl-
edge of the target model and can modify the learning
algorithm. This type of attack is tricky and is hard to
defend.
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 2. An overview of attacks and defenses in DL. The top of the image describes the threat of the existing privacy and security in DL. The middle part shows
the lifecycle of DL, which involves two major phases (training phase, testing/inference phase). The bottom of the image shows the defense methods at different
stages of the DL lifecycle, where homomorphic encryption, secure multi-party computation, and trusted execution environment can be used to preserve the DL
privacy both at training and testing phases.
2) Testing Stage
White-box. In white-box settings, the adversary has
complete knowledge of the target model, including
model architecture, model parameters, and training data.
The adversary only identifies the model’s vulnerability
by utilizing available information, and then launches an
attack against the target model such as adversarial attack
(see Section II-C3), model extraction attack (see Sec-
tion II-C1), model inversion attack (see Section II-C2).
Moreover, in adversarial attacks, the adversary also
has complete knowledge about the defense mechanisms
against adversarial attacks.
Black-box. In black-box settings, the adversary does
not know the target model, including model architec-
ture, model parameters, and training data. The adver-
saries only identify the model’s vulnerability by utiliz-
ing knowledge about output responded from the target
model and then launched an attack against the target
model by sending a series of queries to the target model
and observing corresponding outputs. Such attacks in-
clude model extraction attack, model inversion attack,
and adversarial attack.
Gray-box. In gray-box settings, the adversary has com-
plete knowledge of the target model, including model
architecture, model parameters, and training data. How-
ever, unlike the white-box setting, the adversary does
not know the defense mechanism against the adversarial
attack. The gray-box setting usually is used to evaluate
the defense against the adversarial attack.
C. ADVERSARIAL TYPES AND GOALS
1) Model Extraction Attack
In model extraction attacks, the adversary aims to steal
parameters of the target model with black-box access to
the target model, which compromises the ML algorithm
confidentiality and intellectual property of the learner.
2) Model Inversion Attack
In model inversion attacks, the adversary aims to utilize
model predictions to expose the privacy of sensitive records
that were used as part of the training set. For example, Shokri
et al. [17] proposed a membership inference attack that can
infer whether a given record was a part of the training data.
3) Adversarial Attack
In adversarial attacks, the adversary aims to craft an adver-
sarial example by utilizing the knowledge about the target
model, which leads the target model to predict falsely with
high confidence. An adversarial sample is a kind of modified
image generated by adding imperceptible noise, which can
cause the classifier to make wrong predictions with high
confidence. Moreover, an adversarial sample crafted on one
model is likely to be effective for other models, which is
known as transferability. According to the goal of the adver-
sary, the adversarial attack falls into two categories:
Non-targeted Attack. The adversary crafts adversarial
examples xadv to cause the target model fto misclassify
the input with high confidence, but does not require the
prediction to be specified class, that is, f(xadv)6=ytr ue,
where f(xadv)can be any class except the correct class
ytrue.
Targeted Attack. The adversary crafts adversarial ex-
amples xadv to cause the target model fto misclassify
the input with high confidence into a particular class t
specified by the adversary, that is, f(xadv ) = t, where t
is not correct class ytrue.
4) Poisoning Attack
In poisoning attacks, the adversary aims to poison the train-
ing data such that the learner train a bad classifier, which
4VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
would misclassify malicious sample or activities crafted by
the adversary at the testing stage. The adversary could inject
malicious samples, modify data labels, and corruption in
the training data. Depending on the adversary’ goals, the
poisoning attack falls into three categories:
Accuracy Drop Attack. The adversary aims to disrupt
the training process by injecting malicious samples to
reduce the performance of the target model at the testing
stage.
Target Misclassfication Attack. The adversary aims to
enforce test samples to be misclassified at the testing
stage.
Backdoor Attack. The adversary aims to install a back-
door with a specific mark so that the target model has a
target output for that particular input.
D. CRYPTOGRAPHIC TOOLS
1) Differential Privacy
The concept of Differential Privacy (DP) was proposed by
Dwork et al. [28] that aim to guarantee an algorithm to learn
statistical information of the population without disclosing
information about individuals.
A randomized mechanism Mis considered to provide -
DP if, for all datasets Dand D0that only differ on one record
and all subsets Sof M, satisfy the following:
Pr[M(D)S]exp(ε) Pr [M(D0)S](1)
where is the privacy budget parameter that decides the
privacy level. The definition of the randomized mechanism
Musually is as follows:
M(D) = f(D) + n(2)
where the randomized mechanism Mtakes a dataset as
input and add noise nto the original query response f(D).
The noise usually samples from the Gaussian distribution or
Laplace distribution. That is, even if an adversary knows the
whole dataset Dexcept for a single record, he/she cannot
infer much about the unknown record from the output M(D).
Due to the definition of -DP is strict, the (,δ)-DP were
introduced, which loosens the bound of the error by the
amount δ. The definition of (,δ)-DP is as follows:
Pr[M(D)S]exp(ε) Pr [M(D0)S] + δ(3)
where δis a small real number, which also controls the
privacy budget like . The sensitivity fof the function f
characterize how much changing any one of the records in
the datasets causes the output of the function to change:
f= max kf(D)f(D0)k1(4)
where k · k1denote the l1-norm and max function apply on
all datasets Dand D0.
2) Homomorphic Encryption
The Homomorphic Encryption (HE) is a form of the encryp-
tion scheme that allows computation on ciphertexts, gener-
ating an encrypted result which, when decrypted, matches
the result of the operations as if they had performed on
the plaintext. The definition of encryption function Enc has
followed the equation:
Enc(a)Enc(b) = Enc(ab)(5)
where Enc : xyis a HE scheme that map plaintext
xto ciphertext y. * is mathematical operation that can be
performed on plaintext and ciphertext.
The HE exists in partial and full forms. The partial ho-
momorphic systems only support certain operations on en-
crypted data, typically additive [29]–[32] or multiplication
[33], [34]. Since neural networks need to perform a large
number of additions and multiplications, The partial ho-
momorphic encryption systems (e.g, Paillier [29]) that sup-
port additive operations are better suitable for DL complex
computation than that, only support multiplication operation
(e.g., RSA [33], ElGamal [34]). The Fully Homomorphic
Encryption (FHE) system, which allows all operations on
encrypted data, was first proposed in 1978. Until 2009, Gen-
try et al. [35] first constructed a feasible FHE scheme using
lattice-based cryptography, which supports both addition and
multiplication operations on ciphertexts. Since then, several
FHE schemes were proposed [36]–[41]. Although theoreti-
cally, FHE can perform all operations on encrypted data, the
actual scheme proposed now still has many limitations when
applied in DL, such as only support integer type data, com-
putational complexity (e.g., it requires 29.5s to run secure
integer multiplication computation with a standard PC [42]).
Therefore, the existing (fully) HE schemes still require a lot
of custom work for each DL model to be fitted to the HE
environment and improve the efficiency of computation.
3) Secure Multi-party Computation
The problem of secure computation was first proposed by
Yao [43] in 1982, which is also known as the millionaire
problem: two rich men Alice and Bob met on the street, how
to compare who is richer without exposing their wealth.
Later, the millionaire problem was expanded by Goldeich
et al. [44] to become a very active research field in mod-
ern cryptography, namely Secure Multi-party Computation
(SMC) whose purpose is to address the problem of joint
computing that preserves participant’s data privacy in a flock
of the non-trusted participant.
Formally, in SMC, for a given objective function f, a
group of participants, p1,p2,p3, ..., pn, each participants his
own private data, d1,d2,d3, ..., dn, respectively. Then, all
participants want to jointly compute f(d1, d2, d3, ..., dn)on
those private data. At the end of computing, each participant
has no knowledge of other participant about their private
data.
There are two kinds of secure multi-party computation:
secure Two-Party Computation (2PC) and secure Multi-Party
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Computation (MPC), which are quite different in the pro-
tocols. Garbled Circuit (GC) [43] and Oblivious Transfer
(OT) [45] protocol are used in the 2PC. GC transforms a
function into a safe boolean circuit, and its input and output
are encrypted data, which can be decrypted by its decoding
table. OT is used to transfer the information obviously. The
most protocol commonly used in MPC are secret sharing
[46], which divides an input from each participant into
several parts and distributes them to each participant. Each
participant calculates a given function together.
By its nature, SMC can be used to jointly train a global DL
model without revealing the data privacy of each participant,
which helps to break the island of information.
III. PRIVACY
In this section, we review and summarize the representative
existing privacy threats (model extraction attack, model in-
version attack) in DL and privacy-preserving technologies,
including DP, HE, SMC, trusted execution environment.
A. ATTACKS
The attacks that invade privacy falls into two categories:
model extraction and model inversion attack, an overview
of model extraction and model inversion attack is shown in
Figure 3. The main difference between the two is that the
former focuses on the private information on the model (e.g.,
model parameter, model architecture). The latter focuses on
the sensitive record of the training data.
1) Model Extraction Attack
Tramer et al. [15] introduced a model extraction attack that
aims to duplicate the parameters of ML models deployed
to provide cloud-based ML services. The general idea is to
build model equations from the output obtained by sending
a plurality of queries. However, it cannot be extended to
some scenarios where the attacker does not have access to
the probabilities returned for each class and only effective for
specific ML models, such as decision tree, logistic regression,
neural network. To avoid overfitting on training data, the
regulation term is usually used in ML algorithms, where the
hyperparameters are introduced to balance regulation terms
and loss function. Wang et al. [16] proposed hyperparameter
stealing attacks to steal the hyperparameter of the target
model. Because the goal of learning of the ML algorithm is
to achieve the minima of the objective function where the
gradient of the objective function is close to 0, therefore,
based on this observation, the adversary can establish several
linear equations by executing a large collection of queries.
Finally, the hyperparameters can be estimated by utilizing
linear least squares. They empirically demonstrated that their
attack could accurately steal hyperparameters with less than
104error in regression algorithms.
As a countermeasure to possible intellectual property
thefts, watermarks for DNN has been developed that embed
watermarks into the DL model. Wang et al. [47] showed that
the watermarking scheme proposed by Uchida et al. [48]
increases the standard deviation of the distribution of the
weights as the embedded watermark length increases. Based
on this observation, they proposed an algorithm to detect
the presence of a watermarking and then remove the water-
marking by available knowledge. However, the removability
and overwriting of the watermarking are often considered
when embedding into DL models. Therefore, assuming that
the watermarking might not be removed, Hitaj et al. [49]
designed two novel evasion attacks that allow an adversary to
run an MLaaS with stole ML models and still go undetected
by the legitimate owners of those ML models.
2) Model Inversion Attack
Shokri et al. [17] proposed a Membership Inference At-
tack (MIA) again ML models. The adversary trains an at-
tack model to distinguish the difference between the target
model’s behavior on the sample for its trained sample and its
untrained sample. That is, the attack model is a classification
model. To construct such an attack model under a black-box
setting, the author invented a new technique called shadow
training that builds multiple shadow models to simulate the
target model. Because the target model’s data distribution is
unknown, they utilize the input/output pairs of the shadow
model to train an attack model. The experiments demon-
strated that an adversary could successfully perform an MIA
with black-box access to the target model. However, Long et
al. [18] pointed out that MIA, proposed by Shokri et al. [17],
works effectively when the model is highly overfitted on its
training data. The prediction (probability) of the overfitted
model for the query on the training sample is significantly
different from the prediction (probability) of other queries.
In contrast, a well-generalized model behaves similarly to
training data and test data. Therefore, not all data is vul-
nerable to member inference attacks under well-generated
models. To solve this challenge, Long et al. [18] presented
a Generalized Membership Inference Attack (GMIA) against
a well-generalized model. The general idea behind this attack
is to identify vulnerable target records that are vulnerable to
member inference attacks to perform inference attacks.
The MIA proposed by Shokri et al. [17] depends on too
many assumptions, such as using multiple shadow models,
having the knowledge of the target model, and having a data
distribution same as the target model’s training data. Salem
et al. [50] relaxed these assumptions and proposed three
adversaries that are very broadly applicable at a low cost.
The first adversary utilizes only one shadow model instead of
multiple shadow models to mimic the target model. For the
second adversary, the adversary does not have direct access to
the training data of the target model and its architecture. The
third adversary has a minimal set of assumptions. It is not
necessary to build any shadow model and know the structure
of the target model, which is more practical in real-scenario.
Hitaj et al. [51] proposed an MIA to perform white-box
attacks in the context of collaborative DL models. They
constructed a generator and used it to form a Generative
Adversarial Network (GAN) [52]. After training, GAN can
6VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 3. An overview of the model extracion attack and model inversion attack.
generate synthetic data similar to the training data of the
target model. However, the limitation of this approach is that
all training data belonging to the same category are required
to be visually similar, so they cannot be distinguished under
the same distribution. Similarly, an MIA against collabo-
rative learning models was proposed by Melis et al. [53].
Collaborative learning is a learning technique in which two
or more participants jointly train a joint model by training
locally and periodically exchanging model updates, where
each participant has its training datasets. However, these
updates can leak unintended information about the training
data of the participants. In some non-numeric data scenarios
such as natural language processing, the embedding layer
firstly is used to transform the input into a low-dimensional
vector representation, where the update of the given matrix
is only related to whether the word appears in the batch.
Therefore, this character directly reveals whether the word
appears in the training batches and can be used to design a
property inference attack.
Hayes et al. [54] presented the first MIA on generative
models, which utilizes GAN [52] to infer whether a data
item was part of the training data by learning statistical
difference in distribution. Hayes et al. [54] observed that
the discriminator would place a higher confidence value on
samples that appeared in the training data when the target
model is highly overfitted. Based on this observation, they
proposed the white-box and black-box MIA. In white-box
settings, the adversary can directly utilize the discriminator
to infer whether a sample was used to train the model by
discriminating its output’s confidence. In black-box settings,
the adversary does not know the target model’s parameter,
thereby only locally train a GAN by using queries from the
target model. The experiments demonstrated that white-box
attacks are 100% successful at determining whether a data
record was used to train the target model, and the black-box
ones succeed with 80% accuracy.
Nasr et al. [55] noticed that a black-box attack might not
be effective against well-generalized DNN. The parameters
of the model are visible to curiosity adversaries in some sce-
narios. Therefore, Nasr et al. [55] proposed an MIA again DL
models under the white-box setting. Due to the distribution of
the model’s gradients between the sample in the training data
and not in the training data is likely to be distinguishable and
thereby can be exploited to perform MIA, even though the
DL models are well-generated. They successfully launched
MIA against well-generalized federated learning models in
many scenarios, such as training and fine-tuning, updating
models, which showed that even a well-generated model on
training data can be attacked.
The model inversion attacks in DL discussed above are
deployed in various scenarios, their target model, model
learning approach, and assumptions are all different. To
intuitively understand the differences between these attacks,
we compared those algorithms and provide a summary of the
MIA in Table 1.
B. DEFENSES
To date, a variety of different methods for protecting pri-
vacy in DL have been proposed, which can roughly fall
into four categories: DP, HE, SMC, and trusted execution
environment. In this section, we review and summarize the
representative methods of preserving privacy, as shown in
Table 2, 3, 4, 5. For a comprehensive study, we also briefly
describe another privacy-preserving method in DL.
1) Differential Privacy
Depending on where the noise is added, DP approaches
can be classified into three categories: gradient perturbation,
objective perturbation, and label perturbation, as shown in
Figure 4.
1) Gradient Perturbation. The gradient perturbation is
done by injecting noise to the gradients of the parame-
ters during the training stage.
2) Objective Perturbation. The objective perturbation is
done by injecting noise to the objective function and
solving a precise solution to the new problem.
3) Label Perturbation. The label perturbation is done
by injecting noise to the label during the knowledge
transfer process of the teacher-student network.
a: Gradient Perturbation
Abadi et al. [56] proposed a Differentially Private Stochastic
Gradient Descent (DPSGD) algorithm that can train DNN
with non-convex objectives. The main idea is to inject noise
into the gradient at each step of the stochastic gradient de-
scent process. Besides, Abadi et al. [56] developed a stronger
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 1. A summary of membership inference attacks. Xmeans the adversary needs the information while empty indicates the information is not necessary. The
attack strength (higher for more stars) is based on the impression of the reviewed paper.
Method White-box/
Black-box Target Model Target Model
Learning Approach
Shadow
Model
Dataset
Distribution
Model
Architecture Performance
[17] Black-box Classifier Independent learning X X X ?
[18] Black-box Neural Network Independent learning X X ?? ?
[50]
Black-box Convolutional Neural Network Independent learning X X ? ? ?
Black-box Convolutional Neural Network Independent learning X?? ?
Black-box Convolutional Neural Network Independent learning ??
[51] White-box Generative Adversarial Networks Collaborative leraning ?? ?
[53] White-box Classifier Collaborative learning ??
[54]
White-box Generative Adversarial Networks Independent learning ?? ?
Black-box Generative Adversarial Networks Independent learning ?
Black-box Generative Adversarial Networks Independent learning X?
[55]
White-box Classifier Federated learning ?? ?
White-box Classifier Federated learning ??
White-box Classifier Independent learning ??
FIGURE 4. An overview of DP framework.
accounting method called moment accountant to obtain the
tail bound. The moment accountant utilizes the moments
bound combined with Markov inequality to track the cumula-
tive privacy loss, which provides tighter privacy loss analysis
than composition theorems [57]. Xie et al. [58] presented a
Differentially Private Generative Adversarial Network (DP-
GAN) that guarantees (, θ)-DP by perturbing the gradient
of the discriminator during the training procedure. According
to the post-processing theory [59], the output of the differ-
entially private discriminator will not invade privacy, which
means that the generator is also differentially private. Acs
et al. [60] designed a novel Differentially Private Generative
Model (DPGM) that rely on a mixture of kgenerative neural
networks, such as restricted Boltzmann Machines [61] and
variational Autoencoders [62]. The dataset is clustered by
utilizing the differential private kernel k-means [63], and
then each cluster is assigned to kgenerative neural networks.
Finally, the DP-SGD [56] is used to train kgenerative neural
networks, and these generative neural networks can generate
synthetic high-dimensional data while keeping provable pri-
vacy.
b: Objective Perturbation
Phan et al. [64] introduced a Deep Private Autoencoder
(DPA) that enforce -DP by perturbing the objective func-
tions of the traditional deep auto-encoder. Phan et al. [65]
proposed the Private Convolutional Deep Belief Network
(PCDBN) to enforce -DP perturbing the polynomial forms
that approximate the non-linear objective function by uti-
lized the Chebyshev expansion in convolutional deep be-
lief network. Phan et al. [66] proposed Adaptive Laplace
Mechanism (AdLM), a novel mechanism to guarantee DP in
DNN. This approach not only adaptively adds noise from the
input features to the model output, but also easily extends
to various differential DNN. Unlike gradient perturbation,
which accumulate privacy loss as training progresses, the
privacy loss due to objective perturbation is determined at
the time the objective function is built and is independent of
epoch.
c: Label Perturbation
Papernot et al. [67] demonstrated a privacy-preserving ap-
proach, Private Aggregation of Teacher Ensembles (PATE),
which transfers the knowledge of an ensemble of "teacher"
models to a "student" model. An ensemble of teachers mod-
els learns on disjoint subsets of the sensitive data, and then a
student model is trained on public data labeled using the en-
semble of teacher model. Because the student model cannot
directly access the sensitive data and the differential private
noise is injected into the aggregation process, and, thereby,
the privacy of the sensitive data is protected. Besides, the
moment accountant [56] is introduced to trace the cumulated
privacy budget in the learning process. However, the perfor-
mance of PATE is evaluated on simple classification tasks
(e.g., MNIST [68]). Later, Papernot et al. [69] proposed a new
aggregation mechanism that successfully extends the PATE
to the large scale learning task. Besides, it is empirically
shown that the improved PATE guarantees tighter DP and
has higher utility than the original PATE. Furthermore, The
PATE was applied to construct the differential private GAN
framework [70]. Because the discriminator is differential
privacy, so is the generator trained with discriminator [59].
The disadvantage of this method is that it requires additional
training of a teacher model to teach the student model.
8VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 2. A summary of DP techiniques.
Method Scheme Advantage Shortcoming Accuracy (%)
DPSGD [56] Gradient
perturbation
Track the cumulative privacy loss and
automate analysis of the privacy loss Not apply for complex DL models 97.00
(MNIST)
DPGAN [58] Solve the privacy issues in GAN Not apply for complex dataset -
DPGM [60] Solve the privacy issue in autoencoders Restricted to specific autoencoders -
DPA [64]
Objective
perturbation
Not depend on the number of training
epochs in consuming privacy budget
Apply particularly for autoencoder and
objective function -
PCDBN [65] Not depend on the number of training
epochs in consuming privacy budget
Affect the accuracy of the model
for complex learning task
91.71
(MNIST)
AdLM [66] Not depend on the number of training
epochs in consuming privacy budget
Affect the accuracy of the model
for complex learning task
93.66
(MNIST)
PATE [67] Label
perturbation
Apply for any DL model Need to train an additional teacher model and
not apply for large scale learning tasks
98.10
(MNIST)
Improved PATE [69] Extend PATE [67] to
large scale learning tasks Need to train an additional teacher model 98.50
(MNIST)
2) Homomorphic Encryption
In DL, the HE is mainly used to protect testing inputs and
results and train neural network models. The main adverse ef-
fects of applying HE are the reduction of efficiency, the high
computation cost of the ciphertext, and the sharp increase in
the amount of data after encryption.
Xie et al. [71] theoretically demonstrated that the activa-
tion function of the neural network can be approximated by a
polynomial, therefore, inferencing over encrypted data can be
practical. Subsequently, Gilad et al. [72] presented a method,
CryptoNets, that can perform inference on encrypted data,
which utilizes leveled HE scheme proposed by Bos et al.
[73] to perform privacy-preserving inference on a pre-trained
Convolutional Neural Network (CNN) model. However, the
derivative function of the activation function using the square
function approximation is unbounded, which will lead to
strange phenomena when training the network on encrypted
data, especially for deeper neural networks. Therefore, it
is not suitable for a deeper neural network. Chabanne et
al. [74] improved CryptoNets by combining the polyno-
mial approximation with batch normalization [75]. Hesam-
ifard et al. [76] presented CryptoDL, a new approximation
method for activation functions (such as ReLU, Sigmoid, and
Tanh) commonly used on CNN. This method has low-degree
polynomials, which significantly improve the computational
efficiency. Compared with CryptoNets [72], the CryptoDL
not only has a lower communication overhead, but is also
independent of the dataset. However, their method is based
on a batch operation, so the same efficiency for single and
batch instance prediction.
Because HE schemes applied in the neural network sig-
nificantly increase complexity when the depth of the neural
network increases. Bourse et al. [77] and Sanyal et al. [78]
attempted to improve the efficiency of HE used in the neural
network. Bourse et al. [77] proposed FHE-DiNN, a linear
complexity framework for FHE evaluation of neural network.
FHE-DiNN leverage bootstrapping technique [79] to reach
strictly linear complexity in the depth of the neural network.
Sanyal et al. [78] noted that the FHE scheme proposed by
Chillotti et al. [79] only supports operations on binary data,
which can be utilized to compute all of the operations of the
binary neural network (BNN) [80]. Therefore, Sanyal et al.
[78] design Tricks to Accelerate (encrypted) Prediction As a
Service (TAPAS), which speeds up binary operation in BNN.
The experiments showed that TAPAS drastically shortens the
time of the evaluation step.
3) Secure Multi-party Computation
There are two main application scenarios for SMC in DL.
The first scenario is that a data holder does not want to
expose all the training data to a server to train/infer the DL
model. He/She hopes to distribute the training/testing data to
multiple servers, and train/infer the DL model together. Each
server will not understand the training/testing data of other
servers. The second scenario is that multiple data holders
want to jointly train a shared DL model on aggregate training
data while keeping the privacy of their training data.
Shokri et al. [82] implemented a practical system for
collaborative DL, which enabled multiple participants to
jointly learn neural network models based on their inputs
without sharing those inputs and sacrificing the accuracy of
models. The participants train their local model, after each
round of local training, they asynchronously selectively share
their gradients they computed for some of the parameters.
However, Ano et al. [81] pointed out that even a small
portion of gradients stored on cloud servers also could be
utilized to extract local data. Hence, Ano et al. [81] used the
additive HE scheme to protect privacy against an honest-but-
curious parameter server, but it increases the communication
overheads between the learning participants and the cloud
server.
Mohasse et al. [83] proposed SecureML, a novel and
efficient protocol for privacy-preserving ML. They imple-
mented the first privacy-preserving system for training neural
networks in multiparty computation settings using OT, secret
sharing, and Yao’ GC protocol. However, the disadvantage
of this scheme is that only a simple neural network can
(without any convolutional layer) be implemented. Liu et al.
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 3. A summary of HE techniques.
Method Scheme Training/
Inference Advantage Shortcoming
Performance
Communication
Cost(Mbytes)
Runtime
(s)
Accuracy
(%)
[81] Additively
HE Training
Allow multiple parties to jointly train a
global model without disclosing
participant’s private data
Protecting privacy against an
honest-but-curious parameter server
introduces the communication cost 1 8100 97.00
(MNIST)
Improved
CryptoNet [74]
Additively
HE
Training and
inference
Extend CryptoNets [72] to deeper
neural networks and non-linear layers
Training on encrypted data is feasible
only if data is small or network is
shallow - 0.29 96.92
(MNIST)
CryptoDL [76] Leveled HE Inference Approximate the activation with
low-degree polynomials
The smae efficiency for single and
batch instance predicion 336.7 320 99.52
(MNIST)
TAPAS [78] FHE Inference Accelerate encrypted data computation Not support non-binary quantized
neural networks - 147 98.60
(MNIST)
FHE-DiNN [77] FHE Inference Improve efficiency at the cost of increasing
storage
Not support non-binary quantized
neural networks - 1.64 96.35
(MNIST)
[84] proposed a framework, MiniONN, that transforms an
existing neural network to an oblivious neural network and
supports privacy-preserving predictions with reasonable effi-
ciency. They used GC to approximate all nonlinear activation
functions. Their experiment demonstrated that MiniONN
outperforms SecureML [83] and Cryptonets [72] in terms
of latency and message size. Rouhani et al. [85] presented
DeepSecure, a scalable privacy-preserving framework for
DL, which jointly compute DL inference over their private
data between distributed client and cloud servers. They used
Yao’ GC [43] and OT protocol to preserve the privacy of
the client data during the data transfer process and provided
a security proof of DeepSecure in the honest-but-curious
adversary model.
The major bottleneck of GC is the huge communication
cost. Juvekar et al. [86] and Wagh et al. [87] attempted
to improve SecureML. Juvekar et al. [86] pointed out that
HE is suitable for calculating linear functions, while GC is
suitable for non-linear functions such as ReLU, MaxPool.
Therefore, Juvekar et al. [86] made optimal use of HE and
GC to achieve large computational and communication gains,
that is, using a simple linear size circuit to compute non-
linear functions and using HE to compute linear function.
Wagh et al. [87] developed SecureNN, a novel three-party
or four-party secure computation protocols, which are more
communication efficient for non-linear activation functions.
The overall execution time of this method is 3.68X faster
than Gazelle proposed by Juvekar et al. [86] in the inference
phase.
4) Trusted Execution Environment
The Trusted Execution Environment (TEE) is an independent
operating environment of the main processor that provides
a secure execution environment for authorized and trusted
applications. In addition, the TEE ensures the integrity and
confidentiality of the data and codes loaded inside, and
provides access control to the resources of the trusted appli-
cation. As a result, there are some solutions that utilize TEE
to tackle privacy and security issues in DL [88].
Ohrimenko et al. [89] proposed a privacy-preserving sys-
tem for multi-party ML on an untrusted platform, which is
based on Software Guard Extensions (SGX) [90] that is a set
of x86 instructions used to protected memory regions called
enclaves, whose contents are unable to be accessed by any
process outside the enclave itself, including higher privilege
levels process. Each participant uploads their training model
and encrypted training data and then verifies whether their
training code is executed in enclaves. After remote attestation
was successful, the participant uploads their encryption key
to decrypt their training data to train a shared model. Hunt et
al. [91] developed a system, Chiron, where the data holders
can collaboratively train an ML model on MLaaS while keep-
ing their training data private. The source code is executed
in a Ryoan [92] sandbox that provides a distributed sandbox
that leverages hardware enclaves (e.g., SGX), without reveal-
ing the training code. The Chiron improves the efficiency
of training by launching multiple enclaves, each enclaves
running a part of the training data. However, the Chiron only
allows the data owner to query a trained model via a simple
interface when the training is finished. Therefore, the Chiron
is not suitable for data owners who want to provide MLaaS.
Hynes et al. [93] developed Myelin, a privacy-preserving
framework for DL, which supports both privacy-preserving
during the training and testing phases. The Myelin utilizes
TVM compiler [94] to compile a given model into a TVM-
generated library that only includes the numerical operations
needed for this model. After the compilation is complete,
data owners can deploy an SGX [90] enclave and load the
compiled library into the enclave to train the model without
revealing their training data. Gu et al. [95] presented Deepen-
clave, a privacy-preserving system for DL inference using
SGX [90]. The general idea of Deepenclave is to partition a
given DNN model into FrontNet and BackNet. The FrontNet
is located in a trusted environment, the BackNet is located
in an untrusted environment and therefore protected by SGX
[90]. The user’s encrypted input is fed into the FrontNet,
which is stored in a trusted environment. The intermediate
output of the FrontNet inside the enclave is computed. Once
computing is finished, the intermediate output is delivered
to BackNet to compute the final output. Zeiler et al. [96]
indicated that the shadow layers of DNN respond to low-level
information of the input, such as edges, corners, whereas
deep layers represent more abstract information associated
with the final output. Since the first few layers of the DNN
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 4. A summary of SMC techniques.
Method Scheme Training/
Inference Advantage Shortcoming Performance
Communication
Cost (Mbyte)
Runtime
(s)
Accuracy
(%)
SecureML [83] Secret-sharing
and GC
Training and
Inference
Allow data owner to outsource
training to two servers
Not support the convolutional
neural network - 4.88 93.40
(MNIST)
MiniONN [84] Secret-sharing
and GC Inference
Transform the neural network model
into an oblivious neural network
without any modifications to
pretrained model
The higher computation and
communication cost than GAZZLLE [86]
and SecureNN [87] in inference
phase
47.60 1.04 97.60
(MNIST)
DeepSecure [85] HE and GC Training and
inference
The high throughout when
performating batch prediction
The low efficiency when executing
single instance prediction 791 9.67 98.95
(MNIST)
GAZELLE [86] HE and GC Inference The faster prediction than
DeepSecure [85] and MiniONN [84] Not apply for large input size 8 0.2 -
SecureNN [87] Serect-sharing Training and
inference
Allow multiple parties to jointly
train or testing a model without
disclosing private data Only support three-party or four-party 7.93 0.1 99.15
(MNIST)
operate in a TEE, the privacy of the input sensitive infor-
mation is guaranteed, but the final output of the model is
plaintext.
The TEE uses hardware and software protection to isolate
sensitive computing from untrusted software stacks and re-
solve the integrity and privacy of outsourced ML computing.
However, these isolation guarantees also pay a considerable
price for performance. Therefore, Tramer et al. [97] explored
a pragmatic solution to improve the execution efficiency of
DNN in TEE, which is based on the effective outsourcing
scheme of matrix multiplication. They proposed a framework
called Slalom, which uses Freivalds’s algorithm [98] to verify
the correctness of the linear operator. It encrypts the input
through a pre-calculated blinding factor to protect privacy,
ensuring that the DNN execution part is safely outsourced
from TEE to a co-located, untrusted but faster device, such
as GPUs. Hanzlik et al. [99] proposed MLCapsule, a system
that can safely execute ML algorithms in offline deployed
clients. Because the data is stored locally, and the protocol
is transparent, the user’s data security is guaranteed. The
ML model is protected by TEE, MLCapsule calculates the
ML model through the enclave in TEE. At the same time,
the encrypted information sent by the service provider can
only be decrypted by the enclave. To realize this method, the
author proposed to encapsulate the standard ML layer in the
MLCapsule layer and execute it in TEE. These MLCapsule
layers can decrypt the network weights sent by the service
provider, and through the layer-by-layer merge, to achieve
large-scale network encapsulation. MLCapsule can also inte-
grate advanced defense mechanisms against attacks on ML
models, and the computational cost is meager.
5) Miscellaneous Defense
Tramer et al. [15] and Lee et al. [100] suggested that the
efficiency of the model extraction attack can be decreased
by omitting the confidence value or adding smart noise to
the predicted probabilities. However, Juuti et al. [101] shown
that model extraction is effective, even omitting prediction
probabilities. Juuti et al. [101] proposed PRADA, a detection
mechanism of model extraction attacks, to detect malware
attack process by analyzing the distribution of user’s queries
when the adversary launch attack. Zhang et al. [102] pre-
sented a privacy-preserving approach that solves a regular-
ized empirical risk minimization in distributed ML while
providing α(t)-DP for the final trained output. Hamm et
al. [103] proposed a method to train a global differential
private classifier by transferring the knowledge of the local
classifier. Because the global classifier cannot directly ac-
cess the sensitive training data, the -DP is provided. Long
et al. [104] proposed a measuring mechanism, Differential
Training Privacy (DTP), to estimate whether there is a risk
of privacy leakage when the classifier is released. Although
several approaches have been proposed to preserve privacy, it
is not understood the correction, implementation, and privacy
guarantee of these methods. Carlini et al. [105] explored the
effectiveness of DP schemes against attacks. Rahman [106]
and Jayaraman [107] attempted to analysis privacy cost of
DP implementations. The difference is that Rahman et al.
[106] evaluate not only DP but also its relaxed variants. Jia
et al. [108] proposed MemGuard with formal utility loss
guarantess against MIA under the black-box setting, which
randomly inject noise to the confdence score predicted by
the target classifer for query data sample.
Motivated by digital watermarking, researchers embed
watermarking into DNN to protect the intellectual property of
deep neural networks. After embedding watermarks to DNN
models, once the models are stolen, the ownership can be
verified by extracting watermarks from those models. Uchida
et al. [48] first proposed a framework to embed watermarks
into the parameters of DNN models via parameter regularizer
during the training phase. Its disadvantage is that extracting
the watermark from the model is needed to access all pa-
rameters of the model, which is impractical in real scenarios.
Later, Zhang et al. [109] proposed a verification framework
that can quickly verify the ownership of remotely deployed
MLaaS with only given black-box access to the model. There
are also several methods for the black-box watermark that
target the zero bit (see reference [110]–[112] for details).
IV. SECURITY
In this section, we review and summarize the representative
adversarial attacks and poisoning attacks in recent literature,
as well as defense strategies against adversarial attacks and
poisoning attacks.
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 5. A summary of trusted execution environment techniques. The performance (higher runtime for more ?) is based on the impression of the reviewed paper.
Method Scheme Training/Inference Advantage Shortcoming Performance
Runtime Accuracy(%)
[89] SGX Training
Allow multiple parties to jointly train
a shared model without declosing
private data.
Preventing the trained model to leak
outside the SGX enclave introduces
large overhead
?? 98.7
(MNIST)
Chiron [91] Royan Training and
inference
Support launching multiple enclaves to
increase performance
Not allow data owner to serve as a
service provider ?? ? 84.63
(CIFAR-10)
Myelin [93] SGX and TVM
compiler
Training and
inference
Support multithreading inside the
enclave Need to compile a given model and retrain ?? 84.40
(CIFAR-10)
Deepenclave [95] SGX Inference Not be constrained by SGX’s limited
memory The final output of the model is plaintext ?? ? -
Slalom [97] SGX and GPU Inference Combine trusted hardware and
non-trusted hardware (GPU)
Only apply for network models
established through TensorFlow ?70.6
(ImageNet)
MLcapsule [99] SGX Inference Support offline deployment Not protect model architecture ? ? ? -
FIGURE 5. An overview of adversarial attacks.
A. ADVERSARIAL ATTACKS
An overview of adversarial attacks is shown in Figure 5.
Since the concept of the adversarial examples was proposed
by Szegedy et al. [19], the researcher proposed several ad-
versarial attack algorithms in recent literature. Due to high
activity in this research direction, more attacks are likely to
emerge in the future. Therefore, in this section, we discuss
the representative white-box attack and black-box. Besides,
we also provide a summary of the adversarial attack in Table
6.
1) White-box
a: L-BFGS
Szegedy et al. [21] first demonstrated that the neural network
is vulnerable to adversarial examples crafted by adding a
small perturbation to benign input. The perturbation is imper-
ceptible to the human visual system and can lead the model
to predict wrongly with high confidence. The adversarial
examples generated by solving the following equation:
min
δ||δ||ps.t. f(x+δ) = t, x +δ[0,1]m(6)
As this is a hard problem, the authors turned the above
equation into a convex optimization objective by using box-
constrained L-BFGS algorithm [113]:
min
δc· |δ|+J(x+δ, t)s.t. x+δ[0,1]m(7)
where xrepresents the original image; Jis the loss function
of the model (e.g., cross-entropy); cis a hyperparameter, that
is, the algorithm uses a linear search to find a constant that
can produce the smallest distance adversarial sample; tis
target label that is different from correct label y;δmeans a
perturbation.
b: Fast Gradient Sign Method
Szegedy et al. [19] deemed that the existence of adversarial
samples is caused by the nonlinearity and overfitting of
neural network models. However, Goodfellow et al. [21]
demonstrated that even a simple linear model is vulnerable
to adversarial samples. They proposed the first Fast Gradient
Sign Method (FGSM), an untargeted attack algorithm. For-
mally, the formulation of FGSM is as follows:
η=sign(xJ(x, ytrue)) (8)
where xJ(x, ytrue)indicates the gradient of the adversarial
loss J(x, ytrue),sign(·)means the gradient direction. The
adversarial perturbation ηdenotes the one-step gradient di-
rection against the adversarial loss J(x, ytrue), and controls
the magnitude of the perturbation.
c: Basic Iterative Method/Projected Gradient Descent
Kurakin et al. [114] extended the FGSM attack algorithm
[21] with multiple small-step iterations and proposed Basic
Iterative Method (BIM). In each iteration, they clip the pixel
values to ensure that they are all in the -neighbourhood of
the original image:
xadv
0=x,
xadv
N+1 =Clipx,{xadv
N+αsign(xJ(x, ytrue))}.(9)
Later, Mary et al. [115] extended the iterative attack pro-
posed in [114] by iteratively applying Projected Gradient
Descent (PGD) to search for a perturbation that can ap-
proximate the p-norm ball around an input. In [115], the
authors proposed robust adversarial training which is based
on optimizing the aforementioned saddle point formulation
and uses PGD as a reliable first-order adversary.
d: Jacobian based Saliency Map Attack
While most of the attacks focus on the `2or `norms, Paper-
not et al. [116] proposed Jacobian based Saliency Map Attack
(JSMA), which uses the `0norm to control the perturbation
on a few pixels in the image, rather than on the whole image.
In this attack, Papernot et. al. used the Jacobian matrix to
compute the forward derivative of the DNN:
F(x) = ∂F (x)
∂x =Fj(x)
∂xii1..Min , j1..Mout
(10)
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
where Min represents the number of input layers, and Mout
represents the number of output layers. Then calculate the
corresponding adversarial saliency map Sthrough the for-
ward derivative, and select the input feature x[i]correspond-
ing to the higher S(x, ytarget ) [i]in the adversarial saliency
map as the perturbation points. The algorithm sequentially
selects the most efficient pixels in the adversarial saliency
map and modifies these perturbation points until the maxi-
mum number of pixels allowed to change in the adversarial
image is reached or the fooling succeeds.
e: C&W Attack
To demonstrate that defensive distillation [117] does not sig-
nificantly increase the robustness of neural networks, Carlini
and Wagner [118] proposed an optimization-based adver-
sarial attack (C&W attack), which makes the perturbations
imperceptible by restricting their `0,`2and `norm, its core
formulation is as follows:
min
δkδkp+c·f(x+δ)(11)
where δdenotes the adversarial perturbation, corresponding
to the difference between the original image and the ad-
versarial sample. The smaller the part, the less likely it is
to be detected. And in their implementations, they used a
modified binary search to choose the constant c. This f(·)
denotes the objective function. They provide seven candidate
functions. One of the practical functions in their experiments
is as follows:
f(x0) = max (max {Z(x0)i:i6=t} − Z(x0)t,k)(12)
where Z(·)denotes the softmax function of the model, kis a
constant to control the confidence with the misclassification.
This kind of attack is now used as a benchmark for many
adversarial defense methods.
f: Deepfool Attack
Moosavi-Dezfooli et al. [119] proposed a classifier-based
linearized iterative adversarial attack (Deepfool), which gen-
erates a minimal adversarial perturbation sufficient to change
the classification label. In the binary classification problem,
the original image xiteratively advances in the direction
perpendicular to the boundary of the classifier model f(x).
At each step, the perturbations are accumulated to form
the final perturbation to the image. However, most neural
networks are highly non-linear, so the problem extends from
two-class to multi-class. The multi-classification problem can
be regarded as a collection of multiple binary classification
problems, that is, finding the minimum distance between the
original sample and the boundary of the convex region where
it is located, and approaching the classification boundary
through multiple iterations, making the attack successful.
g: Universal Adversarial Perturbations
Unlike previous attack methods, such as FGSM [21], Deep-
fool [119], which fool neural networks with a single per-
turbed image, Moosavi-Dezfooli et al. [120] proposed Uni-
versal Adversarial Perturbation attack (UAP) to find a uni-
versal perturbation that can be applied to all samples in
the training data to fool the network. The algorithm is to
find the universal perturbation δthat satisfies the following
constraint:
Pxµ(ˆ
k(x+δ)6=ˆ
k(x)) 1ηs.t. kδkp, (13)
where µdenotes a distribution of images in Rdand ˆ
kdefine
a classification function that outputs an estimated label ˆ
k(x)
for each image xRd. The hyperparameter limits the
magnitude of the universal perturbation δ, and ηquantifies
the fooling rate for all images xµ.
The algorithm gradually builds a universal perturbation
through an iterative approach. In each iteration, the algorithm
uses the DeepFool attack [121] to sequentially push all the
images in the distribution µto their respective decision
boundaries, and project the updated disturbance to the `pball
of radius . The experiments show that only 4% of image
perturbation can achieve 80% fool accuracy.
h: Obfuscated Gradient Attack
Many defenses rely on obfuscated gradients [122]–[124],
which make it difficult for an attacker to obtain an effective
gradient and defend against iterative optimization-based at-
tacks. This phenomenon is considered to provide a false sense
of security and leads to improper evaluation of adversarial de-
fenses. Athalye et al. [125] found that defenses relying on this
phenomenon can be circumvented. The authors identify three
types of obfuscated gradients: 1) shattered gradients are non-
existent or incorrect gradients, either intentionally caused by
non-differentiable manipulations or unintentionally caused
by numerical instability. 2) stochastic gradients depend on
test-time randomness. 3) vanish/explode gradients in very
deep calculations leading to unusable gradients. Correspond-
ingly, Athalye et al. introduced three techniques to overcome
the obfuscated gradients caused by these phenomenons: 1)
using Backward Pass Differentiable Approximation (BPDA)
to address shattered gradients. 2) applying Expectation Over
Transformation (EOT) [126] to compute gradients of ran-
domized defenses. 3) resolving vanishing/explosive gradients
by re-parameterization and optimizing on the space where the
gradients do not explode/vanishing.
2) Black-box
a: One Pixel Attack
Su et al. [127] proposed one pixel attack based on the
differential evolution algorithm [128]–[130], which is an
extreme adversarial attack method, and only changing one
pixel can make the network model misclassified. The algo-
rithm iteratively modifies a single pixel and generates a sub-
image, compares it with the parent image, and retains the sub-
image with the best attack effect according to the selection
criteria to achieve the adversarial attack. One pixel attack
can be achieved by modifying a few different pixels, such as
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
modifying 1, 3, or 5 pixels, and the success rates are 73.8%,
82.0%, and 87.3%, respectively.
b: EOT Attack
Previous works have shown that under image transformations
in the real world [131], [132], such as the change of angle
and viewpoint, the adversarial examples generated by these
standard techniques (e.g., FGSM [21]) fail to maintain their
adversarial properties [133], [134]. To address this issue,
Athalye et al. [126] proposed the EOT attack algorithm. The
core formula is as follows:
arg max
x0
EtvT[log P(yt|t(x0))λkLAB(t(x0))LAB(t(x))k2]
(14)
where x0denotes adversarial sample; xdenotes original
image; LAB [135] denotes a color space, and the `2distance
in this color space is the perceived distance; Tdenotes image
transformation distribution.
The basic idea of the algorithm is to transform the distribu-
tion Tthat can model perceptual distortions, such as random
rotation, transformation, or noise. EOT can not only simulate
simple transformations, but also performs operations such as
the three-dimensional rendering of textures.
c: Zeroth Order Optimization
Inspired by the C&W algorithm [118], Chen et al. [136]
proposed the Zeroth Order Optimization (ZOO) method,
which performs a black box attack against the target DNN
by sending a lot of queries and observing responding output
confidence values. ZOO uses zeroth-order optimization to
approximate the network gradient while using dimensionality
reduction, hierarchical attack, and importance sampling tech-
niques to improve attack efficiency. The optimization scheme
of the ZOO is consistent with the C&W algorithm, but the
difference is that it is a black-box attack and cannot obtain the
model gradient. ZOO uses the symmetric difference quotient
[137] to compute the approximate gradient and the Hessian
matrix. On the premise of obtaining the gradient and the
Hessian matrix, the optimal perturbation is generated by
the Stochastic coordinate descent method and using ADAM
method [138] to improve the convergence efficiency.
d: Autoencoder-based Zeroth Order Optimization Method
Tu et al. [139] proposed a generic query-efficient black-box
framework, called Autoencoder-based Zeroth Order Opti-
mization Method (AutoZOOM), which can efficiently gen-
erate adversarial samples under the black-box setting. Auto-
ZOOM leverages an adaptive stochastic gradient estimation
strategy to stabilize the number of queries and the amount of
perturbation, and simultaneously, train the automatic encoder
offline with the unlabeled data, thereby speeding up the
generation of adversarial samples. Compared with standard
ZOO [136], AutoZOOM can reduce the number of queries
while keeping attack effective and adversarial sample visual
quality.
e: Boundary Attack
Brendel et al. [140] pointed out that most methods used
to generate adversarial perturbations rely either on detailed
model information (gradient-based attacks) or on confidence
scores such as class probabilities (score-based attacks). How-
ever, the network model information required for the attack
cannot be obtained in a real scenario, and neither of these two
methods is suitable for a real scenario. Therefore, Brendel
et al. [140] proposed boundary attack, which only depend
on class label. The algorithm starts with an already adver-
sarial sample and then walks randomly along the decision
boundary: 1) keeping in the adversarial area. 2) reducing the
distance to the target image. Finally, the adversarial samples
are iteratively generated.
f: Biased Boundary Attack
The boundary attack [140] uses an unbiased sampling
method that sample perturbation candidates from a multi-
dimensional normal distribution. Although this method is
flexible, it is not efficient for robust models to craft adver-
sarial examples. Brunner et al. [141] redefined the boundary
attack as a biased sampling framework to improve attack
efficiency. The three biases are as followed:
1) Low Frequency Perturbations. Since the perturba-
tions generated by typical attack methods are high-
frequency perturbations, most of the defense meth-
ods also defend against high-frequency perturbations.
Based on this observation, they used low-frequency
Perlin noise [142] to bypass the detection mechanism.
2) Regional Masking. They used the regional mask to
update the areas where the difference between the
sample and the original image is significant, and not
update the extremely similar parts, thereby, effectively
reducing the search space.
3) Gradients from Surrogate Models. The adversarial
samples are transferability. That is, the gradient of
the surrogate model is also helpful for attacking the
target model. Therefore, they used the gradient of the
surrogate model to guide the update direction of the
boundary attack, which improves the attack efficiency.
To a certain extent, the above improvements improve the
efficiency of the algorithm. However, the gradient of the sur-
rogate model relies on the transferability of the model. Later,
Chen et al. [143] further improved the boundary attack by
utilizing Monte Carlo estimation to determine the direction
of the gradient, which does not rely on the transferability of
the model.
B. ADVERSARIAL EXAMPLES IN REAL WORLD
The adversarial attacks mentioned above are mostly in ex-
perimental settings. However, there have been a number
of adversarial samples that have been applied in real-life
applications such as road sign, objection detection, face
recognition. In this section, we will present some real-life
adversarial samples.
14 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 6. An overview of adversarial attacks. The perturbation norm lpis used to make adversarial examples imperceptible. The attack strength (higher for more
?) is based on the impression of the reviewed literature.
Threat Model Method Target/Non-target Perturbation Norm Type Strength
White-box
L-BFGS [19] Targeted l0Optimization-based attack ??
FGSM [21] Non-targeted lGradient-based attack ?
BIM [114] Non-targeted lGradient-based attack ?? ?
PGD [115] Non-targeted lGradient-based attack ?? ?
JSMA [144] Targeted l0Gradient-based attack ?
C&W [118] Targeted l0,l2,lOptimization-based attack ?? ?
Deepfool [119] Non-targeted l2,lGradient-based attack ??
UAP [120] Non-targeted l2,lOptimization-based attack ? ? ?
Obfucated Gradient Attack [125] Targeted lOptimization-based attack ? ? ? ?
Black-box
One Pixel Attack [127] Non-targeted l0Optimization-based attack ?
EOT [126] Targeted l2Transfer-based attack ?? ?
Boundary Attack [140] Targeted, Non-targeted l2,lDecision-based attack ??
Biased Boundary Attack [141] Targeted, Non-targeted l2,lDecision-based attack ? ? ?
ZOO [136] Targeted, Non-targeted l2Confidence-based attack ?? ?
AutoZOOM [139] Targeted, Non-targeted l2Confidence-based attack ?? ?
1) Road Sign
Based on previous attack algorithms [118], [145], Evtimov
et al. [11] proposed a general attack algorithm (robust phys-
ical perturbation) for generating visually adversarial per-
turbations with robustness under different physical condi-
tions (e.g., distance, angle, distortion). The robust physical
perturbations successfully deceive the road sign recognition
system in a real driving environment. To confirm that robust
physical perturbations are generalizable, they affixed graffiti
generated with robust physical perturbations to a microwave
oven and successfully mislead the Inception-v3 classifier
[146] to recognize the microwave oven as a mobile phone.
Lu et al [147] conducted experiments on a sample of physical
confrontations between road sign images and detectors and
showed that YOLO [148] and Faster-RCNN [149] and other
detectors are not currently spoofed by the attack proposed by
Evtimov et al. [11]. However, Eykholt et al. [150] claimed
to be able to generate a small sticker to spoof the YOLO
detector [40] and also to spoof the Faster-RCNN [149]. Chen
et al. [151] further used the EOT technique [126], [152] to
make the attack more robust and successfully mislead the
Faster-RCNN detector [149].
2) Objection Detection
Thys et al [153] proposed a dynamic person target detection
attack method based on the YOLO (You Only Look Once,
You Only Look Once) [148] model. As shown in Figure 6,
they successfully bypassed the detection model detection by
optimizing the image to generate an adversarial patch and
placing it in the center of the human body. They divided
the optimized target loss function into three parts, namely,
Lnps,Ltv and Lobj .Lnps indicates whether the color of
the current patch can be applied to real life; Ltv reflects
the smoothness of the image; and Lobj is the maximum
target detection confidence level in the image. During the
optimization process, the neural network model parameters
were kept constant and only the adversarial patches were
changed. And after each modified patch is rotated, scaled,
FIGURE 6. An example of objection detection [153].
and other basic transformations, it is applied to the dataset
image again to improve the robustness of the adversarial
patch so that it can successfully mislead the detection model.
3) Cyber Security
Papernot et al [154] proposed a realistic adversarial sample
attack on cyberspace, training an agent model on a synthetic
dataset for generating adversarial samples and launching
an attack against the remotely hosted neural networks of
MetaMind, Amazon and Google. The results showed that
the model’s misclassification rates were 84.24%, 96.19%,
and 88.94%, respectively. Similarly, Liu et al [145] also
exploit the transferability of adversarial samples to carry out
attacks, the basic idea of which is to generate an adversarial
sample capable of making multiple models misclassified at
the same time, which is used to carry out transfer attacks.
This approach enabled a black-box attack on a large dataset
ImageNet [8], and successfully attacked Clarifai, a commer-
cial company providing state-of-the-art image classification
VOLUME 4, 2016 15
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
services at the time.
In contrast to transfer attacks, Li et al. [155] improved
on single-pixel and boundary attacks, respectively. Based
on the single-pixel attack [156], it improves efficiency by
gradually increasing the number of pixel modifications and
incorporating the idea of semantic segmentation. In contrast,
semantic segmentation and greedy ideas are introduced in the
boundary attack [140] to improve efficiency. Li et al. [155]
also conducted black-box attacks on computer vision-related
services (e.g., image classification, object recognition, and
illegal image detection) provided by five major cloud service
providers, Amazon, Microsoft, Google, Baidu, and Alibaba,
respectively, with a success rate of almost 100%.
4) Face Recognition
Sharif et al. [157] developed a systematic method of attacking
against face recognition systems by simply adding a pair of
eyeglass frames to make the face recognition system recog-
nize errors. Zhou et al. [158] studied an interesting example
of a real-world adversarial attack and found that infrared light
can also be used to interfere with face recognition systems.
An attacker can install an LED light on the brim of a hat
and use it to shine on the face to produce a purple light that
is invisible to the human eye, but it can be captured by the
camera sensor to evade detection by the facial recognition
system.
C. ADVERSARIAL DEFENSES
Several defenses against adversarial attacks were proposed
in recent studies, which grew along with three main direc-
tions: pre-processing, improving the model robustness, and
malware detection, as shown in Figure 7.
1) Pre-processing. Pre-processing attempts to reduce the
impact of the adversarial perturbation by performing
some operation (e.g., denoising, randomization, recon-
struction, scaling) on the input image. This defense
mechanism is deployed before input and the first layer
of the model, which usually requires no modification
to the model and can be directly extended to the pre-
trained model.
2) Improving the Robustness of the Model. Improv-
ing the robustness of the model aims to enhance
the model’s ability to resist adversarial samples by
modifying the model architecture, training algorithms,
regularization. The retraining and adversarial training
usually entail significant computational overhead.
3) Malware Detection. Malware detection is deployed
between input and the first layer of the model to detect
the input to determine whether the input is an adver-
sarial sample. The corresponding measures would be
taken immediately to block this attack process if the
user input is a malicious sample.
In this section, we describe these defense stragies from the
three direction mentioned above.
1) Pre-processing
a: Randomization
Wang et al. [159] proposed a novel adversarial defense ap-
proach that prevents adversaries from constructing impactful
adversarial samples by randomly nullifying features within
samples, which makes a DNN model non-deterministic and
significantly reduce the effectiveness of this adversarial per-
turbation. Prakash et al. [160] proposed a method termed
as pixel deflection to defend against adversarial examples,
which consists of two parts: redistributing pixel values and
wavelet-based denoising. The pixel deflection first makes full
use of CNN’s resistance to the natural noise by randomly
replacing some pixels with randomly selected pixels from a
small neighborhood. Then, the thresholding process in the
wavelet domain is taken to effectively soften the corruption
caused by redistributing pixels and some of the adversarial
changes. The experiments demonstrated that combining these
techniques can effectively reduce the impact of adversarial
perturbation on the classifiers. Similarly, Ho et al. [161]
proposed Pixel Redrawn (PR) as a defense against the adver-
sarial examples, which redraws every pixel value of training
images into a different pixel value. First, a prediction model
is trained to generate a prediction image, and the range of
the value of the image pixel value is divided into sections.
The original image is fed into the prediction model to obtain
a prediction image, and the interval of each pixel value of
the prediction image is obtained. Then the random value in
the interval is used to replace the pixel value of the original
image. Finally, the modified image is fed into the classifier.
Experimental results on several benchmark datasets [68],
[162], [163] showed that the PR method not only relieves
overfitting but it also boosts the robustness of the neural
network.
b: Image Transformation
Dziugaite et al. [164] first demonstrated that the JPEG com-
pression could reduce the classification errors caused by the
adversarial examples generated by FGSM algorithm [21].
However, the defense effect of JPEG compression decreases
with the increase in the magnitude of the perturbation. Das et
al. [165] further studied that an important capability of JPEG
compression is that it can remove high-frequency signal
components inside the square blocks of the image, which is
equivalent to selectively blur the image, which can eliminate
adversarial perturbation on the image. Therefore, Das et al.
[165] proposed a JPEG compression preprocessing module
that can be quickly built on a trained network model to
protect a model from multiple types of adversarial attacks.
However, Guo et al. [131] found that total variance minimiza-
tion [166] and image quilting [167] are stronger defense than
deterministic denoising procedures (e.g., JPEG compression
[164], bit depth reduction [168], non-local means [169]).
Based on simple image transformations, Raff et al. [170]
proposed to combinate a series of simple defenses (e.g, bit
depth reduction [168], JPEG compression [164], wavelet
16 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 7. An overview of the adversarial defense.
denoising [171], mean filtering [172], non-local mean [169])
to build a strong defense mechanism to resist adversarial
samples, and obfuscated gradient was taken into account
[125]. The basic intuition is to stochastically select several
transforms from a large number of random transforms, and
apply each transform in a random order before the image is
fed into the network. The method is also robust in large-scale
datasets [173].
c: Denoising Network
The traditional denoising encoder [174] is a popular de-
noising model, but it cannot completely remove adversarial
perturbations. To address this problem, Liao et al. [175] de-
signed a High-level representation Guided Denoiser (HGD)
as a defense against adversarial examples, which used U-
net [176] as the denoising network model. The U-net [176]
network model differs from traditional autoencoders in two
ways. The first is that the denoising network does not use
pixel-level construction loss function, but use the difference
between top-level outputs of the target model induced by
original and adversarial examples as the loss function. The
second difference is that network learns adversarial pertur-
bations rather than constructing the entire image. However,
Athalye et al. [177] pointed out that the HGD method cannot
effectively prevent white-box attacks.
d: GAN-based Defense
Samangouei et al. [124] proposed a defensive framework
based on GAN [52]. The main idea is to train an adversarial
generation network using the original dataset, and utilize the
generator’s expression to reconstruct a clean image that is
similar to the original image but does not contain adversarial
perturbations. An overview of the defense framework is
shown in Figure 8. An adversarial example is reconstructed
by the adversarial generative network, a reconstructed im-
age similar to the original image is obtained, and the re-
constructed image is fed into the target network model
for classification. The introduction of random seeds makes
the entire network model difficult to attack. However, this
method cannot effectively prevent white-box attacks on the
CIFAR-10 dataset [163], and Athalye et al. [125] used BPDA
technology to attack this defense mechanism on the MNIST
dataset [68]. However, the success rate was only 48%.
Likewise, based on bidirectional generative adversarial
networks [178], [179], Bao et al. [180] proposed a novel de-
fense approach, Featured Bidirectional Generative Adversar-
ial Network (FBGAN), which can learn the latent semantic
features of the image that is unchanged after the image is per-
turbed. After the bidirectional mapping, the adversarial data
can be reconstructed to denoised data by extracting semantic
features, which can be fed into any pre-trained classifier. The
experiments showed that the FBGAN is effective for any pre-
trained classifier under the white-box and gray-box attack.
e: Image Super-Resolution
Mustafa et al. [181] hypothesized that a generated-well im-
age super-resolution model is enough to project the off-the-
manifold adversarial samples into the natural image man-
ifold. Therefore, Mustafa et al. [181] proposed a defense
mechanism, deep image restoration networks, to defend
against a wide range of recently proposed adversarial attacks.
First, the adversarial perturbation is suppressed by wavelet
domain filtering [182]. Second, the image super-resolution
model [183] was used to enhance the visual quality of the
image. Their method can easily complement the existing
defense mechanism without retraining the model while im-
proving the classification accuracy. The disadvantage is that
it depends on the expressive power of the super-resolution
model.
2) Improving the model robustness
a: Adversaral Training
Goodfellow et al. [21] firstly proposed adversarial training
to enhance the robustness of the model, Kurakin et al. [114]
used the batch normalization [75] method and successfully
extended it to the Inception-v3 model [146] and the ImageNet
dataset [1]. However, its disadvantage is that it can only
defend against single step attacks [21], and it can not defend
against iterative attacks [115]. Chang et al. [184] proposed
a training method based on dual adversarial samples, which
can resist both single-step adversarial samples and iterative
adversarial samples. Much adversarial training can only de-
VOLUME 4, 2016 17
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 8. An overview of Defense-Gan framework [124].
fend against specific adversarial attacks [21], [185], [186].
Madry et al. [115] proposed PGD attack for adversarial train-
ing. However, Madry et al. [115] only conducted adversarial
training on the MNIST [68] and CIFAR-10 datasets [163].
Subsequently, Kannan et al. [187] successfully extended it
to the ImageNet dataset [1]. They formed a pair of similar
samples, and the degree of similarity of the model output of
the paired samples is used as part of the loss function. Their
method is robust on the ImageNet dataset [1] and exceeds
the best performing integrated adversarial training method at
that time [188]. Adversarial training is currently considered
to be the most effective method to defend against adversarial
samples. Its main disadvantage is that it is computationally
expensive, and the improvement of adversarial training is still
in progress.
b: Regulation
The neural network model cannot learn the robust features,
and slight changes in the image cause the classifier to decide
wrongly. To address this problem, Liu et al. [189] proposed
a feature prioritization model based on non-linear attention
modules and L2feature regularization to make model clas-
sification depend on robust features. The attention module
encourages the model to rely heavily on the robust features
by assigning larger weights to them while suppressing non-
robust features. The L2regularizer prompts the extraction of
similar essential features of the original image and the adver-
sarial examples, effectively ignoring the added perturbations.
c: Feature Denoising
Xie et al. [190] suggested that adversarial perturbations are
amplified layer by layer in the network, resulting in a large
amount of noise in the network’s feature map. The features
of the natural image mainly focus on the semantic features
in the image, and the features of the adversarial examples
are also activated in the semantically irrelevant areas. There-
fore, Xie et al. [190] developed a new network architecture
that improves the model robustness by performing feature
denoising. Although the denoising module cannot improve
the classification accuracy in the original dataset, the com-
bination of the denoising module and adversarial training
can significantly improve the robustness of the model under
white-box and black-box attacks. In Competition on Adver-
sarial Attacks and Defenses (CAAD) 2018, Their method
still achieved a classification accuracy of 50.6% against 48
unknown attacks.
d: Convolutional Sparse Coding
Based on convolutional sparse coding [191], [192], Sun et
al. [193] proposed a novel defense method, which projects
adversarial examples into a stratified low-dimensional quasi-
natural image space, where the adversarial examples are
similar to the natural image without adversarial perturba-
tions. In the training phase, they introduced a novel Sparse
Transformation Layer (STL) between the input and the first
layer of the neural network to efficiently project images into
quasi-natural image space and train the classification model
with the image projected to the quasi-natural space. In the
testing phase, the image that projects the original input to
quasi-natural space is fed into the classification model. Com-
pared with other adversarial defense methods for unknown
attacks, Their method is more robust in terms of the size
of adversarial perturbations, various image resolutions, and
dataset size.
e: Blocking the transferability
The adversarial samples crafted by particular network models
are likely to mislead other classifiers with different archi-
tectures or trained with different training data, which is
known as transferability. To address this problem, Hosseini
et al. [194] proposed an empty label approach to defending
against adversarial sample transfer attacks under the black-
box settings. The main idea is to add a NULL label to the
output class and train the classifier to project the adversarial
into the NULL label. The advantage of their method is the
ability to classify the adversarial example into the NULL
classes, rather than other error classes, effectively preventing
the transfer problem of the adversarial samples, while also
maintaining the accuracy of the model.
3) Malware Detection
a: Stateful Detection
Up to date, the defenses against white-box adversarial exam-
ples have proven difficult to achieve, and white-box attacks
are not practical in real-world scenarios. The ML services
provided by cloud platforms are generally based on queries.
18 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Therefore, Chen et al. [195] firstly proposed a black-box
defense method based on stateful detection. Compared with
the stateless defense currently researched, their method en-
hances the capabilities of the defender. An overview of the
defense framework is shown in Figure 9. First, to com-
press the storage of user query records, similar encoding
is used for compression encoding. Then, the user’s query
input is compared with previous records, and the distance
dis calculated using the knearest neighbor algorithm. If
dis less than the threshold value, the user is considered to
be performing a malicious attack. Under black-box attacks
NES [196] and border attacks [140] setting, the experimental
analysis showed that query-based black-box attacks usually
require hundreds of thousands to millions of queries, which
could easily trigger their defense mechanisms. Even if the
defense mechanism is not triggered, the storage services
required for the attack will consume many resources. The
disadvantage of stateful detection methods is that it cannot
defend against transfer attacks that do not require any query.
However, this method can combine with adversarial training
to defend against transfer attacks, which can compensate for
the deficiency of stateful detection and make it perform better
in the case of black-box attacks.
b: Image Transformation
Tian et al. [197] showed that adversarial samples are usually
susceptible to image transformations, such as rotating and
shifting, but the natural image is usually immune to these
transformations. Based on this observation, they proposed a
novel adversarial example detection mechanism that can ef-
fectively detect the adversarial attack. First, a set of transfor-
mation operations is performed on an input image to generate
multiple transformed images. Then, the classification results
of these transformed images are fed into the classifier to train
a classifier that predicts whether the input image had been
perturbed by an adversary or not. To defend against more
complex white-box attacks, they also introduced randomness
during the transformation process. The experiments showed
that the detection rate of the adversarial samples crafted by
the C&W [118] algorithm reaches 99%, and the detection
rate of their method reaches more than 70% under white-box
attacks setting. This approach is relatively simple, requiring
only a simple transformation of the input image, however,
the angle of transformation often affects the performance of
the defense and may fail in the face of stronger adversarial
attacks [125].
c: Adaptive Denoising
The traditional denoising has a significant effect on large
noise. However, in the case of small noise, denoising can
also make the image blurred, resulting in low classification
performance. To address this problem, Liang et al. [198]
introduced two typical image processing techniques: scalar
quantization and smoothing spatial filter, to reduce the im-
pact of adversarial perturbation on the classifier. The cross-
entropy is employed as a metric to implement an adaptive
noise reduction for different kinds of images. They first
utilize cross-entropy to adaptively quantize the interval size
and then determine whether spatial smoothing filtering is
required. A classifier, respectively, classifies the denoised im-
age and the original image. If the prediction of the classifier
on the denoised image and the original image is consistent,
the original image is considered to be a normal sample.
Otherwise, it is considered to be an adversarial example.
However, their method works poorly against attacks that
modify only a fraction of the image pixels.
D. POISONING ATTACKS
An overview of poisoning attacks is shown in Figure 10.
The adversary can reduce the performance of the model or
manipulate the prediction of the model by injecting malicious
samples into the model training data during the training
phase. A large number of poisoning attacks were used in DL,
which can be divided into three categories: accuracy drop
attack, targeted misclassification attack, and backdoor attack.
A summary of poisoning attacks and corresponding defense
strategies is provided in Table 9.
1) Accuracy Drop Attack
Muñoz-González et al. [201] proposed a novel poisoning
attack based on gradient-based optimization, which targets a
wider class of ML algorithms, such as DL architectures. The
algorithm calculates the gradient by back propagation while
reversing the underlying learning process, it traces the pa-
rameters executed during the learning and updates the entire
sequence. The algorithm only requires the learning algorithm
to update its parameters in a smooth manner during training
(for example, gradient descent) to track these changes back-
ward properly. They also demonstrated that attacks designed
for specific learning algorithms are still effective for different
learning algorithms.
The attacks that create poisoning samples based on back-
gradient optimization is computationally intensive and inef-
ficient. Inspired by GAN [202], Yang et al. [203] proposed a
general method to accelerate the generation of poisoning data
using generators and discriminators. The autoencoder is used
as the generator in GAN to generate poisoned data. The target
model is treated as a discriminator in GAN, which receives
the poisoned data, calculates the loss, and then sends the
calculated gradient back to the generator. The experiments
showed that this algorithm has more than 230 times higher
computational efficiency than the direct gradient poisoning
algorithm.
Similarity, Muñoz-González et al. [20] proposed to use
GAN to generate poisoning samples that look like real data
points, but it leads to reducing the accuracy of the classifier
when it is used in training. They proposed a model called
pGAN, which mainly consists of three parts: generator, dis-
criminator, and target classifier. The purpose of the generator
is to generate poisoning points to maximize the error of the
target classifier, but minimize the ability of the discriminator
to distinguish the poisoning points from the original data
VOLUME 4, 2016 19
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 9. An overview of stateful detection framework [195].
FIGURE 10. An overview of poisoning attacks.
points. The purpose of the classifier is to minimize some of
the loss functions evaluated on the training data that contains
a small portion of poisoning points. The generator is utilized
to maximize the convex combination of the discriminator and
the classifier’s loss on the poisoning data points to ensure that
the model has a mechanism to control the detectability of the
generated poisoning points. The aggressiveness of the attack
can be controlled by the parameters of the weighted sum of
the two losses. The pGAN can be utilized for system testing
of ML classifiers at different risk levels by controlling the
trade-off between effectiveness and detectability of attacks.
2) Targeted Misclassification Attack
The concept of targeted misclassification attack was first
proposed by Koh and Liang [204]. DL system performs well
in various applications, but, DL models are extremely poor in
interpretability. Therefore, Koh and Liang [204] attempted to
use a classic influence function from robust statistics [205] to
explain the prediction of the black-box model. The influence
functions track the prediction of the model to identify the
training points that are most relevant to a given prediction.
The adversary observes the changes predicted by the model
through up weighting a training point and perturbing a train-
ing input and leverage Euclidean distance to find the training
point most relevant to the test point. A poisoning sample is it-
eratively generated by modifying the influence of the training
point on the test point. Later, Shafahi et al. [206] proposed
a new poisoning attack, clean-label attack, which generates
poisoned instances via feature collisions. The clean-label
attacks can control the behavior of the classifier on a specific
test instance while keeping the performance of the entire
classifier. Besides, the adversary is not required to have the
ability to control the label of the training data. Compared
with the poisoning attack proposed by Koh and Liang [204],
the clean-label attack has better performance on the same
dog-vs-fish classification task. Under the end-to-end training
scenario, only 50 poisoned instances are needed to achieve
an attack success rate of 60%. Furthermore, Zhu et al. [12]
proposed transferable clean-label poisoning attacks under the
black box model through the convex polytope strategy, which
has a higher attack success rate than feature collisions [206]
in the black-box model. The adversary has no knowledge
of the parameter of the target model but has knowledge
of the similar training data as the target model. Then, the
adversary can train a substitute model on this training data
and optimizes a novel objective that makes the poisons form
a convex polytope to wrap the target image. In this way, a
linear classifier that is overfitted on the poisoning dataset can
classify the target image into the same class as the poisoning
data.
3) Backdoor Attack
Due to the expensive cost of training models, several users
outsource the training process to the cloud servers or rely
on pre-trained models and then fine-tune specific tasks. Gu
et al. [13] proposed BadNet, a maliciously trained backdoor
network, which performs well in the training and verification
phases, but can mislead specific data during the testing phase.
The adversary selects a backdoor trigger composed of pixels
and related color intensity, which can be of any shape, such
as square. The algorithm assumes that the adversary can
control the entire training process (e.g., parameters, learning
rate) and then use the poisoning training data to construct a
backdoor network sensitive to specific backdoor triggers. The
20 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 7. A summary of adversarial defense. The defense strength evaluates how powerful a defense is against different adversarial attacks (stronger for more ?),
(?) means a defense was broken.
Type Method Advantage Shortcoming Strength
Pre-processing
Random Feature
Nullification [159]
Increase the difficulty
of crafting examples Need to modify model architecture ?
Pixel Deflection [160]
Simple, generic,
low computional cost
Sensitive to the size of the
perturbation and might be
defeated by the strong attack
(e.g., PGD attack [115])
?
Pixel Redrawn [161] ?
JPEG [165] ?
TVM [131] ??
Defense-gan [124] ?
Barrage Defense [170] Not depend on
obfucased gradient
Need a amount of
simple defense methods ? ? ?
FBGAN [180] Generate clean examples
with semantic features
Depend on the reconsturction
accuracy of the FBGAN ??
HGD [175] Effectively remove perturbations Depend on the model’s representation ?
Image
Super-resolution [181]
Improve the accuracy
of clean examples
Depend on the representation of
the image super-resolution. ??
Improving the
Robustness of
the Model
Adversarial Training
at Scale [114]
Extend adversarial training [21]
to ImageNet [1] dataset
Not defend iterative attack
and increase the overhead of training ??
Feature Prioritization
and Regularization [189]
Defend white-box iterative attack and
improve the decision boundary
The huge computational overhead of
adversarial training
? ? ?
e2SAD [184] ?? ?
PGD Adversarial
Training [115] ?? ?
Logit Pairing [187] ??
Feature Denoising [190] ?? ?
BANG [199] Not depend on data augumentation
and adversarial example
Need adjust the hypeparameter
with different dataset ??
Stratified Convolutional
Sparse Coding [193]
Not need to modify
original model Need to train a additional network ?? ?
Block Transferablility [194] Block transfer attack Only apply for transfer attack. ??
Malware
Detection
Image Transformation [197] Simple The angle of trainsformation
affect the performance ?
Stateful Detection [195] Block process of attack Not defend transfer attack and
need to store user query record ? ? ?
I-defender [200] Not need any knowledge
of attack method Might be bypassed by strong attack ??
Adaptive Noise
Reduction [198] Adaptive to remove perturbation Not apply for attack that only
modify a fraction of pixel ??
backdoor triggers lead the neural network to misclassify the
backdoor data into the target label specified by the adversary.
The experiments showed that a backdoor network could
attack the network model trained on the MNIST datasets
[68] with a success rate of over 99% without affecting the
performance of the neural network. Liu et al. [14] proposed a
trojaning attack that does not depend on the original training
data. The trojaning attack takes the existing model and target
prediction output as input, and modifies the model to generate
a small part of the input data, called the trojan trigger. Any
valid input with a trojan trigger of the model will cause
the mutated model to output a specific prediction category.
The trojaning attack is implemented in three steps: 1) trojan
trigger generation. The trojan trigger is a special input that
triggers the behavior of the trojan neural network to be
misbehaving. 2) training data generation. As the trojaning
attack has no knowledge of the original training data, it is
necessary to derive a set of training data to retrain the model.
3) retraining model. Due to the retraining of the entire model
is very expensive, they used the trigger and the reverse-
engineered images to retrain a part of the model.
Chen et al. [207] proposed targeted backdoor attacks that
can be applied to very weak threat models: 1) the adversary
has no knowledge of the target model. 2) the adversary can
inject a small portion sample into the training data. 3) the
backdoor key is hard to be noticed by human visual system.
The attack strategy is divided into two steps. First, a poison-
ing sample is generated and added to the training data. Then,
a sample backdoor is created to make the neural network mis-
classify the sample as the target label. The previous backdoor
attacks [13], [14], [207] mainly focused on the establishment
of triggers. Furthermore, Li et al. [208] focused on how to
make triggers invisible and proposed an invisible backdoor
attack, which makes the backdoor attacks hardly perceptible
for the human vision system, while ensuring that the neural
network can still recognize backdoor triggers. They used
the Perceptual Adversarial Similarity Score (PASS) to define
people’s ability to recognize triggers and used `2and `0
regularization to hide the trigger in the entire image, making
the trigger less obvious.
E. POISONING DEFENSES
VOLUME 4, 2016 21
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 8. An overview of poisoning attacks. The attack strength (higher for more stars) is based on the impression of the reviewed paper.
Method Type Advantage Shortcoming Strength
Back-Gradient
Optimization [201] Reducing
Accuracy
Attack
Simple and Effective Limited to the model trained with a
large number of training points ?
Auto-Encoder
Generator [202]
Accelerate the generation rate
of the poisoned data Need train an extra model as generator ??
GAN Poisoning
Attack [20]
Manipulate a trade-off between effectiveness
and detectability of the attack
Need train an extra GAN
(generator and discriminator) ?? ??
Influence
Function [204] Target
Classification
Attack
Only the final fully connected layer of
the network was retrained
Low attack success rate and need the
full knowledge of the model and data ??
Targeted Clean-Label
Poisoning Attacks [206]
No need to control over the labeling
of training data
Need has knowledge of the model and
its parameters ?? ?
Transferable Clean-Label
Poisoning Attacks [12]
No need to access to the target
model and dataset
Difficult to achieve the desired results
under the end-to-end training model ? ? ??
BadNet [13]
Backdoor
Attack
Simple and Effective Need to tamper with the original
training process ??
Trojaning Attack [14] Generate stealthy semantic
trojan trigger Unequally distributed mispredicted results ?? ?
Targeted Backdoor
Attacks [207]
About 50 poisoned samples, the attack success
rate can reach more than 90%
Auxiliary pristine data can reduce
the attack effect ?? ??
Invisible Backdoor
Attacks [208]
Invisible to detector and human
body inspection
Not practical under the data
collection strategy ? ? ??
1) Defense against Accuracy Drop/Targeted
Misclassification Attack
For the almost infinite possible attack space, it is impossible
to draw conclusions based on experience alone, whether the
defense against the known attack set can defend against
future attacks. Therefore, Steinhardt et al. [209] proposed a
framework to solve this problem by removing outliers and
keeping them out of the feasible set. For binary classification,
a natural defense strategy is to find the centroids of positive
and negative classes and remove points that are too far
from the corresponding centroids. There are two ways to
implement it. The sphere defense removes points outside the
radius of sphere, and the slab defense, which first projects
points onto a straight line between the centroids and then
discards points that are too far away from the straight line.
The so-called “optimal” attacks choose poisoning samples
to maximize the damage to target models [210]–[212]. How-
ever, such attacks usually only focus on the learning algo-
rithm while ignoring the data preprocessing step. Base on this
observation, Paudice et al. [213] proposed a defense strategy
based on data pre-filtering with outlier detection to mitigate
the effects of optimal poisoning attacks. The method uses
distance-based anomaly detection to detect poisoned samples
using a small number of trusted data points. They split a
small fraction of trusted data Dinto different categories, i.e.,
D+and D. Then, these curated data were used to train
a distance-based outlier detector for each category. Next,
the Empirical Cumulative Distribution Function (ECDF) was
used to calculate the outlier detection threshold based on
the outliers score. Finally, these samples were filtered by the
threshold to get clean datasets to retrain the model.
2) Defense against Backdoor Attacks
Liu et al. [214] proposed a "fine-pruning" method, that is,
a method combining pruning and fine-tuning. The pruning
defense resists backdoor behavior by removing neurons that
are dormant for clean inputs in the backdoor network. For
pruning defense, they developed a stronger pruning-aware
attack, which evades pruning defense by focusing clean and
backdoor behavior on the same set of neurons. In order to
defend against stronger pruning-aware attack, they proposed
a fine-tuning defense strategy, which locally retrains the
network on clean training data, However, since the accuracy
of the backdoored DNN on clean input does not depend on
the weight of the backdoor neurons, the defense effect of fine-
tuning defense is not significant. Finally, by combining the
advantages of pruning and fine-tuning, they putted forward
the method of fine-pruning. The idea behind fine-tuning first
prunes the DNN returned by the attacker and then fine-tuning
the pruned network. In some cases, this method can even
reduce the success rate of backdoor attacks to 0%.
Chen et al. [215] hypothesized that the features activated
by the activation function between standard and backdoor
samples are different from the neural network. Hence, Chen
et al. [215] proposed an Activation Clustering (AC) method
to detect poisoning training samples. The AC method ana-
lyzes the activations of the last hidden layer of the neural
network to determine whether the input is poisoned. The
AC method is also the first method to detect poisoning
data for inserting backdoors and repair models that do not
require verified and trusted datasets. They have shown that
the effectiveness of the AC method at detecting and repairing
backdoors. In addition, the experiment also demonstrated that
the AC method is robust to multimodal classes and complex
poisoning schemes.
V. FUTURE RESEARCH WORK
A. DL PRIVACY
1) Lightweight Privacy-preserving Techniques
The recent work has proposed several regarding the privacy-
preserving DL technique to protect the privacy of sensitive
data. However, there is still a lot of work to do before it
22 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 9. An overview of poisoning defenses. The defense strength (higher for more stars) is based on the impression of the reviewed paper.
Method Type Advantage Shortcoming Strength
Certified
Defenses [209] Defense against Accuracy
Drop/Misclassification
Attack
Detect potential attacks Need the trusted dataset to
train the detector ??
Anomaly
Detection [213]
Mitigate the effects of optimal
poisoning attacks
Need the trusted dataset to
train the detector ? ? ?
Fine-Pruning [214] Defense against
Backdoor
Just need to fine-tuning Affect the accuracy of the trained model ?? ?
Activation
Clustering [215] No affect the accuracy of the model Need access to the trojaned samples ?? ??
is applied in practical application. The biggest deflect of
privacy-preserving DL techniques are computation cost. Due
to the non-linear operation of DL, the computation cost is
enormous, which seriously reduces the availability of the DL.
An important challenge regarding privacy-preserving DL
techniques is to decrease the overhead of privacy-preserving.
2) Intellectual Property Protection of DL Model
A well-performing ML model requires massive amounts of
training data, a large amount of hardware resources, and
a lot of time for parameter tuning. Therefore, the labeled
training dataset, model architecture, and model parameters
have been considered as a commercial intellectual property
and therefore need to be protected. Currently, there are only a
few works of intellectual property protection for watermark-
based machine learning models [48], [109], [110], and the
effectiveness is still difficult to secure. For neural network
models, a more effective and secure intellectual property
protection method is still to be solved.
3) Generic Privacy-preserving Techniques
Most current privacy-preserving technologies can only make
privacy predictions during the testing phase, and only a
few solutions can be trained on encrypted data. Moreover,
the inference privacy-preserving models are trained on the
unencrypted data on unencrypted typical models, and then,
the trained weights and biases are applied to different models
in which the activation function is replaced with a simple
activation function, such as a square function. Differences
between the trained and inferred models typically result in
a high degree of degradation in performance of the model.
Therefore, the current privacy-preserving techniques still re-
quire a lot of custom work for each DL model. A general
privacy-preserving framework is a challenge to be solved in
the future.
B. DL SECURITY
1) Adversarial Examples in Real World
The adversarial examples was first proposed for the im-
age classification task [19], but recent works have found
adversarial for cyberspace attack [154], [155], stop sign
recognition [216], objection detection [153], [217], semantic
segmentation [218], [219], face recognition [157], authorship
recognition [220]. Although there are already adversarial
samples for different application scenarios, there is still a
large gap with the application scenarios in the real world.
For example, the adversarial example is sensitive to physical
environments such as light, angle, transformation. The ro-
bust adversarial examples against real physical environments
raised the interest of a large number of researchers.
2) Robustness Adversarial Examples
The adversarial attacks have evolved from gradient-based
attacks [21], [115] to decision-based attacks [196], [221],
and the amount of information required for their attacks
has gradually decreased and has been successfully applied
to real scenarios. The algorithm for generating adversarial
samples has made great progress in recent years, but there
are still many deficiencies and limitations. For example, there
are too many queries and the attacks are not stable enough
on different application scenarios. In the future, the more
efficient, smaller disturbance, and stable black box attack
algorithms are still the focus of research.
3) Defenses against Adversarial Examples
As mentioned in this paper, the majority of defense strategies
are only effective against specific known attacks and are
not well generalized in unknown attacks. To the best of our
knowledge, there is still no defensive method against adver-
sarial examples that can completely defend against white-
box attacks. Adversarial training is considered to be the most
effective method of defending against adversarial attacks, but
it has the drawback of huge computational overhead. In short,
how to train a robust model with negligible overhead remains
a hot topic for future research.
4) Systematic Evaluation Adversarial Defenses
Adaptive adversary have (rightfully) become the de facto
standard for evaluating adversarial defenses. However,
Tramer et al. [222] showed that the evaluation of adaptive
adversary in many published papers is incomplete or flawed.
How to properly perform an adaptive attack assessment
against adversarial instance defenses is particularly impor-
tant. A system evaluation criterion and metrics for defense
mechanisms is still a problem to be solved.
5) Why Are the Reasons for the Presence of Adversarial
Examples?
Why are DL models so vulnerable to adversarial samples?
There is some discussion in the current research, but there
VOLUME 4, 2016 23
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
is still a lack of consensus. Ilyas et al. [223] pointed out
that models trained on the same dataset tend to learn similar
non-robust features, which accounts for the transferability
of adversarial examples. However, why DL model tend to
learn non-robust features and how to make them learn robust
features is still an open question.
VI. SUMMARY
DL has been extensively applied in a variety of application
domains such as speech recognition, medical diagnosis, but
the recent security and privacy issues of DL have raised
concerns of the researcher. One of the keys to the rise of
DL is to rely on the vast quantities of data, which is also
accompanied by the risk of privacy leakage. In this paper,
we first describe the potential risks of DL and then reviewed
the two types of attack: model extraction attack and model
inversion attack in DL and four typical defense technologies
for protecting the data privacy of user: DP, HE, SMC, and
TEE. We then investigated two types of attacks: adversarial
attacks and poisoning attacks. In adversarial attacks, we
reviewed the representative black box and white box attack
in recent studies, and reviewed the adversarial attacks under
the physical condition. Regarding the defense methods of se-
curity, we describe the defense approach from three aspects:
pre-processing, improving model robustness, and malware
detection. Finally, the unresolved problems and the direction
of future work are discussed.
REFERENCES
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural infor-
mation processing systems, 2012, pp. 1097–1105.
[2] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly,
A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al., “Deep neural
networks for acoustic modeling in speech recognition: The shared views
of four research groups,” IEEE Signal Processing Magazine, vol. 29,
no. 6, pp. 82–97, Nov 2012.
[3] G. E. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent pre-
trained deep neural networks for large-vocabulary speech recognition,
IEEE Transactions on Audio, Speech, and Language Processing, vol. 20,
pp. 30–42, 2012.
[4] Y. Sun, Y. Chen, X. Wang, and X. Tang, “Deep learning face rep-
resentation by joint identification-verification,” in Advances in neural
information processing systems, 2014, pp. 1988–1996.
[5] Y. Sun, D. Liang, X. Wang, and X. Tang, “Deepid3: Face recognition
with very deep neural networks,” 2015.
[6] D. Shen, G. Wu, and H.-I. Suk, “Deep learning in medical image
analysis,” Annual review of biomedical engineering, vol. 19, pp. 221–
248, 2017.
[7] L. Yu, S. Wang, and K. K. Lai, “Credit risk assessment with a multistage
neural network ensemble learning approach,” Expert systems with appli-
cations, vol. 34, no. 2, pp. 1434–1444, 2008.
[8] D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. Driessche,
J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Diele-
man, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap,
M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the
game of go with deep neural networks and tree search,” Nature, vol. 529,
pp. 484–489, 01 2016.
[9] O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik,
J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grand-
master level in starcraft ii using multi-agent reinforcement learning,
Nature, vol. 575, no. 7782, pp. 350–354, 2019.
[10] J. Xiong, R. Bi, M. Zhao, J. Guo, and Q. Yang, “Edge-assisted privacy-
preserving raw data sharing framework for connected autonomous vehi-
cles,” IEEE Wireless Communications, vol. 27, no. 3, pp. 24–30, 2020.
[11] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao,
A. Prakash, T. Kohno, and D. Song, “Robust physical-world attacks on
deep learning visual classification,” in Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, 2018, pp. 1625–1634.
[12] C. Zhu, W. R. Huang, A. Shafahi, H. Li, G. Taylor, C. Studer, and
T. Goldstein, “Transferable clean-label poisoning attacks on deep neural
nets,” 2019.
[13] T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnerabilities
in the machine learning model supply chain,” 2017.
[14] Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang,
“Trojaning attack on neural networks,” in NDSS, 2018.
[15] F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Stealing
machine learning models via prediction apis,” in 25th {USENIX}Secu-
rity Symposium ({USENIX}Security 16), 2016, pp. 601–618.
[16] B. Wang and N. Z. Gong, “Stealing hyperparameters in machine learn-
ing,” in 2018 IEEE Symposium on Security and Privacy (SP). IEEE,
2018, pp. 36–52.
[17] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership infer-
ence attacks against machine learning models,” in 2017 IEEE Symposium
on Security and Privacy (SP). IEEE, 2017, pp. 3–18.
[18] Y. Long, V. Bindschaedler, L. Wang, D. Bu, X. Wang, H. Tang, C. A.
Gunter, and K. Chen, “Understanding membership inferences on well-
generalized learning models,” 2018.
[19] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow,
and R. Fergus, “Intriguing properties of neural networks,” 2013.
[20] L. Muñoz-González, B. Pfitzner, M. Russo, J. Carnerero-Cano, and E. C.
Lupu, “Poisoning attacks with generative adversarial nets,” 2019.
[21] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing
adversarial examples,” 2014.
[22] N. Akhtar and A. Mian, “Threat of adversarial attacks on deep learning
in computer vision: A survey,IEEE Access, vol. 6, pp. 14410–14 430,
2018.
[23] H. C. Tanuwidjaja, R. Choi, and K. Kim, “A survey on deep learning tech-
niques for privacy-preserving,” in International Conference on Machine
Learning for Cyber Security. Springer, 2019, pp. 29–46.
[24] A. Boulemtafes, A. Derhab, and Y. Challal, “A review of privacy-
preserving techniques for deep learning,” Neurocomputing, vol. 384, pp.
21–45, 2020.
[25] X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and
defenses for deep learning,” IEEE Transactions on Neural Networks and
Learning Systems, vol. 30, no. 9, pp. 2805–2824, 2019.
[26] Q. Liu, P. Li, W. Zhao, W. Cai, S. Yu, and V. C. Leung, “A survey on
security threats and defensive techniques of machine learning: A data
driven view,” IEEE Access, vol. 6, pp. 12103–12 117, 2018.
[27] N. Papernot, P. McDaniel, A. Sinha, and M. P. Wellman, “Sok: Security
and privacy in machine learning,” in 2018 IEEE European Symposium on
Security and Privacy (EuroS&P). IEEE, 2018, pp. 399–414.
[28] C. Dwork, “Differential privacy: A survey of results,” in Interna-
tional conference on theory and applications of models of computation.
Springer, 2008, pp. 1–19.
[29] P. Paillier, “Public-key cryptosystems based on composite degree residu-
osity classes,” in International conference on the theory and applications
of cryptographic techniques. Springer, 1999, pp. 223–238.
[30] X. Liu, R. H. Deng, W. Ding, R. Lu, and B. Qin, “Privacy-preserving
outsourced calculation on floating point numbers,” IEEE Transactions on
Information Forensics and Security, vol. 11, no. 11, pp. 2513–2527, 2016.
[31] X. Liu, K.-K. R. Choo, R. H. Deng, R. Lu, and J. Weng, “Efficient and
privacy-preserving outsourced calculation of rational numbers,IEEE
Transactions on Dependable and Secure Computing, vol. 15, no. 1, pp.
27–39, 2016.
[32] X. Liu, R. H. Deng, K.-K. R. Choo, and J. Weng, “An efficient privacy-
preserving outsourced calculation toolkit with multiple keys,” IEEE
Transactions on Information Forensics and Security, vol. 11, no. 11, pp.
2401–2414, 2016.
[33] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital
signatures and public-key cryptosystems,” Communications of the ACM,
vol. 21, no. 2, pp. 120–126, 1978.
[34] T. ElGamal, “A public key cryptosystem and a signature scheme based on
discrete logarithms,” IEEE Transactions on Information Theory, vol. 31,
no. 4, pp. 469–472, 1985.
[35] C. Gentry, “Fully homomorphic encryption using ideal lattices,” in
Proceedings of the forty-first annual ACM symposium on Theory of
computing, 2009, pp. 169–178.
24 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
[36] C. Gentry, A. Sahai, and B. Waters, “Homomorphic encryption
from learning with errors: Conceptually-simpler, asymptotically-faster,
attribute-based,” Cryptology ePrint Archive, Report 2013/340, 2013.
[37] A. Lopez-Alt, E. Tromer, and V. Vaikuntanathan, “On-the-fly multiparty
computation on the cloud via multikey fully homomorphic encryption,”
Cryptology ePrint Archive, Report 2013/094, 2013.
[38] J. Fan and F. Vercauteren, “Somewhat practical fully homomorphic
encryption,” Cryptology ePrint Archive, Report 2012/144, 2012.
[39] Z. Brakerski, “Fully homomorphic encryption without modulus switch-
ing from classical gapsvp,” Cryptology ePrint Archive, Report 2012/078,
2012.
[40] J. W. Bos, K. Lauter, J. Loftus, and M. Naehrig, “Improved security for
a ring-based fully homomorphic encryption scheme,” Cryptology ePrint
Archive, Report 2013/075, 2013.
[41] X. Liu, R. Deng, K.-K. R. Choo, Y. Yang, and H. Pang, “Privacy-
preserving outsourced calculation toolkit in the cloud,” IEEE Transac-
tions on Dependable and Secure Computing, 2018.
[42] ——, “Privacy-preserving outsourced calculation toolkit in the cloud,
IEEE Transactions on Dependable and Secure Computing, 2018.
[43] A. C.-C. Yao, “How to generate and exchange secrets,” in 27th Annual
Symposium on Foundations of Computer Science (sfcs 1986). IEEE,
1986, pp. 162–167.
[44] O. Goldreich, S. Micali, and A. Wigderson, How to Play Any Mental
Game, or a Completeness Theorem for Protocols with Honest Majority.
New York, NY, USA: Association for Computing Machinery, 2019, p.
307–328. [Online]. Available: https://doi.org/10.1145/3335741.3335755
[45] M. O. Rabin, “How to exchange secrets with oblivious transfer.” IACR
Cryptol. ePrint Arch., vol. 2005, no. 187, 2005.
[46] A. Shamir, “How to share a secret,Communications of the ACM, vol. 22,
no. 11, pp. 612–613, 1979.
[47] T. Wang and F. Kerschbaum, “Attacks on digital watermarks for deep
neural networks,” in ICASSP 2019 - 2019 IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 2622–
2626.
[48] Y. Uchida, Y. Nagai, S. Sakazawa, and S. Satoh, “Embedding watermarks
into deep neural networks,” in Proceedings of the 2017 ACM on Interna-
tional Conference on Multimedia Retrieval, 2017, pp. 269–277.
[49] D. Hitaj and L. V. Mancini, “Have you stolen my model? evasion attacks
against deep neural network watermarking techniques,” 2018.
[50] A. Salem, Y. Zhang, M. Humbert, P. Berrang, M. Fritz, and M. Backes,
“Ml-leaks: Model and data independent membership inference attacks
and defenses on machine learning models,” 2018.
[51] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models under the gan:
information leakage from collaborative deep learning,” in Proceedings of
the 2017 ACM SIGSAC Conference on Computer and Communications
Security, 2017, pp. 603–618.
[52] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,
in Advances in neural information processing systems, 2014, pp. 2672–
2680.
[53] L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, “Exploiting
unintended feature leakage in collaborative learning,” in 2019 IEEE
Symposium on Security and Privacy (SP). IEEE, 2019, pp. 691–706.
[54] J. Hayes, L. Melis, G. Danezis, and E. De Cristofaro, “Logan: Mem-
bership inference attacks against generative models,Proceedings on
Privacy Enhancing Technologies, vol. 2019, no. 1, pp. 133–152, 2019.
[55] M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy anal-
ysis of deep learning: Passive and active white-box inference attacks
against centralized and federated learning,” in 2019 IEEE Symposium on
Security and Privacy (SP). IEEE, 2019, pp. 739–753.
[56] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar,
and L. Zhang, “Deep learning with differential privacy,” in Proceedings
of the 2016 ACM SIGSAC Conference on Computer and Communications
Security, 2016, pp. 308–318.
[57] C. Dwork, G. N. Rothblum, and S. Vadhan, “Boosting and differential
privacy,” in 2010 IEEE 51st Annual Symposium on Foundations of
Computer Science. IEEE, 2010, pp. 51–60.
[58] L. Xie, K. Lin, S. Wang, F. Wang, and J. Zhou, “Differentially private
generative adversarial network,” 2018.
[59] C. Dwork, A. Roth et al., “The algorithmic foundations of differential
privacy,” Foundations and Trends® in Theoretical Computer Science,
vol. 9, no. 3–4, pp. 211–407, 2014.
[60] G. Acs, L. Melis, C. Castelluccia, and E. De Cristofaro, “Differentially
private mixture of generative neural networks,IEEE Transactions on
Knowledge and Data Engineering, vol. 31, no. 6, pp. 1109–1121, 2018.
[61] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521,
no. 7553, pp. 436–444, 2015.
[62] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2013.
[63] B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component anal-
ysis as a kernel eigenvalue problem,Neural computation, vol. 10, no. 5,
pp. 1299–1319, 1998.
[64] N. Phan, Y. Wang, X. Wu, and D. Dou, “Differential privacy preservation
for deep auto-encoders: an application of human behavior prediction,” in
Thirtieth AAAI Conference on Artificial Intelligence, 2016.
[65] N. Phan, X. Wu, and D. Dou, “Preserving differential privacy in convo-
lutional deep belief networks,” Machine learning, vol. 106, no. 9-10, pp.
1681–1704, 2017.
[66] N. Phan, X. Wu, H. Hu, and D. Dou, “Adaptive laplace mechanism:
Differential privacy preservation in deep learning,” in 2017 IEEE Interna-
tional Conference on Data Mining (ICDM). IEEE, 2017, pp. 385–394.
[67] N. Papernot, M. Abadi, Úlfar Erlingsson, I. Goodfellow, and K. Talwar,
“Semi-supervised knowledge transfer for deep learning from private
training data,” 2016.
[68] Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010.
[Online]. Available: http://yann.lecun.com/exdb/mnist/
[69] N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and Úlfar
Erlingsson, “Scalable private learning with pate,” 2018.
[70] A. Triastcyn and B. Faltings, “Generating differentially
private datasets using GANs,” 2018. [Online]. Available:
https://openreview.net/forum?id=rJv4XWZA-
[71] P. Xie, M. Bilenko, T. Finley, R. Gilad-Bachrach, K. Lauter, and
M. Naehrig, “Crypto-nets: Neural networks over encrypted data,” 2014.
[72] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and
J. Wernsing, “Cryptonets: Applying neural networks to encrypted data
with high throughput and accuracy,” in International Conference on
Machine Learning, 2016, pp. 201–210.
[73] J. W. Bos, K. Lauter, J. Loftus, and M. Naehrig, “Improved security for a
ring-based fully homomorphic encryption scheme,” in IMA International
Conference on Cryptography and Coding. Springer, 2013, pp. 45–64.
[74] H. Chabanne, A. de Wargny, J. Milgram, C. Morel, and E. Prouff,
“Privacy-preserving classification on deep neural network.IACR Cryp-
tology ePrint Archive, vol. 2017, p. 35, 2017.
[75] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep net-
work training by reducing internal covariate shift,” 2015.
[76] E. Hesamifard, H. Takabi, and M. Ghasemi, “Cryptodl: Deep neural
networks over encrypted data,” 2017.
[77] F. Bourse, M. Minelli, M. Minihold, and P. Paillier, “Fast homomorphic
evaluation of deep discretized neural networks,” in Annual International
Cryptology Conference. Springer, 2018, pp. 483–512.
[78] A. Sanyal, M. J. Kusner, A. Gascón, and V. Kanade, “Tapas: Tricks to
accelerate (encrypted) prediction as a service,” 2018.
[79] I. Chillotti, N. Gama, M. Georgieva, and M. Izabachene, “Faster fully
homomorphic encryption: Bootstrapping in less than 0.1 seconds,” in
international conference on the theory and application of cryptology and
information security. Springer, 2016, pp. 3–33.
[80] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio,
“Binarized neural networks: Training deep neural networks with weights
and activations constrained to +1 or -1,” 2016.
[81] Y. Aono, T. Hayashi, L. Wang, S. Moriai et al., “Privacy-preserving deep
learning via additively homomorphic encryption,IEEE Transactions on
Information Forensics and Security, vol. 13, no. 5, pp. 1333–1345, 2017.
[82] R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,” in
Proceedings of the 22nd ACM SIGSAC conference on computer and
communications security, 2015, pp. 1310–1321.
[83] P. Mohassel and Y. Zhang, “Secureml: A system for scalable privacy-
preserving machine learning,” in 2017 IEEE Symposium on Security and
Privacy (SP). IEEE, 2017, pp. 19–38.
[84] J. Liu, M. Juuti, Y. Lu, and N. Asokan, “Oblivious neural network
predictions via minionn transformations,” in Proceedings of the 2017
ACM SIGSAC Conference on Computer and Communications Security,
2017, pp. 619–631.
[85] B. D. Rouhani, M. S. Riazi, and F. Koushanfar, “Deepsecure: Scalable
provably-secure deep learning,” in Proceedings of the 55th Annual De-
sign Automation Conference, 2018, pp. 1–6.
VOLUME 4, 2016 25
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
[86] C. Juvekar, V. Vaikuntanathan, and A. Chandrakasan, “{GAZELLE}:
A low latency framework for secure neural network inference,” in 27th
{USENIX}Security Symposium ({USENIX}Security 18), 2018, pp.
1651–1669.
[87] S. Wagh, D. Gupta, and N. Chandran, “Securenn: Efficient and private
neural network training,” IACR Cryptol. ePrint Arch., vol. 2018, p. 442,
2018.
[88] X. Liu, R. H. Deng, P. Wu, and Y. Yang, “Lightning-fast and privacy-
preserving outsourced computation in the cloud,” Cybersecurity, vol. 3,
no. 1, pp. 1–21, 2020.
[89] O. Ohrimenko, F. Schuster, C. Fournet, A. Mehta, S. Nowozin,
K. Vaswani, and M. Costa, “Oblivious multi-party machine learning on
trusted processors,” in 25th {USENIX}Security Symposium ({USENIX}
Security 16), 2016, pp. 619–636.
[90] F. McKeen, I. Alexandrovich, A. Berenzon, C. V. Rozas, H. Shafi,
V. Shanbhogue, and U. R. Savagaonkar, “Innovative instructions and
software model for isolated execution.Hasp@ isca, vol. 10, no. 1, 2013.
[91] T. Hunt, C. Song, R. Shokri, V. Shmatikov, and E. Witchel, “Chiron:
Privacy-preserving machine learning as a service,” 2018.
[92] T. Hunt, Z. Zhu, Y. Xu, S. Peter, and E. Witchel, “Ryoan: A distributed
sandbox for untrusted computation on secret data,” ACM Transactions on
Computer Systems (TOCS), vol. 35, no. 4, pp. 1–32, 2018.
[93] N. Hynes, R. Cheng, and D. Song, “Efficient deep learning on multi-
source private data,” 2018.
[94] T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen,
M. Cowan, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and
A. Krishnamurthy, “TVM: An automated end-to-end optimizing
compiler for deep learning,” in 13th USENIX Symposium on Operating
Systems Design and Implementation (OSDI 18). Carlsbad, CA:
USENIX Association, oct 2018, pp. 578–594. [Online]. Available:
https://www.usenix.org/conference/osdi18/presentation/chen
[95] Z. Gu, H. Huang, J. Zhang, D. Su, A. Lamba, D. Pendarakis, and
I. Molloy, “Securing input data of deep learning inference systems via
partitioned enclave execution,ArXiv, vol. abs/1807.00969, 2018.
[96] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolu-
tional networks,” in European conference on computer vision. Springer,
2014, pp. 818–833.
[97] F. Tramer and D. Boneh, “Slalom: Fast, verifiable and private
execution of neural networks in trusted hardware,” in International
Conference on Learning Representations, 2019. [Online]. Available:
https://openreview.net/forum?id=rJVorjCcKQ
[98] R. Freivalds, “Probabilistic machines can use less running time.” in IFIP
congress, vol. 839, 1977, p. 842.
[99] L. Hanzlik, Y. Zhang, K. Grosse, A. Salem, M. Augustin, M. Backes, and
M. Fritz, “Mlcapsule: Guarded offline deployment of machine learning
as a service,” 2018.
[100] T. Lee, B. Edwards, I. Molloy, and D. Su, “Defending against machine
learning model stealing attacks using deceptive perturbations,” 2018.
[101] M. Juuti, S. Szyller, S. Marchal, and N. Asokan, “Prada: protecting
against dnn model stealing attacks,” in 2019 IEEE European Symposium
on Security and Privacy (EuroS&P). IEEE, 2019, pp. 512–527.
[102] T. Zhang and Q. Zhu, “A dual perturbation approach for differential
private admm-based distributed empirical risk minimization,” in Proceed-
ings of the 2016 ACM Workshop on Artificial Intelligence and Security,
2016, pp. 129–137.
[103] J. Hamm, Y. Cao, and M. Belkin, “Learning privately from multiparty
data,” in International Conference on Machine Learning, 2016, pp. 555–
563.
[104] Y. Long, V. Bindschaedler, and C. A. Gunter, “Towards measuring
membership privacy,” 2017.
[105] N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song, “The secret sharer:
Evaluating and testing unintended memorization in neural networks,” in
28th {USENIX}Security Symposium ({USENIX}Security 19), 2019, pp.
267–284.
[106] M. A. Rahman, T. Rahman, R. Laganière, N. Mohammed, and Y. Wang,
“Membership inference attack against differentially private deep learning
model.” Transactions on Data Privacy, vol. 11, no. 1, pp. 61–79, 2018.
[107] B. Jayaraman and D. Evans, “Evaluating differentially private ma-
chine learning in practice,” in 28th {USENIX}Security Symposium
({USENIX}Security 19), 2019, pp. 1895–1912.
[108] J. Jia, A. Salem, M. Backes, Y. Zhang, and N. Z. Gong, “Memguard:
Defending against black-box membership inference attacks via adversar-
ial examples,” in Proceedings of the 2019 ACM SIGSAC Conference on
Computer and Communications Security, 2019, pp. 259–274.
[109] J. Zhang, Z. Gu, J. Jang, H. Wu, M. P. Stoecklin, H. Huang, and I. Molloy,
“Protecting intellectual property of deep neural networks with water-
marking,” in Proceedings of the 2018 on Asia Conference on Computer
and Communications Security, 2018, pp. 159–172.
[110] Y. Adi, C. Baum, M. Cisse, B. Pinkas, and J. Keshet, “Turning your
weakness into a strength: Watermarking deep neural networks by back-
dooring,” in 27th {USENIX}Security Symposium ({USENIX}Security
18), 2018, pp. 1615–1631.
[111] E. Le Merrer, P. Perez, and G. Trédan, “Adversarial frontier stitching for
remote neural network watermarking,” Neural Computing and Applica-
tions, vol. 32, no. 13, pp. 9233–9244, 2020.
[112] B. D. Rouhani, H. Chen, and F. Koushanfar, “Deepsigns: A generic
watermarking framework for ip protection of deep learning models,
2018.
[113] R. Fletcher, Practical methods of optimization. John Wiley & Sons,
2013.
[114] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning
at scale,” 2016.
[115] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards
deep learning models resistant to adversarial attacks,” 2017.
[116] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and
A. Swami, “The limitations of deep learning in adversarial settings,” in
2016 IEEE European symposium on security and privacy (EuroS&P).
IEEE, 2016, pp. 372–387.
[117] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as
a defense to adversarial perturbations against deep neural networks,” in
2016 IEEE Symposium on Security and Privacy (SP). IEEE, 2016, pp.
582–597.
[118] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural
networks,” in 2017 ieee symposium on security and privacy (sp). IEEE,
2017, pp. 39–57.
[119] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple
and accurate method to fool deep neural networks,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2016, pp.
2574–2582.
[120] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Universal
adversarial perturbations,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2017, pp. 1765–1773.
[121] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple
and accurate method to fool deep neural networks,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2016, pp.
2574–2582.
[122] C. Guo, M. Rana, M. Cisse, and L. Van Der Maaten, “Counter-
ing adversarial images using input transformations,” arXiv preprint
arXiv:1711.00117, 2017.
[123] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille, “Mitigating adversarial
effects through randomization,” arXiv preprint arXiv:1711.01991, 2017.
[124] P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-gan: Protecting
classifiers against adversarial attacks using generative models,” 2018.
[125] A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give
a false sense of security: Circumventing defenses to adversarial
examples,” in Proceedings of the 35th International Conference
on Machine Learning, ICML 2018, Jul. 2018. [Online]. Available:
https://arxiv.org/abs/1802.00420
[126] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesizing robust
adversarial examples,” 2017.
[127] J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for
fooling deep neural networks,” IEEE Transactions on Evolutionary
Computation, vol. 23, no. 5, p. 828–841, Oct 2019. [Online]. Available:
http://dx.doi.org/10.1109/TEVC.2019.2890858
[128] R. Storn and K. Price, “Differential evolution–a simple and efficient
heuristic for global optimization over continuous spaces,Journal of
global optimization, vol. 11, no. 4, pp. 341–359, 1997.
[129] J. Brest, S. Greiner, B. Boskovic, M. Mernik, and V. Zumer, “Self-
adapting control parameters in differential evolution: A comparative
study on numerical benchmark problems,” IEEE transactions on evolu-
tionary computation, vol. 10, no. 6, pp. 646–657, 2006.
[130] S. Das and P. N. Suganthan, “Differential evolution: A survey of the
state-of-the-art,” IEEE transactions on evolutionary computation, vol. 15,
no. 1, pp. 4–31, 2010.
[131] C. Guo, M. Rana, M. Cisse, and L. van der Maaten, “Countering adver-
sarial images using input transformations,” 2017.
26 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
[132] D. D. Thang and T. Matsui, “Image transformation can make neural
networks more robust against adversarial examples,arXiv preprint
arXiv:1901.03037, 2019.
[133] Y. Luo, X. Boix, G. Roig, T. A. Poggio, and Q. Zhao, “Foveation-based
mechanisms alleviate adversarial examples,CoRR, vol. abs/1511.06292,
2015. [Online]. Available: http://arxiv.org/abs/1511.06292
[134] J. Lu, H. Sibai, E. Fabry, and D. Forsyth, “No need to worry about
adversarial examples in object detection in autonomous vehicles,arXiv
preprint arXiv:1707.03501, 2017.
[135] K. McLaren, “Xiii—the development of the cie 1976 (l* a* b*) uniform
colour space and colour-difference formula,Journal of the Society of
Dyers and Colourists, vol. 92, no. 9, pp. 338–341, 1976.
[136] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh, “Zoo: Zeroth
order optimization based black-box attacks to deep neural networks
without training substitute models,” in Proceedings of the 10th ACM
Workshop on Artificial Intelligence and Security, 2017, pp. 15–26.
[137] P. D. Lax and M. S. Terrell, Calculus with applications. Springer, 2014.
[138] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,
2014.
[139] C.-C. Tu, P. Ting, P.-Y. Chen, S. Liu, H. Zhang, J. Yi, C.-J. Hsieh, and
S.-M. Cheng, “Autozoom: Autoencoder-based zeroth order optimization
method for attacking black-box neural networks,” in Proceedings of the
AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 742–749.
[140] W. Brendel, J. Rauber, and M. Bethge, “Decision-based adversarial
attacks: Reliable attacks against black-box machine learning models,”
2017.
[141] T. Brunner, F. Diehl, M. T. Le, and A. Knoll, “Guessing smart: Biased
sampling for efficient black-box adversarial attacks,” in Proceedings of
the IEEE International Conference on Computer Vision, 2019, pp. 4958–
4966.
[142] K. Perlin, “An image synthesizer,” ACM Siggraph Computer Graphics,
vol. 19, no. 3, pp. 287–296, 1985.
[143] J. Chen, M. I. Jordan, and M. J. Wainwright, “Hopskipjumpattack:
A query-efficient decision-based attack,” in 2020 IEEE Symposium
on Security and Privacy (SP). Los Alamitos, CA, USA: IEEE
Computer Society, may 2020, pp. 1277–1294. [Online]. Available:
https://doi.ieeecomputersociety.org/10.1109/SP40000.2020.00045
[144] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and
A. Swami, “The limitations of deep learning in adversarial settings,” in
2016 IEEE European symposium on security and privacy (EuroS&P).
IEEE, 2016, pp. 372–387.
[145] Y. Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversar-
ial examples and black-box attacks,” arXiv preprint arXiv:1611.02770,
2016.
[146] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
the inception architecture for computer vision,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2016, pp.
2818–2826.
[147] J. Lu, H. Sibai, E. Fabry, and D. Forsyth, “Standard detectors aren’t
(currently) fooled by physical adversarial stop signs,” arXiv preprint
arXiv:1710.03337, 2017.
[148] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in
Proceedings of the IEEE conference on computer vision and pattern
recognition, 2017, pp. 7263–7271.
[149] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time
object detection with region proposal networks,” in Advances in neural
information processing systems, 2015, pp. 91–99.
[150] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, D. Song, T. Kohno, A. Rah-
mati, A. Prakash, and F. Tramer, “Note on attacking object detectors with
adversarial stickers,” arXiv preprint arXiv:1712.08062, 2017.
[151] S.-T. Chen, C. Cornelius, J. Martin, and D. H. P. Chau, “Shapeshifter: Ro-
bust physical adversarial attack on faster r-cnn object detector,” in Joint
European Conference on Machine Learning and Knowledge Discovery
in Databases. Springer, 2018, pp. 52–68.
[152] T. B. Brown, D. Mané, A. Roy, M. Abadi, and J. Gilmer,
“Adversarial patch,CoRR, vol. abs/1712.09665, 2017. [Online].
Available: http://arxiv.org/abs/1712.09665
[153] S. Thys, W. Van Ranst, and T. Goedemé, “Fooling automated surveillance
cameras: adversarial patches to attack person detection,” in Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops, 2019, pp. 0–0.
[154] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and
A. Swami, “Practical black-box attacks against machine learning,” in
Proceedings of the 2017 ACM on Asia conference on computer and
communications security, 2017, pp. 506–519.
[155] X. Li, S. Ji, M. Han, J. Ji, Z. Ren, Y. Liu, and C. Wu, “Adversarial
examples versus cloud-based detectors: A black-box empirical study,
IEEE Transactions on Dependable and Secure Computing, 2019.
[156] J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for fooling
deep neural networks,” IEEE Transactions on Evolutionary Computation,
vol. 23, no. 5, pp. 828–841, 2019.
[157] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, “Accessorize to
a crime: Real and stealthy attacks on state-of-the-art face recognition,”
in Proceedings of the 2016 acm sigsac conference on computer and
communications security, 2016, pp. 1528–1540.
[158] Z. Zhou, D. Tang, X. Wang, W. Han, X. Liu, and K. Zhang, “Invisible
mask: Practical attacks on face recognition with infrared,” arXiv preprint
arXiv:1803.04683, 2018.
[159] Q. Wang, W. Guo, K. Zhang, A. G. Ororbia II, X. Xing, X. Liu, and C. L.
Giles, “Adversary resistant deep neural networks with an application to
malware detection,” in Proceedings of the 23rd ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining. ACM,
2017, pp. 1145–1153.
[160] A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer, “Deflecting
adversarial attacks with pixel deflection,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2018, pp. 8571–
8580.
[161] J. Ho and D.-K. Kang, “Pixel redrawn for a robust adversarial defense,
2018.
[162] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset
for benchmarking machine learning algorithms,” 2017.
[163] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features
from tiny images,” 2009.
[164] G. K. Dziugaite, Z. Ghahramani, and D. M. Roy, “A study of the effect
of jpg compression on adversarial images,” 2016.
[165] N. Das, M. Shanbhogue, S.-T. Chen, F. Hohman, L. Chen, M. E.
Kounavis, and D. H. Chau, “Keeping the bad guys out: Protecting and
vaccinating deep learning with jpeg compression,” 2017.
[166] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based
noise removal algorithms,Phys. D, vol. 60, no. 1–4, p. 259–268, nov
1992. [Online]. Available: https://doi.org/10.1016/0167-2789(92)90242-
F
[167] A. A. Efros and W. T. Freeman, “Image quilting for texture synthesis
and transfer,” in Proceedings of the 28th annual conference on Computer
graphics and interactive techniques, 2001, pp. 341–346.
[168] W. Xu, D. Evans, and Y. Qi, “Feature squeezing mitigates and detects
carlini/wagner adversarial examples,” 2017.
[169] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image
denoising,” in 2005 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR’05), vol. 2. IEEE, 2005, pp.
60–65.
[170] E. Raff, J. Sylvester, S. Forsyth, and M. McLean, “Barrage of random
transforms for adversarially robust defense,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2019, pp.
6528–6537.
[171] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image coding
using wavelet transform,IEEE Transactions on image processing, vol. 1,
no. 2, pp. 205–220, 1992.
[172] T. Huang, G. Yang, and G. Tang, “A fast two-dimensional median
filtering algorithm,” IEEE Transactions on Acoustics, Speech, and Signal
Processing, vol. 27, no. 1, pp. 13–18, 1979.
[173] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:
A large-scale hierarchical image database,” in 2009 IEEE conference on
computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
[174] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting
and composing robust features with denoising autoencoders,” in Proceed-
ings of the 25th international conference on Machine learning, 2008, pp.
1096–1103.
[175] F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu, “Defense against
adversarial attacks using high-level representation guided denoiser,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2018, pp. 1778–1787.
[176] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
for biomedical image segmentation,” in International Conference on
Medical image computing and computer-assisted intervention. Springer,
2015, pp. 234–241.
VOLUME 4, 2016 27
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
[177] A. Athalye and N. Carlini, “On the robustness of the cvpr 2018 white-box
adversarial example defenses,” 2018.
[178] V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Ar-
jovsky, and A. Courville, “Adversarially learned inference,” 2016.
[179] J. Donahue, P. Krähenbühl, and T. Darrell, “Adversarial feature learning,”
2016.
[180] R. Bao, S. Liang, and Q. Wang, “Featurized bidirectional gan: Adversar-
ial defense via adversarially learned semantic inference,” 2018.
[181] A. Mustafa, S. H. Khan, M. Hayat, J. Shen, and L. Shao, “Image super-
resolution as a defense against adversarial attacks,” IEEE Transactions
on Image Processing, vol. 29, pp. 1711–1724, 2019.
[182] S. G. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding
for image denoising and compression,” IEEE transactions on image
processing, vol. 9, no. 9, pp. 1532–1546, 2000.
[183] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual
networks for single image super-resolution,” in Proceedings of the IEEE
conference on computer vision and pattern recognition workshops, 2017,
pp. 136–144.
[184] T.-J. Chang, Y. He, and P. Li, “Efficient two-step adversarial defense for
deep neural networks,” 2018.
[185] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvári, “Learning
with a strong adversary,CoRR, vol. abs/1511.03034, 2015. [Online].
Available: http://arxiv.org/abs/1511.03034
[186] U. Shaham, Y. Yamada, and S. Negahban, “Understanding adversarial
training: Increasing local stability of supervised models through robust
optimization,” Neurocomputing, vol. 307, pp. 195–204, 2018.
[187] H. Kannan, A. Kurakin, and I. Goodfellow, “Adversarial logit pairing,
2018.
[188] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and
P. McDaniel, “Ensemble adversarial training: Attacks and defenses,”
2017.
[189] C. Liu and J. JaJa, “Feature prioritization and regularization improve
standard accuracy and adversarial robustness,” 2018.
[190] C. Xie, Y. Wu, L. v. d. Maaten, A. L. Yuille, and K. He, “Feature
denoising for improving adversarial robustness,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 2019,
pp. 501–509.
[191] F. Heide, W. Heidrich, and G. Wetzstein, “Fast and flexible convolutional
sparse coding,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 2015, pp. 5135–5143.
[192] B. Choudhury, R. Swanson, F. Heide, G. Wetzstein, and W. Heidrich,
“Consensus convolutional sparse coding,” in Proceedings of the IEEE
International Conference on Computer Vision, 2017, pp. 4280–4288.
[193] B. Sun, N.-h. Tsai, F. Liu, R. Yu, and H. Su, “Adversarial defense
by stratified convolutional sparse coding,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2019, pp.
11 447–11 456.
[194] H. Hosseini, Y. Chen, S. Kannan, B. Zhang, and R. Poovendran, “Block-
ing transferability of adversarial examples in black-box learning sys-
tems,” 2017.
[195] S. Chen, N. Carlini, and D. Wagner, “Stateful detection of black-box
adversarial attacks,” 2019.
[196] A. Ilyas, L. Engstrom, A. Athalye, and J. Lin, “Black-box adversarial
attacks with limited queries and information,” 2018.
[197] S. Tian, G. Yang, and Y. Cai, “Detecting adversarial examples through
image transformation,” in Thirty-Second AAAI Conference on Artificial
Intelligence, 2018.
[198] B. Liang, H. Li, M. Su, X. Li, W. Shi, and X. Wang,
“Detecting adversarial image examples in deep neural networks
with adaptive noise reduction,IEEE Transactions on Dependable
and Secure Computing, p. 1–1, 2019. [Online]. Available:
http://dx.doi.org/10.1109/TDSC.2018.2874243
[199] A. Rozsa, M. Gunther, and T. E. Boult, “Towards robust deep neural
networks with bang,” 2016.
[200] Z. Zheng and P. Hong, “Robust detection of adversarial attacks by
modeling the intrinsic properties of deep neural networks,” in Advances
in Neural Information Processing Systems, 2018, pp. 7913–7922.
[201] L. Muñoz-González, B. Biggio, A. Demontis, A. Paudice, V. Wongras-
samee, E. C. Lupu, and F. Roli, “Towards poisoning of deep learning
algorithms with back-gradient optimization,” in Proceedings of the 10th
ACM Workshop on Artificial Intelligence and Security, 2017, pp. 27–38.
[202] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,
in Advances in neural information processing systems, 2014, pp. 2672–
2680.
[203] C. Yang, Q. Wu, H. Li, and Y. Chen, “Generative poisoning attack method
against neural networks,” 2017.
[204] P. W. Koh and P. Liang, “Understanding black-box predictions via
influence functions,” in Proceedings of the 34th International Conference
on Machine Learning-Volume 70. JMLR. org, 2017, pp. 1885–1894.
[205] R. D. Cook and S. Weisberg, “Characterizations of an empirical influence
function for detecting influential cases in regression,” Technometrics,
vol. 22, no. 4, pp. 495–508, 1980.
[206] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras,
and T. Goldstein, “Poison frogs! targeted clean-label poisoning attacks on
neural networks,” in Advances in NeuralInformation Processing Systems,
2018, pp. 6103–6113.
[207] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks
on deep learning systems using data poisoning,” 2017.
[208] S. Li, M. Xue, B. Z. H. Zhao, H. Zhu, and X. Zhang, “Invisible backdoor
attacks on deep neural networks via steganography and regularization,
2019.
[209] J. Steinhardt, P. W. W. Koh, and P. S. Liang, “Certified defenses for
data poisoning attacks,” in Advances in neural information processing
systems, 2017, pp. 3517–3529.
[210] B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support
vector machines,” 2012.
[211] S. Mei and X. Zhu, “Using machine teaching to identify optimal training-
set attacks on machine learners,” in Twenty-Ninth AAAI Conference on
Artificial Intelligence, 2015.
[212] H. Xiao, B. Biggio, G. Brown, G. Fumera, C. Eckert, and F. Roli, “Is
feature selection secure against training data poisoning?” in International
Conference on Machine Learning, 2015, pp. 1689–1698.
[213] A. Paudice, L. Muñoz-González, A. Gyorgy, and E. C. Lupu, “Detection
of adversarial training examples in poisoning attacks through anomaly
detection,” 2018.
[214] K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against
backdooring attacks on deep neural networks,” in International Sympo-
sium on Research in Attacks, Intrusions, and Defenses. Springer, 2018,
pp. 273–294.
[215] B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee,
I. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural
networks by activation clustering,” 2018.
[216] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao,
A. Prakash, T. Kohno, and D. Song, “Robust physical-world attacks on
deep learning models,” 2017.
[217] X. Wei, S. Liang, N. Chen, and X. Cao, “Transferable adversarial attacks
for image and video object detection,” 2018.
[218] J. Hendrik Metzen, M. Chaithanya Kumar, T. Brox, and V. Fischer, “Uni-
versal adversarial perturbations against semantic image segmentation,” in
Proceedings of the IEEE International Conference on Computer Vision,
2017, pp. 2755–2764.
[219] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille, “Adversarial
examples for semantic segmentation and object detection,” in Proceed-
ings of the IEEE International Conference on Computer Vision, 2017,
pp. 1369–1378.
[220] E. Quiring, A. Maier, and K. Rieck, “Misleading authorship attribution
of source code using adversarial learning,” in 28th {USENIX}Security
Symposium ({USENIX}Security 19), 2019, pp. 479–496.
[221] M. Cheng, T. Le, P.-Y. Chen, J. Yi, H. Zhang, and C.-J. Hsieh, “Query-
efficient hard-label black-box attack:an optimization-based approach,”
2018.
[222] F. Tramer, N. Carlini, W. Brendel, and A. Madry, “On adaptive attacks to
adversarial example defenses,arXiv preprint arXiv:2002.08347, 2020.
[223] A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry,
“Adversarial examples are not bugs, they are features,” in Advances in
Neural Information Processing Systems, 2019, pp. 125–136.
28 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3045078, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
XIMENG LIU (S’13-M’16) received the B.Sc.
degree in electronic engineering from Xidian Uni-
versity, Xi’an, China, in 2010 and Ph.D. degree
in Cryptography from Xidian University, China,
in 2015. Now, he is a full professor at College
of Mathematics and Computer Science, Fuzhou
University, China. Also, he is a research fellow
at School of Information System, Singapore Man-
agement University, Singapore. He has published
over 100 research articles include IEEE TIFS,
IEEE TDSC, IEEE TC, IEEE TII, IEEE TSC, and IEEE TCC. His research
interests include cloud security, applied cryptography and big data security.
LEHUI XIE received the B.Sc. degree from
the College of Computer and Information Sci-
ence, Fujian Agriculture and Forestry University,
Fuzhou, China, in 2019. He is currently pursuing
the master’s degree at Fuzhou University. His cur-
rent research interest includes privacy and security
in machine learning.
YAOPENG WANG received the B.Sc. degree
from the College of Mathematics and Computer
Science, Yantai University, Yantai, China, in 2018.
He is currently pursuing his master’s degree at
Fuzhou University. His current research interest
includes privacy and security in machine learning.
JIAN ZOU received the B.S. degrees in math and
applied math from Central China Normal Univer-
sity, in 2009, and the Ph.D. degree in informa-
tion security from Institute of Software, Chinese
Academy of Sciences, Beijing, China, in 2015. He
is currently a lecturer at Fuzhou University. His
main research interests include block cipher, hash
function, and post-quantum cryptography.
JINBO XIONG received the M.S. degree in com-
munication and information systems from the
Chongqing University of Posts and Telecommu-
nications, China, in 2006 and the Ph.D. degree
in computer system architecture from Xidian Uni-
versity, China, in 2013. He is currently a Visiting
Scholar with University of North Texas, USA, and
an Associate Professor with the Fujian Provincial
Key Laboratory of Network Security and Cryp-
tology and the College of Mathematics and Infor-
matics, Fujian Normal University. His research interests include cloud data
security, privacy protection, and mobile Internet security. He has published
more than 40 publications and one monograph and holds eight patents in
these fields. He is a member of IEEE and ACM.
ZUOBIN YING was born in Anhui Province,
China, in 1982. Zuobin Ying received the ph.D.
degree in Cryptography from Xidian University,
Xi’an, China, in 2016. Now, he is a lecturer at
School of Computer Science and Technology, An-
hui University, China. His research interests in-
clude cloud security and applied cryptography.
ATHANASIOS V. VASILAKOS is with the Uni-
versity Technology Sydney, Australia, with the
Fuzhou University, Fuzhou, China, and with Lulea
University of Technology, Lulea, 97187, Sweden.
He served or is serving as an Editor for many
technical journals, such as the IEEE TNSM; IEEE
TCC, IEEE TIFS, IEEE TC; IEEE TNB; IEEE
TITB; ACM TAAS; the IEEE JOURNAL ON
SELECTED AREAS IN COMMUNICATIONS.
VOLUME 4, 2016 29
... In this context, many studies have found that the predominant defense strategies implemented during testing prioritize the protection of the integrity of testing nodes. Rather than directly filtering or securing the data, these strategies aim to safeguard the framework within which testing occurs [6,14]. In this context, an advanced method for range ambiguity suppression in spaceborne SAR systems utilizing blind source separation has been proposed, enhancing robust signal processing techniques for DP detection [15]. ...
... Recent developments have also introduced Trusted Execution Environments (TEEs), which serve as secure execution environments and provide substantial protection for both the code and data, thereby strengthening overall cybersecurity measures. These TEEs ensure that the information contained within them remains confidential and maintain its integrity against potential threats [14]. While TEEs have been extensively studied within specific artificial intelligence (AI) systems, the methodologies for developing these secure environments are applicable beyond just AI scenarios [6]. ...
Article
Full-text available
This paper deals with a new secured execution environment which adapts blockchain technology to defend artificial intelligence (AI) models against data poisoning (DP) attacks. The Blockchain Governance Game (BGG) is a theoretical framework for analyzing the network to provide the decision-making moment for taking preliminary cybersecurity actions before DP attacks. This innovative method for conventional decentralized network securities is adapted into a DP defense for AI models in this paper. The core components in the DP defense network, including the Predictor and the BGG engine, are fully implemented. This research concerns the first blockchain-based DP defense mechanism which establishes an innovative framework for DP defense based on the BGG. The simulation in the paper demonstrates realistic DP attack situations targeting AI models. This new controller is newly designed to provide sufficient cybersecurity performance measures even with minimal data collection and limited computing power. Additionally, this research will be helpful for those considering using blockchain to implement a DP defense mechanism.
... Industry standards and regulatory frameworks increasingly emphasize the importance of building resilience against sophisticated attack vectors [9], [10], particularly as attacks evolve from simple data theft to more destructive operational disruptions. This resilience must involve the detection and mitigation of adversarial attacks [11] while maintaining optimal grid performance under varying operational conditions [12]. In the context of smart grid operations, the challenge is particularly acute in areas such as false data injection attacks, where adversaries can manipulate various grid measurements, including voltage, phase measurements from PMUs, and consumption data from smart meters. ...
Article
Full-text available
The seamless and resilient operation of power grids is crucial for ensuring a reliable electricity supply. However, maintaining high operational stability is increasingly challenging due to evolving grid complexities and potential adversarial threats. This paper proposes a novel composite enhanced proximal policy optimization (CePPO) algorithm to improve power grid operation under adversarial conditions. Specifically, our approach introduces three key innovations: 1) A multi-armed bandit (MAB) mechanism for dynamic epsilon-clipping that adaptively adjusts exploration-exploitation trade-offs; 2) A meta-controller framework that automatically tunes hyperparameters including the activation learning rate (ALR) penalties and exploration factors; and 3) An integrated gradient-based optimization approach that combines policy gradients with environmental feedback. The effectiveness of the proposed model on the IEEE 14-bus system demonstrates that the CePPO achieves approximately 50% higher average rewards and 51% longer stability periods compared to standard PPO while reducing computational overhead by 35%. CePPO demonstrates superior performance under adversarial attacks compared to baseline approaches. The simulation results validate that CePPO’s adaptive parameter tuning and enhanced exploration strategies make it particularly well-suited for the dynamic nature of power grid control. To foster further research and reproducibility, the code is available upon request at1.
... Academic interest in AI ethics has surged in recent years, leading to numerous studies that discuss ethical principles, fairness, bias, security, and other critical concerns (HuYupeng et al. 2021;John-Mathews et al. 2022;Nilsson 2014;Ntoutsi et al., n.d.). However, existing literature reviews are usually constrained to particular subtopics, adopting qualitative methods, providing fragmented insights rather than an overarching perspective on the discipline as a whole (Hagendorff 2020;Jobin et al. 2019;Liu et al. 2021;Mehrabi et al. 2022;Ntoutsi et al., n.d.;Zhang et al. 2021). This piecemeal understanding makes it difficult to map out broad trends, to identify emergent themes, or to assess the collaboration networks within AI ethics research. ...
Article
Full-text available
Using bibliometric methods, this study systematically analyzes 6,084 AI ethics-related articles from the Web of Science Core Collection (2015–2025), capturing both recent advances and near-future directions in the field. It begins by examining publication trends, disciplinary categories, leading journals, and major contributing institutions/countries. Subsequently, co-citation (journals, authors, references) and keyword clustering methods reveal the foundational knowledge structure and highlight emerging research hotspots. The findings indicate increasing interdisciplinary convergence and international collaboration in AI ethics, with core themes focusing on algorithmic fairness, privacy and data security, ethical governance in autonomous vehicles, medical AI applications, educational technology, and challenges posed by generative AI (e.g., large language models). Burst keyword detection further shows an evolutionary shift from theoretical debates toward practical implementation strategies and regulatory framework development. Although numerous global initiatives have been introduced to guide AI ethics, broad consensus remains elusive, underscoring the need for enhanced cross-disciplinary and international cooperation. This research provides valuable insights for scholars, policymakers, and industry practitioners, laying a foundation for sustainable and responsible AI development.
... Consider a scenario in which an LLM trained on email data inadvertently reveals fragments of private conversations. Such data leaks are not only unethical but may also be illegal under data protection regulations [16,17]. ...
Preprint
Full-text available
Large Language Models (LLMs) have become a cornerstone of modern artificial intelligence (AI), finding applications across various domains such as healthcare, finance, entertainment, and customer service. To understand their ethical and social implications, it is essential to first grasp what these models are, how they function, and why they carry significant impact. This introduction aims to provide a comprehensive and beginner-friendly overview of LLMs, introducing their basic structure, training process, and the types of tasks they are commonly employed for. We will also include simple analogies and examples to ease understanding.
... Secure. In contrast, secure AI deals with safeguarding against malicious attacks, unauthorized access, and ensuring data privacy and integrity, in particular making AI invulnerable to sophisticated hacking techniques and privacy attacks [14][15][16][17][18][19]. ...
Preprint
Full-text available
A central question in machine learning is how reliable the predictions of a trained model are. Reliability includes the identification of instances for which a model is likely not to be trusted based on an analysis of the learning system itself. Such unreliability for an input may arise from the model family providing a variety of hypotheses consistent with the training data, which can vastly disagree in their predictions on that particular input point. This is called the underdetermination problem, and it is important to develop methods to detect it. With the emergence of quantum machine learning (QML) as a prospective alternative to classical methods for certain learning problems, the question arises to what extent they are subject to underdetermination and whether similar techniques as those developed for classical models can be employed for its detection. In this work, we first provide an overview of concepts from Safe AI and reliability, which in particular received little attention in QML. We then explore the use of a method based on local second-order information for the detection of underdetermination in parameterized quantum circuits through numerical experiments. We further demonstrate that the approach is robust to certain levels of shot noise. Our work contributes to the body of literature on Safe Quantum AI, which is an emerging field of growing importance.
... Randomised encryption introduces randomness into the encryption process, ensuring that the same plaintext generates different ciphertexts each time it is encrypted, thereby preventing pattern recognition. Additionally, entropy injection regularly adds unpredictability to the key generation and encryption processes, strengthening the overall security and resilience of the HE system against potential cryptanalytic attacks [124]. ...
Article
Full-text available
Healthcare data has risen as a top target for cyberattacks due to the rich amount of sensitive patient information. This negatively affects the potential of advanced analytics and collaborative research in healthcare. Homomorphic encryption (HE) has emerged as a promising technology for securing sensitive healthcare data while enabling computations on encrypted information. This paper conducts a background survey of HE and its various types. It discusses Partially Homomorphic Encryption (PHE), Somewhat Homomorphic Encryption (SHE), Fully Homomorphic Encryption (FHE) and Fully Leveled Homomorphic Encryption (FLHE). A critical analysis of these encryption paradigms’ theoretical foundations, implementation schemes, and practical applications in healthcare contexts is presented. The survey encompasses diverse healthcare domains. It demonstrates HE’s versatility in securing electronic health records (EHRs), enabling privacy-preserving genomic data analysis, protecting medical imaging, facilitating privacy-preserving machine learning (ML), supporting secure federated learning, ensuring confidentiality in clinical trials, and enhancing remote monitoring and telehealth services. A comprehensive examination of potential vulnerabilities in HE systems is conducted. The research systematically investigates various attack vectors, including side-channel attacks, key recovery attacks, chosen plaintext attacks (CPA), chosen ciphertext attacks (CCA), known plaintext attacks (KPA), fault injection attacks (FIA), and lattice attacks. A detailed analysis of potential defense mechanisms and mitigation strategies is provided for each identified threat. The analysis underscores the importance of HE for long-term security and sustainability in healthcare systems.
Article
Full-text available
In this paper, we propose a framework for lightning-fast privacy-preserving outsourced computation framework in the cloud, which we refer to as LightCom. Using LightCom, a user can securely achieve the outsource data storage and fast, secure data processing in a single cloud server different from the existing multi-server outsourced computation model. Specifically, we first present a general secure computation framework for LightCom under the cloud server equipped with multiple Trusted Processing Units (TPUs), which face the side-channel attack. Under the LightCom, we design two specified fast processing toolkits, which allow the user to achieve the commonly-used secure integer computation and secure floating-point computation against the side-channel information leakage of TPUs, respectively. Furthermore, our LightCom can also guarantee access pattern protection during the data processing and achieve private user information retrieve after the computation. We prove that the proposed LightCom can successfully achieve the goal of single cloud outsourced data processing to avoid the extra computation server and trusted computation server, and demonstrate the utility and the efficiency of LightCom using simulations.
Article
We investigate a problem at the intersection of machine learning and security: training-set attacks on machine learners. In such attacks an attacker contaminates the training data so that a specific learning algorithm would produce a model profitable to the attacker. Understanding training-set attacks is important as more intelligent agents (e.g. spam filters and robots) are equipped with learning capability and can potentially be hacked via data they receive from the environment. This paper identifies the optimal training-set attack on a broad family of machine learners. First we show that optimal training-set attack can be formulated as a bilevel optimization problem. Then we show that for machine learners with certain Karush-Kuhn-Tucker conditions we can solve the bilevel problem efficiently using gradient methods on an implicit function. As examples, we demonstrate optimal training-set attacks on Support VectorMachines, logistic regression, and linear regression with extensive experiments. Finally, we discuss potential defenses against such attacks.
Article
Deep Neural Networks (DNNs) have demonstrated remarkable performance in a diverse range of applications. Along with the prevalence of deep learning, it has been revealed that DNNs are vulnerable to attacks. By deliberately crafting adversarial examples, an adversary can manipulate a DNN to generate incorrect outputs, which may lead catastrophic consequences in applications such as disease diagnosis and self-driving cars. In this paper, we propose an effective method to detect adversarial examples in image classification. Our key insight is that adversarial examples are usually sensitive to certain image transformation operations such as rotation and shifting. In contrast, a normal image is generally immune to such operations. We implement this idea of image transformation and evaluate its performance in oblivious attacks. Our experiments with two datasets show that our technique can detect nearly 99% of adversarial examples generated by the state-of-the-art algorithm. In addition to oblivious attacks, we consider the case of white-box attacks. We propose to introduce randomness in the process of image transformation, which can achieve a detection ratio of around 70%.
Article
2019 Neural information processing systems foundation. All rights reserved. Adversarial examples have attracted significant attention in machine learning, but the reasons for their existence and pervasiveness remain unclear. We demonstrate that adversarial examples can be directly attributed to the presence of non-robust features: features (derived from patterns in the data distribution) that are highly predictive, yet brittle and (thus) incomprehensible to humans. After capturing these features within a theoretical framework, we establish their widespread existence in standard datasets. Finally, we present a simple setting where we can rigorously tie the phenomena we observe in practice to a misalignment between the (human-specified) notion of robustness and the inherent geometry of the data.
Article
Deep neural networks (DNNs) have been proven vulnerable to backdoor attacks, where hidden features (patterns) trained to a normal model, which is only activated by some specific input (called triggers), trick the model into producing unexpected behavior. In this article, we create covert and scattered triggers for backdoor attacks, invisible backdoors , where triggers can fool both DNN models and human inspection. We apply our invisible backdoors through two state-of-the-art methods of embedding triggers for backdoor attacks. The first approach on Badnets embeds the trigger into DNNs through steganography. The second approach of a trojan attack uses two types of additional regularization terms to generate the triggers with irregular shape and size. We use the Attack Success Rate and Functionality to measure the performance of our attacks. We introduce two novel definitions of invisibility for human perception; one is conceptualized by the Perceptual Adversarial Similarity Score (PASS) and the other is Learned Perceptual Image Patch Similarity (LPIPS). We show that the proposed invisible backdoors can be fairly effective across various DNN models as well as four datasets MNIST, CIFAR-10, CIFAR-100, and GTSRB, by measuring their attack success rates for the adversary, functionality for the normal users, and invisibility scores for the administrators. We finally argue that the proposed invisible backdoor attacks can effectively thwart the state-of-the-art trojan backdoor detection approaches.
Article
Data sharing among connected and autonomous vehicles without any protection will cause private information leakage. Simply encrypting data introduces a heavy overhead; most importantly, when encrypted data (ciphertext) is decrypted on a vehicle, the receiver will be fully aware of the sender's data, implying potential data leakage. To tackle these issues, we propose an edge-assisted privacy-preserving raw data sharing framework. First, we leverage the additive secret sharing technique to encrypt raw data into two ciphertexts and construct two classes of secure functions. The functions are then used to implement a privacy-preserving convolutional neural network (P-CNN). Finally, two edge servers are deployed to cooperatively execute P-CNN to extract features from two ciphertexts to obtain the same object detection results as the original CNN. We adopt the VGG16 model as a case study to illustrate how to construct P-CNN and employ the KITTI dataset to verify our solution. Experiment results demonstrate that P-CNN offers exactly the same classification results as the VGG16 model with negligible error, and the communication overhead and computational cost on the edge servers are less than existing solutions without leaking private information.