The security of machine learning.
-
Citations (0)
-
Cited In (0)
Page 1
eScholarship provides open access, scholarly publishing
services to the University of California and delivers a dynamic
research platform to scholars worldwide.
University of California
Peer Reviewed
Title:
The security of machine learning
Author:
Barreno, Marco; Nelson, Blaine; Joseph, Anthony D.; Tygar, J. D.
Publication Date:
2010
Publication Info:
Postprints, Multi-Campus
Permalink:
http://escholarship.org/uc/item/2p33k2mj
DOI:
10.1007/s10994-010-5188-5
Abstract:
Machine learning’s ability to rapidly evolve to changing and complex situations has helped it
become a fundamental tool for computer security. That adaptability is also a vulnerability: attackers
can exploit machine learning systems. We present a taxonomy identifying and analyzing attacks
against machine learning systems. We show how these classes influence the costs for the attacker
and defender, and we give a formal structure defining their interaction. We use our framework to
survey and analyze the literature of attacks against machine learning systems. We also illustrate
our taxonomy by showing how it can guide attacks against SpamBayes, a popular statistical spam
filter. Finally, we discuss how our taxonomy suggests new lines of defenses.
Page 2
Mach Learn (2010) 81: 121–148
DOI 10.1007/s10994-010-5188-5
The security of machine learning
Marco Barreno ·Blaine Nelson ·Anthony D. Joseph ·
J.D. Tygar
Received: 8 April 2008 / Revised: 15 April 2010 / Accepted: 15 April 2010 / Published online: 20 May 2010
© The Author(s) 2010. This article is published with open access at Springerlink.com
Abstract Machine learning’s ability to rapidly evolve to changing and complex situations
has helped it become a fundamental tool for computer security. That adaptability is also
a vulnerability: attackers can exploit machine learning systems. We present a taxonomy
identifying and analyzing attacks against machine learning systems. We show how these
classes influence the costs for the attacker and defender, and we give a formal structure
defining their interaction. We use our framework to survey and analyze the literature of
attacks against machine learning systems. We also illustrate our taxonomy by showing how
it can guide attacks against SpamBayes, a popular statistical spam filter. Finally, we discuss
how our taxonomy suggests new lines of defenses.
Keywords Security · Adversarial learning · Adversarial environments
1 Introduction
If we hope to use machine learning as a general tool for computer applications, it is incum-
bent on us to investigate how well machine learning performs under adversarial conditions.
When a learning algorithm performs well in adversarial conditions, we say it is an algo-
rithm for secure learning. This raises the natural question: how do we evaluate the quality
of a learning system and determine whether it satisfies requirements for secure learning?
Editors: Pavel Laskov and Richard Lippmann.
M. Barreno (?) · B. Nelson · A.D. Joseph · J.D. Tygar
Computer Science Division, University of California, Berkeley, CA 94720-1776, USA
e-mail: barreno@cs.berkeley.edu
B. Nelson
e-mail: nelsonb@cs.berkeley.edu
A.D. Joseph
e-mail: adj@cs.berkeley.edu
J.D. Tygar
e-mail: tygar@cs.berkeley.edu
Page 3
122Mach Learn (2010) 81: 121–148
Machine learning advocates have proposed learning-based systems for a variety of secu-
rity applications, including spam detection and network intrusion detection. Their vision is
that machine learning will allow a system to respond to evolving real-world inputs, both hos-
tile and benign, and learn to reject undesirable behavior. The danger is that an attacker will
attempt to exploit the adaptive aspect of a machine learning system to cause it to fail. Fail-
ure consists of causing the learning system to produce errors: if it misidentifies hostile input
as benign, hostile input is permitted through the security barrier; if it misidentifies benign
input as hostile, desired input is rejected. The adversarial opponent has a powerful weapon:
the ability to design training data that will cause the learning system to produce rules that
misidentify inputs. If users detect the failure, they may lose confidence in the system and
abandon it. If users do not detect the failure, then the risks can be even greater.
It is well established in computer security that evaluating a system involves a con-
tinual process of first, determining classes of attacks on the system; second, evaluating
the resilience of the system against those attacks; and third, strengthening the system
against those classes of attacks. Our paper follows exactly this model in evaluating secure
learning.
First, we identify different classes of attacks on machine learning systems (Sect. 2).
While many researchers have considered particular attacks on machine learning systems,
previous research has not presented a comprehensive view of attacks. In particular, we show
that there are at least three interesting dimensions to potential attacks against learning sys-
tems: (1) they may be Causative in that they alter the training process, or they may be
Exploratory and exploit existing weaknesses; (2) they may be attacks on Integrity aimed at
false negatives (allowing hostile input into a system) or they may be attacks on Availabil-
ity aimed at false positives (preventing benign input from entering a system); and (3) they
may be Targeted at a particular input or they may be Indiscriminate in which inputs fail.
Each of these dimensions operates independently, so we have at least eight distinct classes
of attacks on machine learning system. We can view secure learning as a game between
an attacker and a defender; the taxonomy determines the structure of the game and cost
model.
Second, we consider how resilient existing systems are against these attacks (Sect. 3).
There has been a rich set of work in recent years on secure learning systems, and we evaluate
many attacks against machine learning systems and proposals for making systems secure
against attacks. Our analysis describes these attacks in terms of our taxonomy and secure
learning game, demonstrating that our framework captures the salient aspects of each attack.
Third, we investigate some potential defenses against these attacks (Sect. 4). Here the
work is more tentative, and it is clear that much remains to be done, but we discuss a variety
of techniques that show promise for defending against different types of attacks.
Finally, we illustrate our different classes of attacks by considering a contemporary ma-
chine learning application, the SpamBayes spam detection system (Sect. 5). We construct
realistic, effective attacks by considering different aspects of the threat model according to
our taxonomy, and we discuss a defense that mitigates some of the attacks.
This paper provides system designers with a framework for evaluating machine learning
systems for security applications (illustrated with our evaluation of SpamBayes) and sug-
gests directions for developing highly robust secure learning systems. Our research not only
proposes a common language for thinking and writing about secure learning, but goes be-
yond that to show how our framework works, both in algorithm design and in real system
evaluation. We present an essential first step if machine learning is to reach its potential as a
tool for use in real systems in potentially adversarial environments.
Page 4
Mach Learn (2010) 81: 121–148123
1.1 Notation
We focus on binary classification for security applications, in which a defender attempts
to separate instances of input (data points), some or all of which come from a malicious
attacker, into harmful and benign classes. This setting covers many interesting security ap-
plications, such as host and network intrusion detection, virus and worm detection, and spam
filtering. In detecting malicious activity, the positive class (label 1) indicates malicious intru-
sion instances while the negative class (label 0) indicates benign normal instances. A classi-
fication error is a false positive (FP) if a normal instance is classified as positive and a false
negative (FN) if an intrusion instance is classified as negative.
It may be interesting as well to consider cases where a classifier has more than two
classes, or even a real-valued output. Indeed, the spam filter SpamBayes, which we consider
in our experiments in Sect. 5, uses three labels so it can explicitly label some messages
unsure. However, generalizing the analysis of errors to more than two classes is not straight-
forward, and furthermore most systems in practice make a single fundamental distinction
(for example, spam messages that the user will never see vs. non-spam and unsure messages
that the user will see). For these reasons, and in keeping with common practice in the liter-
ature, we limit our analysis to binary classification and leave extension to the multi-class or
real-valued cases as future work.
In the supervised classification problem, the learner trains on a dataset of N instances,
X = {(x,y) | x ∈ X,y ∈ Y}N, given an instance space X and the label space Y = {0,1}.
Given some hypothesis class Ω, the goal is to learn a classification hypothesis (classifier)
f∗∈ Ω to minimize errors when predicting labels for new data, or if our model includes a
cost function over errors, to minimize the total cost of errors. The cost function assigns a nu-
meric cost to each combination of data instance, true label, and classifier label. The defender
chooses a procedure H, or learning algorithm, for selecting hypotheses. The classifier may
periodically interleave training steps with the evaluation, retraining on some or all of the
accumulated old and new data. In adversarial environments, the attacker controls some of
the data, which may be used for training. We assume that the learner has some way to get the
true labels for its training data and for the purpose of computing cost; the true label might
come from manual classification of a training set or from observing the effect of instances
on a test system, for example.
The procedure can be any method of selecting a hypothesis; in statistical machine learn-
ing, a common procedure is (regularized) empirical risk minimization. This procedure is an
optimization problem where the objective function has an empirical risk term and a regular-
ization term. Since true cost is often not representable precisely and efficiently, we calculate
risk as the expected loss given by a loss function ? that approximates true cost; the regu-
larization term ρ captures some notion of hypothesis complexity to prevent overfitting the
training data, using a weight λ to quantify the trade-off. This procedure finds the hypothesis
minimizing:
?
Many learning methods make a stationarity assumption: training data and evaluation
data are drawn from the same distribution. This assumption allows us to minimize the risk
on the training set as a surrogate for risk on the evaluation data, since evaluation data are
not known at training time. However, real-world sources of data often are not stationary
and, even worse, attackers can easily break the stationarity assumption with some control of
f∗= argmin
f∈Ω
(x,y)∈X
?(y,f(x))+λρ(f)
(1)
Page 5
124 Mach Learn (2010) 81: 121–148
Table 1 Notation in this paper
X
Y
D
Space of data instances
Space of data labels; for classification Y = {0,1}
Space of distributions over (X ×Y)
Space of hypotheses f : X ?→ Y
Training distribution
Evaluation distribution
Distribution for training and evaluation (Sect. 4.2.3)
Ω
PT∈ D
PE∈ D
P ∈ D
x ∈ X
y ∈ Y
X,E,Z,C,Ti,Qi∈ (X ×Y)N
H : (X ×Y)N?→ Ω
AT,AE: XN×Ω ?→ D
? : Y ×Y ?→ R0+
C : X ×Y ×Y ?→ R
f : X ?→ Y
f∗: X ?→ Y
N
Data instance
Data label
Datasets
Procedure for selecting hypothesis
Procedures for selecting distribution
Loss function
Cost function
Hypothesis (classifier)
Best hypothesis
Number of data points
Number of repetitions of a game
Number of experts (Sect. 4.2.3)
Trade-off parameter for regularized risk minimization
K
M
λ
either training or evaluation instances. Analyzing and strengthening learning methods in the
face of a broken stationarity assumption is the crux of the secure learning problem.
We model attacks on machine learning systems as a game between two players, the
attacker and the defender. The game consists of a series of moves, or steps. Each move
encapsulates a choice by one of the players: the attacker alters or selects data; the defender
chooses a training procedure for selecting the classification hypothesis.
Table 1 summarizes the notation we use in this paper.
2 Framework
2.1 Security analysis
Properly analyzing the security of a system requires identifying security goals and a threat
model. Security is concerned with protecting assets from attackers. A security goal is a
requirement that, if violated, results in the partial or total compromise of an asset. A threat
model is a profile of attackers, describing motivation and capabilities. Here we analyze the
security goals and threat model for machine learning systems.
Classifiers are used to make distinctions that advance security goals. For example, a virus
detection system has the goal of reducing susceptibility to virus infection, either by detect-
ing the virus in transit prior to infection or by detecting an extant infection to expunge.
Another example is an intrusion detection system (IDS), which identifies compromised sys-
tems,usuallybydetectingmalicioustraffictoandfromthesystemorbydetectingsuspicious