Quantifying eavesdropping vulnerability in sensor networks.
-
Citations (0)
-
Cited In (0)
Page 1
Department of Computer & Information Science
Departmental Papers (CIS)
University of Pennsylvania
Year
Quantifying Eavesdropping Vulnerability
in Sensor Networks
Madhukar Anand∗
Zachary G. Ives†
Insup Lee‡
∗University of Pennsylvania, anandm@cis.upenn.edu
†University of Pennsylvania, zives@cis.upenn.edu
‡University of Pennsylvania, lee@cis.upenn.edu
Postprint version.
is posted here by permission of ACM for your personal use. Not for redistribution. The
definitive version was published in Proceedings of the 2nd International VLDB Workshop on
Data Management for Sensor Networks 2005 (DMSN 2005), pages 3-9.
Publisher URL: http://doi.acm.org/10.1145/1080885.1080887
Copyright ACM, 2005.This is the author’s version of the work.It
This paper is posted at ScholarlyCommons.
http://repository.upenn.edu/cis papers/176
Page 2
Quantifying Eavesdropping Vulnerability in Sensor
Networks∗
Madhukar Anand
Department of Computer and
Information Science
University of Pennsylvania
anandm@cis.upenn.edu
Zachary Ives
Department of Computer and
Information Science
University of Pennsylvania
zives@cis.upenn.edu
Insup Lee
Department of Computer and
Information Science
University of Pennsylvania
lee@cis.upenn.edu
ABSTRACT
With respect to security, sensor networks have a number of con-
siderations that separate them from traditional distributed systems.
First, sensor devices are typically vulnerable to physical compro-
mise. Second, they have significant power and processing con-
straints. Third, the most critical security issue is protecting the (sta-
tisticallyderived) aggregate output of thesystem, even if individual
nodes may be compromised. We suggest that these considerations
merit a rethinking of traditional security techniques: rather than
depending on the resilience of cryptographic techniques, in this
paper we develop new techniques to tolerate compromised nodes
and to even mislead an adversary. We present our initial work on
probabilistically quantifying the security of sensor network proto-
cols, with respect to sensor data distributions and network topolo-
gies. Beginning with a taxonomy of attacks based on an adver-
sary’sgoals, wefocuson how toevaluate thevulnerabilityof sensor
network protocols to eavesdropping. Different topologies and ag-
gregation functions provide different probabilistic guarantees about
system security, and make different trade-offs in power and accu-
racy.
Categories and Subject Descriptors: C.2.0 [Computer-
Communication Networks]: Security and Protection
General Terms: Security
Keywords: Wireless Sensor Networks, Eavesdropping, Data
Streams, Probability Distribution.
1.INTRODUCTION
As sensor network technology advances, security and privacy
concerns will increasingly move to the forefront. Many real-world
settings in which sensors might be deployed (e.g., security systems,
intelligent buildings, hospitals, automated warehouses) have signif-
icant need not only for privacy policies, but mechanisms for enforc-
ing data security and confidentiality.
∗This work was funded in part by NSF grants IIS-0477972
and CCR-0209024 and ARO grants DAAD19-01-1-0473 and
W911NF-05-1-0182.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profi t or commercial advantage and that copies
bear this notice and the full citation on the fi rst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specifi c
permission and/or a fee.
DMSN’05, August 29, 2005, Trondheim, Norway.
Copyright 2005 ACM 1-59593-206-2/05/0008 ...$5.00.
Inthe Aspenn (Abstraction-based Sensor Programming Environ-
ment from Penn) project, we focus on developing the infrastruc-
ture for such rich sensor applications, in which the sensing devices
and networks may be heterogeneous (including smart card readers,
video cameras, and mobile sensors) and the sensor network may in-
teract with external data sources on the Internet. A major emphasis
of our work lies in protecting application data from eavesdroppers
and hackers.
With respect to security, the sensor network domain has several
important characteristics that differentiate it from traditional dis-
tributed systems. First, sensor devices are frequently vulnerable to
physical compromise or local eavesdropping, as they are embed-
ded within an environment. Second, sensor devices have signifi-
cant power and processing constraints, which often prevent them
from running expensive encryption protocols, but which also limit
the amount of “damage” they can do to the overall sensor network
(e.g., by injecting spurious data or snooping on large volumes of
messages). Third, sensor network applications are generally con-
sensus or aggregation-based, meaning that compromising one or a
few nodes may not significantly affect the overall system.
To this point, security techniques have been adapted for the sen-
sor network domain by reducing the computation requirements of
cryptography (generally by pre-distributing keys [18] or reducing
the key size [2]) in order to operate under the limited processing
capabilities of sensor networks. However, cryptography is not the
only means of providing security in a sensor network application
— in fact, if an attacker has sufficient resources, cryptographic
schemes with small key sizes may provide little protection. More-
over, such techniques do not consider the system-wide effects if an
attacker compromises a few nodes.
We advocate a different approach, which takes advantage of the
fact that any real-world attacker is limited by the properties of
the system he or she is attempting to compromise. In this paper
we present an initial framework, taxonomy, and methodology for
quantifying theprivacy and security of sensor network applications,
under the assumption that some nodes may be compromised, and
based on the networks’ size, protocols, and computations. Rather
than providing all-or-nothing guarantees about privacy or security,
our goal is to examine probabilistic guarantees with respect to
compromise, and to understand and improve existing aggregation
strategies with respect to these guarantees. Our focus in this pa-
per is on the problem of eavesdropping, although we are currently
generalizing to other types of attacks. Specifically, we make the
following contributions:
• We propose a taxonomy of attack models for sensor net-
works, based on the goals of the attacker.
Page 3
• We propose what we believe to be the first quantitative ap-
proach to assessing system-level confidentiality and security,
under the possibility that some nodes are compromised.
• We show how our methods can be used to choose between
different protocols and sampling strategies.
• We discuss how cryptographic and non-cryptographic tech-
niques can be used to improve the confidentiality of a sensor
network.
The remainder of the paper is organized as follows: in Section 2,
we introduce a taxonomy of attacks in sensor networks. In the sub-
sequent section, we develop a model for cost and accuracy in a
sensor network. Section 4 discusses how we model an attacker’s
ability to determine the output of a sensor network, and also her
cost. Next, we identify and assess potential means of combating
eavesdropping. We discuss related work in Section 6, and in Sec-
tion 7 we conclude by highlighting avenues for future work.
2.TAXONOMY OF ATTACKER MODELS
By compromising nodes, eavesdropping, or spoofing, an adver-
sary may attempt to violate the security of a sensor network appli-
cation. In order to evaluate a sensor application’s security charac-
teristics, we must first understand the potential goals of the adver-
sary’s attack. We define a taxonomy of attack models for sensor
networks, based on the goals of the adversary.
1. Eavesdropping. Here, the adversary (eavesdropper) aims to
determine the aggregate data that is being output by the sen-
sor network: it is attempting to see what the system is ob-
serving, e.g., to predict how the owner of the sensor network
will react. The adversary either listens to messages transmit-
ted by the nodes, or directly compromises those nodes. We
further distinguish between two types of eavesdropping:
(a) Passive: The eavesdropper conceals her presence from
the sensor nodes and uses only the broadcast medium
to eavesdrop on all messages.
(b) Active: The eavesdropper actively attempts to discern
information by sending queries to sensors or aggrega-
tion points, or by attacking sensor nodes.
2. Disruption. The intent of the adversary is to disrupt the sen-
sor application. This can be a combination of two types of
techniques:
(a) Semantic: Theadversaryinjectsmessages, corruptsdata,
or changes values in order to render the aggregated data
corrupt or useless.
(b) Physical: The adversary upsets sensor readings by di-
rectlymanipulatingtheenvironment. Forexample, gen-
erating heat in the vicinity of sensors will result in er-
roneous values being reported.
3. Hijacking. This variation on the disruption model is a case in
which the adversary attempts to direct the aggregated output
of the sensor application towards a value of her choosing.
If the adversary gains control of enough sensors, then this
attack is the hardest to counter.
Our focus. In this paper, which forms the first step towards ad-
dressing the attack models of our taxonomy, we focus strictly on
the case of eavesdropping. As stated above, we assume that the
adversary’s goal is to ascertain the aggregated values output by the
network: while subtly different from the alternative definition —
attempting to precisely ascertain information about the sensed en-
vironment — we believe this is a more likely motivation for at-
tacking a sensor network. In our definition, what we are trying to
protect is what the system sees, and thus the ability to predict how
the user of the system might react, as opposed to merely protecting
information about the environment. We note that our methods can
generalize to handling the latter case as well: the two definitions
will essentially coincide if we constrain our sensor network appli-
cation to return the most accurate information possible about the
environment.
In the next two sections, we first define our network model and
means for determining cost; then we discuss how we evaluate net-
works’ vulnerability to eavesdropping — first for height-two ag-
gregation trees, and then for trees of arbitrary depth.
3.SENSOR NETWORK MODEL
We begin by introducing our model of a sensor network, begin-
ning by examining how computation is performed, and then quan-
tifying the quality (accuracy) of the network and its cost. These
factors, aswell as the vulnerability of the network toeavesdropping
(next section) will form the basis of assessing sensor networks.
3.1Streams and Aggregation
Data from sensors is typically continuous and time-varying, as
opposed to actually having discrete values; a formal stream model,
similar to that of [1], is appropriate to capture this aspect of data.
DEFINITION 1. (Sensor Stream) A Sensor Stream R is a possi-
bly infinite sequence of elements, {?id,d,τ,ρ?n}n≥1, where id ∈
Z+is a identifier for the sensor, d is a sensor data structure, τ is
the timestamp and ρ is either ∅ or the location of the sensor.
2
Wereasonabout twoorthogonal typesofaggregationover streams:
in-stream aggregation, which occurs over asingle stream, generally
over a time window, and multi-stream aggregation, which occurs
across the values of multiple streams, either at the same time or
over a time window.
In-stream aggregation can be thought of as aggregation over all
data from a single sensor within some time window. We can also
defineaggregationover streamsofdatafromdifferent sensorswithin
the same time window. We refer to this form of aggregation as
multi-stream aggregation.
3.2Hierarchical Aggregation
Forpurposes offormal analysis, weabstract awayspecificdetails
of sensing, communication and computation and view the network
from a pure data collection and aggregation perspective. The hier-
archical aggregation tree is a recursive structure in which, at each
level of the tree, groups of child nodes send their values to a parent
node that aggregates their values. The base station is the interme-
diate point at the highest level. Our model is consistent with most
proposed aggregation algorithms, e.g. [16, 23, 11].
Finally, we assume that the values observed at each sensor are
not identical, but can be characterized according to some proba-
bilistic data distribution. Data from a sensor network will typically
consist of a number of observed attributes; a probability density
function (pdf) can be used to assign a probability for each possible
assignment to the attributes. Such a model can be learned from data
collected over time, using algorithms such as those in [17]. Learn-
ing a model involves maintaining certain parameters, e.g., the mean
and the variance, and coping with noise, outliers, etc. A significant
literature exists on learning models of streams, (e.g., [3, 5]).
Page 4
Many sensor applications include multiple, dynamic attributes,
and hence correlations and temporal aspects to the data distribution
must also beconsidered. In[8], theauthorsused Markovian models
to learn the time-varying effects of sensor readings. In their model,
given the value of all attributesat timet, it isassumed that the value
of the attributes at time t+ 1 are independent of those for any time
earlier than t. This is generally sufficient to capture the dynamic
nature of the sensor data. The same authors have extended their
work in [7] to consider correlations between streams.
Our work assumes that such distributions are given (or can be
reasonably approximated). Based on knowledge of the data distri-
bution, wecan provide specific probabilistic guarantees about sens-
ing and eavesdropping. Additional information, such as the spatial
distribution of sensors, is not assumed, although it can add to the
precision of the metrics we present.
We illustrate an example aggregation tree in Figure 1, where
nodes s0,...,s5 are in a hierarchical group. Each of s1,...,s5
perform aggregation of data in sub-groups and combine their own
data with this before forwarding it to node s0. Node s0, in addition
to recording its own sensor data, is also the final aggregator for all
the data in the network.
s0
s3
s5
s4
s2
s1
Figure 1: Sensor network model
We consider the presence of a powerful adversary who has the
capability of listening to the messages in the sensor network, or of
compromising sensor nodes in an undetectable way, with a certain
probability. The higher a compromised node is in the aggregation
tree, the more power the attacker has.
Notation. We denote the set of all sensor data streams within a
group in the hierarchical network with the symbol S. Some subset,
SC ⊆ S of these data values will be used to compute the stream
aggregate σ (this quantity considers the possibility of dropped mes-
sages, filtering, sampling, etc.). The adversary can eavesdrop on
some set of nodes SA ⊆ S, which may overlap with but differ
from SC.
EXAMPLE 1. Consider, thesituationdepicted inFigure1, where
the top-level group of an activity monitoring sensor network has
nodes s0,...,s5. Assume the sensors s1,...s5perform their local
aggregation tasks and output their values to node s0 once every 5
seconds. Alsoassume that the values from all data streams have the
in-stream aggregation function σ1 to be the mean of all the read-
ings obtained at each node si over the past 4 sampling intervals.
Let the multi-stream aggregation (σ2) be applied every 20 seconds,
as the mean of the readings from s0,...,s5.
If readings from s0 are {4.82,4.81,4.82,4.83}, then σ1(s0) =
4.82. Similarly, if σ1(s1) = 4.93, σ(s2) = 5.17, σ1(s3) = 4.92,
σ1(s4) = 4.87, σ1(s5) = 5.04 and we compute the mean over all
streams, then σ2(S) = 4.96.
2
3.3Quality of the Sample
Given a model of the distribution of data readings in the environ-
ment, there are several possible metrics for estimating the quality
(accuracy) of the sample. We assume that the readings used to
produce a single aggregate stream element occur within some time
window [T,T + ∆]. The length of the window, ∆, is application-
specific, and it corresponds to the common notion of an epoch [16]
during which computations are performed, but it allows readings to
occur at any point within the window.
In statistics, goodness-of-fit is used to measure the distance be-
tween the data and the hypothesis. For example, if the underlying
distribution is normal, then goodness-of-fit can be determined by
using the standard χ2test. We adopt a statistic that works bet-
ter for small samples and is simple to compute, the Kolmogorov-
Smirnov test [12]. To compare a data sample consisting of N
events whose cumulative distribution is SN(x) with a hypothesis
function whose cumulative distribution is Φ(x), the value η is cal-
culated as η = maxx|SN(x) − Φ(x)|.The Cramer-Smirnov-Von-
Mises test is often used to test that a one-dimensional data sample
is compatible with being a random sampling from a given distri-
bution: If the density function of the data is f(x), then, the test
measures the goodness-of-fit by the measure W2, which is given
byR∞
depending on the distribution of data; for details we refer the reader
to a standard textbook on statistics (e.g., [12]).
−∞[SN(x)−F(x)]2f(x)dx. There are many alternative tests,
EXAMPLE 2. If we assume that the data in Example 1 is dis-
tributedN(5,0.1) and use theχ2test asthegoodness-of-fit measure,
we have ∆ = 20s andP5
i=0
that we have a sample close to the actual model.
3.4Cost of Sensing
We can estimate the cost of producing a single output element in
the sensor network by considering the cost of acquiring and com-
municating the sensor readings. Let the time window be T =
[T,T + ∆], the cost of acquiring a reading at sensor node s be
ca(s), and the cost of transmitting a message from sensor s to the
aggregating point s0 be ct(s). Then the cost of acquiring the data
to be aggregated is Ca(T ,S) =P
diate node in the aggregation tree, the cost of transmitting sensor
data is Ct(T ,S) =P
cause there is no transmission involved from s0 to itself). Let the
reception cost for one reading at s0be cr. Then, the total cost of re-
ception Cr(T ,S\S0) = |S\S0|·crwhere S0is the set of readings
obtained at s0. Thus, the total cost for acquiring and aggregating
the data is C(T ,S) = Ca(T ,S) + Ct(T ,S) + Cr(T ,S\S0) for
any set S of nodes that share a single aggregation point s0.
( ¯ si−4.96)
0.1
= 0.911, which implies
2
s∈Sca(s). For each interme-
s∈Sct(s), where ct(s0) = 0. (This is be-
EXAMPLE 3. Let us assume that the cost of sensing for at-
tribute is 0.015J and transmitting and receiving data takes 0.025J
of energy for all the sensors in Example 1. In one epoch, the sen-
sors transmit20
0.015) + 4 × 0.015J + 5 × 4 × 0.025J = 1.36J.
5= 4 packets. Hence, C(S) = 5 × 4 × (0.025 +
2
4.MODELING EAVESDROPPING
We now consider the case of an adversary who has access to
some of the sensor readings (either through eavesdropping or com-
promise), and who istrying to determine the aggregate value output
by the sensor network.1We consider the confidentiality of the net-
work, in terms of whether the adversary can estimate the output
1As described in Section 2, this definition is motivated by the fact
thattheeavesdropper ismost likelytobeinterestedinpredictingthe
behavior of the person or application monitoring the sensor data.
Page 5
value within some small tolerance δ. We compute the eavesdrop-
ping vulnerability based on several important parameters. First,
there is the probability that a compromised set of sensor nodes,
SA, greatly resembles the set of nodes that our application is sam-
pling, SC. This probability is a function of the size of SC, the
specific aggregate function σ, and the data distribution of the sen-
sors S. For example, if all sensors produce the same reading, then
the adversary can compromise the system from a single reading.
We formalize the probability based on these parameters.
DEFINITION 2. (Eavesdropping Vulnerability) The eavesdrop-
ping vulnerability (γ) relative to a set of compromised nodes is de-
fined as γ(σ,S,SA,SC,δ) = p(|σ(SC) − σ(SA)| ≤ δ), where σ
is the aggregating function and δ the adversary’s error tolerance.
2
Although we have considered a single aggregate computation
here, the eavesdropping vulnerability can be generalized to sup-
port multiple aggregate computations over different attributes: the
expected value of γ can be obtained by conditioning on different
parameters.
We can compute the expected eavesdropping vulnerability, in
which the specific SA is unknown, as ¯ γ =
I(|σ(SC) − σ(s)| ≤ δ), where I is an indicator function that
evaluates to 1 if the condition is true and 0 otherwise.
This relieson knowledge of the underlying sensor value distribu-
tion of S, and the specific aggregation function, σ. We now show
the derivation of γ values for the most common sensor aggrega-
tion functions (min,max,sum,avg and median) over single at-
tributes with discrete distributions:
P
sp(SA = s) ·
• Min/Max: I(|min(SC) − min(SA)| ≤ δ) = 1 if min(SA)
lies between [min(SC) − δ,min(SC) + δ]. If f is the prob-
ability density function (pdf) and Φ is the cumulative den-
sity function (cdf) of the distribution of S, then, for any j,
f(j) is the probability of obtaining a j and (1 − Φ(j)) is
the probability that a reading is greater than j. Thus in a
sample of size i, j will be the minimum with probability
f(j)(1 − Φ(j))i−1. Therefore:
¯ γ =
|S|
X
i=1
p(|SA| = i)
?min(SC)+δ?
X
j=?min(SC)−δ?
f(j) · (1 − Φ(j))i−1
(1)
Using a similar argument for Max, we get:
¯ γ =
|S|
X
i=1
p(|SA| = i)
?min(SC)+δ?
X
j=?min(SC)−δ?
f(j) · Φ(j)i−1
(2)
• Sum: I(|sum(SC)−sum(SA)| ≤ δ) = 1 if sum(SA) lies
between [sum(SC)−δ),sum(SC)+δ)]. If f|SA|is the pdf
of the sum of variables and Φ|SA|is the cdf of the sum of
variables, we get:
¯ γ =
|S|
X
i=1
p(|SA| = i) ·`Φ|SA|(u) − Φ|SA|(l)´
(3)
where u = (sum(SC) + δ) and l = (sum(SC) − δ).
• Avg: I(|avg(SC) − avg(SA)| ≤ δ) = 1 if sum(SA) lies
between [|SA|(avg(SC)−δ),|SA|(avg(SC)+δ)]. If f|SA|
is the pdf of the sum of variables and Φ|SA|is the cdf of the
sum of variables, then with a similar argument as before, we
get:
¯ γ =
|S|
X
i=1
p(|SA| = i) ·`Φ|SA|(u) − Φ|SA|(l)´
(4)
where u = i(avg(SC) + δ) and l = i(avg(SC) − δ).
• Median: I(|med(SC) − med(SA)| ≤ δ) = 1 if med(SA)
lies in [med(SC) − δ,med(SC) + δ]. If f be the proba-
bility density function (pdf),and Φ is the cumulative density
function (cdf) of distribution of S, then, for any j, f(j) is
the probability of obtaining a j, Φ(j) is the probability that a
reading is less than j, and (1 − Φ(j)) is the probability that
a reading is greater than j. Thus in a sample of size i, j will
be the median with probability,
p(j) =`
?i
2?
Therefore:
i
´· f(j) · Φ(j)?i
2?· (1 − Φ(j))i−?i
2?−1.
¯ γ =
|S|
X
i=1
p(|SA| = i)
?min(SC)+δ?
X
j=?min(SC)−δ?
p(j)
(5)
2
EXAMPLE 4. To evaluate the expected value of γ for the ap-
plication in Example 1, let us assume that the probability of the
adversary eavesdropping onasingle node is0.2 and thedataisdis-
tributed as N(5,0.1). Also, let the tolerance δ = 0.1. Noting that
we have σ2(S) = 4.96, we can use Equation (4) to evaluate the
expected probability. We get ¯ γ =P5
2, which on evaluation yields ¯ γ = 0.2499. This agrees with our
intuition that if the adversary is able to compromise one node, then
she is far from being able to estimate the aggregate of the network
consisting of 5 nodes.
i=1pi· (Φ(5.06) − Φ(4.86))
2
4.1Hierarchical Aggregation
Thus far, we have only considered aggregation within a group
with a single aggregation point. We now generalize to eavesdrop-
ping over hierarchical groups: the goal is to consider how close the
adversary gets to an aggregate value higher in the tree when she
eavesdrops on data in the lower levels comprising that group. (If
we assume that the adversary eavesdrops only at one level, then
this problem is identical to the one considered above.) The higher
the adversary listens, the closer she gets to aggregate of the whole
network.
An example scenario is depicted in Figure 1, where we assume
thattheadversaryhaseavesdropped ongroups withnodes s1,...,s5
as the nodes responsible for aggregation. Now, we want to know
how close she gets to the aggregate at s0.
The probability of adversary learning the result of aggregation
at a level l is called the eavesdropping vulnerability over a hier-
archy and is denoted by γl, where l indicates the hierarchical level
from which the adversary listenswiththe goal of compromising the
overall system. As with γ, γlwill be a function of Sl,Sl
and δ. We consider the effect of a lower-level compromise on a
higher-level node to be a “partial compromise” of the higher node,
i.e., we define Sl
set at level l is the union of sets σ(Sl−1
fact that the sensor values at level l will be aggregates of values at
level l − 1.
A,Sl
C,σ
A=S
iσ(Sl−1
Ai),l > 1. Note that the adversary’s
Ai), which accounts for the
2These values can be found by converting it into standard normal
form for which Φ is well tabulated.
Page 6
DEFINITION 3. (Eavesdropping VulnerabilityoveraHierarchy)
The eavesdropping vulnerability (γl) for the adversary over a hier-
archy isdefined asγl(σ,Sl,Sl
where σ is the aggregating function and δ is the error in estimate,
and Sl
A,Sl
C,δ) = p`|σ(Sl
C) − σ(Sl
A)| ≤ δ´,
A=S
iσ(Sl−1
Ai),l ≥ 1.
2
Note that with this definition, γ = γ0. We can compute γl by
conditioning onvarious parameters. Forexample, knowing σ,Sl,Sl
and δ, we can compute:
C
γl=
X
Sl−1
A1,...,Sl−1
An
p(Sl−1
A1,...,Sl−1
An) · I(d ≤ δ)
(6)
where d = |σ(Sl
Computing γl, in general, involves knowing how much the data
from different groups are related. If the data from different groups
at level l−1are correlated, then computing γlcan be quitedifficult.
Correlations between groups are also undesirable because they can
help the adversary can make a good estimate by eavesdropping on
only a few groups.
Although the exact computation of γl is generally difficult, an
approximate answer bymakingsomesimplifyingassumptions, such
as simultaneous eavesdropping in all the groups. The example be-
low illustrates this idea.
C) − σ(Sl
A)|.
EXAMPLE 5. Consider the scenario in Example 1. Let us as-
sume that each of the nodes s0,...,s5are themselves aggregating
data in their groups and that the distribution in each group is as
follows: s1 : N(4.9,1), s2 : N(4.8,1), s3 : N(4.8,1), s4 :
N(5,1), s5 : N(5.2,1), and the data from node s0 is distributed
N(5,1). If the data at this level is being averaged, the resulting av-
erage will be normally distributed with mean5+4.9+4.8+4.8+5+5.2
and a standard deviation
N(4.95,0.16). Now, if the probability of eavesdropping simulta-
neously in every group is 0.5, the eavesdropping vulnerability for
δ = 0.1 isP5
4.2Performance Ratio
The eavesdropping vulnerability γ or γlgives us the probability
that anadversary canobtainagoodestimateof theactual aggregate.
Obviously, we would like to design sensor networks that minimize
this probability; however, to do this, we will generally have to incur
additional overhead.
If we use benefit to mean how close an estimate is to the tar-
get (in the case of our application, this is the “real” aggregate;
in the case of the adversary, this is our network’s aggregate), we
can define a performance ratio to compare different sensor network
schemes. We define the performance ratio of the adversary relative
to a set of compromised nodes, ρA, as: ρA(σ,S,SA,SC,δ,C) =
γ(σ,S,SA,SC,δ)
Cr(SA)
. The increase in cost incurred to reduce γ can be
measured by
tolerant data protocol and C is the cost model for the standard
streaming model, as defined earlier. We can now define the per-
formance ratio of a sensor network, ρ, as:
6
1+1+1+1+1+1
36
, which has distribution
i=1(0.5)i· (Φ(5.06) − Φ(4.86)) = 0.4599.
2
C(S)
C?(S). Here, C
?isthecost model foranyeavesdropping-
ρ(σ,S,SA,SC,δ,C,C
?) =
1
ρA(σ,S,SA,SC,δ,C)·C(S)
C?(S)
(7)
Wecancalculate theexpected value of ρby conditioning on various
parameters. Ideally, we would like to design our data protocol to
maximize ρ as much as possible.
EXAMPLE 6. Consider the application in Example 1 with the
cost as computed in Example 3. We assume that the probability of
the adversary eavesdropping on a node is 0.2, yielding a cost of
0.025 ∗ 4J = 0.1J.
P5
i=1
now be computed as,
for the adversary increases the ratio ρ. If we make it harder for
the adversary to eavesdrop, say reducing the probability of eaves-
dropping on a single node to 0.1, then we will have, ¯ ρ = 1.2248.
Techniques for increasing performance ratio are discussed in the
next section.
(Φ(5.06)−Φ(4.86))·pi
0.1i
1.799·1.36
= 1.799. ¯ ρ can
1
1.36= 0.5558. Intuitively, higher cost
2
Toincrease thequality of the sample, weneed more observations
(SC), which, however, increases both cost and (if the distribution
of values remains the same) γ. Hence, we can identify a trade-off
between quality, cost, and having a eavesdropping vulnerability.
5.COUNTERMEASURES AGAINST
EAVESDROPPING
Given our understanding of the factors that affect eavesdropping
potential, we now present some general techniques to thwart ad-
versaries. We distinguish between traditional, cryptographic tech-
niques and non-cryptographic schemes.
5.1Cryptographic techniques
Encryption and authentication using cryptographic techniques
makes a system significantly more secure against eavesdropping
and other attacks. Encryption can be used to keep data secure
from the adversary, and authentication can be used to safeguard
against spurious data. In essence, these techniques attempt to en-
sure system-level confidentiality by protecting all links. For the
sensor network environment, symmetric key techniques are most
commonly used, but it is unclear how to manage keys and how to
justifytheoverhead of encryption. Among themany prior works on
cryptographic techniques for privacy in wireless sensor networks,
[18] and [15] describe methods to achieve authenticity and confi-
dentiality.
However, many approaches (e.g., [19, 9]) assume a pre-key dis-
tributionwhichimpedesnetwork creationand makesdynamicmem-
bership difficult. In [4], Chan and Perrig advocate that end-to-end
encryption is not possible for sensor networks and foresee new
methods as the solution. Moreover, encryption may not help if the
nodes themselves can be compromised. Taking our cue from these
points, we briefly suggest several alternatives below.
5.2Non-cryptographic techniques
Non-cryptographic techniques make it harder to eavesdrop by
reducing the chance that an adversary’s sensor data sample SA
matches the system’s sample SC.
Data Filtering or Compensation. One technique is to deliber-
ately send spurious data (or data with spurious offsets) from the
sensors, and to filter the noise at the aggregating point. After fil-
tering, the resulting data set will comprise legitimate information
about the underlying network. The adversary, who is not aware of
this shared information, will see data that follows a different distri-
bution.
One such idea, which we are investigating extensively, is termed
confusion [6]. Under such a scheme, whenever the sensor wishes to
transmit a message, it appends the shared secret (token) to the mes-
sage. A set of confusion-generating nodes then could inject spuri-
ous data, which is indistinguishable to a third party, into the net-
work. Such confusion messages could be generated either by third
party nodes or be a subset of sensors themselves. At the receiving
end, the secret can be used to separate the legitimate message from
the noise. Yet while the aggregate node can filter out superfluous
Page 7
messages from confusers, an eavesdropper with incomplete knowl-
edge cannot make such distinctions. Since the eavesdropper is not
aware of which tokens belong to the sensors and which belong to
the confusers, she cannot identify the legitimate messages. Thus, if
she ends up accepting the “noise,” she will end up with a different
distribution of the data in the network.
As with encryption techniques, a confusion-based technique as-
sumes a shared secret unique to a sensor, but it may may require
less computational power per sensor node, it is tolerant to the com-
promise of a few nodes, and it is resistant to active eavesdropping.
The savings on per-device power in the confusion-based approach
comes fromthefact that thereisno need forthe expensive exponen-
tiation operations involved in encryption. Confusion does require
more message transmissions, but these can be amortized by adding
greater numbers of devices.
EXAMPLE 7. Consider the application in Example 1. Suppose
the sensors double their transmission rate by injecting a spuri-
ous value for every legitimate one. Assume that the legitimate
data is distributed within the range N(5,0.1), while the spuri-
ous data ensures the adversary’s sample will be uniformly dis-
tributed in [4,7]. Given the model of Example 6, the cost is C(S) =
8×5×(0.025+0.015)+4×0.015+8×5×0.025J = 2.66J.
P5
i=1
0.1i
1
0.1492·
the vulnerability of the network, when compared to the baseline
model’s ¯ ρ = 0.5558.
(Φ(5.06)−Φ(4.86))·pi
= 0.1492. ¯ ρ can now be computed as,
2.66= 3.4267. Clearly, this technique greatly reduces
1.36
Data cloaking [10] has been proposed as another approach to
achieving privacy in sensor networks. Cloaking of data involves
perturbing the data by a predefined offset. This has been used to
achieve anonymity within a network. A similar idea can also be
used to counter eavesdropping: 1) First, nodes are partitioned into
disjoint subsets. 2) Then, based on a shared secret, each node
within a partition is assigned an offset. This offset is added to
the actual sensor reading before transmission. Ideally, this offset
should be unique to a partition. 3) At the point of aggregation, the
appropriate offset is subtracted from the reading before aggrega-
tion.
Although thisschemerequiresmaintaininganode-to-offset map-
ping at the aggregating point, it can easily be obviated by having
all the nodes within a partition transmit within a time slot. With
such a routing protocol, only the mapping of different time slots to
the offset would have to be stored and this information is modest
compared the original mapping.
The adversary, who has no information about the offset, will be
readily misled by the transmitted information. Even if she man-
ages to compromise a few nodes and learn the offset information,
the damage is limited to members of the partition with the compro-
mised nodes.
EXAMPLE 8. Consider the scenario in Example 1. Let us as-
sume that the nodes s1,...,s5 are themselves aggregation point
of their groups and their data is distributed as N(5,1). Also,
let the data at node s0 be also distributed N(5,1). If average is
the aggregation function used, it will be normally distributed with
mean
6
= 5 and standard deviation
sume that the probability of eavesdropping on a single message is
0.5, the eavesdropping vulnerability isP5
Φ(4.86)) = 0.9843 × 0.4553 = 0.4482.
Now, if we assume that each sensor i,i ∈ {0,...,5} adds an
offset 0.1i, which is subtracted out at s0, then the average will
be normally distributed with mean5+5.1+5.2+5.3+5.4+5.5
5×6
1×6
36
= 0.16. If we as-
i=1(0.5)i· (Φ(5.06) −
6
= 5.25
and standard deviation
ping vulnerability will beP5
0.9843 × 0.1101 = 0.1083. which is a clear reduction in eaves-
dropping vulnerability from 0.4482 without using the offsets.
1×6
36
= 0.16. In this case, the eavesdrop-
i=1(0.5)i· (Φ(5.06) − Φ(4.86)) =
2
Attribute-value Correlation. Yet another possibility is to use cor-
relations between different attributes. If the application at hand is
temperature monitoring and a sensor’s temperature and voltage are
correlated, then, for instance, the sensors might transmit voltages
in certain cases, and temperatures the remainder of the time. If we
assume that the adversary does not have the correlation model, then
such data will be useless to her. Constructing correlations between
attributes has been previously studied (e.g., [7]), with the objec-
tive of reducing the cost for the network. Here we use it as shared
information. Importantly, it takes a considerable amount of time,
energy, and node samples to learn this correlation model, mean-
ing that an attacker would need to devote significant resources to
compromising a large portion of the system.
EXAMPLE 9. Again consider Example 7, with the modification
that with probability 0.5, the sensors send voltage readings. Fur-
ther, they also output as many spurious messages as temperature
readings, in order to ensure that the adversary’s distribution is uni-
formly distributed in [4,7]. In this case, C(S) = (1
5×(0.025+0.015)+4×0.015+1
1.76J, ¯ ρ can now be computed as,
is better than strictly using the filtering/compensation approach.
2×8+1
2×4)×
2×8+1
1
0.1492·1.36
2×4)×5×(0.025) =
1.76= 5.1791. which
6.RELATED WORK
Prior works on sensor security [22, 14] present attack models,
but our focus and attack taxonomy are a more general classification
based on the goals of the adversary, and our focus is on the security
of theoverall systemeven when individual nodes arecompromised.
There is also a significant literature on quantifying security in
a context-specific way. [13] presents a quantitative model of the
security intrusion process based on attacker behavior: their model
is based on empirical data collected from intrusion experiments.
[20] quantifies security strength and risk using economic criteria.
It should be noted that though these are general methods, their ap-
plicability to sensor networks is uncertain. Our approach, in con-
trast, is based on data models for different applications of sensor
networks.
The idea of developing a probabilistic model for data aggrega-
tion in sensor networks was introduced in [8]. We can use the same
techniques to learn a model from the data. However, our focus is
on using the model to understand the security vulnerabilities of a
sensor network, as opposed to minimizing power usage in com-
puting aggregates. This slightly resembles the resilient techniques
for data aggregation of [21], although we focus on quantitatively
ascertaining robustness in the presence of an adversary.
7.CONCLUSIONS AND FUTURE WORK
We have presented an attacker taxonomy for sensor networks
which has three main classes of attackers: eavesdropping, disrup-
tion, and hijacking. So far as we know, our work is the first to
focus on quantifying system-level eavesdropping vulnerability. We
first study asingle-level aggregation tree (γ) and then a hierarchical
network (γl), developing a probabilistic scheme for assessing their
eavesdropping vulnerability. We then consider trading off power
consumption versus security and data quality/accuracy. Finally, we
propose a series of solutions using cryptographic techniques, data
filtering, and attribute correlation.
Page 8
This paper represents an initial step in a much broader plan.
First, we are extending our model to the disruption and hijacking
models. We are also developing a comprehensive characterization
of common sensor network protocols and aggregation functions
with respect to their robustness. We ultimately hope to consider a
range of other issues, such as unreliable networks, temporary out-
ages, and correlations between the values at different sensors.
8. REFERENCES
[1] A. Arasu, S. Babu, and J. Widom. The CQL continuous
query language: Semantic foundations and query execution.
Technical Report 2003-67, Stanford University, 2003.
[2] S. Avancha, J. L. Undercoffer, A. Joshi, and J. Pinkston.
Secure sensor networks for perimeter protection. Computer
Networks, 43(4):421–435, November 2003.
[3] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom.
Models and issues in data stream systems. In PODS ’02:
Proceedings of the twenty-first ACM
SIGMOD-SIGACT-SIGART symposium on Principles of
database systems, pages 1–16, New York, NY, USA, 2002.
ACM Press.
[4] H. Chan and A. Perrig. Security and privacy in sensor
networks. IEEE Computer Magazine, pages 103–105, 2003
2003.
[5] F. Chu, Y. Wang, and C. Zaniolo. An adaptive learning
approach for noisy data streams. In ICDM, pages 351–354,
2004.
[6] E. Cronin, M. Sherr, and M. Blaze. On the reliability of
internet eavesdropping, February 2005. Personal
Communication.
[7] A. Deshpande, C. Guestrin, S. Madden, and W. Hong.
Exploiting correlated attributes in acqusitional query
processing. In ICDE 2005, 2005.
[8] A. Deshpande, C. Guestrin, S. R. Madden, J. M. Hellerstein,
and W. Hong. Model-driven data acquisition in sensor
networks. In 30th VLDB Conference, 2004.
[9] W. Du, J. Deng, Y. S. Han, S. Chen, and P. Varshney. A key
management scheme for wireless sensor networks using
deployment knowledge. In Proceedings of The 23rd
Conference of the IEEE Communications Society, 2004.
[10] M. Gruteser, G. Schelle, A. Jain, R. Han, and D. Grunwald.
Privacy-aware location sensor networks. In Proceedings of
HotOS’03: 9th Workshop on Hot Topics in Operating
Systems, pages 163–168. USENIX, May 2003.
[11] J. M. Hellerstein, W. Hong, S. Madden, and K. Stanek.
Beyond average: Towards sophisticated sensing with queries.
In 2nd International Workshop on Information Processing in
Sensor Networks (IPSN ’03), March 2003.
[12] I.Miller and J.E.Freund. Probability and Statistics for
Engineers, 2nd edition. Prentice Hall,Inc, Englewood Cliffs,
NJ., 1977.
[13] E. Jonsson and T. Olovsson. A quantitative model of the
security intrusion process based on attacker behavior. IEEE
Trans. Softw. Eng., 23(4):235–245, 1997.
[14] C. Karlof and D. Wagner. Secure routing in wireless sensor
networks: Attacks and countermeasures. In IEEE Int’l
Workshop on Sensor Network Protocols and Applications,
pages 113–127, May 2003.
[15] Y. W. Law, S. Etalle, and P. H. Hartel. Assessing
Security-Critical Energy-Efficient sensor networks. In Conf.
on Security and Privacy in the Age of Uncertainty (SEC),
pages 459–463, May 2003.
[16] S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong.
Design of an acquisitional query processor for sensor
networks. In SIGMOD 2003, pages 491–502, 2003.
[17] T. Mitchell. Machine Learning. McGraw Hill, 1997.
[18] A. Perrig, R. Szewczyk, V. Wen, D. E.Culler, and J. D.
Tygar. SPINS: security protocols for sensor netowrks. In
Mobile Computing and Networking, pages 189–199, 2001.
[19] B. Przydatek, D. Song, and A. Perrig. SIA: secure
information aggregation in sensor networks. In SenSys ’03,
pages 255–265, 2003.
[20] S. E. Schechter. Computer security strength & risk: A
quantitative approach. Harvard University Doctoral
Dissertation, 2004.
[21] D. Wagner. Resilient aggregation in sensor networks. In
SASN: Proc. Workshop on security of ad hoc and sensor
networks, pages 78–87, 2004.
[22] A. D. Wood and J. A. Stankovic. Denial of service in sensor
networks. Computer, 35(10):54–62, 2002.
[23] Y. Yao and J. Gehrke. Query processing for sensor networks.
In CIDR 2003, 2003.
View other sources
Hide other sources
-
Available from Zachary G. Ives · 14 Nov 2012
-
Available from psu.edu