Conference PaperPDF Available

S/Kademlia: A practicable approach towards secure key-based routing

Authors:

Abstract

Security is a common problem in completely decentralized peer-to-peer systems. Although several suggestions exist on how to create a secure key-based routing protocol, a practicable approach is still unattended. In this paper we introduce a secure key-based routing protocol based on Kademlia that has a high resilience against common attacks by using parallel lookups over multiple disjoint paths, limiting free nodeld generation with crypto puzzles and introducing a reliable sibling broadcast. The latter is needed to store data in a safe replicated way. We evaluate the security of our proposed extensions to the Kademlia protocol analytically and simulate the effects of multiple disjoint paths on lookup success under the influence of adversarial nodes.
S/Kademlia: A Practicable Approach Towards Secure Key-Based Routing
Ingmar Baumgart and Sebastian Mies
Institute of Telematics
Universit
¨
at Karlsruhe (TH)
D–76128 Karlsruhe, Germany
Email: {baumgart, mies}@tm.uka.de
Abstract
Security is a common problem in completely decentral-
ized peer-to-peer systems. Although several suggestions ex-
ist on how to create a secure key-based routing protocol,
a practicable approach is still unattended. In this paper
we introduce a secure key-based routing protocol based on
Kademlia that has a high resilience against common attacks
by using parallel lookups over multiple disjoint paths, lim-
iting free nodeId generation with crypto puzzles and intro-
ducing a reliable sibling broadcast. The latter is needed to
store data in a safe replicated way. We evaluate the security
of our proposed extensions to the Kademlia protocol ana-
lytically and simulate the effects of multiple disjoint paths
on lookup success under the influence of adversarial nodes.
1. Introduction
For a long time structured peer-to-peer networks like
Chord [18] or Pastry [14] have only raised interest in the
research community but were not largely used in real net-
works. Nowadays the situation in the Internet has changed
significantly with the success of distributed filesharing ap-
plications. But also the emerging area of Massively Multi-
player Online Games has raised interest in using structured
peer-to-peer networks to build scalable storage and commu-
nication services.
A major problem of completely decentralized peer-to-
peer systems are security issues. Several common attacks
against structured peer-to-peer networks have been identi-
fied [3, 16, 17]. Although many suggestions exist how to
deal with such attacks in theory, there has been little work
on building a practicable secure peer-to-peer network.
All widely deployed structured overlay networks used
in the Internet today (i.e. BitTorrent, OverNet and eMule)
are based on the Kademlia [11] protocol and are vulnerable
to several attacks [5]. In this paper we analyze attacks on
Kademlia networks and propose several practicable coun-
termeasures.
The rest of this paper is organized as follows: In section
2 we provide some background on structured peer-to-peer
networks and give an introduction to the Kademlia proto-
col. In section 3 we present an overview of common attacks
against Kademlia. The detailed design of our security ex-
tensions to Kademlia is described in section 4. In section
5 we evaluate our design by simulation. Finally, section 6
covers related work and section 7 concludes.
2 Background
In this section we provide some background on struc-
tured peer-to-peer systems and describe some properties of
the Kademlia protocol. A common service which is pro-
vided by all structured peer-to-peer networks is the key-
based routing layer (KBR) [6]. This layer provides efficient
routing to identifiers called keys from a large identifier space
(typically n-bit integers modulo 2
n
with n = 128 or 160).
Every participating node in the overlay chooses a unique
nodeId from the same id space and maintains a routing table
with nodeIds and IP addresses of neighbors in the overlay
topology. Depending on the overlay protocol the topology
resembles a ring, hypercube or de Bruijn graph. Every node
is responsible for a particular range of the identifier space,
usually for all keys close to its nodeId in the id space. In
this way KBR can be used to efficiently route a message to
an arbitrary key by successively forwarding the message to
overlay neighbors which have a nodeId closer to the desti-
nation key. This KBR layer can be used as building block
for more complex tasks like a distributed storage service.
Although structured peer-to-peer systems like Chord,
Pastry or Kademlia provide a similar KBR service, they dif-
fer in properties like routing table size and average lookup
path length. In the following we focus on the Kademlia [11]
protocol, because it is already insusceptible to several com-
mon attacks and at the same time simple to implement.
Figure 1. Routing table in Kademlia for a node
with a nodeId prefix of 0011
2.1. Kademlia
Kademlia [11] is a structured peer-to-peer system which
has several advantages compared to protocols like Chord
[18] as a results of using a novel XOR metric for dis-
tance between points in the identifier space. Because XOR
is a symmetric operation, Kademlia nodes receive lookup
queries from the same nodes which are also in their local
routing tables. This is important in order that nodes can
learn useful routing information from lookup queries and
update their routing tables. In contrast Chord needs a ded-
icated stabilization protocol due to the asymmetric nature
of the overlay topology. The XOR metric is also unidirec-
tional: For any constant x there exists exactly one y which
has a distance of d(x, y). This ensures that lookups for the
same key converge along the same path independently of
the origination which is an important property for caching
mechanisms.
In Kademlia every node chooses a random 160-bit
nodeId and maintains a routing table consisting of up to
160 k-buckets. Every k-buckets contains at most k en-
tries with <IP address, UDP port, NodeId> triples of other
nodes. The parameter k is a redundancy factor to make the
routing more robust by spanning several disjoint paths be-
tween overlay nodes. Buckets are arranged as a binary tree
and nodes get assigned to buckets according to the shortest
unique prefix of their nodeIds. An example routing table of
a node with a nodeId prefix of 0011 is shown in figure 1. Ini-
tially, a node’s routing table consists of a single bucket cov-
ering the entire ID space. When a node U learns of a new
contact C, the node U inserts C in the appropriate bucket
according to the prefix of C’s nodeId. If this bucket already
contains k nodes and the bucket’s range includes U’s own
nodeId, the bucket is split into two new buckets. Otherwise
the new contact is simply dropped.
A problem arises if nodeIds are unequally distributed. In
this case the standard bucket splitting algorithm can lead to
nodes not knowing their complete k neighborhood. An ex-
Figure 2. Kademlia routing table irregularities
in a highly unbalanced tree
ample for this is given in figure 2. This figure shows the
routing table of a node U with nodeId prefix of 00. Accord-
ing to the splitting rule given above, there would be only
one bucket for prefix 01 with k entries, which doesn’t get
split any futher. In this case the node with prefix 010, which
is the closest node to node U could get dropped, leading
to an incomplete neighborhood. To avoid this, the authors
of Kademlia propose in these cases to also split buckets in
which node U’s nodeId isn’t contained, resulting in an ir-
regular subtree next to the bucket of U’s nodeId. This ex-
ception to the bucket splitting algorithm makes the protocol
more complex to implement, hence we propose an alterna-
tive approach in section 4.2.
3. Attacks on Kademlia
In this section we describe the taxonomy of possible at-
tacks on Kademlia’s key-based routing and data storage. In
addition we present a theoretical estimation of lookup suc-
cess rates in a Kademlia network with a fraction of adver-
sarial nodes.
3.1. Attacks on the underlying network
We assume, that the underlying network layer doesn’t
provide any security properties to the overlay layer. There-
fore an attacker could be able to overhear or modify ar-
bitrary data packets. Furthermore we presume nodes can
spoof IP addresses and there is no authentication of data
packets in the underlay. Consequently, attacks on the un-
derlay can lead to denial of service attacks on the overlay
layer.
3.2. Attacks on overlay routing
Besides the attacks in the underlay there are many at-
tacks which address the influence on overlay routing. This
section summarizes the currently known attacks on overlay
stability.
Eclipse attack: This attack tries to place adversarial
nodes in the network in a way that one or more nodes are
cut off from it, i.e. all messages are routed over at least
one adversarial node. This gives the attacker the control
over a part of the overlay network. Thus the Eclipse attack
can “hide” some nodes from the overlay network. It can be
prevented, first, if a node can not choose its nodeId freely
and secondly, when it is hard to influence the other nodes
routing table. Because Kademlia favors long-living nodes
in its k-buckets and nodes are only added, if a bucket is not
already full, the latter is easy to achieve as soon as the net-
work has bootstrapped.
Sybil attack: In completely decentralized systems there
is no instance that controls the quantity of nodeIds an at-
tacker can obtain. Thus an attacker can join the network
with lots of nodeIds until he controls a fraction m of all
nodes in the network. Douceur [8] proved that this attack
can not be prevented but only impeded. In centralized sys-
tems one might let nodes pay monetarily to join the net-
work and bind this node to a particular natural person. In
completely decentralized systems the only way a node can
“pay” only with system resources (bandwidth, CPU power
etc.) for authorization.
Churn attack: If the attacker owns some nodes he may
induce high churn in the network until the network stabi-
lization fails. Since a Kademlia node is advised to keep
long-living contacts in its routing table, this attack does not
have a great impact on the Kademlia overlay topology.
Adversarial routing: Since a node is simply removed
from a routing table when it neither responds with routing
information nor routes any packet, the only way of influ-
encing the networks’ routing is to return adversarial rout-
ing information. For example an adversarial node might
just return other collaborating nodes which are closer to the
queried key. This way an adversarial node routes a packet
into its subnet of collaborators and neither the queried nor
the closest node for a given key would be found. This can be
prevented by using a lookup algorithm that considers mul-
tiple disjoint paths. The lookup succeeds when one path is
free of adversarial nodes. Therefore, suppose m is the frac-
tion of adversarial nodes, d the number of disjoint paths,
(h
i
) the path length distribution, then the probability that a
lookup succeeds is given by:
P
K
:
=
|(h
x
)|
X
i=1
h
i
·
1
1 (1 m)
i
d

This shows that particularly with regard to a moderate
number of adversarial nodes, lookup algorithms can signif-
icantly benefit from disjoint paths as well as a low average
path length.
3.3. Other attacks
Denial-of-Service: A adversarial may try to suborn a vic-
tim to consume all its resources, i.e. memory, bandwidth,
computational power. Thus the protocol needs to have
mechanisms to allocate resources in a secure way. Physi-
cal attacks, such as jamming or side-band attacks are not
considered in this paper.
Attacks on data storage: Key-based routing protocols
are commonly used as building blocks to realize a dis-
tributed hash table (DHT) for data storage. To make it more
difficult for adversarial nodes to modify stored data items,
the same data item is replicated on a number of neighboring
nodes. Although attacks on data storage are not regarded
in this paper, the key-based routing layer has to provide a
secure neighborhood to a given key.
4. Design
In this section we propose a practicable secure Kademlia
protocol. In section 4.1 we introduce a method to assign
nodeIds in a secure way. Further a reliable sibling broadcast
is proposed, which is needed to store replicated data in a
secure way. Finally we explain how to secure the routing
table maintenance.
4.1. Secure nodeId assignment
As known from section 3.2, it should be hard for an at-
tacker to generate a large number of nodeIds (Sybil attack)
nor choose the nodeId freely (Eclipse attack). Furthermore
the nodeId should authenticate a node, i.e. no other node
should be able to steal or fake the nodeId. The latter can be
achieved with one of the two methods: Using a hash value
over IP address and port or by hashing a public key. The first
solution has a significant drawback because with dynami-
cally allocated IP addresses the nodeId will change subse-
quently. It is also not suitable to limit the number of gen-
erated nodeIds if you want to support networks with NAT
in which several nodes appear to have the same public IP
address. Finally there is no way of ensuring integrity of ex-
changed messages with those kind of nodeIds. This is why
we advocate to use the hash over a public key to generate
the nodeId. With this public key it is possible to sign mes-
sages exchanged by nodes. Due to computational overhead
we differentiate between two signature types:
Weak signature: The weak signature does not sign the
whole message. It is limited to IP address, port and
a timestamp. The timestamp specifies how long the
signature is valid. This prevents replay attacks if dy-
namic IP addresses are used. For synchronization is-
sues the timestamp may be chosen in a very coarse-
grained way. The weak signature is primarily used in
FIND NODE and PING messages where the integrity
of the whole message is dispensable.
Strong signature: The strong signature signs the full
content of a message. This ensures integrity of the
message and resilience against Man-in-the-Middle at-
tacks. Replay attacks can be prevented with nonces
inside the RPC messages.
Those two signature types can authenticate nodes and
ensure integrity of messages. We now need to impede the
Sybil and Eclipse attack. This is done by either using a
crypto puzzle or a signature from a central certificate au-
thority, so we need to combine the signature types above
with one of the following:
Supervised signature: If a signature’s public key addi-
tionally is signed by a trustworthy certificate authority,
this signature is called supervised signature. This sig-
nature is needed to impede a Sybil attack in the net-
work’s bootstrapping phase where only a few nodes
exist in the network. A network size estimation can be
used to decide if this signature is needed.
Crypto puzzle signature: In the absence of a trustwor-
thy authority we need to impede the Eclipse and Sybil
attack with a crypto puzzle. In [3] the use of crypto
puzzles for nodeId generation is rejected because they
cannot be used to entirely prevent an attack. But in
our opinion they are the most effective approach for
distributed nodeId generation in an completely decen-
tralised environment without trustworthy authority and
should therefore be used to make an attack as hard as
possible in such networks.
For this reason we introduce two crypto puzzles as
shown in figure 3. A static puzzle that impedes that
the nodeId can be chosen freely and a dynamic puz-
zle that ensures that it is complex to generate a huge
amount of nodeIds. H denotes a cryptographically se-
cure hash function, is the XOR operation, and the
c
i
denote crypto puzzle complexity. It is obvious that
an increase of c
1
decreases the space of possible pub-
lic keys therefore the size of the public key must be
increased subsequently. c
1
is considered as a constant.
Once the nodeId has been generated it stays fixed for
a certain value. c
2
can be modified or extended over
time when computational resources become cheaper.
Generate key pair
s
publ
, s
priv
?
Calculate
P
:
= H(H(s
publ
))
?
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Preceeding
c
1
bits in P zero?
No
Yes
?
'
&
$
%
NodeID
:
= H(s
publ
)
generated.
-
Calculate
NodeID
:
= H(s
publ
)
?
Choose a random X
?
Calculate
P
:
= H(NodeID X)
?
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Preceeding
c
2
bits in P zero?
No
Yes
?
'
&
$
%
dynamic cryptopuzzle
(NodeID , X) solved.
-
Figure 3. Static (left) and dynamic (right)
crypto puzzles for nodeId generation
If a node receives a signed message it can now first
validate its signature and then check if the crypto puzzles
were solved. Both operations have O(1) complexity for a
constant public key size while crypto puzzle creation has
O(2
c
1
+ 2
c
2
) complexity.
4.2. Reliable sibling broadcast
Siblings are nodes which are responsible for a certain
key-value pair that needs to be stored in a DHT. In the case
of Kademlia those key-value pairs are replicated over the
k closest nodes (we remember: k is the bucket size). In
this paper we want to consider this number of nodes inde-
pendently from the bucket size k and introduce the num-
ber of siblings as a parameter s. This makes sense because
the bucket size usually defines the redundancy of overlay
connectivity and not the number of replicas that need to be
stored by the DHT.
A common security problem is the reliability of sibling
information which arises when replicated information needs
to be stored in the DHT which uses a majority decision to
compensate for adversarial nodes. Since Kademlia’s origi-
nal protocol converges to a list of siblings, it is complicated
to analyze and prove the coherency of sibling information.
For this reason we introduce a sibling list of size η · s per
node, which ensures that each node knows at least s siblings
to a ID within the nodes’ siblings range with high probabil-
ity. We now have to determine η and prove that the con-
straints hold with high probability. This has already been
done by Gai and Viennot in their paper about the Broose
DHT [10] where a brotherhood list is constructed in the
same way. Since this proof unnecessarily uses Chernoff
c = 1.5 c = 2.0 c = 2.5 c = 3.0
s = 8 0.8950 · 10
1
1.0000 · 10
2
0.7786 · 10
3
0.4750 · 10
4
s = 16 0.3440 · 10
1
0.6600 · 10
3
0.5464 · 10
5
0.2590 · 10
7
s = 20 0.2187 · 10
1
0.1763 · 10
3
0.4791 · 10
6
0.6352 · 10
9
s = 32 0.5925 · 10
2
0.3617 · 10
5
0.3506 · 10
9
0.1022 · 10
13
Table 1. Probability for an incomplete sibling
list
bounds we review the proof here:
As mentioned before the XOR metric is unidirectional,
which means that for a fixed x and for each d
(x, y) exactly
one y exists. Consider two nodeIds chosen at random, then
the probability that one nodeId y is smaller than nodeId x
is given by
x
2
n
(n is the number of bits of the nodeId). Let
N be the number of nodes in the network, then the average
distance with the XOR metric between two adjacent nodes
is
2
n
N
. Now consider the distance
d
N
(µ) = µ ·
2
n
N
of two nodes with nodeId x and y, then the expected number
of nodes N (x, y) between x and y is E[N(x, y)] = µ and
the probability that a node is placed between x and y is
given by
µ
N
. Since nodeIds are randomly chosen the actual
number of nodes between x and y varies from the expected
value µ. Therefore the probability that there are less than s
nodes between x and y at distance d
N
(cs) is given by:
Pr[N(x, y) < s] =
s1
X
i=0
N
i
cs
N
i
1
cs
N
(Ni)
This probability is computeable for small s > 0, since the
following is imperative:
n
k
=
k
Y
i=1
n + 1 i
i
So the use of Chernoff bounds are not necessary. This
makes this solution more accurate and the following proba-
bilities depending on c and s with an upper bound network
size of N = 10
10
can be computed as shown in Table 1.
The remainder of the proof follows [10]. The interesting
fact is that it is shown that a value η (2·c) 5 is sufficient
to satisfy our needed constraints w.h.p.
Thus in S/Kademlia the routing table consists of a list of
n k-buckets holding nodes with a distance d with 2
i1
d < 2
i
, 0 i n and a sorted list of siblings of size
η · s. The special subtree handling introduced in section 2.1
can be omitted because of the newly introduced sibling list.
Because routing tables in Kademlia are implicitly refreshed
by incoming lookup requests and many of the nodes in our
sibling table would otherwise have to be stored in Kadem-
lia’s k-buckets, the additional communication overhead for
maintaining the sibling table is low.
4.3. Routing table maintenance
Kademlia uses a reactive approach to maintain routing
tables. Since the XOR metric ensures that all iterative
lookups converge along the same path, Kademlia can learn
about the existence of new nodes from incoming RPCs. To
secure routing table maintenance in S/Kademlia we cate-
gorize signaling messages to the following classes: Incom-
ing signed RPC requests, responses or unsigned messages.
Each of those messages contains the sender address. If the
message is weakly or strong signed, this address can not be
forged or associated with another nodeId (see section 4.1).
We call the sender address valid if the message is signed and
actively valid, if the sender address is valid and comes from
a RPC response. Kademlia uses those sender addresses to
maintain their routing tables.
Actively valid sender addresses are immediately added
to their corresponding bucket, when it is not full. Valid
sender addresses are only added to a bucket if the nodeId
prefix differs in an appropriate amount of bits χ (for ex-
ample χ > 32). This is necessary because an attacker can
easily generate nodeIds that share a prefix with the victims
nodeId and flood his buckets, because buckets close to the
own nodeId are only sparsely filled in Kademlia. Sender ad-
dresses that came from unsigned messages will be ignored.
If a message contains more information about other nodes,
then each of them can be added by invoking a ping RPC
on them. If a node already exists in the routing table it is
moved at the tail of the bucket.
4.4. Lookup over disjoint paths
In section 3.2 we have shown the importance of using
multiple disjoint paths to lookup keys in a network with ad-
versarial nodes. The original Kademlia lookup iteratively
queries α nodes with a FIND
NODE RPC for the closest k
nodes to the destination key. α is a system-wide redundancy
parameter such as 2. In each step the returned nodes from
previous RPCs are merged into a sorted list from which the
next α nodes are picked. A major drawback of this approach
is, that the lookup fails as soon as a single adversarial node
is queried.
We extended this algorithm to use d disjoint paths and
thus increase the lookup success ratio in a network with ad-
versarial nodes. The initiator starts a lookup by taking the
k closest nodes to the destination key from his local rout-
ing table and distributes them into d independent lookup
buckets. From there on the node continues with d paral-
lel lookups similar to the traditional Kademlia lookup. The
lookups are independent, except the important fact, that
each node is only used once during the whole lookup pro-
cess to ensure that the resulting paths are really disjoint. By
using the sibling list from section 4.2 the lookup doesn’t
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Fraction of successful node lookups
Fraction of adversarial nodes (N=10000, k=16, s=16)
d=8
d=4
d=2
d=1
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1 2 3 4 5
CDF
Path length (N=10000, k=16, s=16)
d=1
d=8
Figure 4. Fraction of successful node
lookups with a fixed bucket size k = 16
converge at a single node but terminates on d close-by
neighbors, which all know the complete s siblings for the
destination key. Hence a lookup is still successful even if
k 1 of the neighbors are adversarial.
5. Evaluation
In this section we evaluate S/Kademlia with OverSim
[2], a flexible framework for overlay simulation. We de-
scribe the simulation setup and procedure and finally the
results of our simulations.
5.1. Simulation assumptions
We simulate adversarial nodes with the following as-
sumptions: An adversarial node returns data that compro-
mises the network in a worst case scenario. So in the case
of a FIND
NODE RPC the worst behaviour would only re-
turn other collaborating nodes which are closer to the target
nodeId. The adversarial also harvest other existing valid
nodeIds in order to map them to a false transport addresses.
This is the worst case since other reactions like an empty
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Fraction of successful node lookups
Fraction of adversarial nodes (N=10000, s=16)
d=8, k=16
d=4, k=8
d=2, k=4
d=1, k=2
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1 2 3 4 5 6 7
CDF
Path length (N=10000, s=16)
d=1, k=2
d=8, k=16
Figure 5. Fraction of successful node
lookups with an adaptive bucket size k = 2d
result or invalid data can be detected and the node would be
removed from the network, i.e. the node would be consid-
ered as a stale contact.
On the other hand we assume that well behaving nodes
return adversarial node information in an equally distributed
manner. This assumption is confirmed in section 4 because
it is nearly impossible for an adversarial node to influence
other nodes’ routing tables.
With these assumptions we expect that a lookup of a
node or siblings is successful if no adversarial node is on
one path to the responsible node. In the case of a parallel
lookup on multiple paths we simply stop pursuing each path
that hit an adversarial node and the lookup is only consid-
ered successful, if one path is free of any adversarial nodes.
5.2. Simulation procedure
To keep the simulation efficient we first create a static
Kademlia overlay network with N nodes that is fully stabi-
lized. Then we continue by processing N node lookups and
evaluate the fraction of successful queries. This process is
repeated with increasing adversarial nodes by 5% until 90%
of the nodes are adversarial. Since we do not evaluate churn
all simulations are done with a parameter of α = 1 and
we assume that the network stays in a stable state after the
bootstrapping phase, but under the influence of adversarial
nodes.
5.3. Results
We simulated two setups with a network size of N =
10000 nodes, s = 16 siblings using d {1, 2, 4, 8} dis-
joint paths. In Figure 4 we show the fraction of success-
ful lookups dependant on the number of adversarial nodes
for a fixed bucket size of k = 16. For d = 1 the lookup
process is similar to a standard Kademlia lookup. The fig-
ure clearly shows that by increasing the number of paral-
lel disjoint paths d the fraction of successful lookups can
be considerably improved. In this case the communication
overhead increases linearly with d. We also see that with
k = 16 there is enough redundancy in the k-buckets to ac-
tually create d disjoint paths.
In the second setup we adapted k = 2 · d to the num-
ber of disjoint paths to keep a minimum of redundancy in
the routing tables and consequently reduce communication
overhead. The results in figure 5 show, that a smaller k
leads to a smaller fraction of successful lookups compared
to figure 4. The reason for this is the increased average path
length due to the smaller routing table as shown in the path
length distribution diagram.
We conclude that d = 4..8 with a k = 8..16 is a good
choice for S/Kademlia. Higher values of d and k seem not
worth the additional communication costs. Larger values
for k would also increase the probability that a large fraction
of buckets are not full for a long time. This unnecessarily
makes the routing table more vulnerable to Eclipse attacks.
Since we present simulations with N = 10000 nodes
only, one might argue that this is a rather small number of
nodes and not comparable to huge networks. In fact the
path length highly corelates with the fraction of success-
ful lookups. On the other hand the network topology can
be easily tuned to have a smaller diameter and therefore a
shorter average path length. This is usually done by con-
sidering multiple bits b of the nodeId in each step. So the
network can be tuned to the level of security needed in dif-
ferent scenarios.
6. Related Work
Castro et al. [3] study attacks on routing of messages in
structured peer-to-peer overlays. They propose several de-
fenses to secure the join process, routing table maintenance
and message forwarding. The secure assignment of nodeIds
is delegated to a central trusted certification authority.
Sit and Morris [16] present a categorization of attacks
against peer-to-peer distributed hash tables on the basis of
Chord, CAN and Pastry. They state that an important step
to defend these attacks is detection by defining verifiable
system invariants. For example nodes can detect incorrect
lookup routing by verifying that the lookup gets “closer” to
the destination key.
Srivatsa and Liu [17] investigate three security threats in
DHT-based P2P systems. First they present an attack on
the routing scheme, in which a single adversarial node can
block all lookup requests in the absence of alternate paths.
Therefore they highlight the importance of several alternate
optimal paths in conjunction with the feasibility to detect
incorrect lookup results. Furthermore they present an at-
tack on the data placement scheme and show that replica-
tion alone is not sufficient to tolerate attacks by adversar-
ial nodes, but has to be combined with cryptographic tech-
niques to be effective. Finally they show that the nodeId
selection process has to be restricted to prevent adversarial
nodes from corrupting specific data items.
Awerbuch and Scheideler [1] present a theoretical DHT
which is provably robust against adversarial join-leave as
well as insert-lookup attacks. The design of the DHT is
high level and it is an open question how hard it would be
to transform their ideas into a practicable protocol. Another
DHT which can provably deal with join-leave attacks is S-
Chord [9], though it is limited to a linear number of adver-
sarial join requests.
Cerri et al. [4] focus on attacks that arise from the unlim-
ited choice of nodeIds and exemplify their findings with the
Kademlia protocol. They propose to limit free nodeId selec-
tion by coupling IP address and port to the nodeId by a hash
function. To make it harder for adversarial nodes to attack
specific data items, they propose that data items should be
stored at a temporary key, which is regularly rotated. This
is done by hashing the data item’s key with some temporal
information to compute the temporary key,
There are several papers in which countermeasures
against Sybil attacks are proposed: In [13] Rowaihy et al.
present an admission control system for structured peer-to-
peer systems. The systems constructs a tree-like hierarchy
of cooperative admission control nodes, from which a join-
ing node has to gain admission. Another approach [7] to
limit Sybil attacks is to store the IP addresses of partici-
pating nodes in a secure DHT. In this way the number of
nodeIds per IP address can be limited by querying the DHT
if a new node wants to join.
Singh et al. [15] study the impacts of Eclipse attacks on
structured overlay networks and propose to defend against
this attack by letting nodes audit each others connectivity.
The idea is, that a node mounting an Eclipse attack has a
node degree higher than average.
Nielson et al. [12] regard the class of rational attacks.
They assume that a large fraction of nodes in a peer-to-peer
system are selfish and try to maximize their consumption of
system ressources while minimizing the use of their own.
7. Conclusion
In this paper we presented a secure key-based routing
protocol based on Kademlia. Although the elegant routing
table maintenance makes Kademlia already insusceptible to
some attacks, we have shown that there are several vulnera-
bility that make it easy for adversarial nodes to gain control
of the network.
We propose several practicable solutions to make
Kademlia more resilient. First we suggest to limit free
nodeId generation by using crypto puzzles in combination
with public key cryptography. Furthermore we extend the
Kademlia routing table by a sibling list. This reduces the
complexity of the bucket splitting algorithm and allows a
DHT to store data in a safe replicated way. Finally we pro-
pose a lookup algorithm which uses multiple disjoint paths
to increase the lookup success ratio.
The evaluation of S/Kademlia in the simulation frame-
work OverSim has shown, that even with 20% of adversar-
ial nodes still 99% of all lookups are successful if disjoint
paths are used. We believe that the proposed extensions to
the Kademlia protocol are practical and could be used to
easily secure existing Kademlia networks.
Acknowledgment
This research was supported by the German Federal
Ministry of Education and Research as part of the ScaleNet
project 01BU567 and by the BW-FIT support program as
part of the SpoVNet project.
References
[1] B. Awerbuch and C. Scheideler. Towards a scalable and ro-
bust dht. In SPAA ’06: Proceedings of the eighteenth annual
ACM symposium on Parallelism in algorithms and architec-
tures, pages 318–327, New York, NY, USA, 2006. ACM
Press.
[2] I. Baumgart, B. Heep, and S. Krause. OverSim: A flexible
overlay network simulation framework. In Proceedings of
10th IEEE Global Internet Symposium (GI ’07) in conjunc-
tion with IEEE INFOCOM 2007, Anchorage, AK, USA, May
2007.
[3] M. Castro, P. Druschel, A. Ganesh, A. Rowstron, and D. S.
Wallach. Secure routing for structured peer-to-peer overlay
networks. SIGOPS Oper. Syst. Rev., 36(SI):299–314, 2002.
[4] D. Cerri, A. Ghioni, S. Paraboschi, and S. Tiraboschi. Id
mapping attacks in p2p networks. In Global Telecommuni-
cations Conference, GLOBECOM’05. IEEE, 2005.
[5] S. A. Crosby and D. S. Wallach. An analysis of two bittor-
rent distributed trackers. Presentation at the South Central
Information Security Symposium (SCISS ’06), 2006.
[6] F. Dabek, B. Zhao, P. Druschel, J. Kubiatowicz, and I. Sto-
ica. Towards a common api for structured peer-to-peer
overlays. In Proceedings of the 2nd International Work-
shop on Peer-to-Peer Systems (IPTPS ’03), volume Volume
2735/2003, pages 33–44, 2003.
[7] J. Dinger and H. Hartenstein. Defending the sybil attack
in p2p networks: Taxonomy, challenges, and a proposal for
self-registration. ares, 0:756–763, 2006.
[8] J. R. Douceur. The sybil attack. In IPTPS ’02: Revised
Papers from the First International Workshop on Peer-to-
Peer Systems, pages 251–260, London, UK, 2002. Springer-
Verlag.
[9] A. Fiat, J. Saia, and M. Young. Making chord robust to
byzantine attacks. In ESA, Lecture Notes in Computer Sci-
ence, pages 803–814. Springer, 2005.
[10] A.-T. Gai and L. Viennot. Broose: a practical distributed
hashtable based on the de-bruijn topology. In Fourth In-
ternational Conference on Peer-to-Peer Computing, 2004,
pages 167–174, aug 2004.
[11] P. Maymounkov and D. Mazires. Kademlia: A peer-to-peer
information system based on the xor metric. In Peer-to-Peer
Systems: First InternationalWorkshop, IPTPS 2002 Cam-
bridge, MA, USA, March 7-8, 2002. Revised Papers, volume
Volume 2429/2002, pages 53–65, 2002.
[12] S. Nielson, S. Crosby, and D. Wallach. A taxonomy of ratio-
nal attacks. In 4th International Workshop on Peer-To-Peer
Systems, Ithaca, New York, USA, February 2005.
[13] H. Rowaihy, W. Enck, P. Mcdaniel, and T. La-Porta. Limit-
ing sybil attacks in structured peer-to-peer networks. Tech-
nical Report NAS-TR-0017-2005, Network and Security
Research Center, Department of Computer Science and En-
gineering, Pennsylvania StateUniversity, University Park,
PA, USA, 2005.
[14] A. Rowstron and P. Druschel. Pastry: Scalable, decentral-
ized object location, and routing for large-scale peer-to-peer
systems. In Middleware 2001 : IFIP/ACM International
Conference on Distributed Systems Platforms Heidelberg,
Germany, November 12-16, 2001. Proceedings, volume Vol-
ume 2218/2001, pages 329+, 2001.
[15] A. Singh, T.-W. J. Ngan, P. Druschel, and D. Wallach.
Eclipse attacks on overlay networks: Threats and defenses.
In In Proceedings of INFOCOM 06, Barcelona, Spain. April
2006, 2006.
[16] E. Sit and R. Morris. Security considerations for peer-to-
peer distributed hash tables. In IPTPS ’02: Revised Papers
from the First International Workshop on Peer-to-Peer Sys-
tems, pages 261–269, London, UK, 2002. Springer-Verlag.
[17] M. Srivatsa and L. Liu. Vulnerabilities and security threats in
structured overlay networks: A quantitative analysis. In AC-
SAC ’04: Proceedings of the 20th Annual Computer Secu-
rity Applications Conference (ACSAC’04), pages 252–261,
Washington, DC, USA, 2004. IEEE Computer Society.
[18] I. Stoica, R. Morris, D. Liben-Nowell, D. Karger,
M. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: a scal-
able peer-to-peer lookup protocol for internet applications.
IEEE/ACM Transactions on Networking, 11(1):17–32, feb
2003.
... Lookups in Kademlia are effective when nodes collaborate and follow the protocol and experience crash failures at most. However, things change in light of deliberate adversarial behavior: similar to other DHTs, Kademlia is vulnerable to a variety of attacks due to the deterministic lookup [29,115]. Most importantly, Kademlia is susceptible to Sybil (and related) attacks, which we will cover in Section 2.4.1 and Section 2.5. ...
... To this end, each instruction on the EVM has an associated cost called "gas". Hence, to execute a smart contract, the initiator of the respective transaction has to sufficient funds to cover the gas costs for the entirety of every executed instruction within the smart contract 29 29 I.e., the costs are sent with the transaction initiating the execution. ...
... Normal users tend to have less resources than an adversary (which, even it is not true, is a reasonable assumption for a threat model). Hence, any precautions regarding the creation of identities (e.g., PoW) or requirements as to uptime, bandwidth and other computing resources should not be prohibitive for normal users 37 37 For example, in S/Kademlia [29] the ID creation is tied to a PoW which should not take too long for users on weak devices. However, at the same time the PoW is hardly a challenge for a sufficiently equipped adversary. ...
Thesis
Die Erfindung von Bitcoin hat ein großes Interesse an dezentralen Systemen geweckt. Eine häufige Zuschreibung an dezentrale Systeme ist dabei, dass eine Dezentralisierung automatisch zu einer höheren Sicherheit und Widerstandsfähigkeit gegenüber Angriffen führt. Diese Dissertation widmet sich dieser Zuschreibung, indem untersucht wird, ob dezentralisierte Anwendungen tatsächlich so robust sind. Dafür werden exemplarisch drei Systeme untersucht, die häufig als Komponenten in komplexen Blockchain-Anwendungen benutzt werden: Ethereum als Infrastruktur, IPFS zur verteilten Datenspeicherung und schließlich "Stablecoins" als Tokens mit Wertstabilität. Die Sicherheit und Robustheit dieser einzelnen Komponenten bestimmt maßgeblich die Sicherheit des Gesamtsystems in dem sie verwendet werden; darüber hinaus erlaubt der Fokus auf Komponenten Schlussfolgerungen über individuelle Anwendungen hinaus. Für die entsprechende Analyse bedient sich diese Arbeit einer empirisch motivierten, meist Netzwerklayer-basierten Perspektive -- angereichert mit einer ökonomischen im Kontext von Wertstabilen Tokens. Dieses empirische Verständnis ermöglicht es Aussagen über die inhärenten Eigenschaften der studierten Systeme zu treffen. Ein zentrales Ergebnis dieser Arbeit ist die Entdeckung und Demonstration einer "Eclipse-Attack" auf das Ethereum Overlay. Mittels eines solchen Angriffs kann ein Angreifer die Verbreitung von Transaktionen und Blöcken behindern und Netzwerkteilnehmer aus dem Overlay ausschließen. Des weiteren wird das IPFS-Netzwerk umfassend analysiert und kartografiert mithilfe (1) systematischer Crawls der DHT sowie (2) des Mitschneidens von Anfragenachrichten für Daten. Erkenntlich wird hierbei, dass die hybride Overlay-Struktur von IPFS Segen und Fluch zugleich ist, da das Gesamtsystem zwar robust gegen Angriffe ist, gleichzeitig aber eine umfassende Überwachung der Netzwerkteilnehmer ermöglicht wird. Im Rahmen der wertstabilen Kryptowährungen wird ein Klassifikations-Framework vorgestellt und auf aktuelle Entwicklungen im Gebiet der "Stablecoins" angewandt. Mit diesem Framework wird somit (1) der aktuelle Zustand der Stablecoin-Landschaft sortiert und (2) ein Mittel zur Verfügung gestellt, um auch zukünftige Designs einzuordnen und zu verstehen.
... This is distinct from the majority of other file systems. Distributed Hash Table (DHT) addressing is based on each node having its own copy of a ledger inspired by [11][12][13], which specifies the locations from which data chunks may be retrieved. ...
Article
Full-text available
Big data has reignited research interest in machine learning. Massive quantities of data are being generated regularly as a consequence of the development in the Internet, social networks, and online sensors. Particularly deep neural networks benefited greatly from this unprecedented data availability. Large models with millions of parameters are becoming common, and big data has been proved to be essential for their effective training. The scientific community has come up with a number of methods to create more accurate models, but most of these methods require high-performance infrastructure. There is also the issue of privacy, since anyone using leased processing power from a remote data center is putting their data in the hands of a third party. Studies on decentralized and non-binding methods among individuals with commodity hardware are scarce, though. Our work on LEARNAE seeks to respond to this challenge by creating a totally distributed and fault-tolerant framework of artificial neural network training. In our recent work, we demonstrated a method for incentivizing peers to participate to collaborative process, even if they are not interested in the neural network produced. For this, LEARNAE included a subsystem that rewards participants proportionately to their contribution using digital assets. In this article we add another important piece to the puzzle: A decentralized mechanism to mitigate the effect of bad actors, such as nodes that attempt to exploit LEARNAE’s network power without following the established rewarding rules. This is achieved by a novel reward mechanism, which takes into account the overall contribution of each node to the entire swarm. The network collaboratively builds a contribution profile for every participant, and the final rewards are dictated by these profiles. Taking for granted that the majority of the peers are benevolent, the whole process is tamper-proof, since it is implemented on blockchain and thus is protected by distributed consensus. All codebase is structured as a decentralized autonomous organization, which allows LEARNAE to embed new features like digital asset locking, proposal submitting, and voting.
... Only branches having these values can retrieve the file. Validating and solving the cryptographic puzzles have O(1) complexity [15]. ...
Article
Full-text available
Groundwater overuse in different domains will eventually lead to global freshwater scarcity. To meet the anticipated demands, many governments worldwide are employing innovative and traditional techniques for forecasting groundwater availability by conducting research and studies. One challenging step for this type of study is collecting groundwater data from different sites and securely sending it to the nearby edges without exposure to hacking and data tampering. In the current paper, we send raw data formats from the Internet of Things to the Distributed Data Storage (DDS) and Blockchain (BC) edges. We use a distributed and decentralized architecture to store the statistics, perform double hashing, and implement access control through smart contracts. This work demonstrates a modern and innovative approach combining DDS and BC technologies to overcome traditional data sharing, and centralized storage, while addressing blockchain limitations. We have shown performance improvements with increased data quality and integrity.
Article
The Industrial Internet of Things (IIoT) is the essential component of Industry 4.0. Blockchain is a promising technology for secure data sharing and trustable cooperation between IIoT devices. However, the ever-growing transaction records make it difficult for the storage-limited IIoT devices to join the blockchain network. In this article, an adaptive compression scheme is proposed to decrease the storage volume on each node. In the scheme, the block body is compressed by representing the included transactions as their remainders stored in the distributed nodes. The original transaction could be recovered based on the Chinese remainder theorem. In particular, each node adapts its compression ratio according to its storage resource. The nodes storing more data have advantages in transaction recovery, introducing an incentive mechanism for efficient storage utilization. The theoretical analysis and simulation results show that the proposed scheme can achieve a high compression ratio with good service availability. The proposed scheme dramatically lowers the threshold for IIoT devices to join the blockchain network, which is important for the large-scale application of blockchain in Industry 4.0.
Article
Full-text available
Structured peer-to-peer networks are highly scal-able, efficient, and reliable. These characteristics are achieved by deterministically replicating and recalling content within a widely distributed and decentralized network. One practical limitation of these networks is that they are frequently subject to Sybil attacks: malicious parties can compromise the network by generating and controlling large numbers of shadow identities. In this paper, we propose an admission control system that mitigates Sybil attacks by adaptively constructing a hierarchy of cooperative admission control nodes. Implemented by the peer-to-peer nodes, the admission control system vets joining nodes via client puzzles. A node wishing to join the network is serially challenged by the nodes from a leaf to the root of the hierarchy. Nodes completing the puzzles of all nodes in the chain are provided a cryptographic proof of the vetted identity. In this way, we exploit the structure of hierarchy to distribute load and increase resilience to targeted attacks on the admission control system. We evaluate the security, fairness, and efficiency of our scheme analytically and via simulation. Centrally, we show that an adversary must perform days or weeks of effort to obtain even a small percentage of nodes in small peer-to-peer networks, and that this effort increases linearly with the size of the network. We further show that we can place a ceiling on the number of IDs any adversary may obtain by requiring periodic reassertion of the an IDs continued validity. Finally, we show that participation in the admission control system does not interfere with a node's use of the peer-to-peer system: the loads placed on the nodes participating in admission control are vanishingly small.
Article
Full-text available
A fundamental problem that confronts peer-to-peer applications is the efficient location of the node that stores a desired data item. This paper presents Chord, a distributed lookup protocol that addresses this problem. Chord provides support for just one operation: given a key, it maps the key onto a node. Data location can be easily implemented on top of Chord by associating a key with each data item, and storing the key/data pair at the node to which the key maps. Chord adapts efficiently as nodes join and leave the system, and can answer queries even if the system is continuously changing. Results from theoretical analysis and simulations show that Chord is scalable: Communication cost and the state maintained by each node scale logarithmically with the number of Chord nodes.
Conference Paper
Full-text available
The problem of scalable and robust distributed data storage has recently attracted a lot of attention. A common approach in the area of peer-to-peer systems has been to use a distributed hash table (or DHT). DHTs are based on the concept of virtual space. Peers and data items are mapped to points in that space, and local-control rules are used to decide, based on these virtual locations, how to interconnect the peers and how to map the data to the peers. DHTs are known to be highly scalable and easy to update as peers enter and leave the system. It is relatively easy to extend the DHT concept so that a constant fraction of faulty peers can be handled without any problems, but handling adversarial peers is very challenging. The biggest threats appear to be join-leave attacks (i.e., adaptive join-leave behavior by the adversarial peers) and attacks on the data management level (i.e., adaptive insert and lookup attacks by the adversarial peers) against which no provably robust mechanisms are known so far. Join-leave attacks, for example, may be used to isolate honest peers in the system, and attacks on the data management level may be used to create a high load-imbalance, seriously degrading the correctness and scalability of the system. We show, on a high level, that both of these threats can be handled in a scalable manner, even if a constant fraction of the peers in the system is adversarial, demonstrating that open systems for scalable distributed data storage that are robust against even massive adversarial behavior are feasible.
Conference Paper
Full-text available
Chord is a distributed hash table (DHT) that requires only O(log n) links per node and performs searches with latency and message cost O(log n), where n is the number of peers in the network. Chord assumes all nodes behave according to protocol. We give a variant of Chord which is robust with high probability for any time period during which: 1) there are always at least z total peers in the network for some integer z; 2) there are never more than (1/4–ε)z Byzantine peers in the network for a fixed ε > 0; and 3) the number of peer insertion and deletion events is no more than z k for some tunable parameter k. We assume there is an adversary controlling the Byzantine peers and that the IP-addresses of all the Byzantine peers and the locations where they join the network are carefully selected by this adversary. Our notion of robustness is rather strong in that we not only guarantee that searches can be performed but also that we can enforce any set of “proper behavior” such as contributing new material, etc. In comparison to Chord, the resources required by this new variant are only a polylogarithmic factor greater in communication, messaging, and linking costs.
Conference Paper
Full-text available
A fundamental problem in studying peer-to- peer networks is the evaluation of new protocols. This paper presents OverSim, a flexible overlay network simulation framework based on OMNeT++. It was designed to fulfill a number of requirements that have been partially neglected by existing simulation frameworks. OverSim includes several structured and unstructured peer-to-peer protocols like Chord, Kademlia and Gia. These protocol implementations can be used for both simulation as well as real world networks. To facilitate the implementation of additional protocols and to make them more comparable OverSim provides several common functions like a generic lookup mechanism for structured peer-to-peer networks and an RPC interface. Several exchangeable underlay network models allow to simulate complex heterogeneous underlay networks as well as simplified networks for large- scale simulations. We show that with OverSim simulations of overlay networks with up to 100,000 nodes are feasible.
Conference Paper
The problem of scalable and robust distributed data storage has recently attracted a lot of attention. A common approach in the area of peer-to-peer systems has been to use a distributed hash table (or DHT). DHTs are based on the concept of virtual space. Peers and data items are mapped to points in that space, and local-control rules are used to decide, based on these virtual locations, how to interconnect the peers and how to map the data to the peers.DHTs are known to be highly scalable and easy to update as peers enter and leave the system. It is relatively easy to extend the DHT concept so that a constant fraction of faulty peers can be handled without any problems, but handling adversarial peers is very challenging. The biggest threats appear to be join-leave attacks (i.e., adaptive join-leave behavior by the adversarial peers) and attacks on the data management level (i.e., adaptive insert and lookup attacks by the adversarial peers) against which no provably robust mechanisms are known so far. Join-leave attacks, for example, may be used to isolate honest peers in the system, and attacks on the data management level may be used to create a high load-imbalance, seriously degrading the correctness and scalability of the system.We show, on a high level, that both of these threats can be handled in a scalable manner, even if a constant fraction of the peers in the system is adversarial, demonstrating that open systems for scalable distributed data storage that are robust against even massive adversarial behavior are feasible.
Conference Paper
This paper presents the design and evaluation of Pastry, a scalable, distributed object location and routing substrate for wide-area peer-to-peer applications. Pastry performs application-level routing and object location in a potentially very large overlay network of nodes connected via the Internet. It can be used to support a variety of peer-to-peer applications, including global data storage, data sharing, group communication and naming. Each node in the Pastry network has a unique identifier (nodeId). When presented with a message and a key, a Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key, among all currently live Pastry nodes. Each Pastry node keeps track of its immediate neighbors in the nodeId space, and notifies applications of new node arrivals, node failures and recoveries. Pastry takes into account network locality; it seeks to minimize the distance messages travel, according to a to scalar proximity metric like the number of IP routing hops. Pastry is completely decentralized, scalable, and self-organizing; it automatically adapts to the arrival, departure and failure of nodes. Experimental results obtained with a prototype implementation on an emulated network of up to 100,000 nodes confirm Pastry’s scalability and efficiency, its ability to self-organize and adapt to node failures, and its good network locality properties.
Conference Paper
Overlay networks are widely used to deploy func- tionality at edge nodes without changing network routers. Each node in an overlay network maintains connections with a number of peers, forming a graph upon which a distributed application or service is implemented. In an "Eclipse" attack, a set of malicious, colluding overlay nodes arranges for a correct node to peer only with members of the coalition. If successful, the attacker can mediate most or all communication to and from the victim. Furthermore, by supplying biased neighbor information during normal overlay maintenance, a modest number of malicious nodes can eclipse a large number of correct victim nodes. This paper studies the impact of Eclipse attacks on structured overlays and shows the limitations of known defenses. We then present the design, implementation, and evaluation of a new defense, in which nodes anonymously audit each other's connectivity. The key observation is that a node that mounts an Eclipse attack must have a higher than average node degree. We show that enforcing a node degree limit by auditing is an effective defense against Eclipse attacks. Furthermore, unlike most existing defenses, our defense leaves flexibility in the selection of neighboring nodes, thus permitting important overlay optimizations like proximity neighbor selection (PNS).
Conference Paper
For peer-to-peer services to be effective, par- ticipating nodes must cooperate, but in most scenarios a node represents a self-interested party and cooperation can neither be expected nor enforced. A reasonable assumption is that a large fraction of p2p nodes are rational and will attempt to maximize their consumption of system resources while minimizing the use of their own. If such behavior violates system policy then it constitutes an attack. In this paper we identify and create a taxonomy for rational attacks and then identify corresponding solutions if they exist. The most effective solutions directly incentivize c oop- erative behavior, but when this is not feasible the common alternative is to incentivize evidence of cooperation inst ead.