ArticlePDF Available

# Guard Placement Attacks on Path Selection Algorithms for Tor

Authors:

## Abstract and Figures

The popularity of Tor has made it an attractive target for a variety of deanonymization and fingerprinting attacks. Location-based path selection algorithms have been proposed as a countermeasure to defend against such attacks. However, adversaries can exploit the location-awareness of these algorithms by strategically placing relays in locations that increase their chances of being selected as a client’s guard. Being chosen as a guard facilitates website fingerprinting and traffic correlation attacks over extended time periods. In this work, we rigorously define and analyze the guard placement attack . We present novel guard placement attacks and show that three state-of-the-art path selection algorithms—Counter-RAPTOR, DeNASA, and LASTor—are vulnerable to these attacks, overcoming defenses considered by all three systems. For instance, in one attack, we show that an adversary contributing only 0.216% of Tor’s total bandwidth can attain an average selection probability of 18.22%, 84× higher than what it would be under Tor currently. Our findings indicate that existing location-based path selection algorithms allow guards to achieve disproportionately high selection probabilities relative to the cost required to run the guard. Finally, we propose and evaluate a generic defense mechanism that provably defends any path selection algorithm against guard placement attacks. We run our defense mechanism on each of the three path selection algorithms, and find that our mechanism significantly enhances the security of these algorithms against guard placement attacks with only minimal impact to the goals or performance of the original algorithms.
Content may be subject to copyright.
Proceedings on Privacy Enhancing Technologies ; 2019 (4):272–291
Gerry Wan*, Aaron Johnson, Ryan Wails, Sameer Wagh, and Prateek Mittal
Guard Placement Attacks on Path Selection
Algorithms for Tor
Abstract: The popularity of Tor has made it an attrac-
tive target for a variety of deanonymization and ﬁn-
gerprinting attacks. Location-based path selection al-
gorithms have been proposed as a countermeasure to
defend against such attacks. However, adversaries can
exploit the location-awareness of these algorithms by
strategically placing relays in locations that increase
their chances of being selected as a client’s guard. Being
chosen as a guard facilitates website ﬁngerprinting and
traﬃc correlation attacks over extended time periods.
In this work, we rigorously deﬁne and analyze the guard
placement attack. We present novel guard placement
attacks and show that three state-of-the-art path se-
lection algorithms—Counter-RAPTOR, DeNASA, and
LASTor—are vulnerable to these attacks, overcoming
defenses considered by all three systems. For instance,
in one attack, we show that an adversary contributing
only 0.216% of Tor’s total bandwidth can attain an av-
erage selection probability of 18.22%, 84×higher than
what it would be under Tor currently. Our ﬁndings in-
dicate that existing location-based path selection algo-
rithms allow guards to achieve disproportionately high
selection probabilities relative to the cost required to run
the guard. Finally, we propose and evaluate a generic
defense mechanism that provably defends any path se-
lection algorithm against guard placement attacks. We
run our defense mechanism on each of the three path se-
lection algorithms, and ﬁnd that our mechanism signiﬁ-
cantly enhances the security of these algorithms against
guard placement attacks with only minimal impact to
the goals or performance of the original algorithms.
DOI 10.2478/popets-2019-0069
Received 2019-02-28; revised 2019-06-15; accepted 2019-06-16.
*Corresponding Author: Gerry Wan: Princeton Univer-
sity, E-mail: gwan@princeton.edu
Aaron Johnson: U.S. Naval Research Laboratory, E-mail:
aaron.m.johnson@nrl.navy.mil
Ryan Wails: U.S. Naval Research Laboratory, E-mail:
ryan.wails@nrl.navy.mil
Sameer Wagh: Princeton University, E-mail:
swagh@princeton.edu
Prateek Mittal: Princeton University, E-mail: pmit-
tal@princeton.edu
1 Introduction
Anonymous communication systems aim to protect the
privacy of Internet users from untrusted entities. These
systems hide the identities of users and prevent third
parties from linking communication partners on the In-
ternet. Today, Tor [10] is the most widely used anony-
mous communication system, serving millions of busi-
nesses, law-enforcement agencies, journalists, whistle-
blowers, and ordinary citizens from around the world.
As of August 2018, the Tor network is comprised of over
6,000 volunteer-run relays and carries terabytes of traﬃc
every day [47]. Each Tor client uses a public consensus
to choose which relays to send its traﬃc through. The
client uses onion routing [17] to send traﬃc, in which a
sequence of relays is selected, a circuit is built through
that sequence, and then encrypted data is forwarded
along the circuit. This process protects user anonymity
by preventing clients from being linked to their destina-
tions on the Internet.
As a popular anonymity system, Tor is an attrac-
tive target for adversaries wishing to deanonymize users.
Researchers have found that Tor is vulnerable to adver-
saries with visibility into Internet traﬃc [15, 21, 26, 27].
Passive attackers can use packet sizes and packet tim-
ings to correlate traﬃc on diﬀerent segments of the Tor
circuit. This ability can be used to associate traﬃc origi-
nating from the client with traﬃc ﬂowing to the destina-
tion and thereby deanonymize the user [10, 13, 28, 29].
Website ﬁngerprinting attacks can allow adversaries to
recognize the encrypted traﬃc patterns of a client as
those of speciﬁc websites [12, 33, 35, 38, 53]. Active at-
tackers can manipulate the underlying Internet topology
to place themselves along the path of Tor traﬃc [41]. To
mitigate these threats, a number of systems have been
developed that modify Tor’s path selection algorithm
to take into account the Internet locations of the re-
lays, such as Counter-RAPTOR [40], DeNASA [5], and
Astoria [30]. Other such path selection algorithms take
into account the geographic location of relays, such as
LASTor [1], which is designed to improve Tor’s latency.
However, the location-awareness of these algorithms
presents a new attack vector, in which malicious re-
lays can be strategically placed in locations that make
Guard Placement Attacks 273
them more likely to be selected, leading to easier user
deanonymization. To understand the threat of these
guard placement attacks, we investigate their eﬀective-
ness on several proposed Tor path selection algorithms.
Moreover, we precisely deﬁne the attack and give a
generic defense algorithm that can be applied to all path
selection algorithms. To the best of our knowledge, we
are the ﬁrst to systematically study guard placement
attacks and the ﬁrst to develop a framework for quan-
tifying and mitigating the threat. Our contributions in
this work are:
(A) Theoretical formalization: We formalize a general
framework for analyzing security against guard
placement attacks. This includes deﬁning a formal
threat model with an explicit adversary, giving un-
targeted and targeted attack versions, quantifying
the adversary’s success via a metric, and providing
a deﬁnition that guarantees security against guard
placement attacks.
(B) Attack evaluations: We demonstrate the threat by
running our attacks on three state-of-the-art Tor
path selection algorithms: Counter-RAPTOR [40],
DeNASA [5], and LASTor [1]. For instance, we show
that in LASTor an adversary with a bandwidth of
just 0.216% of the Tor network can increase its av-
erage guard selection probability to almost 18.22%,
84×the current Tor selection probability. We re-
mark that we defeat the separate defenses against
guard placement attacks that each individual algo-
rithm already possesses and showcase the impor-
tance of provably secure defenses.
(C) Defense framework: We propose a general tech-
nique to defend against guard placement attacks
that can be applied to any path selection algorithm.
We prove our approach secure under our deﬁnition,
which bounds the advantage of an attack (Theo-
rem 2). Finally, we apply our defense mechanism
to the three algorithms we attack, and ﬁnd that it
is feasible to largely maintain the original goals of
location-aware path selection algorithms while mit-
igating the threat.
Overall, our work provides a critical tool for the design
and analysis of path selection algorithms for Tor.
2 Background
2.1 Tor
The Tor network is a widely deployed and popular
anonymous communication system that primarily aims
to prevent attackers from linking communication part-
ners or associating online communication with a single
user. Tor uses the onion routing protocol [17], in which
data is transmitted over the network through a series
of relays. A Tor client constructs a circuit by choosing
an entry, middle, and exit relay to reach a destination
on the Internet. Tor clients choose these relays from
those listed in the current network consensus, which is
updated hourly. A relay is chosen for a given position
randomly with probability roughly proportional to its
bandwidth weight as given in the consensus. We refer
to the current algorithm for choosing relays in a circuit
as Vanilla Tor. When creating and using the circuit, the
layered encryption of onion routing ensures that each re-
lay learns information about only the previous hop and
the next hop in the circuit, and that no single relay is
able to link the client to the destination [10].
To improve long-term security, Tor clients use entry
guards (or simply guards) for the entry position of their
circuits [31, 59]. Each client selects a small number of
relays to use as guards for a long period of time. Cur-
rently, a typical Tor client selects one guard to be used
for 3–4 months. For all circuits created during this time
period, one of the selected guards will be used as the
entry relay. Each guard is chosen by clients at random
with probability proportional to bandwidth [45, 46].
To be a guard, a relay must satisfy a number of cri-
teria that are chosen to ensure good performance and
to raise the cost of obtaining the guard position. We
highlight four criteria that aﬀect this cost. First, the re-
lay must be measured by Tor’s bandwidth-measurement
system, which can take up to two weeks after joining the
network [48]. Second, the relay must have enough band-
width for its consensus weight to be at least 2,000, which
we estimate to require 35.5 Mbit/s (see Appendix A).
Third, the relay must be online consistently enough to
be considered “stable”, which can be ensured by keeping
the relay online at all times. Fourth, the relay must be
online long enough to be considered “familiar”, which
takes at most eight days [45].
Relay Adversaries. There are two general types of
saries. Relay adversaries run Tor relays with the goal
of performing traﬃc analysis attacks on the circuits
that they are a part of. Since all Tor relays are run
by volunteers with no restrictions, it is diﬃcult to tell
which ones can be trusted [11, 57, 58]. A well-known
Guard Placement Attacks 274
threat is when the adversary controls both the entry
and the exit relay in a single circuit, allowing them
to trivially deanonymize the client [10, 13, 21]. An-
other well-studied attack, called website ﬁngerprinting,
only requires the adversary to control the entry relay
[22, 24, 32, 33, 35, 38, 53, 55].
To defend against attacks that can be performed
by malicious entry relays, Tor uses the same guards in
the entry position for all circuit creations during the
lifetime of the guards (typically several months). This
provides a long-term defense by preventing clients from
quickly choosing a malicious entry relay [9]. Elahi et
al. [14] observe that the use of guards prevents a relay
adversary from compromising a large set of clients in a
short amount of time, and Johnson et al. [21] observe
that the frequency of guard selections limits the speed
of compromise by a relay adversary. In this work, we
focus on the threat of malicious guards.
not run malicious relays. Instead, they leverage their
position as a network operator to observe some por-
tion of a client’s circuit. These can include Autonomous
Systems (ASes), Internet Service Providers (ISPs), and
Internet Exchange Points (IXPs) [27]. A single such
network entity can potentially observe both sides of a
Tor circuit, performing traﬃc correlation attacks to link
a client to its destination [15, 42]. Asymmetric traﬃc
analysis (i.e. correlating traﬃc ﬂows in diﬀerent direc-
tions) and active attacks that exploit BGP dynamics
can deanonymize users even more eﬀectively [41]. Fur-
ther work has also shown that network adversaries can
exploit client mobility and user behavior over time to
deanonymize users or leak information about their net-
work location [51].
2.3 Location-Aware Path Selection
The current Tor network does little to defend against
network adversaries [10]. A number of location-aware
path selection algorithms have been proposed to defend
against passive and active AS-level adversaries, as well
as to enhance Tor’s performance.
and Syverson [13] propose an AS-aware path selection
algorithm that uses AS topology snapshots to avoid
ASes that appear both between the client and guard and
between the exit and destination. Nithyanand et al. [30]
propose Astoria, a similar AS-aware Tor path selection
algorithm that also considers asymmetric attackers, col-
luding attackers, and load-balancing across the Tor net-
work. Furthermore, it ensures that in the case where no
safe paths are available, the Tor client chooses guard
and exit relays in a way that will minimize the chance
of a successful attack. Barton and Wright [5] propose
a destination-naïve AS-aware path selection approach
called DeNASA. DeNASA chooses guards by avoiding
network paths that contain an empirically-determined
list of “suspect” ASes.
Active AS-level adversary defenses. To im-
prove Tor security against BGP hijack attacks [41], Sun
et al. propose a new guard selection algorithm called
Counter-RAPTOR [40]. The resilience of each candidate
guard to BGP hijacks is determined based on the client
and guard location, and this is factored into the guard’s
selection probability. Resilience is deﬁned as the proba-
bility of a client source AS not being deceived by a false
guard AS. Counter-RAPTOR is shown to improve the
resilience experienced by Tor clients up to 36% on av-
erage and up to 166% for certain clients.
Performance improvements. LASTor [1] is a
location-aware path selection algorithm designed to re-
duce Tor latency. LASTor incorporates some awareness
of passive AS-level adversaries, but primarily favors re-
lays that minimize the geographic distance from the
client and through the circuit to the destination. LAS-
Tor is able to reduce median path latencies by 25%.
These location-aware path selection algorithms use
client and guard location to choose guard relays in or-
der to protect against AS-level adversaries and improve
performance. However, in doing so, these systems make
themselves vulnerable to the guard placement attack.
3 Models and Deﬁnitions
To perform a guard placement attack, the adversary
needs bandwidth to contribute to his malicious guards
and IP addresses to host relays (Tor enforces a limit
of two relays per IP). A global and competitive mar-
ket exists for hosting services, and so this attack can
be performed by nearly anyone with a small amount of
money and an Internet connection. In particular, priv-
ileged points of network observation are not necessary.
This is a weak adversary that falls within the threat
models considered by the systems we attack as well as
by Tor itself.
Guard Placement Attacks 275
ize the adversary’s resources using the following param-
eters:
1. Total bandwidth (B): The total amount of band-
width the adversary can support (across all relays).
2. Number of guards (K): The total number of guard
3. Set of candidate guard locations (L): The set of loca-
tions where it is feasible for the adversary to deploy
relays. Possible examples of Linclude all ASes on
the Internet, all ASes hosting at least one Tor relay,
or all geographic coordinates within certain regions.
Attack Parameters. The attack itself is a function
of the following parameters:
1. Client locations (C): The set of possible locations of
clients that the adversary would like to attack.
2. Guard selection algorithm (A): The guard selection
component of the path selection algorithm used by
Tor clients. We will consider the following algo-
rithms: (1) Vanilla Tor (A=VT), (2) Counter-
RAPTOR (A=CR), (3) DeNASA (A=DN), and
(4) LASTor (A=LT).
3.2 Deﬁnitions
Attack Taxonomy. A guard placement attack is a
type of relay-level attack. The adversary places ma-
licious guards in network locations with the goal of
maximizing the probability that at least one of the
guards is selected by a client under attack. Once a
malicious guard is selected, the adversary can then
mount website ﬁngerprinting or traﬃc correlation at-
tacks [6, 10, 12, 21, 33, 35, 55]. In this work, we con-
sider two types of guard placement attacks: untargeted
and targeted. In the untargeted attack, the adversary
attacks all likely Tor client locations. This case can ap-
ply when an adversary wishes to attack every Tor client
or when he has no information about the locations of
the clients of interest. In the targeted attack, the adver-
sary attacks clients in a single location, i.e. |C|= 1. Of
course, an adversary could target a number of client lo-
cations in between these extremes, but we ﬁnd it useful
to investigate them as they estimate the best and worst
cases for a client.
Attack Success Metrics. We measure the success
of a guard placement attack as the probability that an
attacked client selects a guard of the adversary. Given
an adversary with total bandwidth B, total guards K,
and candidate guard locations L, an attack strategy sis
a tuple of location-bandwidth pairs representing guard
placements: s= ((1, b1),...,(K, bK)), where iiL,
ibi0, and PibiB. Given guard selection algo-
rithm Aand a client in location cC, let pA(c, s)be
the probability that the next guard selected by the client
is malicious. Then our main metric for the success of at-
tack strategy sis
σ(A, C, s) = 1
|C|X
cC
pA(c, s).(1)
This metric quantiﬁes the average success over client lo-
cations C. This can be viewed as reﬂecting the average
risk to clients in Cor the adversary’s expected chance
of success against a speciﬁc client knowing only that the
client location is in C. We also analyze the maximum
success over client locations: maxcCpA(c, s). Note that
these metrics are general for both untargeted and tar-
geted attacks, the latter being the case where |C|= 1.
Attack Goal. The adversary’s goal is to ﬁnd a
strategy sthat maximizes σ(A, C, s)given the client lo-
cations Cand the clients’ guard selection algorithm A.
Let S(B, K, L)be the set of all attack strategies given
bandwidth B, number of guards K, and candidate guard
locations L:S(B, K , L) = {((1, b1),...,(K, bK)) :
iiL, ibi0,and PibiB}. Then we can ex-
press the goal of the attacker as selecting some strategy
sarg maxsS(B,K,L)σ(A, C, s). This goal applies to
both the untargeted and targeted attacks, as it is pa-
rameterized by the set of client locations C.
Security Deﬁnition. We deﬁne security against
the guard placement attack in a more conservative way
than we measure attack success. While attack success
is measured given an adversary strategy, a meaningful
security deﬁnition must take into account all possible
adversary strategies. However, the resources needed by
these strategies must also be incorporated into the def-
inition. The reason is that Tor (and onion routing in
general) makes no trust assumptions on the relays and
prioritizes performance, and so an adversary that con-
tributes the majority of the guard resources should have
his guards selected by the majority of clients. In order
to quantify the adversary’s maximum possible success
relative to his contribution, we unify the resources con-
tributed by relays under a single cost parameter. This
cost includes the time required to run the relays and
obtain the necessary GUARD ﬂag. We give a speciﬁc cost
model in Section 3.3, but our framework is generic and
applies to any cost model.
We deﬁne the cost of running a guard placement
attack as the cost of running the adversary’s guards
relative to the total cost of running all guards in the
Guard Placement Attacks 276
Tor network. Let relCost(g)be the cost of running guard
gdivided by the cost of running all Tor guards. Costs
are additive in our model, and so the relative cost of
running a set of guards Gis PgGrelCost(g). Note that
relCost(g)[0,1], and Pg∈G relCost(g) = 1, where Gis
the set of all guards in the network.
Using this cost model, we deﬁne security as the max-
imum success of the attacker relative to his cost. We
will deﬁne security over all possible client locations C
(i.e. not just those in Cthat the adversary intentionally
attacks) to guarantee security to all clients. Let Sbe all
possible attack strategies given all possible guard loca-
tions L:S=B0,K0,L⊆L S(B, K , L). The maximum
success relative to cost is
σ(A) = max
sSmax
c∈C
pA(c, s)
PgsrelCost(g).(2)
Observe that, because of the maximization over all
strategies, σ(A)bounds the absolute success of any
speciﬁc strategy safter adjusting for its cost: σ(A)
σ(A, C, s)/PgsrelCost(g). Thus our security notion,
given in Deﬁnition 1, simply bounds σ(A).
Deﬁnition 1. Path selection algorithm Ais secure
against guard placement attacks with parameter θ, i.e.
is θ-GP-secure, if σ(A)θ.
Deﬁnition 1 provides a strong notion of security. Be-
cause it bounds the maximum success over all strategies,
it applies to all adversaries and attacks. Moreover, be-
cause it considers the maximum over all possible client
locations, it provides the same security guarantee to all
clients. This use of a worst case metric is in contrast to
our attack metric σ, which considers success averaged
over a set of client locations. While the weaker average
metric is useful to understand the threat of speciﬁc at-
tacks, it is less appropriate as a security deﬁnition as
a bounded average could still leave certain client loca-
tions highly vulnerable to attack and may even allow
every client location to be vulnerable when individually
targeted. Note that Deﬁnition 1 applies to all strategy
components—not just the malicious guard locations.
In particular, the deﬁnition covers strategies that sim-
ply vary the number and bandwidths of the malicious
guards.
Proving that an algorithm satisﬁes Deﬁnition 1 is
not straightforward due to the maximization over all
strategies. However, we can show that it is equivalent
to a simpler condition on the path selection algorithm.
Let fA(c, g)be the probability that a client using path
selection algorithm Ain location cchooses gas its next
guard. Let Gbe the set of all guards. The maximum
probability-cost ratio for path selection algorithm Ais
ρ(A) = max
c∈C max
g∈G
fA(c, g)
relCost(g).(3)
Theorem 1 shows that we can just bound ρ(A)to prove
that Asatisﬁes Deﬁnition 1. Its proof is in Appendix C.
Theorem 1. Path selection algorithm Ais θ-GP-secure
if and only if ρ(A)θ.
3.3 Empirical Cost Model
We propose a cost model derived from empirical analy-
sis of the prices of commercial hosting providers. Using
data from Tor Metrics [47], we identify the top 10 ASes
in the Tor network by total relay consensus weight (see
Appendix D). Of these 10, we ﬁnd 7 that provide com-
mercial hosting and list prices online. For each of these 7
providers, we identify the cheapest server price for each
bandwidth oﬀered. Moreover, we include the possibility
of running two relays on the same IP address (as limited
by Tor) and splitting the bandwidth and cost between
we consider additional splitting of the bandwidth, up
to 32 possible IP addresses, which would constitute an
entire /24. Then, to determine the cheapest price for a
given bandwidth, we consider the cheapest possible op-
tion among all providers with at least that bandwidth.
In every case, the cheapest provider was Online
SAS. Each bandwidth could be obtained at the cheap-
est price from one of three of its products: a dedicated
server at 1,000 Mbps for $11.40/month, a cloud server at 200 Mbps for$4.55/month, and a cloud server at 100
Mbps for $2.28/month. We emphasize that these costs are for running the relays for one month, which takes into consideration the time it takes to obtain the GUARD ﬂag. The exact cost model is given in Appendix D. To obtain a guard’s relative cost (i.e. relCost), the guard’s bandwidth is input to the model to determine the absolute cost, and that value is divided by the sum of the absolute guard costs. We use a linear regression on the past consensus weights and self-advertised band- widths of Tor’s guards to convert consensus weights to the bandwidths used in the cost model. The regression is necessary because the consensus weights are the result of a load-balancing algorithm [34] that causes them to diﬀer substantially from the true bandwidths. This re- gression has a coeﬃcient of determination of r2= 0.86. See Appendix A for more details. Guard Placement Attacks 277 While our cost model is based on limited data, Def- inition 1 is somewhat robust to any inaccuracies. If the model estimates the relative cost cof any set of guards to be (1 + ε)c, then an algorithm with θ-GP-security under the model will instead actually have (1 + ε)θ-GP security. Moreover, while any cost model will not reﬂect the costs to all adversaries at all times, any guard se- lection algorithm will give an adversary some success relative to his cost, and we argue that explicitly model- ing the cost increases the chance of successfully limiting his relative success. 4 Case Study I: Counter-RAPTOR Counter-RAPTOR [40] modiﬁes the way guards are cho- sen in Tor to improve security against BGP hijack at- tacks. In Counter-RAPTOR, the client computes, for each guard g, the guard’s resilience r(g), which is the probability that the client’s AS is not deceived by an equally-speciﬁc preﬁx attack on the guard’s AS. The re- siliences are used as a component of the weights used by the client to select guards. To prevent a client from being too biased towards any one guard, Tille’s algorithm [49] is applied to the resiliences to produce more uniform val- ues r0(g)(for details, see Appendix B). Guard g, with normalized bandwidth b(g), is then given weight w(g) = α·r0(g) + (1 α)·b(g),(4) where αis a parameter that trades oﬀ attack resilience and performance (the recommended value is α= 0.5). We show that, despite the use of Tille’s algorithm, Counter-RAPTOR is still vulnerable to guard place- ment attacks. 4.1 Untargeted Attack – Counter-RAPTOR We ﬁrst analyze the success of an untargeted attack in which the attacker seeks a high average guard-selection probability over many client locations. When placing a guard, the attacker beneﬁts from choosing a location that has high resilience with respect to many client lo- cations. We consider the success of the attack under diﬀerent bandwidths and numbers of guards. Experimental Setup. We attack the locations (C) in the list of 368 top Tor client ASes measured by Juen [23]. We let the set of candidate ASes for running the malicious guard (L) include all ASes that already contain at least one Tor relay. We obtain Tor network data from CollecTor [44] and retrieve relevant relay in- formation from an August 1, 2018 consensus. All IP to AS mappings are done using Team Cymru data [43], and we use Internet topology data from CAIDA [8]. AS path prediction [16] is used to compute resiliences. This data is used to model Tor and Internet routing throughout the paper unless otherwise noted. Varying the bandwidth. We use one malicious guard (K= 1) and consider seven consensus weight values B: 2,000; 3,000; 7,500; 10,000; 30,000; 75,000; and 150,000. These bandwidths are the Tor consensus weights across the range of existing guard bandwidths. Our linear regression on consensus weights and adver- tised bandwidths (Appendix A) shows that a guard with a weight of 2,000, which is 0.011% of the total guard weight, likely has an actual bandwidth of 35.5 Mbit/s, and a guard with a weight of 150,000, which is 0.81% of the total guard weight, has a predicted actual band- width of 939.8 Mbit/s. To optimize the attack success σ(Equation 1) against all client locations (C), the ad- versary computes the success probability for placing a malicious guard in each of the candidate ASes (L), and then chooses the location with the largest attack suc- cess. This is the same AS regardless of bandwidth, and we ﬁnd that it is AS199524 (G-Core Labs). We consider the attack success probabilities un- der Counter-RAPTOR from two perspectives. First, to show the increase in guard probability compared to to- day’s Tor network, we include the success probability under Vanilla Tor. Second, we show the relative cost of the attack (Section 3.3) to present what would be the “ideal” success of the attacker under our cost model. Figure 1 shows the untargeted attack success for a varying attacker bandwidth. The shaded areas represent the range of guard-selection probabilities over the client locations even though no speciﬁc one is being targeted. For the smallest bandwidth guard shown (2,000), we can see that it achieves a selection probability of 0.046%, which is 4.3×greater than the Vanilla Tor success prob- ability and 1.37×greater than the relative cost of the guard. This means that it would be feasible for small adversaries (such as a single person) to have higher suc- cess rates on average than they could on today’s Tor network. For all bandwidths, the highest success prob- ability is obtained against AS28885. The adversary has the highest relative advantage against that client loca- tion for a bandwidth of 2,000, giving them a 12.7×in- crease in success rate compared to Vanilla Tor and a 4.1×increase over the ideal cost-based probability, even without targeting those client locations speciﬁcally. We Guard Placement Attacks 278 Fig. 1. Success probability of an untargeted attack on Counter- RAPTOR clients with 1 malicious guard and varying bandwidths. Shaded areas show range of success over client locations. can also see that a single-guard untargeted attack’s rel- ative success compared to the ideal cost-based model stays about the same as bandwidth increases, but loses eﬀectiveness when compared to Vanilla Tor for band- widths greater than 0.1% of the total. This is because Counter-RAPTOR inherently weights bandwidth less than Vanilla Tor (exactly 1/2, due to α). However, this does not exonerate Counter-RAPTOR; a larger adver- sary can instead run many small guards to obtain high absolute success. Varying the number of relays. Because Tor does not place restrictions on how many relays one can run, a strategic adversary can deploy a large number of guards in the Tor network to better optimize his attack success probability. For a ﬁxed bandwidth budget, the adver- sary can split the bandwidth among multiple guards and place them each strategically to increase the likeli- hood that one of them is selected. The adversary does need to take care not to reduce the individual band- width of each relay such that it falls below the mini- mum threshold (2,000) to be considered a guard. We observe that Counter-RAPTOR is particularly vulner- able to splitting bandwidth among multiple guards due its additive formula for guard weight (Equation 4). Di- viding Bbandwidth among Kguards instead of one guard reduces the bandwidth term by a factor Kbut does not reduce the resilience term. We use a ﬁxed bandwidth weight of 40,000, which represents just 0.216% of the total guard bandwidth. This ensures that the adversary can divide the band- width evenly among up to 20 guards without falling be- low the minimum guard threshold (see Section 2.1). We analyze the optimal strategy, which places all malicious guards in the same AS location (AS199524). # Guards relCost Avg Max (K) (%) Success (%) Success (%) 1 0.134 0.148 0.238 2 0.153 0.189 0.368 3 0.181 0.230 0.498 5 0.263 0.311 0.754 10 0.334 0.512 1.384 20 0.665 0.909 2.597 Vanilla 0.134 0.216 0.216 Table 1. Success probability of an untargeted attack on Counter- RAPTOR clients with a ﬁxed bandwidth of 40,000 (0.216% of total guard bandwidth) and varying number of malicious guards. Table 1 shows the eﬀects of splitting a ﬁxed band- width budget among an increasing number of malicious guards in an untargeted attack. Dividing the bandwidth evenly among 20 guards increases the average attack success to 0.909%, which is 4.2×higher than the Vanilla Tor success and 1.4×higher than the relative cost. De- ploying multiple guards pushes the success probability for this moderately sized adversary above Vanilla Tor, and it also increases the advantage relative to the cost of running the attack. We can also see this trend in the maximum success over all clients, where the absolute success probability reaches nearly 2.6% for 20 guards, an increase of 12×over Vanilla Tor and 3.9×over the relative cost. 4.2 Targeted Attack – Counter-RAPTOR We next analyze targeted guard placement attacks on Counter-RAPTOR. We use a single malicious guard with a bandwidth of 2,000 and for each of the 368 top Tor client ASes compare the targeted and untargeted success rates. For each of these locations, the adversary performs an exhaustive search over all candidate ASes (L) to ﬁnd the one that has the maximum success prob- ability for the client location. Figure 2 shows the success rates of targeted and un- targeted attacks over the client locations with just one guard. For visual clarity, we rank each client AS in in- creasing order ﬁrst by untargeted success and then by targeted success. The largest increase in targeted suc- cess probability over all client ASes is 47%. However, for the 13 client ASes where the optimal untargeted place- ment happens to also be the optimal targeting location, targeting them speciﬁcally is no better than attacking all clients at once. For the client AS with the maxi- mum adversary success (index 368), we can see that a Guard Placement Attacks 279 Fig. 2. Success rates of targeted attacks against Counter- RAPTOR clients using 1 malicious guard with a bandwidth of 2,000 (0.011% of total guard bandwidth). single-guard targeted attack with a bandwidth of 2,000 can achieve a success probability of 0.15%. This is over 13.6×higher than what an adversary with the same re- sources could achieve against Vanilla Tor and over 4.4× higher than the relative cost. 4.3 Summary We show that strategic guard placement attacks expose new vulnerabilities in Counter-RAPTOR. Similar to Vanilla Tor, adversaries with more bandwidth resources are able to compromise clients with higher probability. However, small adversaries with relatively little band- width can attack Counter-RAPTOR more successfully than they can attack today’s Vanilla Tor network. We also show that targeted guard placement attacks can boost the attack’s likelihood of success, and splitting the same bandwidth resource among multiple malicious guards can increase the probability that one of them is chosen. 5 Case Study II: DeNASA DeNASA [5] is a proposal for improving security against passive AS attackers that may snoop on Tor circuits. This algorithm chooses guards weighted by bandwidth, except only among those that do not have a Suspect AS on the path to or from the guard. Suspect ASes are those that are frequently in a position to perform traﬃc corre- lation attacks on Tor circuits. According to the authors, the top two Suspect ASes that clients should avoid are AS3356 (Level 3) and AS1299 (Telia Company) [5]. De- NASA uses AS path inference [16] to predict whether or not a Suspect AS is on-path between client-guard pairs. Guards that do not have a Suspect AS on-path are called suspect-free. If there are no suspect-free guards, DeNASA resorts back to the Vanilla Tor guard selec- tion algorithm. The number of Suspect ASes is limited to only two as a defense against guard placement at- tacks, but we show that this is still ineﬀective. 5.1 Untargeted Attack – DeNASA We consider an untargeted attack on DeNASA in which the adversary attempts to maximize average guard- selection probability over common Tor client locations. In this attack, the adversary seeks to place a malicious guard in a location that is suspect-free from many client locations. The adversary further beneﬁts from choosing a guard location that is suspect-free from client loca- tions that have few other suspect-free guards and thus are more likely to choose the malicious guard. Experimental Setup. We again analyze an un- targeted attack with the 368 top Tor client ASes [23] as the client locations (C) and the ASes with at least one Tor relay as the candidate ASes (L). We use the same Internet topology and Tor data as in Section 4. Varying the bandwidth. We again evaluate the attack success for one malicious guard (K= 1) and seven diﬀerent guard bandwidths (B) ranging from 2,000 to 150,000 and compare it to Vanilla Tor and the ideal cost-based success. To place the guard, the adver- sary does an exhaustive search over all candidate ASes (L) and chooses the AS that receives the highest av- erage selection probability over all client locations. We discover that the optimal location to place the malicious guard is in AS1659 (Taiwan Academic Network) for smaller adversaries (bandwidth weight less than 30,000), but is AS12637 (Seeweb) for larger adversaries. This dif- ference is because once an adversary has enough band- width, it is better to give up on attacking client loca- tions that are extremely vulnerable in favor of attacking many more clients that are only somewhat vulnerable. Figure 3 shows the attack success. For the small- est bandwidth shown (2,000), the adversary achieves on average 0.043% success. This is a relative advan- tage of 3.9 over Vanilla Tor and 1.29 over the relative cost. The shaded area shows the range over all client locations of the probability that the malicious guard is selected provided that the adversary places the guard in the location that maximizes average success. There is a wide range of attack success across client locations for any given bandwidth, even though this attack is un- Guard Placement Attacks 280 Fig. 3. Success probability of an untargeted attack on DeNASA clients with 1 malicious guard and varying bandwidths. Shaded areas show range of success over client locations. targeted. Clients in the worst-case location (AS30083) have an extremely high probability of selecting the ma- licious guard. In particular, for a bandwidth of 2,000, a single malicious guard placed in AS1659 achieves a se- lection probability of 10.6% (964×that of Vanilla Tor, 316×the relative cost) for clients in AS30083. The rea- son for such large success probabilities is that AS30083 can only reach one non-malicious suspect-free guard. For large bandwidths, the worst-case client location be- comes AS36992. A guard with a bandwidth of 150,000, which represents just 0.81% of total guard bandwidth, achieves success rates of 16.2%, even though AS36992 is not speciﬁcally targeted. Varying the number of relays. With DeNASA, an adversary must place guards in separate ASes to gain an advantage in running multiple small guards with a ﬁxed total bandwidth. Deploying multiple guards within the same AS will have no eﬀect on average success prob- ability because the entire AS is either suspect-free or not. By deploying a fraction of the guard bandwidth in a separate AS, the adversary may be able to capture some clients that were not able to reach the ﬁrst AS due to an on-path Suspect AS. In our attack analysis, we im- plement a greedy algorithm that places each additional guard in the candidate AS that maximally increases the success probability. Note that this is just a heuristic and may not ﬁnd the optimal strategy. Table 2 shows the eﬀect of splitting a bandwidth weight of 40,000 among up to 20 malicious guards. Note that the success probabilities do not strictly increase be- cause we use a heuristic. We can see that running two guards instead of just one vastly increases the maximum success probability to 54.2%. This is 251×the Vanilla Tor success rate and 354×the relative cost. The reason for this huge improvement is that the second guard can # Guards relCost Avg Max (K) (%) Success (%) Success (%) 1 0.134 0.522 4.896 2 0.153 0.555 54.16 3 0.181 0.535 61.17 5 0.263 0.544 58.64 10 0.334 0.531 62.32 20 0.665 0.531 62.32 Vanilla 0.134 0.216 0.216 Table 2. Success probability of an untargeted attack on DeNASA clients with a ﬁxed bandwidth of 40,000 (0.216% of total guard bandwidth) and varying number of malicious guards. be deployed in an AS that is suspect-free from client AS30083, while the optimal location for deploying just one guard of size 40,000 cannot reach this particularly vulnerable client AS. We further see that adding even more guards does not increase the average success or the maximum success, but does increase the relative cost. This is because while adding more guards does allow the adversary to attack more client locations, it also removes bandwidth from the most eﬀective attack van- tage points. Since there is some cost to deploying more guards, an adversary attacking DeNASA should use no more than two or three guards placed in separate ASes. 5.2 Targeted Attack – DeNASA In a targeted guard placement attack, the adversary maximizes his success by placing his guard in an AS that is suspect-free from the target client. This is pos- sible as long as the client is not in one of the Suspect ASes and the set of candidate guard locations Lis large enough. For example, if the client’s AS is in the set of candidate guard locations L, the adversary can simply place the malicious guard in that AS. If there exists no suspect-free AS for a targeted client location, then De- NASA chooses guards based on bandwidth, and so the adversary can choose any location for his guard. We show in Figure 4 the success rate of targeted attacks using one malicious relay with a bandwidth of 2,000 against each client AS. Again, we rank each AS in increasing order ﬁrst by untargeted success and then by targeted success. For the 50 of 368 client ASes (13.6%) that had zero probability of compromise in the untar- geted attack, we ﬁnd that all are able to reach at least one suspect-free AS, and so targeting them speciﬁcally gives a non-zero success rate. For the other 318 client ASes, targeting them does no better than generally at- Guard Placement Attacks 281 Fig. 4. Success rates of targeted attacks against DeNASA clients using 1 malicious guard relay with a bandwidth of 2,000 (0.011% of total guard bandwidth). For 86.4% client ASes, the targeted and untargeted success rates are equivalent (the overlapping points). tacking all clients. Figure 4 also shows that the suc- cess of a targeted attack in DeNASA is heavily depen- dent on the target client AS. For about ﬁve percent of clients, the probability of successful attacks target- ing these ASes are bigger than an order of magnitude higher than what the same adversary would be capa- ble of in the Vanilla Tor network. While a majority of clients can reach most non-adversarial guards without traversing a Suspect AS, there are a few ASes that are especially vulnerable to adversaries that leverage De- NASA’s location-awareness. 5.3 Summary We demonstrate vulnerability in DeNASA to the guard placement attack in that an adversary can strategically place guards in suspect-free locations. We show that for an untargeted attack with a single guard of bandwidth weight 2,000, a DeNASA client chooses the malicious guard with average probability 3.9 times greater than a Vanilla Tor client does, and in the worst case the se- lection probability can be more than 964 times greater. This shows that the guard placement success rates of an adversary varies signiﬁcantly across client locations. For large enough guards, the maximum absolute success probability is also large. We also show that splitting the bandwidth among a few guards in separate ASes can greatly increase the untargeted success rate by reach- ing more clients. However, it is also important to note that such deployment more comes at a higher cost to the adversary. Finally, we show that targeted attacks are successful against clients that were not vulnerable to the untargeted attack. 6 Case Study III: LASTor LASTor [1] is a location-aware path selection algorithm that primarily aims to reduce latency of communica- tion on Tor. While the proposal oﬀers some additional AS-awareness to defend against passive AS adversaries, it is only incorporated in the path selection algorithm after guards are chosen. LASTor uses a Weighted Short- est Path algorithm that selects a given path with prob- ability inversely proportional to the expected latency between the client and destination. Network latency is approximated by the end-to-end geographical distance. Thus, the algorithm attempts to select a path close to the direct line between the client and the destination. LASTor includes relay clustering as a defense against guard placement attacks. This technique lim- its the success of an adversary strategy that places all of the malicious relays in the same location. However, we demonstrate that it is ineﬀective in general against guard placement. The clustering algorithm divides the globe into a grid of cells and includes a cluster for each cell containing all the relays within that cell. The recom- mended edge lengths of each cell are 2 degrees of latitude and longitude. To select guards, all guards are ﬁrst clus- tered, and then distance from each cluster is computed as the great-circle distance [50] from the center of the cluster to the client. The client selects a “feasible” clus- ter uniformly at random from the closest 20% of clusters and then picks one guard uniformly at random among the guards in that cluster. Note that LASTor does not consider relay bandwidth. 6.1 Untargeted Attack – LASTor We consider an untargeted attack on LASTor in which the adversary seeks high average selection probability over many likely physical locations for Tor clients. The adversary beneﬁts by placing his guards close to many of the client locations. Because LASTor does not take bandwidth into account, the adversary further beneﬁts from having as many guards as possible. Experimental Setup. We choose the client loca- tions (C) from the ten countries with the most directly- connecting Tor users. We use 200 cities located across these countries as client locations. The locations in the ith country are the geographic coordinates of the 200fi most-populous cities in that country, where fiis that country’s fraction of Tor users among the top-ten coun- tries. We obtain the top ten Tor countries and their fi values using data from Tor Metrics [47]. We let the can- Guard Placement Attacks 282 Fig. 5. Success probability of an untargeted attack on LASTor clients with 1 malicious guard and varying bandwidths. The shaded area shows ranges of success over client locations. didate locations for running the guard (L) be the set of all relay clusters that already contain at least one Tor relay. We use the Maxmind GeoIP database [25] for IP to geo-location mapping. Varying the bandwidth. We again consider a single malicious guard (K= 1) with seven bandwidth weights (B) from 2,000 to 150,000. To choose the guard’s location, the adversary ﬁnds the optimal strategy by computing for each candidate location its potential guard selection probability from all client locations and choosing the one that maximizes the average probabil- ity (i.e. σ, see Equation 1). We ﬁnd that this optimal location is just north of Moscow (57.8794, 34.9925). Figure 5 shows the attack’s success on the set of client locations. It is clear that LASTor selection prob- abilities have no dependency on bandwidth, giving an adversary running a small guard a signiﬁcant advan- tage compared to both Vanilla Tor and the relative cost. A malicious guard with consensus weight 2,000, which is the minimum weight that we examine, obtains a 1.13% average success probability over all clients, 103× greater than what the same adversary would obtain un- der Vanilla Tor, and 34×greater than the relative cost. The shading shows that the success rates range widely from 0% to 2.94% for all bandwidths. The constant max- imum success probability over all clients indicates that there is always at least one client location that has the malicious guard’s cluster within its closest 20%. The constant minimum success probability, which occurs for 61% of client locations, is because it is not in the closest 20% of clusters for those locations. Varying the number of relays. Because the LASTor guard selection algorithm does not depend on bandwidth, it is particularly susceptible to an adver- sary that has limited bandwidth but can run multiple # Guards relCost Avg Max (K) (%) Success (%) Success (%) 1 0.134 1.132 2.941 2 0.153 2.250 5.882 3 0.181 3.353 8.824 5 0.263 4.414 14.29 10 0.334 10.22 27.78 20 0.665 18.22 34.21 Vanilla 0.134 0.216 0.216 Table 3. Success probability of an untargeted attack on LASTor clients with a ﬁxed bandwidth of 40,000 (0.216% of total guard bandwidth) and a varying number of malicious guards. relays. Moreover, an adversary with multiple relays can strategically place them in separate clusters to obtain positive success probability from more client locations. We consider an adversary implementing a greedy algo- rithm to insert malicious guards into the Tor network, where each added guard is placed to maximize the in- crease in average success probability. Note that this is a heuristic may not ﬁnd the optimal strategy. In Table 3 we show the success probabilities of un- targeted attacks on the 200 client locations with a band- width budget of 40,000 and up to 20 relays (K= 20). As we can see, the success probability increases nearly linearly with the number of relays even while keeping the total bandwidth constant. Thus, the attacker ob- tains the highest advantage by running 20 relays, in which case he increases the average success probabil- ity 84×over Vanilla Tor and 27×over the relative cost. The maximum success probability is 158×higher than Vanilla Tor’s and 51×higher than the relative cost. 6.2 Targeted Attack – LASTor In the targeted attack, the adversary focuses on a sin- gle client location and attempts to maximize his suc- cess probability with respect to this target. LASTor ﬁrst chooses a feasible cluster, then chooses a guard within that cluster. Therefore, a strategic adversary would want to place guards in feasible clusters that are geographically close to the client but that contain as few guards as possible. We ﬁnd that an adversary speciﬁcally targeting any one of the 200 client locations is always able to ﬁnd a feasible cluster that is within the closest 20% and that does not already contain a guard. This gives a malicious guard targeting any spe- ciﬁc client a 2.94% chance of being selected from among the 1,935 other guards in the Tor network, regardless Guard Placement Attacks 283 of its bandwidth. By comparison, the highest-bandwidth non-malicious guard has a 0.74% chance of selection un- der Vanilla Tor. We emphasize that this targeted attack success applies to all 200 client locations, even those that have zero chance of selecting a malicious guard in the untargeted attack. 6.3 Summary We demonstrate serious vulnerabilities in LASTor to the guard placement attack, despite the relay cluster- ing that LASTor uses as a defense to that very attack. With just a single malicious guard, a low-bandwidth adversary can exploit the location-awareness of LAS- Tor to increase its guard selection probability by more than two orders of magnitude over Vanilla Tor. We also show that splitting the adversary’s bandwidth among multiple guards and placing them in separate clusters drastically increases the attack success probability. 7 Countermeasures In this section, we present a meta-algorithm that modi- ﬁes the guard selection component of any path selection algorithm to provably defend against guard placement attacks. We apply this algorithm to Counter-RAPTOR, DeNASA, and LASTor and evaluate its eﬀect on their security and performance. 7.1 Defense Mechanism In a successful guard placement attack, the adversary obtains a success probability that is disproportionate to the fraction of Tor’s guard resources that he contributes. We therefore propose a defense mechanism that bounds guard selection probabilities relative to their costs. Our defense mechanism can be applied to any guard selec- tion algorithm, as it operates by interacting with the algorithm to produce a modiﬁed guard selection distri- bution. The mechanism takes as input a desired bound on the guards’ probability-cost ratios, and it produces a distribution satisfying this bound while preserving as much as possible the security beneﬁts of the original guard distribution. Before describing the mechanism, we introduce some notation. Let Abe the client’s algorithm for se- lecting its next guard. Let θ1be a security parameter indicating the desired bound on the guards’ probability- cost ratios. Intuitively, θrepresents the maximum rela- tive advantage the network is willing to give to an ad- versary running a guard placement attack. Let Gbe the set of guards in the Tor network. For g∈ G, recall (Sec- tion 3.2) that relCost(g)denotes its cost fraction. Also recall that fA(g)denotes the selection probability of g for a client using A(we drop the client parameter cfor simplicity). We assume that Aproduces this distribu- tion given the set of guards (i.e. fA=A(G)). The defense mechanism Dis given in Algorithm 1. It takes in Aand θand produces a guard selection dis- tribution f0 Athat bounds the probability-cost ratio for all guards. Doperates by asking Afor its desired guard selection distribution, enforces the θbound on that dis- tribution by potentially reducing guard probabilities, re- moves any guards thus limited, and repeats to assign to the remaining guards the probability in excess of the bound from the removed guards. Thus Drepeatedly uses Aas the guide for how the current unassigned prob- ability should be allocated among guards that haven’t yet met the θbound. Duses each distribution that A produces as much as possible, only reducing some de- sired probabilities if they would cause the guard to ex- ceed the θbound. We prove in Appendix C that this algorithm terminates (Theorem 2), as it is not obvious from the description that it does so. 7.2 Defense Framework Evaluation We evaluate the security and performance of our de- fense framework from three perspectives: (1) how well it reduces vulnerability to a guard placement attack, (2) how well the modiﬁed guard selection algorithm main- tains its original goal (e.g. increasing hijack resilience), and (3) how it aﬀects the load balancing of clients over guards. We apply Algorithm 1 to Counter-RAPTOR, DeNASA, and LASTor with varying threshold values. 7.2.1 Security from Guard Placement Attack The defense algorithm bounds the extent to which ad- versaries can exploit the location-awareness of the guard selection algorithm to achieve high selection probabili- ties. Tor clients can apply this defense to their location- aware path selection algorithm to mitigate guard place- ment attacks while maintaining AS-awareness or latency beneﬁts. Theorem 2 shows that modifying a guard se- lection algorithm using the defense makes it robust to guard placement attacks in general, regardless of at- tacker strategy. Guard Placement Attacks 284 Algorithm 1: Defense algorithm D Input: Guard selection algorithm A, security parameter θ Output: Guard selection distribution f0 A 1forall g∈ G do 2f0 A(g)0 3end 4B← G // Guards below threshold 5p1// Probability to allocate 6repeat 7x0// Excess probability 8fAA(B) 9forall gBdo 10 if f0 A(g) + p·fA(g)θ·relCost(g)then 11 xx+f0 A(g) + p·fA(g)θ·relCost(g) 12 f0 A(g)θ·relCost(g) 13 BBg 14 else 15 f0 A(g)f0 A(g) + p·fA(g) 16 end 17 end 18 px 19 until x= 0 20 return f0 A Theorem 2. Let Abe any guard selection algorithm and θ1be the security parameter. Then using the guard selection distribution f0 A=D(A, θ)is θ-GP- secure. In other words, applying Algorithm 1 and using the re- sulting guard selection probability distribution is secure against guard placement even though the original guard selection algorithm may not be. An additional conse- quence of Theorem 2 is that applying Algorithm 1 limits any advantage the adversary might obtain from splitting his bandwidth among multiple relays, as Deﬁnition 1 al- lows the adversary to vary the number and bandwidths of guards in addition to their locations. We defer the proof of Theorem 2 to Appendix C. We evaluate how much the defense would improve the security of using the location-aware algorithms on the existing Tor network by computing the probability- cost ratios for each algorithm. The probability-cost ratio of guard gunder algorithm Ais ρ(g) = fA(g)/relCost(g). We determine the values of ρ(g)under our cost model for each existing guard in the August 1, 2018 consensus. We do not insert any malicious guards. Fig. 6. Maximum selection probability to cost ratio for clients choosing guards in the August 1, 2018 Tor consensus. Figure 6 shows the maximum probability-cost ratio for each client location under Counter-RAPTOR, De- NASA, and LASTor. These values represent the ability of existing guards to attract clients relative to their costs (recall Equation 3). For Counter-RAPTOR, the worst- case client location has at least one guard with a ratio of 5.4. Some clients under DeNASA could select a guard with a probability 1,490×the cost it took to deploy the guard, but this ratio varies greatly across clients. LASTor always gives some existing guard a large ad- vantage, and 75% of client locations have a maximum probability-cost ratio of greater than 100. These results show that even on the current net- work, applying proposed location-aware algorithms would give some guards a chance to observe clients that is disproportionate to their cost. We emphasize that these results only include relays that currently exist, and a strategic attack on a speciﬁc location-aware al- gorithm would yield even higher success, as shown in Sections 4–6. Furthermore, Theorem 2 shows that ap- plying Algorithm 1 to any of these algorithms would mitigate the threat by guaranteeing that the maximum probability-cost ratio for all client locations would be no higher than the desired limit θ. 7.2.2 Algorithms’ Original Goals The defense meta-algorithm provably limits the advan- tage of a guard placement attack to a factor of θ, but it does impact the original goals of the location-aware path selection algorithm. However, we show that for rea- sonable values of θthat impact is generally small. In the following sections, again let fA(g)denote the probabil- Guard Placement Attacks 285 Fig. 7. Probability of being resilient to attacks with diﬀerent threshold (θ) values. ity that a client selects guard gunder algorithm A, and let Gdenote the set of all guards in the network. Counter-RAPTOR. Let r(g)indicate the hijack resilience of guard g. The aggregated probability of a client being resilient to a BGP hijack attack can be expressed as Pg∈G fCR(g)r(g). Figure 7 shows the ag- gregated resilience for θ∈ {1,1.1,1.25,1.5}, as well as for “pure” Counter-RAPTOR without the guard place- ment defense. We use the 368 top client ASes [23] and guard data from the August 1, 2018 consensus. Natu- rally, θ= 1 has the lowest probability of being resilient to an equally-speciﬁc preﬁx hijack attack, while pure Counter-RAPTOR is the most resilient. For θ= 1, 50% of client locations have at least 0.62 resilience proba- bility, and in pure Counter-RAPTOR, 50% of client lo- cations have at least 0.7 resilience probability. Observe that for θ= 1.25 the resilience probabilities over all client locations are nearly identical to those of clients using pure Counter-RAPTOR. This means that we can relax the θbound to 1.25 while maintaining the hi- jack resilience beneﬁts of Counter-RAPTOR. We rec- ommend this value to mitigate guard placement attacks as it bounds the adversary’s probability-cost advantage to 1.25 and does not signiﬁcantly impact the original goals of Counter-RAPTOR. DeNASA. Let sf(g)be the indicator function for the AS paths between the client and guard being suspect-free, i.e., sf(g) = 1 if the paths in both di- rections are suspect-free and 0 otherwise. The aggre- gated suspect-free probability can then be expressed as Pg∈G fDN(g)sf (g). Figure 8 shows the suspect-free probability for θ∈ {1,1.25,1.5,2,10}, as well as for pure DeNASA. We use the same Tor client ASes [23] and guards as in the Counter-RAPTOR evaluation. Natu- rally, as θincreases, the algorithm behaves more sim- ilarly to pure DeNASA, which has the highest proba- Fig. 8. Probability of choosing suspect-free guards with diﬀerent threshold (θ) values. bility for clients to choose suspect-free guards. For pure DeNASA, 366 out of 368 client locations are guaranteed to select suspect-free guards; the only clients not select- ing suspect-free guards are clients located in the two Suspect ASes themselves. As we reduce θ, the modiﬁed distribution begins to give guards that are not suspect- free non-zero probability of being selected. This is the trade-oﬀ between protecting clients from the threat of Suspect ASes and from guard placement attacks. We recommend θ= 2, which has 80% of client locations still only choosing suspect-free guards while ensuring that no guard is more than twice as likely to be chosen relative to the fraction of the Tor network it contributes. LASTor. LASTor uses geographical distance as a substitute for network latency [1]. Let d(g)denote the distance to the guard from a client. The expected dis- tance from the client to a chosen guard can then be expressed as Pg∈G fLT(g)d(g). Figure 9 shows the ex- pected distance in kilometers for θ∈ {1,2,5,10,20}, as well as for pure LASTor. We evaluate the set of 200 client locations used in Section 6.1 and use the same guards as in the previous evaluations. Naturally, as θ decreases (mitigating guard placement attacks), the ex- pected distance of a guard selected by a client increases. The median expected distances range between 1,348 km to 4,375 km, depending on choice of θ. We recommend θ= 5 for LASTor, which has 50% of client locations still choosing guards within an 3,104 km. 7.2.3 Performance Impact We will show that our defense mechanism can be applied to Counter-RAPTOR, DeNASA, and LASTor without negatively impacting the performance of Tor’s load bal- ancing. In fact, application of the defense can improve Guard Placement Attacks 286 Fig. 9. Expected guard distance with diﬀerent threshold (θ) val- ues. the network load balance because it prevents attractive relays from being overloaded with client traﬃc. We examine guards’ expected load under these path-selection algorithms, both with and without the application of our defense mechanism. When computing expected guard loads for Counter-RAPTOR and De- NASA, we assume clients are distributed in the top 368 client ASes according to the densities measured by Juen [23]. When computing expected loads for LASTor, we assume clients are geographically distributed according to our experimental setup described in Section 6.1. Fol- lowing Tor’s existing load-balancing strategy, we con- sider ideal load balancing to be when clients are dis- tributed proportionally to bandwidth, which is most reasonable under the assumption that clients produce similar amounts of traﬃc. When applying the defense, we use our recommended values of θfor each algorithm: θ= 1.25 for Counter-RAPTOR, θ= 2 for DeNASA, and θ= 5 for LASTor. Under each algorithm, we compute each guard’s expected load factor, which is the ratio of the guard’s fraction of clients to the guard’s fraction of bandwidth; for example, if a guard is used by 4% of Tor clients and contributes 2% of Tor’s bandwidth, then the guard’s load factor is .04/.02 = 2. Under ideal load balancing, guards would have load factors close to 1. Figure 10 shows the distribution of expected guard load under the location-aware algorithms. The CDFs are over clients and not client locations; e.g., the point at (x= 100, y = 0.5) on the Counter-RAPTOR lines indicates that 50% of clients choose a guard with a load factor of at most 1 in Counter-RAPTOR. Applying our defense to Counter-RAPTOR pro- duces a nearly identical load distribution; therefore, 1.25-GP-security can be achieved without disturbing the load balance in Counter-RAPTOR. Applying our de- fense to DeNASA slightly improves network balance— Fig. 10. Distribution of expected guard load factors for location- aware algorithms with and without guard placement defense ap- plied. the median load factor experienced by clients is reduced from 1.25 to 1.07, and the worst load factor is reduced from 3.00 to 1.70. Akhoondi et al. note that one of LAS- Tor’s deﬁciencies is its poor load balancing [1], which is reﬂected in our results. Applying our defense signiﬁ- cantly improves the load balance of LASTor: the median load factor experienced by clients is reduced from 7.91 to 2.19, and the worst load factor is reduced from 70.1 to 8.70. Our analysis suggests that defending a path selec- tion algorithm against the guard placement attack may result in desirable performance beneﬁts, in addition to improving the security of the algorithm. 8 Related Work Guard Placement Attacks. While previous work on location-aware path selection algorithms in Tor has mentioned guard placement attacks, it has not rigor- ously studied them. We showed that the defenses pro- posed for each algorithm we attacked are ineﬀective. Sun et al. [40] note that an adversary can run a relay that has a short AS-path length to the client to obtain a high resilience value, making it more likely for a Counter- RAPTOR client to choose the adversary’s relay. To combat this, the resilience of each relay is somewhat randomized using Tille’s algorithm [49]. Nithyanand et al. [30] mention that Astoria clients may be manipulated into connecting to malicious guards if there are few or no safe paths available. In these cases, the authors sug- gest having a minimum threshold of safe paths to choose from, but no further analysis is provided. Akhoondi et al. [1] also mention that an adversary may try to place relays close to the direct line between a LASTor client and destination, and they introduce a clustering algo- Guard Placement Attacks 287 rithm as a defense. Perhaps most explicitly, Barton and Wright [5] coin the term “guard placement attack”. To mitigate the attack in DeNASA, they limit the list of Suspect ASes to just two (AS3356 and AS1299), as do- ing so increases the number of guards a client can use. In contrast to previous works, we formally deﬁne and analyze guard placement attacks using a metric that quantiﬁes the attack success. Our work is the ﬁrst to obtain quantitative estimates of the security of several location-aware Tor path selection algorithms against guard placement attacks. We also present a defense tech- nique that modiﬁes guard selection algorithms to make them provably secure. Other Path Selection Algorithms. There have also been many alternate path selection algorithms pro- posed to improve the security and/or performance of Tor [2, 4, 36, 37, 39, 52]. Our security deﬁnitions apply to more than just location-aware algorithms in that ad- versaries can choose any strategy that maximizes their guard placement attack success, which includes choosing the number and bandwidths of their guards in addition to their locations. Because these other path selection al- gorithms do not address guard placement attacks, they may also be vulnerable and beneﬁt from our defense. Traﬃc Analysis Attacks. Onion routing systems such as Tor are vulnerable to adversaries that can ob- serve traﬃc as it enters and exits the anonymity network [6, 21, 26–29, 31]. Both relay and network adversaries could monitor this traﬃc; a relay adversary may control both the entry and exit relay, while a network adversary might observe links on both sides of the circuit. The low-latency requirement of Tor makes correlating traﬃc patterns easy. This vulnerability has been known since the initial development of Tor [10], and has been demon- strated in a number of works [15, 26–28, 41]. A guard placement attack would allow an adversary to obtain the powerful guard position, making subsequent traﬃc correlation attacks on the client much easier. Website ﬁngerprinting is another deanonymization attack in which the adversary aims to associate a client’s traﬃc patterns with those of speciﬁc websites [7, 18, 19, 22, 24, 32, 33, 35, 38, 53–56]. This is a pow- erful attack because the adversary only needs access to the client’s traﬃc. Machine learning techniques can be used on features such as timing, volume, and direction to classify encrypted traﬃc as representative of a certain website. Like traﬃc correlation attacks, both relay and network adversaries could perform website ﬁngerprint- ing attacks; a relay adversary would need to control the guard relay, and a network adversary would need to ob- serve the link between the client and its guard. Guard placement attacks make website ﬁngerprinting easier by making it easier for an adversary to induce a client to choose his guard. Cost-based Adversary Models. Backes et al. [3] include an analysis of Tor’s security against a “mon- etary” adversary, for which they produce a per-relay cost model based on hosting prices and bandwidth-cost statistics. In addition to considering a relay’s band- width, their model also takes into account the speciﬁc hosting provider or country, but it does not consider the purchase of additional IP addresses. Jansen et al. [20] produce a cost model for a Tor adversary, but they do not require hosting providers to support Tor relays and do not consider additional IP addresses. 9 Conclusion In this work, we formalize the guard placement attack, in which an adversary strategically places relays to com- promise large fractions of Tor users at relatively low cost. We are the ﬁrst to systematically study the guard placement attack and show that it is highly eﬀective against location-aware path selection algorithms. We provide a deﬁnition of security against this attack and describe a general method that modiﬁes a path-selection algorithm to satisfy this deﬁnition while minimizing the impact on the algorithm’s original goals. Our work motivates the following directions for fu- ture work: (1) The design of θ-GP-secure path-selection algorithms without requiring the application of our de- fense. If an algorithm is explicitly designed to achieve θ-GP-security, it may be able to achieve improved traﬃc-analysis resistance or network performance. (2) An improved cost model. Although we provide a con- crete cost model through a study of current hosting providers, deﬁning more sophisticated cost models can further reﬁne our understanding of placement attacks and Tor security in general [36]. (3) The generaliza- tion of guard placement attacks to relay placement at- tacks. Many path-selection algorithms modify the way that clients choose middle and exit relays, making these positions potentially vulnerable to placement attacks. (4) The development of techniques to quantify the se- curity of Tor path selection. Our work highlights the importance of designing Tor algorithms that consider multi-dimensional trade-oﬀs among diﬀerent types of adversaries. Techniques that help researchers balance the many considerations in path selection can greatly beneﬁt the development of new Tor algorithms. Guard Placement Attacks 288 Acknowledgements This work has been supported by the Oﬃce of Naval Research, the Army Research Oﬃce Young Investi- gator Prize (YIP), and National Science Foundation grants CNS-1704105, CNS-1553437, and CNS-1617286. We thank Florentin Rochet for valuable feedback. References [1] Masoud Akhoondi, Curtis Yu, and Harsha V. Madhyastha. LASTor: A Low-latency AS-aware Tor Client. IEEE/ACM Transactions on Networking, 22(6), 2014. [2] Mashael AlSabah, Kevin Bauer, Tariq Elahi, and Ian Gold- berg. The Path Less Travelled: Overcoming Tor’s Bottle- necks with Traﬃc Splitting. In Privacy Enhancing Technolo- gies, 2013. [3] Michael Backes, Sebastian Meiser, and Marcin Slowik. Your choice mator (s). Proceedings on Privacy Enhancing Tech- nologies, 2016(2), 2016. [4] Armon Barton, Mohsen Imani, Jiang Ming, and Matthew Wright. Towards Predicting Eﬃcient and Anonymous Tor Circuits. In Proceedings of the 27th USENIX Conference on Security Symposium, 2018. [5] Armon Barton and Matthew Wright. DeNASA: Destination- Naive AS-Awareness in Anonymous Communications. In Proceedings on Privacy Enhancing Technologies, 2016. [6] Kevin Bauer, Damon McCoy, Dirk Grunwald, Tadayoshi Kohno, and Douglas Sicker. Low-resource Routing Attacks Against Tor. In Proceedings of the 2007 ACM Workshop on Privacy in Electronic Society, WPES ’07, 2007. [7] Xiang Cai, Rishab Nithyanand, Tao Wang, Rob Johnson, and Ian Goldberg. A Systematic Approach to Developing and Evaluating Website Fingerprinting Defenses. In ACM Conference on Computer and Communications Security (CCS), 2014. [8] CAIDA Data. http://www.caida.org/data. [9] Roger Dingledine and George Kadianakis. One fast guard for life (or 9 months). In HotPETs, 2014. [10] Roger Dingledine, Nick Mathewson, and Paul Syverson. Tor: The Second-generation Onion Router. In Proceedings of the 13th Conference on USENIX Security Symposium, 2004. [11] John R. Douceur. The Sybil Attack. In Revised Papers from the First International Workshop on Peer-to-Peer Systems, 2002. [12] Kevin P. Dyer, Scott E. Coull, Thomas Ristenpart, and Thomas Shrimpton. Peek-a-Boo, I Still See You: Why Eﬃ- cient Traﬃc Analysis Countermeasures Fail. In Proceedings of the 2012 IEEE Symposium on Security and Privacy, SP ’12, 2012. [13] Matthew Edman and Paul Syverson. AS-awareness in Tor Path Selection. In Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS ’09, 2009. [14] Tariq Elahi, Kevin Bauer, Mashael AlSabah, Roger Dingle- dine, and Ian Goldberg. Changing of the Guards: A Frame- work for Understanding and Improving Entry Guard Selec- tion in Tor. In Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society, WPES ’12, 2012. [15] Nick Feamster and Roger Dingledine. Location Diversity in Anonymity Networks. In Proceedings of the 2004 ACM Workshop on Privacy in the Electronic Society, WPES ’04, 2004. [16] Lixin Gao and Jennifer Rexford. Stable Internet Routing Without Global Coordination. IEEE/AM Transactions on Networking, 9(6), 2001. [17] David M. Goldschlag, Michael G. Reed, and Paul F. Syver- son. Hiding Routing Information. In Proceedings of the First International Workshop on Information Hiding, 1996. [18] Jamie Hayes and George Danezis. k-ﬁngerprinting: A Robust Scalable Website Fingerprinting Technique. In 25th USENIX Security Symposium (USENIX Security 16), 2016. [19] Andrew Hintz. Fingerprinting Websites Using Traﬃc Anal- ysis. In Proceedings of the 2nd International Conference on Privacy Enhancing Technologies, 2003. [20] Rob Jansen, Tavish Vaidya, and Micah Sherr. Point Break: A Study of Bandwidth Denial-of-Service Attacks against Tor. In 28th USENIX Security Symposium, 2019. [21] Aaron Johnson, Chris Wacek, Rob Jansen, Micah Sherr, and Paul Syverson. Users Get Routed: Traﬃc Correlation on Tor by Realistic Adversaries. In ACM Conference on Computer and Communications Security (CCS), CCS ’13, 2013. [22] Marc Juarez, Sadia Afroz, Gunes Acar, Claudia Diaz, and Rachel Greenstadt. A Critical Evaluation of Website Finger- printing Attacks. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS ’14, 2014. [23] Joshua Juen. Protecting anonymity in the presence of Au- tonomous System and Internet exchange level adversaries. Master’s thesis, University of Illinois at Urbana-Champaign, 2012. [24] Shuai Li, Huajun Guo, and Nicholas Hopper. Measuring Information Leakage in Website Fingerprinting Attacks and Defenses. In Proceedings of the 2018 ACM SIGSAC Confer- ence on Computer and Communications Security, 2018. [25] Maxmind GeoLite2 Database. https://dev.maxmind.com/ geoip/geoip2/geolite2/. [26] Steven J. Murdoch and George Danezis. Low-Cost Traﬃc Analysis of Tor. In Proceedings of the 2005 IEEE Sympo- sium on Security and Privacy, SP ’05, 2005. [27] Steven J. Murdoch and Piotr Zieliński. Sampled Traﬃc Analysis by Internet-Exchange-Level Adversaries. In Privacy Enhancing Technologies Symposium (PETS), 2007. [28] Milad Nasr, Alireza Bahramali, and Amir Houmansadr. DeepCorr: Strong Flow Correlation Attacks on Tor Using Deep Learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018. [29] Milad Nasr, Amir Houmansadr, and Arya Mazumdar. Com- pressive Traﬃc Analysis: A New Paradigm for Scalable Traf- ﬁc Analysis. In Proceedings of the 2017 ACM SIGSAC Con- ference on Computer and Communications Security, CCS ’17, 2017. [30] Rishab Nithyanand, Oleksii Starov, Adva Zair, Phillipa Gill, and Michael Schapira. Measuring and mitigating AS-level adversaries against Tor. In Symposium on Network and Distributed System Security (NDSS), 2016. Guard Placement Attacks 289 [31] Lasse Overlier and Paul Syverson. Locating Hidden Servers. In Proceedings of the 2006 IEEE Symposium on Security and Privacy, 2006. [32] Andriy Panchenko, Fabian Lanze, Jan Pennekamp, Thomas Engel, Andreas Zinnen, Martin Henze, and Klaus Wehrle. Website Fingerprinting at Internet Scale. In Symposium on Network and Distributed System Security (NDSS), 2016. [33] Andriy Panchenko, Lukas Niessen, Andreas Zinnen, and Thomas Engel. Website Fingerprinting in Onion Routing Based Anonymization Networks. In Proceedings of the 10th Annual ACM Workshop on Privacy in the Electronic Society, WPES ’11, 2011. [34] Mike Perry. TorFlow: Tor Network Analysis. In HotPETs, 2009. [35] Vera Rimmer, Davy Preuveneers, Marc Juárez, Tom van Goethem, and Wouter Joosen. Automated Website Finger- printing through Deep Learning. In Symposium on Network and Distributed System Security (NDSS), 2018. [36] Florentin Rochet and Olivier Pereira. Waterﬁlling: Balancing the Tor network with maximum diversity. Proceedings on Privacy Enhancing Technologies, 2017(2), 2017. [37] Micah Sherr, Matt Blaze, and Boon Thau Loo. Scalable Link-Based Relay Selection for Anonymous Routing. In Proceedings of the 9th International Symposium on Privacy Enhancing Technologies, 2009. [38] Payap Sirinam, Mohsen Imani, Marc Juarez, and Matthew Wright. Deep Fingerprinting: Undermining Website Finger- printing Defenses with Deep Learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communi- cations Security, 2018. [39] Robin Snader and Nikita Borisov. A Tune-up for Tor: Im- proving Security and Performance in the Tor Network. In Proceedings of 16th Annual Network and Distributed Sys- tem Security Symposium, 2008. [40] Yixin Sun, Anne Edmundson, Nick Feamster, Mung Chiang, and Prateek Mittal. Counter-RAPTOR: Safeguarding Tor Against Active Routing Attacks. In IEEE Symposium on Security and Privacy, 2017. [41] Yixin Sun, Anne Edmundson, Laurent Vanbever, Oscar Li, Jennifer Rexford, Mung Chiang, and Prateek Mittal. RAP- TOR: Routing Attacks on Privacy in Tor. In Proceedings of the 24th USENIX Conference on Security Symposium, SEC’15, 2015. [42] Paul Syverson, Gene Tsudik, Michael Reed, and Carl Landwehr. Towards an Analysis of Onion Routing Security. In International Workshop on Designing Privacy Enhancing Technologies: Design Issues in Anonymity and Unobservabil- ity, 2001. [43] Team-Cymru. http://www.team-cymru.com. [44] CollecTor - Tor Project. https://metrics.torproject.org/ collector.html. [45] Tor Directory Protocol. https://gitweb.torproject.org/ torspec.git/tree/dir-spec.txt. [46] Tor Guard Speciﬁcation. https://gitweb.torproject.org/ torspec.git/tree/guard-spec.txt. [47] Tor Metrics Portal. https://metrics.torproject.org/. [48] Torﬂow Protocol Speciﬁcation. https://gitweb.torproject. org/torﬂow.git/tree/NetworkScanners/BwAuthority/ README.spec.txt. [49] Yves Tillé. An elimination procedure of unequal probability sampling without replacement. Biometrika, 83, 1996. [50] Thaddeus Vincenty. Direct and Inverse Solutions of Geodesics on the Ellipsoid with Application of Nested Equa- tions. In Survey Review, 1975. [51] Ryan Wails, Yixin Sun, Aaron Johnson, Mung Chiang, and Prateek Mittal. Tempest: Temporal Dynamics in Anonymity Systems. In Privacy Enhancing Technologies Symposium (PETS), 2018. [52] Tao Wang, Kevin Bauer, Clara Forero, and Ian Goldberg. Congestion-Aware Path Selection for Tor. In International Conference on Financial Cryptography and Data Security, 2012. [53] Tao Wang, Xiang Cai, Rishab Nithyanand, Rob Johnson, and Ian Goldberg. Eﬀective Attacks and Provable Defenses for Website Fingerprinting. In USENIX Security Symposium, 2014. [54] Tao Wang and Ian Goldberg. Improved Website Fingerprint- ing on Tor. In Proceedings of the 12th ACM Workshop on Workshop on Privacy in the Electronic Society, WPES ’13, 2013. [55] Tao Wang and Ian Goldberg. On realistically attacking Tor with website ﬁngerprinting. In Privacy Enhancing Technolo- gies Symposium (PETS), 2016. [56] Tao Wang and Ian Goldberg. Walkie-Talkie: An Eﬃcient Defense Against Passive Website Fingerprinting Attacks. In 26th USENIX Security Symposium (USENIX Security 17), 2017. [57] Philipp Winter, Roya Ensaﬁ, Karsten Loesing, and Nick Feamster. Identifying and Characterizing Sybils in the Tor Network. In 25th USENIX Security Symposium (USENIX Security 16), 2016. [58] Philipp Winter and Stefan Lindskog. Spoiled Onions: Ex- posing Malicious Tor Exit Relays. In Privacy Enhancing Technologies Symposium (PETS), 2014. [59] Matthew Wright, Micah Adler, Brian N. Levine, and Clay Shields. Defending Anonymous Communications Against Passive Logging Attacks. In Proceedings of the 2003 IEEE Symposium on Security and Privacy, SP ’03, 2003. A Interpreting Consensus Weights Tor has a non-trivial method for computing consensus weights [34, 45]. While these values are ostensibly in units of KByte/s, they diﬀer substantially from the ac- tual bandwidths that relays report in their descriptors. We observe that the consensus weight is correlated to these self-advertised relay bandwidths, however. There- fore, we can use a linear regression to convert the con- sensus weights to relay bandwidths. The resulting lin- ear regression (y= 0.7638x+ 2908.2712) expresses relay bandwidth in units of KBytes/s and has a coeﬃcient of determination of r2= 0.86. The conversion from weights to actual bandwidth can be found in Table 4. Guard Placement Attacks 290 Consensus weight Weight fraction (%) BW (Mbit/s) 2,000 0.0108 35.5 3,000 0.0162 41.6 7,500 0.0404 69.1 10,000 0.0539 84.4 30,000 0.162 206.6 40,000 0.216 267.7 75,000 0.404 481.5 150,000 0.809 939.8 Table 4. Conversion from consensus weights to actual bandwidth values. B Tille’s Algorithm Counter-RAPTOR sought to provide a preliminary de- fense against guard placement attacks by adjusting the resilience of guards using Tille’s algorithm [49]. Since high resilience guards have a higher probability of selec- tion, Counter-RAPTOR instead simulates the process of ﬁrst choosing a resilience-weighted sampling of size g·Nand then choosing uniformly within that sample, where g[0,1] indicates the fraction of sampled guards and Nis the total number of guards. Adding the sim- ulated sampling step makes the selection distribution more uniform. To simulate this process, each guard’s resilience is adjusted from r(i)to r0(i)using Tille’s al- gorithm and then a guard is selected using Equation 4. Counter-RAPTOR uses a default value of g= 0.1[40]. The steps of Tille’s algorithm applied to the guards are as follows: 1. For each guard i,r0(i) = k·r(i) Pj∈G r(j), where kis ini- tially equal to the sample size (g·N) and set G initially includes all available guards. 2. For each guard i, if r0(i)>1, set r0(i) = 1, set k=k1, and exclude relay ifrom the set G. 3. Repeat the above process until each r0(i)is in [0,1]. 4. For each relay i,r0(i) = r0(i) g·N C Theorems and Proofs Theorem 1. Path selection algorithm Ais θ-GP-secure if and only if ρ(A)θ. Proof. Assume that ρ(A)θ. Consider any adversary strategy sSand client location c∈ C. Observe that fA(c, g)relCost(g)·θby the deﬁnition of ρ. More- over, pA(c, s) = PgsfA(c, g), and so pA(c, s)θ· PgsrelCost(g). Therefore, pA(c, s)/PgsrelCost(g) θ. Because sand cwere arbitrary, σ(A)θ. To prove the other direction of the equivalence, as- sume that Ais θ-GP-secure. For any guard g, let the adversary strategy sconsists of just that guard. Then fA(c, g) = pA(c, s). By the deﬁnition of θ-GP-secure, pA(c, s)θ·relCost(g). Therefore, fA(c, g)/relCost(g) θ. Because gand cwere arbitrary, ρ(A)θ. Theorem 2. Dhalts. Proof. Every loop in Algorithm 1 has a constant number of iterations except for the loop in Lines 6–19. This loop terminates if the condition in Line 10 applies to no guard in B.Bcontains all guards at the beginning of the loop, and if the loop doesn’t terminate at least one guard is removed. Thus, the loop iterates at most |G|times. Theorem 3. Let Abe any guard selection algorithm and θ1be the security parameter. Then using the guard selection distribution f0 A=D(A, θ)is θ-GP- secure. Proof. At the beginning of Algorithm 1, each guard is in the set B. The assigned probability of g(i.e. f0 A(g)) only changes while gis in B.f0 A(g)only increases because its assignments occur in Lines 12 and 15, where the former assignment is ensured to be an increase by the fact that θ0and the latter is ensured by the fact that always p0. The condition in Line 10 guarantees that if increasing f0 A(g)would violate the θbound, then instead f0 A(g)is set to meet the θbound, and gis removed from B. Therefore, if Dterminates, f0 Ais such that every guard satisﬁes the θbound. Moreover, every iteration of the loop in Lines 6–19 must occur with a non-empty Bbecause θ1guarantees that some guard remains strictly below the θbound while there is unassigned probability. This fact implies that some positive amount of the unassigned probability pgets assigned during each iteration of that loop, which also ensures that Line 11 executes and thus any probability unassigned during the iteration is assigned to x, causing another iteration if necessary. By Theorem 2, Ddoes terminate, and so the output f0 Amust be a probability distribution satisfying the θbound. Thus, by Theorem 1, it is θ-GP-secure to use f0 A=D(A, θ)as the guard selection distribution. Guard Placement Attacks 291 Autonomous System Consensus Weight (%) Relays Hosting Prices OVH SAS (AS16276) 14.06 586 ovh.com Hetzner Online (AS24940) 12.57 408 hetzner.com Online SAS (AS12876) 10.84 332 online.net scaleway.com JP McQuistan (AS200052) 3.95 48 N/A Next Layer (AS1764) 1.94 17 nextlayer.at netcup (AS197540) 1.86 71 netcup.eu Quintex (AS62744) 1.78 23 N/A iomart (AS20860) 1.66 20 N/A myLoc (AS24961) 1.59 38 myloc.de DigitalOcean (AS14061) 1.54 228 digitalocean.com Total 51.78 1,771 - Table 5. Top 10 ASes in Tor by consensus weight of hosted relays as of 2019-2-26. When hosting prices were available online, sites with pricing information are indicated. D Cost Model Details The top 10 ASes in the Tor network are listed in Ta- ble 5. These ASes contained relays with the largest total consensus weight. Of the 10 top ASes, 7 provide com- mercial hosting and make their pricing available online. For these ASes, the sites with that pricing information are indicated in the table. The exact cost model is given in Table 6. Data was obtained from hosting provider sites 2019-2-26–27. The number of relays indicates the number of relays the product’s bandwidth and cost is split among to achieve the given per-relay bandwidth and cost. Costs are given in USD. Prices given in Euros are converted to USD at a rate of 1.14 USD/Euro. The cost for a given band- width Bis the cost listed in the table for the smallest bandwidth not smaller than B. Note that neighboring bandwidths may appear to have identical costs due to rounding. Provider Product Number of relays Bandwidth (Mbps) Cost ($/month)
Online SAS Dedicated 1 1,000 11.4
Online SAS Dedicated 2 500 5.7
Online SAS Dedicated 3 333.33 4.56
Online SAS Dedicated 4 250 3.42
Online SAS Dedicated 5 200 3.19
Online SAS Dedicated 6 166.67 2.66
Online SAS Dedicated 7 142.86 2.61
Online SAS Dedicated 8 125 2.28
Online SAS Dedicated 9 111.11 2.28
Online SAS Dedicated 10 100 2.05
Online SAS Dedicated 12 83.33 1.9
Online SAS Dedicated 14 71.43 1.79
Online SAS Dedicated 16 62.5 1.71
Online SAS Dedicated 18 55.56 1.65
Online SAS Cloud 1-XS 2 50 1.14
Online SAS Cloud 1-S 6 33.33 1.14
Online SAS Cloud 1-XS 4 25 0.85
Online SAS Cloud 1-XS 6 16.67 0.76
Online SAS Cloud 1-XS 8 12.5 0.71
Online SAS Cloud 1-XS 10 10 0.68
Online SAS Cloud 1-XS 12 8.33 0.66
Online SAS Cloud 1-XS 14 7.14 0.65
Online SAS Cloud 1-XS 16 6.25 0.64
Online SAS Cloud 1-XS 18 5.56 0.63
Online SAS Cloud 1-XS 20 5 0.63
Online SAS Cloud 1-XS 22 4.55 0.62
Online SAS Cloud 1-XS 24 4.17 0.62
Online SAS Cloud 1-XS 26 3.85 0.61
Online SAS Cloud 1-XS 28 3.57 0.61
Online SAS Cloud 1-XS 30 3.33 0.61
Online SAS Cloud 1-XS 32 3.12 0.61
Table 6. Cost model derived from hosting prices of top Tor ASes.
Product in each case is from the Start line.
... Unfortunately, preferentially choosing circuits in this way also selects relays that are correlated with the identity of the user or their destination [12,14,60]. Many attacks show how this can be exploited to deanonymize Tor users, allowing a passive observer to identify information about user locations [12,14,36,60,61,74,85,86]. ...
... We quantify this advantage using MATOR by applying ShorTor to LASTor [5], a location-biased path-selection proposal. We emphasize that LASTor is not integrated in Tor and has known security flaws [86]-we include it as an illustrative example of a location-aware path selection scheme. ...
... There is additionally a large body of work that alters path selection in Tor for purposes of security [13,15,30,40,48,64,73,82,89]. While important, these works are orthogonal to ShorTor and often result in substantially degraded performance [61,74] without clear security advantages over Tor's current protocol [36,85,86]. ...
Preprint
Full-text available
We present ShorTor, a protocol for reducing latency on the Tor network. ShorTor uses multi-hop overlay routing, a technique typically employed by content delivery networks, to influence the route Tor traffic takes across the internet. ShorTor functions as an overlay on top of onion routing-Tor's existing routing protocol and is run by Tor relays, making it independent of the path selection performed by Tor clients. As such, ShorTor reduces latency while preserving Tor's existing security properties. Specifically, the routes taken in ShorTor are in no way correlated to either the Tor user or their destination, including the geographic location of either party. We analyze the security of ShorTor using the AnoA framework, showing that ShorTor maintains all of Tor's anonymity guarantees. We augment our theoretical claims with an empirical analysis. To evaluate ShorTor's performance, we collect a real-world dataset of over 400,000 latency measurements between the 1,000 most popular Tor relays, which collectively see the vast majority of Tor traffic. With this data, we identify pairs of relays that could benefit from ShorTor: that is, two relays where introducing an additional intermediate network hop results in lower latency than the direct route between them. We use our measurement dataset to simulate the impact on end users by applying ShorTor to two million Tor circuits chosen according to Tor's specification. ShorTor reduces the latency for the 99th percentile of relay pairs in Tor by 148 ms. Similarly, ShorTor reduces the latency of Tor circuits by 122 ms at the 99th percentile. In practice, this translates to ShorTor truncating tail latencies for Tor which has a direct impact on page load times and, consequently, user experience on the Tor browser.
... Furthermore, OnionShare leverages Tor, consequently it is inherently susceptible to a variety of tra c analysis a acks [7,36,45]. A variety of proposals a empt to circumvent the a acks by enhancing the route selection process [3,49,52]. Our work, orthogonal to these, takes another approach: we promote a multiplication of relays while being churn tolerant to e ectively improve anonymity. ...
Preprint
Mass surveillance of the population by state agencies and corporate parties is now a well-known fact. Journalists and whistle-blowers still lack means to circumvent global spying for the sake of their investigations. With Spores, we propose a way for journalists and their sources to plan a posteriori file exchanges when they physically meet. We leverage on the multiplication of personal devices per capita to provide a lightweight, robust and fully anonymous decentralised file transfer protocol between users. Spores hinges on our novel concept of e-squads: one's personal devices, rendered intelligent by gossip communication protocols, can provide private and dependable services to their user. People's e-squads are federated into a novel onion routing network, able to withstand the inherent unreliability of personal appliances while providing reliable routing. Spores' performances are competitive, and its privacy properties of the communication outperform state of the art onion routing strategies.
Chapter
This chapter examines PETs that limit exposure by hiding the user’s identity information. As examples of this category, the following PETs are described: mix networks; anonymous remailers; and onion routing networks. For each of these examples, the original scheme is given, enhancements made over the years are presented, and strengths and limitations of the technology are discussed.
Chapter
Tor provides anonymity to millions of users around the globe, which has made it a valuable target for malicious actors. As a low-latency anonymity system, it is vulnerable to traffic correlation attacks from strong passive adversaries, such as large autonomous systems. Estimations of the risk posed by such attackers as well as the evaluation of defense strategies are mostly based on simulations and data retrieved from BGP updates. However, this might only provide an incomplete view of the network and thereby influence the results of such analyses. It has already been acknowledged in previous studies that direct path measurements, e.g. with traceroute, could provide valuable information. But in the past, such measurements were thought to be impossible, because they require the placement of measurement nodes in the same ASes as the respective Tor network nodes. With the rise of new technologies and methodologies, this assumption needs to be re-evaluated. In this paper we present a novel methodology to utilize the RIPE Atlas framework, a network of more than 10,000 probes worldwide, to actively perform traceroute commands from and to Tor guard and exit relays to clients and destinations. Based on multiple global scans our results validate previous results and show the large influence on Tor posed by a limited set of ASes. These are in a strong position to carry out effective correlation attacks on Tor traffic. With this work, we provide an additional source of information that can be used together with BGP route information to increase the accuracy of future models and simulations of Tor and ultimately improve anonymity on the Internet.
Chapter
Tor is the most popular anonymization system with millions of daily users and, thus, an attractive target for attacks, e.g., by malicious autonomous systems (ASs) performing active routing attacks to become man in the middle and deanonymize users. It was shown that the number of such malicious ASs is significantly larger than previously expected due to the lack of security guarantees in the Border Gateway Protocol (BGP). In response, recent works suggest alternative Tor path selection methods prefering Tor nodes with higher resilience to active BGP attacks. In this work, we analyze the implications of such proposals. We show that Counter-RAPTOR and DPSelect are not as secure as thought before: for particular users they allow for leakage of user’s location. DPSelect is not as resilient as widely accepted as we show that it achieves only one third of its originally claimed resilience and, hence, does not protect users from routing attacks. We reveal the performance implications of both methods and identify scenarios where their usage leads to significant performance bottlenecks. Finally, we propose a new metric to quantify the user’s location leakage by path selection. Using this metric and performing large-scale analysis, we show to which extent a malicious middle can fingerprint the user’s location and what kind of confidence it can achieve. Our findings shed light on the implications of path selection methods on the users’ anonymity and the need for further research.
Preprint
Full-text available
Flow correlation is the core technique used in a multitude of deanonymization attacks on Tor. Despite the importance of flow correlation attacks on Tor, existing flow correlation techniques are considered to be ineffective and unreliable in linking Tor flows when applied at a large scale, i.e., they impose high rates of false positive error rates or require impractically long flow observations to be able to make reliable correlations. In this paper, we show that, unfortunately, flow correlation attacks can be conducted on Tor traffic with drastically higher accuracies than before by leveraging emerging learning mechanisms. We particularly design a system, called DeepCorr, that outperforms the state-of-the-art by significant margins in correlating Tor connections. DeepCorr leverages an advanced deep learning architecture to learn a flow correlation function tailored to Tor's complex network this is in contrast to previous works' use of generic statistical correlation metrics to correlated Tor flows. We show that with moderate learning, DeepCorr can correlate Tor connections (and therefore break its anonymity) with accuracies significantly higher than existing algorithms, and using substantially shorter lengths of flow observations. For instance, by collecting only about 900 packets of each target Tor flow (roughly 900KB of Tor data), DeepCorr provides a flow correlation accuracy of 96% compared to 4% by the state-of-the-art system of RAPTOR using the same exact setting. We hope that our work demonstrates the escalating threat of flow correlation attacks on Tor given recent advances in learning algorithms, calling for the timely deployment of effective countermeasures by the Tor community.
Article
Full-text available
Many recent proposals for anonymous communication omit from their security analyses a consideration of the effects of time on important system components. In practice, many components of anonymity systems, such as the client location and network structure, exhibit changes and patterns over time. In this paper, we focus on the effect of such temporal dynamics on the security of anonymity networks. We present Tempest, a suite of novel attacks based on (1) client mobility, (2) usage patterns, and (3) changes in the underlying network routing. Using experimental analysis on real-world datasets, we demonstrate that these temporal attacks degrade user privacy across a wide range of anonymity networks, including deployed systems such as Tor; path-selection protocols for Tor such as DeNASA, TAPS, and Counter-RAPTOR; and network-layer anonymity protocols for Internet routing such as Dovetail and HORNET. The degradation is in some cases surprisingly severe. For example, a single host failure or network route change could quickly and with high certainty identify the client's ISP to a malicious host or ISP. The adversary behind each attack is relatively weak - generally passive and in control of one network location or a small number of hosts. Our findings suggest that designers of anonymity systems should rigorously consider the impact of temporal dynamics when analyzing anonymity.
Conference Paper
Full-text available
Several studies have shown that the network traffic that is generated by a visit to a website over Tor reveals information specific to the website through the timing and sizes of network packets. By capturing traffic traces between users and their Tor entry guard, a network eavesdropper can leverage this meta-data to reveal which website Tor users are visiting. The success of such attacks heavily depends on the particular set of traffic features that are used to construct the fingerprint. Typically, these features are manually engineered and, as such, any change introduced to the Tor network can render these carefully constructed features ineffective. In this paper, we show that an adversary can automate the feature engineering process, and thus automatically deanonymize Tor traffic by applying our novel method based on deep learning. We collect a dataset comprised of more than three million network traces, which is the largest dataset of web traffic ever used for website fingerprinting, and find that the performance achieved by our deep learning approaches is comparable to known methods which include various research efforts spanning over multiple years. The obtained success rate exceeds 96% for a closed world of 100 websites and 94% for our biggest closed world of 900 classes. In our open world evaluation, the most performant deep learning model is 2% more accurate than the state-of-the-art attack. Furthermore, we show that the implicit features automatically learned by our approach are far more resilient to dynamic changes of web content over time. We conclude that the ability to automatically construct the most relevant traffic features and perform accurate traffic recognition makes our deep learning based approach an efficient, flexible and robust technique for website fingerprinting.
Conference Paper
Full-text available
Article
Full-text available
We present the Waterfilling circuit selection method, which we designed in order to mitigate the risks of a successful end-to-end traffic correlation attack. Waterfilling proceeds by balancing the Tor network load as evenly as possible on endpoints of user paths. We simulate the use of Waterfilling thanks to the TorPS and Shadow tools. Applying several security metrics, we show that the adoption of Waterfilling considerably increases the number of nodes that an adversary needs to control in order to be able to mount a successful attack. Moreover, we evaluate Waterfilling into Shadow and show that it does not impact significantly the performance of the network. Furthermore, Waterfilling reduces the benefits that an attacker could obtain by hacking into a top bandwidth Tor relay, hence limiting the risks raised by such relays.
Conference Paper
Full-text available
The website fingerprinting attack aims to identify the content (i.e., a webpage accessed by a client) of encrypted and anonymized connections by observing patterns of data flows such as packet size and direction. This attack can be performed by a local passive eavesdropper – one of the weakest adversaries in the attacker model of anonymization networks such as Tor. In this paper, we present a novel website fingerprinting attack. Based on a simple and comprehensible idea, our approach outperforms all state-of-the-art methods in terms of classification accuracy while being computationally dramatically more efficient. In order to evaluate the severity of the website fingerprinting attack in reality, we collected the most representative dataset that has ever been built, where we avoid simplified assumptions made in the related work regarding selection and type of webpages and the size of the universe. Using this data, we explore the practical limits of website fingerprinting at Internet scale. Although our novel approach is by orders of magnitude computationally more efficient and superior in terms of detection accuracy, for the first time we show that no existing method – including our own – scales when applied in realistic settings. With our analysis, we explore neglected aspects of the attack and investigate the realistic probability of success for different strategies a real-world adversary may follow.
Conference Paper
Tor provides low-latency anonymous and uncensored network access against a local or network adversary. Due to the design choice to minimize traffic overhead (and increase the pool of potential users) Tor allows some information about the client's connections to leak. Attacks using (features extracted from) this information to infer the website a user visits are called Website Fingerprinting (WF) attacks. We develop a methodology and tools to measure the amount of leaked information about a website. We apply this tool to a comprehensive set of features extracted from a large set of websites and WF defense mechanisms, allowing us to make more fine-grained observations about WF attacks and defenses.
Conference Paper
Website fingerprinting enables a local eavesdropper to determine which websites a user is visiting over an encrypted connection. State-of-the-art website fingerprinting attacks have been shown to be effective even against Tor. Recently, lightweight website fingerprinting defenses for Tor have been proposed that substantially degrade existing attacks: WTF-PAD and Walkie-Talkie. In this work, we present Deep Fingerprinting (DF), a new website fingerprinting attack against Tor that leverages a type of deep learning called Convolutional Neural Networks (CNN) with a sophisticated architecture design, and we evaluate this attack against WTF-PAD and Walkie-Talkie. The DF attack attains over 98% accuracy on Tor traffic without defenses, better than all prior attacks, and it is also the only attack that is effective against WTF-PAD with over 90% accuracy. Walkie-Talkie remains effective, holding the attack to just 49.7% accuracy. In the more realistic open-world setting, our attack remains effective, with 0.99 precision and 0.94 recall on undefended traffic. Against traffic defended with WTF-PAD in this setting, the attack still can get 0.96 precision and 0.68 recall. These findings highlight the need for effective defenses that protect against this new attack and that could be deployed in Tor.
Conference Paper
Traffic analysis is the practice of inferring sensitive information from communication patterns, particularly packet timings and packet sizes. Traffic analysis is increasingly becoming relevant to security and privacy with the growing use of encryption and other evasion techniques that render content-based analysis of network traffic impossible. The literature has investigated traffic analysis for various application scenarios, from tracking stepping stone cybercriminals to compromising anonymity systems. The major challenge to existing traffic analysis mechanisms is scaling to today's exploding volumes of network traffic, i.e., they impose high storage, communications, and computation overheads. In this paper, we aim at addressing this scalability issue by introducing a new direction for traffic analysis, which we call \emph{compressive traffic analysis}. The core idea of compressive traffic analysis is to compress traffic features, and perform traffic analysis operations on such compressed features instead of on raw traffic features (therefore, improving the storage, communications, and computation overheads of traffic analysis due to using smaller numbers of features). To compress traffic features, compressive traffic analysis leverages linear projection algorithms from compressed sensing, an active area within signal processing. We show that these algorithms offer unique properties that enable compressing network traffic features while preserving the performance of traffic analysis compared to traditional mechanisms. We introduce the idea of compressive traffic analysis as a new generic framework for scalable traffic analysis. We then apply compressive traffic analysis to two widely studied classes of traffic analysis, namely, flow correlation and website fingerprinting. We show that the compressive versions of state-of-the-art flow correlation and website fingerprinting schemes\textemdash significantly\textemdash outperform their non-compressive (traditional) alternatives, e.g., the compressive version of Houmansadr et al. [44]'s flow correlation is two orders of magnitude faster, and the compressive version of Wang et al. [77] fingerprinting system runs about 13 times faster. We believe that our study is a major step towards scaling traffic analysis.