ArticlePDF Available

Abstract and Figures

Various low-latency anonymous communication systems such as Tor and Anonymizer have been designed to provide anonymity service for users. In order to hide the communication of users, most of the anonymity systems pack the application data into equal-sized cells (e.g., 512 B for Tor, a known real-world, circuit-based, low-latency anonymous communication network). Via extensive experiments on Tor, we found that the size of IP packets in the Tor network can be very dynamic because a cell is an application concept and the IP layer may repack cells. Based on this finding, we investigate a new cell-counting-based attack against Tor, which allows the attacker to confirm anonymous communication relationship among users very quickly. In this attack, by marginally varying the number of cells in the target traffic at the malicious exit onion router, the attacker can embed a secret signal into the variation of cell counter of the target traffic. The embedded signal will be carried along with the target traffic and arrive at the malicious entry onion router. Then, an accomplice of the attacker at the malicious entry onion router will detect the embedded signal based on the received cells and confirm the communication relationship among users. We have implemented this attack against Tor, and our experimental data validate its feasibility and effectiveness. There are several unique features of this attack. First, this attack is highly efficient and can confirm very short communication sessions with only tens of cells. Second, this attack is effective, and its detection rate approaches 100% with a very low false positive rate. Third, it is possible to implement the attack in a way that appears to be very difficult for honest participants to detect (e.g., using our hopping-based signal embedding).
Content may be subject to copyright.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE/ACM TRANSACTIONS ON NETWORKING 1
A New Cell-Counting-Based Attack Against Tor
Zhen Ling, Junzhou Luo, Member, IEEE, Wei Yu, Xinwen Fu, Dong Xuan, and Weijia Jia
Abstract—Various low-latency anonymous communication sys-
tems such as Tor and Anonymizer have been designed to provide
anonymity service for users. In order to hide the communication
of users, most of the anonymity systems pack the application data
into equal-sized cells (e.g., 512 B for Tor, a known real-world, cir-
cuit-based, low-latency anonymous communication network). Via
extensive experiments on Tor, we found that the size of IP packets
in the Tor network can be very dynamic because a cell is an appli-
cation concept and the IP layer may repack cells. Based on this
nding, we investigate a new cell-counting-based attack against
Tor, which allows the attacker to conrm anonymous communi-
cation relationship among users very quickly. In this attack, by
marginally varying the number of cells in the target trafcatthe
malicious exit onion router, the attacker can embed a secret signal
into the variation of cell counter of the target trafc. The embedded
signal will be carried along with the target trafc and arrive at the
malicious entry onion router. Then, an accomplice of the attacker
at the malicious entry onion router will detect the embedded signal
based on the received cells and conrm the communication rela-
tionship among users. We have implemented this attack against
Tor, and our experimental data validate its feasibility and effec-
tiveness. There are several unique features of this attack. First, this
attack is highly efcient and can conrm very short communica-
tion sessions with only tens of cells. Second, this attack is effective,
and its detection rate approaches 100% with a very low false posi-
tive rate. Third, it is possible to implement the attack in a way that
Manuscript received May 29, 2011; accepted November 05, 2011; approved
by IEEE/ACM TRANSACTIONS ON NETWORKING Editor M. Allman. This work
was supported in part by the National Key Basic Research Program of China
(973 Program) under Grants 2010CB328104 and 2011CB302800; the National
Science Foundation of China (NSFC) under Grants 60903162, 60903161,
61070158, 61070161, 61003257, 61070221, and 61070222/F020802; the US
National Science Foundation (NSF) under Grants CNS0916584, CNS1065136,
and CNS-1117175; CityU Applied R&D Funding (ARD) under Grants
9681001, 6351006, and 9667052; CityU Strategic Research Grant 7008110;
ShenZhen-HK Innovation Cycle Grant ZYB200907080078A; the China Spe-
cialized Research Fund for the Doctoral Program of Higher Education under
Grant 200802860031; Jiangsu Provincial Natural Science Foundation of China
under Grant BK2008030; Jiangsu Provincial Key Laboratory of Network and
Information Security under Grant BM2003201; and the Key Laboratory of
Computer Network and Information Integration of Ministry of Education of
China under Grant 93K-9. Any opinions, ndings, conclusions, and recom-
mendations in this paper are those of the authors and do not necessarily reect
the views of the funding agencies. The conference version of this paper was
published in the Proceedings of the 16th ACM Conference on Computer and
Communications Security (CCS), Chicago, IL, November 9–13, 2009.
Z. Ling and J. Luo are with the School of Computer Science and Engineering,
Southeast University, Nanjing 210096, China (e-mail: zhenling@seu.edu.cn;
jluo@seu.edu.cn).
W. Yu is with the Department of Computer and Information S ciences, Towson
University, Towson, MD 21252 USA (e-mail: wyu@towson.edu).
X. Fu is with the Department of Computer Science, University of Massachu-
setts Lowell, Lowell, MA 01854 (e-mail: xinwenfu@cs.uml.edu).
D. Xuan is with the Department of Computer Science and Engineering, The
Ohio State University, Columbus, OH 43210 USA (e-mail: xuan@cse.ohio-
state.edu).
W. Jia is with the Department of Computer Science, City University of Hong
Kong, Kowloon, Hong Kong (e-mail: wei.jia@cityu.edu.hk).
Color versions of one or more of the gures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identier 10.1109/TNET.2011.2178036
appears to be very difcult for honest participants to detect (e.g.,
using our hopping-based signal embedding).
Index Terms—Anonymity, cell counting, mix networks, signal,
Tor.
I. INTRODUCTION
CONCERNS about privacy and security have received
greater attention with the rapid growth and public accep-
tance of the Internet, which has been used to create our global
E-economy. Anonymity has become a necessary and legitimate
aim in many applications, including anonymous Web browsing,
location-based services (LBSs), and E-voting. In these applica-
tions, encryption alone cannot maintain the anonymity required
by participants [1]–[3]. In the past, researchers have developed
numerous anonymous communication systems. Generally
speaking, mix techniques can be used for either message-based
(high-latency) or ow-based (low-latency) anonymity applica-
tions. E-mail is a typical message-based anonymity application,
which has been thoroughly investigated [4]. Research on
ow-based anonymity applications has recently received
great attention in order to preserve anonymity in low-latency
applications, including Web browsing and peer-to-peer le
sharing [5], [6].
To degrade the anonymity service provided by anonymous
communication systems, trafc analysis attacks have been
studied [3], [7]–[14]. Existing trafc analysis attacks can be
categorized into two groups: passive trafc analysis and active
watermarking techniques. Passive trafc analysis technique
will record the trafc passively and identify the similarity be-
tween the sender’s outbound trafc and the receiver’s inbound
trafc based on statistical measures [7]–[9], [15], [16]. Because
this type of attack relies on correlating the timings of messages
moving through the anonymous system and does not change the
trafc characteristics, it is also a passive timing attack. For ex-
ample, Serjantov et al. [7] proposed a passive packet-counting
scheme to observe the number of packets of a connection that
arrives at a mix node and leaves a node. However, they did
not elaborate how packet counting could be done. To improve
the accuracy of attacks, the active watermarking technique has
recently received much attention. The idea of this technique is
to actively introduce special signals (or marks) into the sender’s
outbound trafc with the intention of recognizing the embedded
signal at the receiver’s inbound trafc [13], [14], [17].
In this paper, we focus on the active watermarking tech-
nique, which has been active in the past few years. For example,
Yu et al. [13] proposed a ow-marking scheme based on the
direct sequence spread spectrum (DSSS) techniquebyutilizing
apse
udo-noise (PN) code. By interfering with the rate of a
suspect sender’s trafc and marginally changing the trafc rate,
the attacker can embed a secret spread-spectrum signal into the
target trafc. The embedded signal is carried along with the
1063-6692/$26.00 © 2011 IEEE
IEEE 2012 Transactions on Networking, Volume:PP,Issue:99 www.chennaisunday.com
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2IEEE/ACM TRANSACTIONS ON NETWORKING
target trafc from the sender to the receiver, so the investigator
can recognize the corresponding communication relationship,
tracing the messages despite the use of anonymous networks.
However, in order to accurately conrm the anonymous com-
munication relationship of users, the ow-marking scheme
needs to embed a signal modulated by a relatively long length
of PN code, and also the signal is embedded into the trafc
ow rate variation. Houmansadr et al. [14] proposed a nonblind
network ow watermarking scheme called RAINBOW for step-
ping stone detection. Their approach records the trafc timing
of the incoming ows and correlates them with the outgoing
ows. This approach also embeds watermarks into the trafc
by actively delaying some packets. The watermark detection
problem was formalized as detecting a known spread-spectrum
signal with noise caused by network dynamics. Normalized
correlation is used as the detection scheme. Their approach can
classify a typical SSH connection as a stepping stone connec-
tion in 3 min. As we can see, it is hard for the ow-marking
technique to deal with the short communication sessions that
may only last for a few seconds.
A successful attack against anonymous communication
systems relies on accuracy, efciency, and detectability of
active watermarking techniques. Detectability refers to the
difculty of detecting the embedded signal by anyone other
than the attackers. Efciency refers to the quickness of con-
rming anonymous communication relationships among users.
Although accuracy and/or detectability have received great at-
tention [13], [14], [17], to the best of our knowledge, no existing
work can meet all these three requirements simultaneously.
In this paper, we investigate a new cell-counting-based at-
tack against Tor, a real-world, circuit-based low-latency anony-
mous communication network. This attack is a novel variation
of the standard timing attack. It can conrm anonymous com-
munication relationship among users accurately and quickly and
is difcult to detect. In this attack, the attacker at the malicious
exit router detects the data transmitted to a suspicious destina-
tion (e.g., server Bob). The attacker then determines whether
the data is a relay cell or acontrol cell in Tor. After excluding
the control cells, the attacker manipulates the number of relay
cells in the circuit queue and ushes out all cells in the circuit
queue. In this way, the attacker can embed a signal (a series
of “1” or “0” bits) into the variation of the cell count during a
short period in the target trafc. An accomplice of the attacker
at the entry onion router detects and excludes the control cells,
records the number of relay cells in the circuit queue, and re-
covers the embedded signal. The signal embedded in the target
trafc might be distorted because the cells carrying the different
bits (units) of the original signal might be combined or separated
at middle onion routers. To address this problem, we develop
the recovery algorithms to accurately recognize the embedded
signal. Our theoretical analysis shows that the detection rate is a
monotonously increasing function with respect to the delay in-
terval and is a monotonously decreasing function of the variance
of one way transmission delay along a circuit. In our real-world
experiments, the experimental results match the theoretical re-
sults well. To be specic, our attack needs only 2 s to achieve a
true positive rate of almost 100% and the false positive rate of
almost 0%.
We hav e i mplemented the cell-counting-based attack against
Tor and performed a set of real-world Internet experiments to
Fig. 1. Tor network.
validate the feasibility and effectiveness of the attack. The attack
presented in this paper is one of the rst to exploit the imple-
mentation of known anonymous communication systems such
as Tor by exploiting its fundamental protocol design. There are
several unique features for this attack. First, this attack is highly
efcient and can quickly conrm very short anonymous com-
munication sessions with tens of cells. Second, this attack is ef-
fective, and its detection rate approaches 100% with very low
false positive rate. Third, the short and secret signal makes it dif-
cult for others to detect the presence of the embedded signal.
Our time-hopping-based signal embedding technique makes the
attack even harder to detect. The attack poses a signicant threat
to the anonymity provided by Tor because the attack can con-
rm over half of communication sessions by injecting around
10% malicious onion routes on Tor [18], [19].
The remainder of this paper is organized as follows: We intro-
duce the background in Section II. We present the cell-counting-
based attack, including the basic idea, issues of the attack, and
solutions,inSectionIII.InSectionIV,wediscussvariousis-
sues, including some extension, and the detectability and im-
pact of the proposed attack. In Section V, we analyze the ef-
fectiveness of the attack. In Section VI, we show experimental
results on Tor and validate our ndings. We review related work
in Section VII and conclude this paper in Section VIII.
II. BACKGROUND
In this section, we rst overview the components of Tor.
We then present the procedures of how to create circuits and
transmit data in Tor and process cells at onion routers.
A. Components of Tor
Tor is a popular overlay network for providing anonymous
communication over the Internet. It is an open-source project
and provides anonymity service for TCP applications [20]. As
shown in Fig. 1, there are four basic components in Tor.
1) Alice (i.e.,Client): The client runs a local software called
onion proxy (OP) to anonymize the client data into Tor.
2) Bob (i.e., Server): It runs TCP applications such as a Web
service.
3) Onion routers (ORs): Onion routers are special proxies that
relay the application data between Alice and Bob. In Tor,
transport-layer security (TLS) connections are used for the
overlay link encryption between two onion routers. The
application data is packed into equal-sized cells (512 B as
shown in Fig. 2) carried through TLS connections.
4) Directory servers: They hold onion router information
such as public keys for onion routers. Directory authori-
ties hold authoritative information on onion routers, and
directory caches download directory information of onion
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LING et al.: NEW CELL-COUNTING-BASED ATTACK AGAINST TOR 3
Fig. 2. Cell format by Tor. (a) Tor cell format. (b) Tor relay cell format.
routers from authorities. A list of directory authorities is
hard-coded into the Tor source code for a client to down-
load the information of onion routers and build circuits
through the Tor network.
Fig. 2 illustrates the cell format used by Tor. All cells have
a 3-B header, which is not encrypted in the onion-like fashion
so that the intermediate Tor routers can see this header. The
other 509 B are encrypted in the onion-like fashion. There are
two types of cells: control cell shown in Fig. 2(a) and relay
cell shown in Fig. 2(b). The command eld (Command)of
a control cell can be: CELL_PADDING, used for keepalive
and optionally usable for link padding, although not used
currently; CELL_CREATE or CELL_CREATED,usedfor
setting up a new circuit; and CELL_DESTROY,usedforre-
leasing a circuit. The command eld (Command)ofarelay
cell is CELL_RELAY. Note that relay cells are used to carry
TCP stream data from Alice to Bob. The relay cell has an
additional header, namely the relay header. There are nu-
merous types of relay commands (Relay Command), including
RELAY_COMMAND_BEGIN,RELAY_COMMAND_DATA,
RELAY_COMMAND_END,RELAY_COMMAND_SENDME,
RELAY_COMMAND_EXTEND,RELAY_COMMAND_DROP,
and RELAY_COMMAND_RESOLVE. Note that all these can
be found in or.h in released source code package by Tor.
B. Circuit Creation and Data Transmission
In Tor, an OR maintains a TLS connection to other ORsor
OPs on demand. The OP uses a way of source routing and
chooses several ORs (preferably ones with high bandwidth and
high uptime) from the locally cached directory, downloaded
from the directory caches. The number of the selected ORsisre-
ferred as the path length. We use the default path length of three
as an example. The OP iteratively establishes circuits across the
Tor network and negotiates a symmetric key with each OR,one
hop at a time, as well as handles the TCP streams from client
applications. The OR on the other side of the circuit connects to
the requested destinations and relays the data.
We now illustrate the procedure that the OP establishes a cir-
cuit and downloads a le from the server. OP rst sets up a
TLS connection with OR1 using the TLS protocol. Then, tun-
neling through this connection, OP sends a CELL_CREATE cell
and uses the Dife–Hellman (DH) handshake protocol to nego-
tiate a base key with OR1, which responds with a
CELL_CREATED cell. From this base key material, a forward
symmetric key and a backward symmetric key are pro-
duced [21]. In this way, a 1-hop circuit C1 is created. Simi-
larly, OP extends the circuit to a 2-hop circuit and 3-hop circuit.
After the circuit is set up between the OP and OR3,OP sends a
RELAY_COMMAND_BEGIN cell to the exit onion router, and
the cell is encrypted as ,
Fig. 3. Processing the cells at onion routers.
where the subscript refers to the key used for encryption of one
onion skin. The three layers of onion skin are removed one by
one each time the cell traverses an onion router through the cir-
cuit. When OR3 removes the last onion skin by decryption, it
recognizes that the request intends to open a TCP stream to a
port at the destination IP, which belongs to Bob. Therefore, OR3
acts as a proxy, sets up a TCP connection with Bob, and sends
aRELAY_COMMAND_CONNECTED cell back to Alice’s OP.
Then, Alice can download the le.
C. Processing Cells at Onion Routers
Fig. 3 illustrates the procedure of processing cells at
onion routers. Note that the cells mentioned below are all
CELL_RELAY_DATA cells, which are used to carry end-to-end
stream data between Alice and Bob. To begin with, the onion
router receives the TCP data from the connection on the given
port A. After the data is processed by TCP and TLS protocols,
the data will be delivered into the TLS buffer of the connection.
When there is pending data in the TLS buffer, the read event of
this connection will be called to read and process the data. The
connection read event will pull the data from the TLS buffer
into the connection input buffer. Each connection input buffer
is implemented as a linked list with small chunks. The data is
fetched from the head of the list and added to the tail. After
the data in the TLS buffer is pulled into the connection input
buffer, the connection read event will process the cells from the
connection input buffer one by one. As stated earlier, the cell
size is 512 B. Thus, 512-B data will be pulled out from the input
buffer every time until the data remaining in the connection
input buffer is smaller than 512 B. Since each onion router has
a routing table that maintains the map from source connection
and circuit ID to destination connection and circuit ID, the read
event can determine that the transmission direction of the cell
is either in the forward or backward direction. Then, the corre-
sponding symmetric key is used to decrypt/encrypt the payload
of the cell, replace the present circuit ID with the destination
circuit ID, and append the cell to the destination circuit queue.
If it is the rst cell added to this circuit queue, the circuit will
be made active by being added into a double-linked ring of
circuits with queued cells waiting for a room to free up on the
output buffer of the destination connection. Then, if there is no
data waiting in the output buffer for the destination connection,
the cell will be written into the output buffer directly, and
then the write event of this circuit is added to the event queue.
Subsequent incoming cells are queued in the circuit queue.
When the write event of the circuit is called, the data in the
output buffer is ushed to the TLS buffer of the destination con-
nection. Then, the write event will pull as many cells as possible
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4IEEE/ACM TRANSACTIONS ON NETWORKING
Fig. 4. Packet sequence versus packet size.
Fig. 5. Number of packets versus packet size.
from the circuit queue of the currently active circuit to the output
buffer and add the write event of this circuit to the event queue.
The next write event can carry on ushing data to the output
buffer and pull the cells to the output buffer. In other words, the
cells queued in the circuit queue can be delivered to the network
via port Bby calling the write event twice.
III. CELL-COUNTING-BASED ATTACK
In this section, we rst show that the size of IP packets in
the Tor network is very dynamic. Based on this nding, we then
introduce the basic idea of the cell-counting-based attack and
list some challenging issues related to the attack and present
solutions to resolve those issues.
A. Dynamic IP Packet Size of Trafc Over Tor
In Tor, the application data will be packed into equal-sized
cells (e.g., 512 B). Nonetheless, via extensive experiments over
the Tor network, we found that the size of IP packets transmitted
over Tor is dynamic. Fig. 4 shows the size of received IP packets
at the client over time, and Fig. 5 shows the frequency of the IP
packet size. It can be observed that the size of packets from the
sender to the receiver is random over time, and a large number of
packets have varied sizes, other than the cell size or maximum
transmission unit (MTU) size.
These observations can be reasoned as follows.
1) The varied performance of onion routers may cause cells
not to be promptly processed. According to cell processing
in Fig. 3, if an onion router is overloaded, unprocessed cells
will be queued. Therefore, cells will be merged at the IP
layer and sent out together. Those merged cells may be split
into multiple MTU-sized packets and one non-MTU-sized
packet.
Fig. 6. Cell-counting-based attack.
2) Tor network dynamics may incur those non-MTU-sized IP
packets as well. If the network between onion routers is
congested, cells will not be delivered on time. When this
happens, cells will merge, and non-MTU-sized IP packets
will show up.
B. Basic Idea of Cell-Counting-Based Attack
As we stated above, the packet size observed at the client
shows a high probability to be random because of the perfor-
mance of onion routers and Internet trafc dynamics. Motivated
by this nding, we investigate a new cell-counting-based attack
against Tor, which allows the attacker to conrm anonymous
communication relationship among users very quickly. In addi-
tion, it will be hard for the client to detect our developed attack
described in what follows.
As we mentioned before, this attack intends to conrm
that Alice (client) communicates with Bob (server) over Tor.
In order to do so, we assume that the attacker controls a
small percentage of exit and entry onion routers by donating
computers to Tor. This assumption is also used in other
studies [3], [10], [18], [19]. The assumption is valid since Tor
is operated in a voluntary manner [21]. For example, attackers
may purchase Amazon EC2 virtual machines, which can be
put into Tor. The attack can be initiated at either the malicious
entry onion router or exit onion router, up to the interest of
the attacker. In the rest of the paper, we assume that the attack
is initiated at an exit onion router connected to server Bob
and intends to conrm that Alice communicates with a known
server Bob.
The basic idea is as follows. An attacker at the exit onion
router rst selects the target trafcow between Alice and Bob.
The attacker then selects a random signal (e.g., a sequence of
binary bits), chooses an appropriate time, and changes the cell
count of target trafc based on the selected random signal. In
this way, the attacker is able to embed a signal into the target
trafc from Bob. The signal will be carried along with the target
trafc to the entry onion router connecting to Alice. An accom-
plice of the attacker at the entry onion router will record the vari-
ation of the received cells and recognize the embedded signal.
If the same pattern of the signal is recognized, the attacker con-
rms the communication relationship between Alice and Bob.
As shown in Fig. 6, the workow of the cell-counting-based
attack is illustrated as follows.
Step 1: Selecting the Target: At a malicious exit onion
router connected to the server Bob, the attacker will log
the information, including server Bob’s host IP address and
port used for a given circuit, as well as the circuit ID. The
attacker uses CELL_RELAY_DATA cells since those cells
transmit the data stream. According to the description of Tor
in Section II, we know that the attacker is able to obtain the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LING et al.: NEW CELL-COUNTING-BASED ATTACK AGAINST TOR 5
rst cell backward to the client, which is a CELL_CREATED
cell and is used to negotiate a symmetric key with the middle
onion router. The second cell backward to the client will be a
CELL_RELAY_CONNECTED cell. All sequential cells will be
CELL_RELAY_DATA cell, and the attacker starts the encoding
process shown in Step 2.
Step 2: Encoding the Signal: In Section II, we introduced
the procedure of processing cells at the onion routers. The
CELL_RELAY_DATA cells will be waiting in the circuit queue
of the onion router until the write event is called. Then, the cells
in the circuit queue are all ushed into the output buffer. Hence,
the attacker can benet from this and manipulate the number
of cells ushed to the output buffer all together. In this way,
the attacker can embed a secret signal (a sequence of binary
bits, i.e., “10101”) into the variation of the cell count during a
short period in the target trafc. Particularly, in order to encode
bit “1,” the attacker ushes three cells from the circuit queue. In
order to encode bit “0,” the attacker ushes only one cell from
the circuit queue. In order to accurately manipulate the number
of the cells to be ushed, the attacker needs to count the number
of cells in the circuit queue. Once the number of the cells is
adequate (i.e,, three cells for encoding “1” bit of the signal,
and one cell for “0” bit of the signal), the attacker calls the
circuit write event promptly and all the cells are ushed to the
output buffer immediately. Unfortunately, due to the network
congestion and delay, the cells may be combined or separated
at the middle onion routers, or the network link between the
onion routers. We will develop a reliable encoding mechanism
to deal with network dynamics in Section III-C.
Step 3: Recording Packets: After the signal is embedded in
the target trafc in Step 2, it will be transmitted to the entry
onion router along with the target trafc. An accomplice of the
attacker at the entry onion router will record the received cells
and related information, including Alice’s host IP address and
port used for a given circuit, as well as the circuit ID. Since
the signal is embedded in the variation of the cell count for
CELL_RELAY_DATA cells, an accomplice of the attacker at the
entry onion router needs to determine whether the received cells
are CELL_RELAY_DATA cells. This can be done through a way
similar to the one in Step 1. We know that the rst two cells that
arrive at the entry onion router are CELL_RELAY_EXTENDED
cells, and the third one is a CELL_RELAY_CONNECTED cell.
After these three cells, all cells are a CELL_RELAY_DATA cell.
Therefore, starting from this point, the attacker records the cells
arriving at the circuit queue.
Step 4: Recognizing the Embedded Signal: With recorded
cells, the attacker enters the phase of recognizing the embedded
signal. In order to do so, the attacker uses our developed re-
covery mechanisms presented in Section III-C to decode the
embedded signal. Once the original signal is identied, the entry
onion router knows Alice’s host IP address, and the exit onion
router knows Bob’s host IP address of the TCP stream. There-
fore, the attacker can link the communication relationship be-
tween Alice and Bob. As mentioned earlier, when the signal is
transmitted through Tor, it will be distorted because of network
delay and congestion. For example, when the chunks of three
cells for encoding bit “1” arrive at the middle onion router, the
rstcellwillbeushed to the output buffer promptly if there
is no data in the output buffer. The subsequent two cells are
queued in the circuit queue. When the write event is called, the
rst cell is sent to the network, while the subsequent two cells
are ushed into the output buffer. Therefore, the chunks of the
three cells for carrying bit “1” may be split into two portions.
The rst portion contains the rst cell, and the second portion
contains the second and third cell together. Therefore, attention
must be paid to take these into account to recognize a signal bit.
Due to the network congestion and delay, the cells may be com-
bined or separated at the middle onion routers, or the network
link between the onion routers [22]. All these facts cause a dis-
torted version of the originally embedded signal to be received
at the entry onion router. To deal with these issues, we will de-
sign mechanisms to carefully encode and robustly recover the
embedded signal in Section III-C.
C. Issues and Solutions
From the description above, we know that there are two crit-
ical issues related to the attack: 1) How can an attacker effec-
tively encode the signal at the exit onion router? 2) How can
an attacker accurately decode the embedded signal at the entry
onion router? We address these two issues below.
1) Encoding Signals at Exit Onion Routers: Two Cells for
Encoding “1” Bit Is Not Enough: As we stated earlier, this at-
tack intends to manipulate the number of cells and embed the
secret signal into the variation of the cell count during a short
period in the target trafc. If the attacker uses two cells to en-
code bit “1,” it will be easily distorted over the network and
will be hard to recover. The reason is that when the two cells
arrive at the input buffer at the middle onion router, the rst
cell will be pulled into the circuit queue. If the output buffer is
empty, the rstcellwillbeushed into the output buffer im-
mediately. Then, the second cellwillbepulledtothecircuit
queue. Since the output buffer is not empty, the second cell will
stay in the circuit queue. When the write event is called, the rst
cell will be delivered to the network, while the second cell will
be written to the output buffer and wait for next write event.
Consequently, two originally combined cells will be split into
two separate cells at the middle router. Hence, the attacker at
the entry onion router will observe two separate cells arriving
at the circuit queue. These two cells will be decoded as two “0”
bits, leading to a wrong detection of the signal. To deal with this
problem, the attacker should choose at least three cells for car-
rying bit “1.” If the middle onion router splits them into one cell
and two cells, the attacker can still recognize the pattern and de-
code the signal bit correctly at the entry onion router.
Proper Delay Interval Should Be Selected for Transmitting
Cells: Since the signal modulates the number of cells trans-
mitted from the exit onion router to the entry onion router, the
delay intervals among cells that carry different units (bits) of the
signal will have impact on the accuracy and detectability of the
attack. Hence, care must be taken to select a proper interval for
transmitting those cells. If the delay interval among cells is too
large, users may not be able to tolerate the slow trafcrateand
will choose another circuit to transmit the data. When this hap-
pens, the attack will fail. When the delay interval among cells
is too small, it will increase the chance that cells may be com-
bined at middle onion routers. Let us use one simple example
to clarify this. We assume that the delay intervals for three bits
“0,” “1,” and “0” of the signal are very small. The rst cell for
carrying the rst bit “0” arrives at the middle onion router and
is written into the queue. This rst cell will be ushed into the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6IEEE/ACM TRANSACTIONS ON NETWORKING
output buffer if the output buffer is empty. The write event is
added to the event queue, and the cell waits to be written to
the network by the write event. Since the interval is small, the
three cells for the second bit “1” and the cell for the third bit
“0” also arrive at the middle onion router and stay in the circuit
queue. When the write event is called, the rst cell for carrying
the rst bit “0” will be written to the network, while the fol-
lowing three cells for carrying the second bit of the signal and
one cell for carrying the third bit of the signal will be written to
the output buffer all together. When this happens, the original
signal will be distorted (i.e., the third bit “0” of the signal will be
lost). Therefore, the attacker needs to choose the proper delay
interval for transmitting cells. In addition, we will discuss the
types of the division and combination of the cells with details
in Section III-C.2.
We now check conditions that preserve units of the signal
during transmission. Let be the signal,
a series of bits, where is the signal length and
is 0 or 1. When , the attacker will choose three cells
to encode bit “1.” When , the attacker will choose only
one cell to encode bit “0.” Let the time sequence of the signal
that arrives at the OR2 be ,andlet
be the average time of calling the read event, which pulls
the data of cells for each unit of the signal from the TLS buffer
and write them to the circuit queue. Let be the average
time of calling the write event, which writes the cells in the
output buffer to the network and ushes the cells in the circuit
queue to the output buffer. Let the delay interval between two
sequential bits of the signal be , and let the delay of transmitting
data between OR3 and OR2 be . The relationship between
and can be represented as follows:
(1)
Let the time of the cells for the signal arriving at the circuit
queue be ,where . Let the time of the
cells for the signal arriving at the output buffer be ,
where . Please refer to [22] for
statistics of ,and other related random variables.
In order to avoid the combination of cells that belong to dif-
ferent units of a signal in the circuit queue, the cells for carrying
one bit should be ushed to the output buffer or the network be-
fore the cells for carrying the next unit of the signal arrives at
the circuit queue. Therefore, we have
(2)
(3)
(4)
(5)
The parameter is affected by the network condition.
Suppose that the network is congested, i.e., ,the
write event in the event queue cannot be called in time to ush
the cells in the output buffer and the circuit queue. Then, the
subsequent cells will be queued in the circuit queue along with
the previous cells. Therefore, the cells belonging to different
units of the signal will be combined in the circuit queue. If the
network load is light and is small, i.e., ,the
cells will be transmitted in time at the middle onion router. In
this case, when three cells carrying “1” bit of the signal arrive at
the middle onion router, the rstcellwillbeushed to the output
Fig. 7. Signal division and combination. (a) Types I and II. (b) Types III and
IV.
buffer since the output buffer is empty. Then, the next two cells
will be queued in the circuit queue. Therefore, the cells for “1”
bit of signal will be divided into two parts. If the network load
is medium, i.e., , when the cells for the previous
unit of the signal wait in the output buffer, the cells for the next
unit of the signal arrive at the queue. The write event will be
called to write the cells for the previous unit of the signal to the
network and ush the cells for the next unit of the signal to the
output buffer. Therefore, cells for different units of the signal
will not be combined or divided.
2) Decoding Signals at Entry Onion Routers: Distortion of
the Signal: The proper selection of delay interval for transmit-
ting cells for carrying different units of the signal will reduce
the probability that cells will be combined or divided at middle
onion routers. However, due to unpredictable network delay and
congestion, the combination and division of cells will happen
anyway. This will cause the embedded signal to be distorted,
and the probability of recognizing the embedded signal will be
reduced. To deal with the distortion of the signal, we present
a recovery mechanism that robustly recognizes the embedded
signal.
The combination or division of the cells for different units
of the signal can be categorized into four types. Fig. 7(a) illus-
trates two types of the cell division for the unit of the signal, and
Fig. 7(b) illustrates the two types of the cell combination for dif-
ferent units of the signal. Let
be the cell numbers recorded in the circuit queue at the entry
onion router, and is the number of the cells,
which is a positive integer. Recall the original signal is denoted
as .Let be the th signal
bit, as the part of the th signal bit, and let be the integral
signal bits or a remaining signal bit in the packet or a null signal
bit. Type-I distortion indicates that the original signal is di-
vided into separate cells. Fig. 8 illustrates an example for
Type I with . Suppose signal is bit “1”; the number
of cells should be 3. As a matter of fact, the attacker at the
entry onion router records that is 1 and is 2, i.e., three
cells for signal are divided into one cell and two cells. More-
over, signal may also be divided into three separate cells,
i.e., . Type-II distortion indicates that the last part of
is merged with the following signal(s) . Fig. 8 illustrates an
example for Type III with . Suppose signal is bit “1”
and is a integral signal for “0” bit. However, the at-
tacker records that is 1 and is 3, i.e., the part of is
merged with the followed signal . Type-III distortion in-
dicates that original signals are merged into a signal packet.
Fig. 8 illustrates an example for Type III with .If ,
and are “010,” the attacker records that is 5. In
this case, the cells belonging to three signal units are merged
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LING et al.: NEW CELL-COUNTING-BASED ATTACK AGAINST TOR 7
Fig. 8. Examples of signal division and combination.
all together. Type-IV distortion indicates that a part of is
merged into the following cells. Fig. 8 illustrates an example
for Type III with . If signal ,and are “010”
bits, and will be recorded as 2 and 3, respectively. We
give simple examples of four types of division and combination
listed above. The division or combination of the cells in these
types may be even more complicated on Tor.
Signal Detection Schemes: To deal with those types of
combination and separation, we propose our detection scheme.
Algorithm 1 in Appendix A shows the recovery mechanism. If
the number of cells recorded in the circuit queue is smaller than
the number of cells of the original signal, the signals are recov-
ered as either Type I or Type II. Suppose the number of cells
recorded in the circuit queue is larger than the number of cells
for carrying the signal; these recovered signals will be either
Type III or Type IV depending on the condition whether there
is in . When the signals are recovered in these Types
with , we consider that these signals are successfully
identied. Otherwise, the signals cannot be identied.
IV. EXTENSION AND DISCUSSION
In this section, we study various issues, including the impact
of controlling both entry and exit onion routers, how an attacker
uses only Tor exit routers for launching the attack, and the de-
tectability and other impacts of the attack.
A. Impact of Controlling Both Entry and Exit Onion Routers
We now investigate the impact of controlling both entry
and exit onion routers. We assume that the attacker needs to
set up malicious onion routers in the Tor network. As men-
tioned in [23], there are four types of onion routers at the
Tor network—namely, entry router, middle router, exit router,
and both entry and exit router (denoted as EE router). In the
cell-counting-based attack, the attacker controls a number of
onion routers as either entry routers or exit routers. In order to
understand the impact, we need to evaluate the probability that
a TCP stream traverses both the malicious entry onion router
and exit onion router, given that a number of routers in Tor are
malicious and controlled by attackers.
To ensure the performance of circuits, Tor adopts weighted
bandwidth routing algorithms. First, the client chooses an ap-
propriate exit onion router OR3 from the set of exit routers, in-
cluding the pure exit routers and EE routers. The bandwidth
of exit routers is weighted as follows. Assume that the total
bandwidth is , the total exit bandwidth is , and the total
entry bandwidth is .If , i.e., the bandwidth
of exit routers is scarce, the exit routers will not be considered
for nonexit use. The bandwidth of EE routers are weighted by
,where is the bandwidth weight of
entry routers and .If ,then .
The probability of selecting the th exit router from the exit set
is ,where is the total bandwidth
of EE routers. Second, the client chooses an appropriate entry
onion router OR1 from the set of entry routers, including the
pure entry routers and EE routers. To ensure sufcient entry
bandwidth, if , the entry routers will not be con-
sidered for nonentry use. Then, the probability of selecting the
th entry router from the entry set is ,
where is the exit bandwidth weight and
is the th bandwidth in the entry set. If ,then
. Eventually, the client chooses the middle from the rest
of Tor routers.
Assume that we congure EC2 nodes as malicious entry, exit,
or EE routers. Denote the number of malicious exit routers as ,
the number of malicious entry routers as , and the number of
the malicious EE routers as ,where . Based on
the above weighted bandwidth selection algorithm, the weight
can be derived by
(6)
.(7)
Then, the catch probability can be calculated as follows:
(8)
According to the above formula, we could derive the max-
imum and the corresponding number of exit routers and entry
routers [24].
In our recent study [24] shown in Fig. 9, we showed that by
injecting around 4% of onion routers with long uptime and high
bandwidth, the attack can conrm over 60% of the communica-
tion sessions over Tor.1We consider two strategies. In Scheme
1, we donate nodes such as those from Amazon EC2 as either
Tor exit routers or entry routers (not as EE routers). In Scheme
2, we congure EC2 nodes as entry, exit, or EE sentinels. We
can see that these two schemes achieve similar results. Note
that since TorFlow [25] can measure the real bandwidth of the
Tor nodes, the attacker should rent sufcient bandwidth for each
EC2 node instead of making fake bandwidth advertisement. Be-
cause of the pay-as-you-go model of the cloud computing, such
a bandwidth rent is feasible to malicious organizations or people
with modest power.
According to previous research [26], Tor will suffer severe
TCP performance degradation if it adopts the random path se-
lection strategy to reduce the impact of the attack. The band-
width of 90% of Tor routers is less than 350 kB/s [24]. Suppose
that a client uses random path selection strategy, the probability
1Note the fact is true for any powerful trafcconrmation attack as well as
the proposed attack.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8IEEE/ACM TRANSACTIONS ON NETWORKING
Fig. 9. Probability that a circuit chooses the malicious routers as entry and exit
routers versus number of malicious Tor routers [19].
Fig. 10. Packet format. (a) Packet format of 1 cell. (b) Packet format of 2 cell.
(c) Packet format of 3 cell.
Fig. 11. TLS header.
that it selects Tor routers with low bandwidth for the circuits is
around 90%. Obviously, the low-bandwidth Tor router will be
the bottleneck of the circuit.
B. Controlling Exit Onion Routers Only
If the attacker does not control entry onion routers, the cell-
counting-based attack can still be successful. An attacker can
sniff the packets transmitted between an entry onion router and
a client. The attacker may recover the embedded signal based
on the size of the packet. In this way, the number of required
malicious routers in Tor can also be reduced while the attack
still has a desired impact.
We now introduce the structure of the IP packet that envelops
the cell(s) and passes along the network. Without loss of gen-
erality, we assume that MTU is 1500 B. Fig. 10(a) illustrates
the structure of IP packet that envelops one cell, including an IP
header, a TCP header, an empty TLS application record, and a
TLS application record of enveloping one cell. The TLS record
packet incorporates a TLS header (5 B), a TLS message (not to
exceed 2 B), a MAC (Message Authentication Code, 20 B),
and a TLS padding (12 B). Fig. 11 illustrates the header of the
TLS packet, with the length of 5 B. The eld of content type
identies the record-layer protocol type contained in this record,
with the length of 1 B. In our case, we are concerned with the
Fig. 12. Time-hopping technique.
TLS application record, with content type of 23. The eld of
version identies the major or minor version of TLS for the con-
tained message, with the length of 2 B. The eld of length iden-
ties the length of protocol message(s), not to exceed 2 B.
Fig. 10(b) illustrates the structure of IP packet that envelops
two cells and has a length of 1150 B. Because an IP packet that
envelops three cells exceeds the MTU (1500 B), this IP packet
will be segmented; one segment has the packet of 1500 B, and
the other segment has the packet of 214 B. Fig. 10(c) illustrates
the structure of IP packet that envelops three cells and is seg-
mented. Hence, the attacker can map “0” bit of the signal to one
IP packet, with the length of 638 B. By appropriately choosing
a delay interval at the exit onion router, the “1” bit of the signal
will have two cases: two IP packets with one cell [shown in
Fig. 10(a)] and two cells [shown in Fig. 10(b)], i.e., the signal
is divided as Type I with , as well as two IP packets
with three cells [shown in Fig. 10(c)], which is neither divided
nor combined. Therefore, from packet size pattern, the attacker
is still able to recognize the signal embedded in the IP packet
stream by using our signal detection mechanism. Actually, the
fact that multiple cells can be packed into a packet guarantees
the correct signal encoding via the variation of the cell count.
When such a packet arrives at the TLS buffer, those cells form
a group, which is read into the circuit queue. This is our mech-
anism that generates a signal bit “1” or “0.”
C. Attack Detectability
The proposed cell-counting-based attack is difcult to detect.
As we know, the attack transmits a short and secret random
signal known only to the attackers. It is difcult to detect
within the target trafc. Based on the evaluation data shown
in Section VI, the success of this attack requires only a short
secret signal—such as 5 b—while achieving a detection rate
of almost 100% and a false positive rate of .Itwouldbe
hard to classify such a short sequence of random signals as the
attack sequence in bursty network trafc.
To further improve the attack invisibility, we adopt the
time-hopping-based signal embedding technique, which can
greatly reduce the probability of interception and recogni-
tion [27]. Fig. 12 illustrates the principle of the time-hopping
technique. For the time hopping, there exist random intervals
between signal bits. At the exit onion router, the duration of
those intervals are varied according to a pseudorandom control
code, which is known to only the attackers. To recover the
signal at the entry onion router, an accomplice of the attacker
can use the same secret control code to help position the signal
bits and recover the whole signal. Intuitively, if the interval
between the bits is large enough, the inserted signal bits appear
sparse within the target trafc, and it is difcult to determine
whether groups of cells are caused by network dynamics or
by intention. Therefore, the secret signal embedded into the
target trafc is no different than the noise. In addition, when
a malicious entry node has conrmed the communication
relationship, it can separate the group of cells by adding delay
between the cells so that not even the client can observe the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LING et al.: NEW CELL-COUNTING-BASED ATTACK AGAINST TOR 9
embedded signal. In Section VI, we demonstrate the effective-
ness of this time-hopping-based technique, and the detailed
approach is shown in Algorithm 2 in Appendix A.
In our proposed attack, a secret signal is embedded into the
target trafc, which implies a secret sequence of groups of one
and three cells. One may be concerned that if the sequence of
groups of one and three cells is unnatural and the entry node is
honest and aware of the attack, it will detect the sequence and
thus distinguish the trafcow with an embedded signal from
aow without a signal. However, with the time-hopping tech-
nique, groups of one and three cells are separated by random
intervals, and it is hard to differentiate them from those caused
by network dynamics. As a side note, the false positives in de-
tecting signal bits in Section VI’s gures imply that normal net-
work trafc does have groups of one and three cells caused by
network dynamics. In addition, since the embedded signal is
very short and only known to attackers, we conjecture that it is
very difcult to distinguish trafc with embedded signals from
normal trafc based on this very short secret sequence of cell
groups.
D. Difference From Existing Attacks
The proposed cell-counting-based attack may dramatically
degrade anonymity that Tor maintains. Different from other ex-
isting attacks, the cell-counting-based attack is accurate, ef-
cient, and difcult to detect. This attack requires much fewer
packets and incurs little overhead while achieving a higher de-
tection rate than most trafc analysis attacks, including trafc
conrmation attacks in [10], [13], [17], [28], and [29]. Since
this attack utilizes the atomic unit of a trafcow, i.e., cells/
packets (and their size), this attack is highly efcient and can
conrm very short communication sessions with only tens of
cells. Although the tagging-based attack [19], [30] may require
few packets, it tears down the Tor circuits and is relatively easy
to detect. A simple passive cell-counting attack may count the
cells at points of exit and entry onion routers and correlate the
counting. However, there is no guarantee of detection rate and
false positive rate because of the large number of connections
running through Tor. In addition, our attack achieves a low false
positive rate with a very small amount of target trafcasdemon-
strated in Section VI. Therefore, as a powerful trafcconrma-
tion attack, the proposed attack poses a great challenge against
Tor .
E. Countermeasures
We now discuss possible countermeasures. It is also difcult
for Tor to defeat the cell-counting attack. One possible counter-
measure is that Tor routers add delay between cells in order to
disrupt malicious cell groups. However, choosing such a delay
will be very challenging. A too short delay cannot separate cells
(at the network layer), while a long delay may dramatically de-
grade Tor’s performance, which is already the biggest bottle-
neck of using Tor [22], [26], [31]. A second way to reduce the
impact of the proposed attack is to use purely random routing
algorithms and reduce the chance of trafcows passing mali-
cious Tor onion routers. However, such a random routing algo-
rithm will also degrade Tor performance. Its effect is also very
limited since the attacker can inject more malicious routers into
Tor to increase the impact.
Dummy trafc may be used to distort the timing of the
signal. A constant rate padding along a circuit may incur too
much overhead. Levine et al. [8] investigated a defensive
dropping scheme, in which dummy trafc can be randomly
dropped at the intermediated routers. An end-to-end defen-
sive dropping cannot be applied to Tor directly. Tor adopts
Advanced Encryption Standard Counter Mode (AES-CTR) to
encrypt the cells. The AES counter at each onion router and
onion proxy is synchronized. Defensive dropping will disrupt
this AES counter and cause decryption errors at the onion
proxy or the exit routers [30]. These errors will tear down the
circuits. Shmatikov et al. [15] proposed an adaptive padding
scheme by injecting dummy packets into statistically unlikely
gaps in the packet ow, destroying timing ngerprints without
adding any latency to the application trafc. However, in our
case, the attacker controls the exit router, and the signal can be
embedded in the dummy trafcaswell.
This paper provides guidance to anonymous protocol design
and implementation. To design an anonymous communica-
tion system, we have to consider the impact of the design on
all protocol layers. For example, Tor implements an overlay
protocol and preserves equal-sized cells on the application
layer. However, the equal-sized cells on the application layer
cannot guarantee that packets on the network layer are also
equal-sized. Hence, the equal-sized cells on the application
layer cannot guarantee the anonymity provided by Tor. Indeed,
our attack exploits the Tor protocol’s impact on the network
layer.
V. A NALYSIS
In this section, we show the analytical results for the accu-
racy and efciency of the cell-counting-based attack. For attack
accuracy, we derive closed formulas for detection rate and false
positive rate. Our theoretical analysis shows that the detection
rate is a monotonously increasing function with respect to the
delay interval and is a monotonously decreasing function of the
variance of one way transmission delay along a circuit. Our ex-
perimental results in Section VI match the theoretical results
well.
A. Detection Rate
We view that the major factor causing detection error is net-
work dynamics, which leads to combination and division of
cell groups. Our analysis is based on the network congura-
tion described in the second paragraph of Section II-B. The
round-trip delay between two onion routers can be modeled by
a log-normal distribution [32]. We rst investigate the proper-
ties of the log-normal distribution and then use the delay model
to derive detection rate analytically.
A log-normal random variable has the property that its loga-
rithm has a Gaussian distribution. Let be a Gaussian random
variable with the probability density function (PDF), we have
(9)
where and are mean and standard derivation, respec-
tively. Let ,where is a random variable with
log-normal distribution and the PDF of is given by
(10)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10 IEEE/ACM TRANSACTIONS ON NETWORKING
Fig. 13. One-way trip time probability density function. (a) Germany.
(b)–(d) US.
Let be the log-normal random variable of the delay be-
tween OR3 and OR2,and be the log-normal random vari-
able of the delay between OR2 and OR1. Following the widely
used assumption that a sum of independent log-normal random
variables is well approximated by another log-normal random
variable, we have
(11)
where the random variable possesses a Gaussian distribution.
Therefore, the round-trip delay between OR3 and OR1 is also a
log-normal distribution .Since follows a log-normal distri-
bution, the arrival time of the signal at the entry onion router is
approximately , which is a log-normal distribution as well.
This fact is formally proved in Appendix B.
We have experimentally measured one-way trip time along
the circuit and veried this fact. In our experiments, the client
sends a cell to the server every 10 s via the OP.Wechange
the conguration of the client to select our entry node and exit
node for its circuits. We use Network Time Protocol (NTP) to
synchronize entry node and exit node.2The entry node and exit
node record the timestamp of the incoming cells. The middle
nodes are selected randomly by the client. Therefore, the differ-
ence of the timestamps recorded in entry nodes and exit nodes
are one-way trip time between entry nodes and exit nodes.
Fig. 13 shows that the realistic data can be approximated by the
log-normal distribution. Note that in this gure, the solid line is
the PDF derived from the realistic data. The dashed line is the
estimated log-normal PDF by using maximum likelihood esti-
mation (MLE). The middle node in the experiments producing
Fig. 13(a) is in Germany, while all the other middle nodes
for Fig. 13(b)–(d) are in the US. From these gures, we can
see that the empirical curves match the estimated log-normal
distribution curves well.
2NTP ver. 4 can usually maintain a time accuracy of 10 ms over the public
Internet and can achieve an accuracy of 0.2 ms or better in local area networks.
We obtain statistics to show the trend of one-way delay between an entry node
and exit node, and the accuracy provided by NTPv4 is sufcient for our exper-
iments.
Now we derive the detection error rate. Let be the length
of the original signal, and the arrival time of the signals at the
entry onion router be . Let the delay
interval between the two bits of the signal be . Because cells
associated with the neighboring signal bits can be combined
(when ), the probability of error becomes
(12)
Letting ,wehave
(13)
Detection rate is dened as the probability that a 1-b orig-
inal signal is recognized correctly. We have
(14)
Let and .Wehave . Assume
and are independent and identically distributed (i.i.d.).
and are i.i.d. as well. Let and be mean and standard
deviation of the variable ( or ). Because
(15)
(16)
then
(17)
(18)
(19)
(20)
where .
In addition, the rst derivative of function is given
by
(21)
(22)
Since and ,wehave . Hence,
and is a monotonously increasing function in terms
of . Therefore, the larger the delay interval we choose, the
higher the detection rate that will be achieved. This result is also
validated by our real-world experimental data in Section VI.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LING et al.: NEW CELL-COUNTING-BASED ATTACK AGAINST TOR 11
Fig. 14. Theoretical detection rate versus delay interval and variance of the
log-normal distribution.
Detection rate is dened as the detection rate for an
-bit original signal. Given for detection rate for 1-b original
signal, we have
(23)
which is a monotonously increasing function with the delay in-
terval as well.
Fig. 14 illustrates the theoretical results based on the above
theoretical analysis, i.e., (17). It shows the relationship among
the theoretical detection rate, delay interval (ms), and variance
of the log-normal distribution. Assume that the mean of the
log-normal distribution is 700 ms. We have two observations
from Fig. 14: 1) the theoretical detection rate is a monotonously
increasing function with respect to the delay interval ;2)the
theoretical detection rate is a monotonously decreasing function
with respect to the variance of the log-normal distribution.
Our experimental results in Section VI match these observations
well and validate our theoretical analysis.
B. False Positive Rate
When there is no signal embedded into the target trafc, there
is the possibility that the detection could reach an incorrect deci-
sion. Packets in the normal trafc would have different sizes. Let
the probability of one cell packed in an IP packet be (which
will be recognized as signal bit “0”). Let the probability of three
cells packed in the packet be (which will be recognized as
signal bit “1”). Let be the probability that packets have other
sizes. We have .
Thefalsepositiverate for recognizing an -bit signal
can be calculated by
(24)
To obtain the empirical distribution of IP packet size for the
trafc within the Tor network, we downloaded a le with the
size of 20 M using the Tor network. Fig. 15 shows the cumula-
tive probability function for the packet size in normal trafc. It
shows that the sum of and is around 0.5. Then, we have
(25)
Fig. 15. Empirical cumulative distribution function (CDF) of packet size.
Therefore, we will have a lower false positive rate, as the orig-
inal signal length becomes longer. Given the false positive
rate in the above formula, we can determine the original
signal length . For example, given the false positive rate of
1.5% (or 0.4%), we can use an original signal of length 3 (or 4).
In our extensive experiments in Section VI, we observed even
much lower false positive rate.
C. Attack Capacity
We can use the information-theoretical model to analyze the
efciency of the cell-counting-based attack. Recall that in this
attack, the attacker at the exit onion router embeds a bit of signal,
and the attacker at the entry router recognizes a correct bit or
a wrong bit information. When bit information is mistakenly
recognized, we call the signal bit to be “erased.” Hence, the
cell-counting-based attack technique uses the host trafcfrom
the exit onion router to the entry router as a covert channel
to transmit an invisible signal. Using the concept of channel
capacity, we can obtain efciency of our investigated attack.
Channel capacity dened by Shannon gives a theoretical bound
for measuring the information transmission capability over a
noisy channel [33].
Assume the channel model in our system is a discrete and
memoryless channel (DMC). This attack can be modeled as
a binary erasure communication channel as shown in Fig. 16,
where presents the transmission signal, presents the re-
ceived signal, and represents the probability of transmitting
bit 1. Recall that the network congestion or delay can result in
the erasure signal (i.e., either 1 or 0) in our case. Let the proba-
bility that one bit of signal is “erased” be and be the amount
of time to transmit one input bit across the DMC channel.
The random variable is the time to send such a bit. Note that
also includes the network delay caused by network dynamics.
The mean of is represented by . The mutual information
in units of bits per second considering the transmis-
sion time cost for a channel is
(26)
Then, the capacity in units of bits per second for a DMC [34]
is given by
(27)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
12 IEEE/ACM TRANSACTIONS ON NETWORKING
Fig. 16. Channel model.
Based on the channel model shown in Fig. 16, we know that
can be derived by
(28)
(29)
(30)
(31)
(32)
and we have
(33)
Note that determines how quickly the attacker can
transmit one bit of signal. In our case, it is determined by
along with other factors. Recall that is the delay interval
between the two bits of the transmitted signal. From the above
analysis, we also know that is a monotone increasing
function of . The above formula provides a few important
insights into the capability of cell-counting-based attack. With
the increase of ,alarger can improve the capacity, but
it causes a larger and deteriorates the capacity. With
the decrease of , it can obtain a smaller to improve
the capacity, but it causes a smaller and deteriorates the
capacity. From Fig. 14, we observe that the decline of is
slower than the decline of . Consequently, with the increase
of , the capacity will increase. Nevertheless, the capacity
will decrease when reaches a certain level.
VI. EXPERIMENTAL EVA L U AT I O N
We have implemented the cell-counting-based attack pre-
sented in Section III against Tor [35]. In this section, we use
real-world experiments to demonstrate the feasibility and ef-
fectiveness of this attack. All the experiments were conducted
in a controlled manner, and we experimented on TCP ows
generated by ourselves in order to avoid legal issues.
Fig. 17. Experiment setup.
A. Experiment Setup
In our experiment setting illustrated in Fig. 17, we deployed
two malicious onion routers as the Tor entry onion router and
exit onion router. The entry onion router and client (Alice) lo-
cated in Asia are deployed on PlanetLab [36]. The server (Bob)
is located at one university campus in North America, and the
exit onion router is at an off-campus location in North America
as well. All computers are on different IP address segments and
connected to different Internet service providers (ISPs). Fig. 17
shows the experiment setup.
We modied the Tor client code for attack verication
purpose. The Tor client will intend to setup circuits through
the designated malicious exit onion router and entry onion
router shown in Fig. 17. The middle onion router is selected
using the default routing selection algorithm released by Tor.
As we stated earlier, the cell-counting-based attack intends
to conrm whether the client (Alice) communicates with the
server (Bob). For verication purpose, we set up a server (Bob)
and download a le from the client (Alice). The downloading
software at the client is the command line utility wget.By
conguring wget’s param eter s of http_proxy and ftp_proxy,we
let wget download les through Privoxy, the proxy server used
by Tor. By using the Tor conguration le and manipulatable
parameters, such as EntryNodes, ExitNodes, StrictEntryNodes,
and StrictExitNodes [23], we let the client choose both the
malicious entry and exit onion routers along the circuit.
B. Experimental Results
To obtain the empirical property of IP packet size for the
trafc within the Tor network, we downloaded a le with the
size of 20 M using the Tor network. Fig. 15 shows the empirical
cumulative probability function (CDF) of the IP packet size in
the trafc. As shown in Fig. 5, we know that the packets with
non-MTU size are around 50%. This validates that the size of
packets transmitted over the Tor is dynamic. Consequently, it
also indicates that our embedded signal will be hidden in the
normal trafc and hard to be detected by victims.
To validate the accuracy of thecell-counting-based attack, we
let the client download 30 les in our experiments. The size of
each le is around 10 MB. At the exit onion router, we generate
a random signal with 100 b. When the target trafcfromserver
Bob arrives at the exit onion router, we vary the number of cells
in the circuit and embed the signal into the variation of the cell
count during a short period in the target trafc. At the entry
onion router, the cells in the circuit queue are recorded in the log,
and the recovery mechanisms will be applied to recognize the
embedded signal. In addition, we chose different thresholds and
types in our recovery mechanism as discussed in Section III-C.
In particular, we chose to recover Type I and III with
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LING et al.: NEW CELL-COUNTING-BASED ATTACK AGAINST TOR 13
Fig. 18. Detection rate versus delay interval (Note: The rate is for detecting
one bit).
Fig. 19. Detection rate versus delay interval and signal length with detection
scheme 1 (Note: The rate is for detecting one bit).
as detection scheme 1. Moreover, we chose to recover all types
with as detection scheme 2.
When we evaluate the false positive rate, the client downloads
30 les via Tor again. However, no signal is embedded into the
trafc at the exit onion router. Denote the trafcwithnosignal
as clean trafc. We generate a 100-b random signal and apply
detection schemes 1 and 2 to the clean trafc collected at the
entry onion router. By checking how many bits of this signal
show up in the clean trafc, we can calculate the false positive
rate.
We conduct the above experiment to evaluate the true posi-
tive and false positive by using a 100-b random signal. Fig. 18
illustrates the correlation between the detection rate (true pos-
itive) and the delay interval for transmitting cells associated to
different units of the signal. As we can see from this gure, the
detection rate will increase dramatically when the delay interval
is slightly increased in two detection schemes. As expected, the
detection rate of scheme 2 is higher than scheme 1 with a slightly
increasing false positive rate, while the overall false positive rate
for each scheme is a xed value. When the delay interval ap-
proaches 100 ms, the detection rate of two schemes approaches
100%. All these ndings validate that our investigated attack
can signicantly degrade the anonymity service provided by
Tor .
Fig. 19 illustrates the detection rate in terms of signal length
and the delay interval for scheme 1. Note that the detection rate
Fig. 20. Detection rate versus delay interval and signal length with detection
scheme 1.
Fig. 21. Detection rate versus delay interval and signal length with detection
scheme 2 (Note: The rate is for detecting one bit).
in Fig. 19 is for detecting one bit. As we can see from this gure,
when we increase the signal length from 20 to 100, the detec-
tion rate will be slightly decreased, and the false positive rate
will be constantly very low (less than 5%). When the signal
length is 20, and the delay interval between signals is 100 ms,
100% detection rate can be achieved. In addition, Fig. 20 illus-
trates the detection rate for detecting the whole signal in terms
of signal length and the delay interval for scheme 1. When the
signal length is 20, and the delay interval between the signals is
100 ms, a detection rate of 100% can be achieved. In Fig. 20,
the false positive approaches 0%, and this matches our theoret-
ical analysis in Section V-B. This validates that the investigated
attack only requires tens of cells and is highly efcient to con-
rm very short communication sessions on Tor. Fig. 21 illus-
trates the detection rate in terms of signal length and the delay
interval for scheme 2. Note that the rate shown in Fig. 21 is for
detecting one bit. The false positive decreases quickly with the
increasing signal length. Additionally, the detection rate can ap-
proach 100% with the delay interval of 100 ms and signal length
of 100 with a low false positive. Fig. 22 illustrates the detection
rate for detecting the whole signal in terms of signal length and
the delay interval for scheme 2. The detection rate can approach
100% with the delay interval 100 ms and signal length 20, and
the false positive approaches 0%.
To further improve the detectability of cell-counting-based
attack, we also investigated the improved encoding mecha-
nism, called the hopping-based encoding, which randomly
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
14 IEEE/ACM TRANSACTIONS ON NETWORKING
Fig. 22. Detection rate versus delay interval and signal length with detection
scheme 2.
Fig. 23. Correlation between detection rate and mean of the Poisson distri-
bution (Note: The rate is for detecting one bit).
embeds units of a signal into the target trafc, as introduced in
Section IV-B. For this encoding scheme, we generate an array
by using a Poisson distribution with a mean .We
rst send the number of cells without embedding signals,
and then embed a signal bit. In this set of experiments, we also
chose a signal length of 100. Since the units of the signal are
embedded randomly in a hopping fashion in the time domain,
it is hard for the multiow attack [37] to detect the embedded
signal in the trafc. Fig. 23 illustrates the relationship between
detection rate (true positive) and the mean of nonwatermarked
cells (which corresponds to the random time interval. No signal
is embedded into those cells). From Fig. 23, we can see that
this improved encoding scheme can still achieve very high
detection rates along with a verylowfalsepositiverate.Sinc
e
this new encoding scheme does not embed the signal into all
CELL_RELAY_DATA cells, the attack will require more cells
in order to be successful. Additionally, based on Algorithm 1,
we use Algorithm 2 to recognize the signal embedded in the
sparsely encoded cells.
We als o u s e tcpdump to capture the IP packets transmitted
between the entry node and the client and demonstrate that an
attacker may also use packet size to recognize the embedded
signal. Fig. 24 illustrates the variance of IP packet size. As we
can see, there are three types of IP packet sizes, and the cor-
responding packet structures are shown in Fig. 10. According
to detection scheme 1 (blind detection approach), we can map
bit “0” of the signal to the IP packet size of 638 B. Bit “1”
Fig. 24. Variance of IP packet size.
of the signal has two cases: IP packet of 638 B followed by
IP packet of 1150 B, as well as one IP packet of 1500 B and
of 214 B, as we discussed in Section IV-B. Therefore, we can
decode the signal between 0 and 1 s as “0010100.” Note that
since the delay interval is very small among the second (638 B),
the third (1500 B), and the fourth (214 B) IP packets, they
are mostly overlapped in the gure. As we know, the packet
drops will incur TCP retransmissions, and it may result in a dis-
torted signal. One way to address the packet drop issue is to use
tcpdump and check the TCP sequence numbers for assisting the
signal recovery. A second way is to increase the delay interval
between the signal bits and reduce the impact of packet drops.
As illustrated in Fig. 24, we know that the attacker is able to rec-
ognize the signal based on the size of sniffed IP packets using
the signal detection mechanism discussed in Section IV-B in ad-
dition to using the cell count. In our recent work, we proposed a
packet-size-based attack [38] that compromises Tor’s commu-
nication anonymity with no need of controlling Tor routers. An
attacker can manipulate size of packets between a Web site and
an exit onion router and embeds a signal into the target trafc.
An accomplice at the user side can sniff the trafc and recog-
nize this signal. If the victim trafc is marked by our signal four
times, the detection rate approaches over 90% with the delay in-
terval of 400 ms, and the false positive rate can be suppressed
to less than 4%.
VII. RELATED WORK
A good review of mix systems can be found in [4] and [5].
There has been much research on degrading anonymous com-
munication through mix networks. Existing trafcanalysis
attacks against anonymous communication can largely be
categorized into two groups: passive trafc analysis and active
watermarking techniques. Passive trafc analysis techniques
have shown that the attacks record the trafc passively and
identify the similarity between server’s outbound trafcand
client’s inbound trafc [8], [9]. Other recent research works
have shown that the attackers can infer sensitive information
from the encrypted network trafc by examining patterns in
terms of the sizes of packet and its timing [1], [39]–[41]. For
example, Liberatore and Levine [40] examined the packet sizes
of HTTP trafc transmitted over persistent connection or tun-
neled via SSH port forwarding can statistically identify the Web
pages. Wright et al. [41] investigated the statistical distribution
of packet sizes in encrypted Voice over IP (VoIP) connections
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LING et al.: NEW CELL-COUNTING-BASED ATTACK AGAINST TOR 15
and identied the language spoken based on the distribution in
each conversation. Later work of Wright et al. [42] also inves-
tigated how an eavesdropper could identify spoken phrases in
encrypted VoIP.
The active watermarking techniques intend to embed spe-
cic secret signal (or marks) into the target trafc[10],[13],
[17], [43]. Such techniques can reduce the false positive rate
signicantly if the signal is long enough and does not require
massive training study of trafc cross correlation as required in
passive trafc analysis. For example, Yu et al. [13] proposed
aow-marking scheme based on the DSSS technique. This
approach could be used by attackers to secretly conrm the com-
munication relationship via mix networks. Øverlier et al. [3]
studied a scheme using one compromised mix router to identify
the “hidden server” anonymized by Tor. Wang et al. [17] also
investigated the feasibility of a timing-based watermarking
scheme in identifying the encrypted peer-to-peer VoIP calls.
Peng et al. [44] analyzed the secrecy of timing-based water-
marking traceback proposed in [43], based on the distribution
of trafc timing. Kiyavash et al. [37] proposed a multiow
approach detecting the interval-based watermarks [12], [45]
and DSSS-based watermarks [13]. This multiow-based ap-
proach intends to average the rate of multiple synchronized
watermarked ows and expects to observe a unusual long
silence period without packets or a unusual long period of
low-rate trafc.
VIII. CONCLUSION
In this paper, we introduced a novel cell-counting-based at-
tack against Tor. This attack is difcult to detect and is able to
quickly and accurately conrm the anonymous communication
relationship among users on Tor. An attacker at the malicious
exit onion router slightly manipulates the transmission of cells
from a target TCP stream and embeds a secret signal (a series of
binary bits) into the cell counter variation of the TCP stream. An
accomplice of the attacker at the entry onion router recognizes
the embedded signal using our developed recovery algorithms
and links the communication relationship among users. Our the-
oretical analysis shows that the detection rate is a monotonously
increasing function with respect to the delay interval and is a
monotonously decreasing function of the variance of one way
transmission delay along a circuit. Via extensive real-world ex-
periments on Tor, the effectiveness and feasibility of the attack
is validated. Our data showed that this attack could drastically
and quickly degrade the anonymity service that Tor provides.
Due to Tor’s fundamental design, defending against this attack
remains a very challenging task that we will investigate in our
future research.
APPENDIX A
Algorithm 1 shows the signal recovery mechanism with con-
tinuously embedded bits at a malicious Tor entry node. Algo-
rithm 2 gives the signal recovery mechanism at a malicious Tor
entry node when the time-hopping-based approach is used for
embedding a signal into the target trafc.
Algorithm 1: Recovery Mechanism for Continuously
Embedded Bits
Require:
(a) , an array storing the number of cell counter
variation in the circuit queue at the entry router;
(b) , an array storing the original signal bit;
1: ;
2: while do
3: if then
4: Signal is matched.
5: else if then
6: Signal is splitted.
7: if then
8: Signal is processed as Type I with .
9: else if then
10: Signal and are processed as Type II
with .
11: else if then
12: Find the value of
13: if then
14: Signal is processed as Type I with .
15: else
16: Signal and is processed as Type II
with .
17: end if
18: ;
19: end if
20: else if then
21: Two or more signals are combined together.
22: if then
23: Signal and are processed as Type II
with .
24: else if then
25: Signal and are processed as Type IV
with .
26: else if then
27: Find the value of
28: if then
29: These combined signals are processed as
Typ e I I I with .
30: else
31: These combined signals are processed as
Type IV with .
32: end if
33:
34: end if
35: end if
36: ;
37: end while
Algorithm 2: Recovery Mechanism for Hopping-Based
Encoding
Require
(a) , an array storing the number of cell counter
variation in the circuit queue at the entry router;
(b) , an array storing the original signal bit;
(c) , an array storing the number of nonwatermark
cells.
1: ;
2: while do
3: Remove the nonwatermark packets from .
4: while do
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
16 IEEE/ACM TRANSACTIONS ON NETWORKING
5:
6: end while
7: if then
8: ;is removed.
9: Detect with C[i] by using Algorithm 1
10: else if then
11: The signal is combined with .
12:
13: Detect with by using Algorithm 1
14: end if
15: ;
16: end while
APPENDIX B
We now prove the fact that if is a random variable with a
log-normal distribution, also has a log-normal distri-
bution. Let be the random variable of one way delay through
the circuit and is approximately ,where is a round-trip
delay along the circuit. Since the CDF of one-way delay is
derived by
(34)
(35)
(36)
(37)
(38)
the PDF of can be derived by
(39)
(40)
(41)
(42)
(43)
As we can see, the PDF of follows a log-normal distribution
as well.
REFERENCES
[1] Q. X. Sun, D. R. Simon, Y. Wang, W. Russell, V. N. Padmanabhan,
and L. L. Qiu, “Statistical identication of encrypted Web browsing
trafc,” in Proc. IEEE S&P, May 2002, pp. 19–30.
[2] X.Fu,Y.Zhu,B.Graham,R.Bettati,andW.Zhao,“Onow marking
attacks in wireless anonymous communication networks,” in Proc.
IEEE ICDCS, Apr. 2005, pp. 493–503.
[3] L. Øverlier and P. Syverson, “Locating hidden servers,” in Proc. IEEE
S&P, May 2006, pp. 100–114.
[4] G. Danezis, R. Dingledine, and N. Mathewson, “Mixminion: Design
of a type III anonymous remailer protocol,” in Proc. IEEE S&P,May
2003, pp. 2–15.
[5] R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The second-
generation onion router,” in Proc. 13th USENIX Security Symp., Aug.
2004, p. 21.
[6] “Anonymizer, Inc.,” 2009 [Online]. Available: http://www.
anonymizer.com/
[7] A. Serjantov and P. Sewell, “Passive attack analysis for connection-
based anonymity systems,” in Proc. ESORICS, Oct. 2003, pp. 116–131.
[8] B.N.Levine,M.K.Reiter,C.Wang,andM.Wright,“Timingattacks
in low-latency MIX systems,” in Proc. FC, Feb. 2004, pp. 251–565.
[9] Y.Zhu,X.Fu,B.Graham,R.Bettati,andW.Zhao,“Onow corre-
lation attacks and countermeasures in Mix networks,” in Proc. PET,
May 2004, pp. 735–742.
[10] S. J. Murdoch and G. Danezis, “Low-cost trafc analysis of Tor,” in
Proc. IEEE S&P, May 2006, pp. 183–195.
[11] K. Bauer, D. McCoy, D. Grunwald, T. Kohno, and D. Sicker, “Low-
resource routing attacks against anonymous systems,” in Proc. ACM
WPES, Oct. 2007, pp. 11–20.
[12] X. Wang, S. Chen, and S. Jajodia, “Network ow watermarking attack
on low-latency anonymous communication systems,” in Proc. IEEE
S&P, May 2007, pp. 116–130.
[13] W. Yu, X. Fu, S. Graham, D. Xuan, and W. Zhao, “DSSS-based ow
marking technique for invisible traceback,” in Proc. IEEE S&P,May
2007, pp. 18–32.
[14] N. B. Amir Houmansadr and N. Kiyavash, “RAINBOW: A robust and
invisible non-blind watermark for network ows,” in Proc.16thNDSS,
Feb. 2009, pp. 1–13.
[15] V. Shmatikov and M.-H. Wang, “Timing analysis in low-latency MIX
networks: Attacks and defenses,” in Proc. ESORICS, 2006, pp. 18–31.
[16] V. Fusenig, E. Staab, U. Sorger, and T. Engel, “Slotted packet counting
attacks on anonymity protocols,” in Proc. AISC, 2009, pp. 53–60.
[17] X. Wang, S. Chen, and S. Jajodia, “Tracking anonymous peer-to-peer
VoIP calls on the internet,” in Proc. 12th ACM CCS, Nov. 2005, pp.
81–91.
[18] K. Bauer, D. McCoy, D. Grunwald, T. Kohno, and D. Sicker, “Low-
resource routing attacks against anonymous systems,” Univ. Colorado
Boulder, Boulder, CO, Tech. Rep., Aug. 2007.
[19] X. Fu, Z. Ling, J. Luo, W. Yu, W. Jia, and W. Zhao, “One cell is enough
to break Tor’s anonymity,” in Proc. Black Hat DC,Feb.2009[On-
line]. Available: http://www.blackhat.com/presentations/bh-dc-09/Fu/
BlackHat-DC-09-Fu-Break-Tors-Anonymity.pdf
[20] R. Dingledine, N. Mathewson, and P. Syverson, “Tor: Anonymity on-
line,” 2008 [Online]. Available: http://tor.eff.org/index.html.en
[21] R. Dingledine and N. Mathewson, “Tor protocol specica-
tion,” 2008 [Online]. Available: https://gitweb.torproject.org/
torspec.git?a=blob_plain;hb=HEAD;f=tor-spec.txt
[22] J. Reardon, “Improving Tor using a TCP-over-DTLS tunnel,” Master’s
thesis, University of Waterloo, Waterloo, ON, Canada, Sep. 2008.
[23] R. Dingledine and N. Mathewson, “Tor path specication,”
2008 [Online]. Available: https://gitweb.torproject.org/torspec.
git?a=blob_plain;hb=HEAD;f=path-spec.txt
[24] X. Fu, Z. Ling, W. Yu, and J. Luo, “Network forensics through cloud
computing,” in Proc. 1st ICDCS-SPCC, Jun. 2010, pp. 26–31.
[25] M. Perry, “TorFlow: Tor network analysis,” in Proc. 2nd HotPETs,
2009, pp. 1–14.
[26] R. Pries, W. Yu, S. Graham, and X. Fu, “On performance bottleneck
of anonymous communication networks,” in Proc. 22nd IEEE IPDPS,
Apr. 14–28, 2008, pp. 1–11.
[27] G. Smillie, Analogue, Digital Communication Techniques. London,
U.K.: Butterworth-Heinemann, 1999.
[28] N. S. Evans, R. Dingledine, and C. Grothoff, “A practical congestion
attack on Tor using long paths,” in Proc. 18th USENIX Security Symp.,
Aug. 10–14, 2009, pp. 33–50.
[29] S. J. Murdoch, “Hot or not: Revealing hidden services by their clock
skew,” in Proc. 13th ACM CCS, Nov. 2006, pp. 27–36.
[30] R. Pries, W. Yu, X. Fu, and W. Zhao, “A new replay attack against
anonymous communication networks,” in Proc. IEEE ICC,May
19–23, 2008, pp. 1578–1582.
[31] D. Mccoy, K. Bauer, D. Grunwald, T. Kohno, and D. Sicker, “Shining
light in dark places: Understanding the Tor network,” in Proc. 8th
PETS, 2008, pp. 63–76.
[32] S. U. Khaunte and J. O. Limb, “Packet-level trafc measurements from
a Tier-1 IP backbone,” Georgia Institute of Technology, Atlanta, GA,
Tech. Rep., 1997.
[33] T. M. Cover and J. A. Thomas, Elements of Information Theory.New
York: Wiley-Interscience, 1991.
[34] S. Verdu, “On channel capacity per unit cost,” IEEE Trans. Inf. Theory,
vol. 36, no. 5, pp. 1019–1030, Nov. 1990.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LING et al.: NEW CELL-COUNTING-BASED ATTACK AGAINST TOR 17
[35] “Tor: Anonymity online,” The Tor Project, Inc., 2008 [Online]. Avail-
able: http://tor.eff.org/
[36] “PlanetLab An open platform for developing, deploying, and
accessing planetary-scale services,” PlanetLab, 2011 [Online]. Avail-
able: http://www.planet-lab.org/
[37] N. Kiyavash, A. Houmansadr, and N. Borisov, “Multi-ow attacks
against network ow watermarking schemes,” in Proc. USENIX Se-
curity Symp., 2008, pp. 307–320.
[38] Z.Ling,J.Luo,W.Yu,andX.Fu,“Equal-sizedcellsmeanequal-sized
packets in Tor?,” in Proc. IEEE ICC, Jun. 2011, pp. 1–6.
[39] D. X. Song, D. Wagner, and X. Tian, “Timing analysis of keystrokes
and timing attacks on SSH,” in Proc. 10th USENIX Security Symp.,
Aug. 2001, p. 25.
[40] M. Liberatore and B. N. Levine, “Inferring the source of encrypted
HTTP connections,” in Proc. ACM CCS, Oct. 2006, pp. 255–263.
[41] C.V.Wright,L.Ballard,F.Monrose,andG.M.Masson,“Language
identication of encrypted VoIP trafc: Alejandra y Roberto or Alice
and Bob?,” in Proc. 16th Annu. USENIX Security Symp.,Aug.2007,
pp. 43–54.
[42] C.V.Wright,L.Ballard,S.E.Coull,F.Monrose,andG.M.Masson,
“Spot me if you can: Uncovering spoken phrases in encrypted VoIP
conversation,” in Proc. IEEE S&P, May 2008, pp. 35–49.
[43] X. Wang and D. S. Reeves, “Robust correlation of encrypted attack
trafc through stepping stones by manipulation of inter-packet delays,”
in Proc. ACM CCS, Nov. 2003, pp. 20–29.
[44] P. Peng, P. Ning, and D. S. Reeves, “On the secrecy of timing-based
active watermarking trace-back techniques,” in Proc. IEEE S&P,May
2006, pp. 335–349.
[45] Y. J. Pyun, Y. H. Park, X. Wang, D. S. Reeves, and P. Ning, “Tracing
trafc through intermediate hosts that repacketize ows,” in Proc.
IEEE INFOCOM, May 2007, pp. 634–642.
Zhen Ling received the B.S. degree in computer sci-
ence from Nanjing Institute of Technology, Nanjing,
China, in 2005, and is currently pursuing the Ph.D.
degree in computer science and engineering at South-
east University, Nanjing, China.
He joined Department of Computer Science, City
University of Hong Kong, Hong Kong, from 2008
to 2009 as a Research Associate, and then joined
the Department of Computer Science, University of
Victoria, Victoria, BC, Canada, in 2011 as a visiting
scholar. His research interests include network
security, privacy, and forensics.
Junzhou Luo (M’10) received the B.S. degree in
applied mathematics and M.S. and Ph.D. degrees in
computer network from Southeast University, Nan-
jing, China, in 1982, 1992, and in 2000, respectively.
He is a Full Professor with the School of Com-
puter Science and Engineering, SoutheastUniversity.
His research interests are next-generation network,
protocol engineering, network security and manage-
ment, grid and cloud computing, and wireless LAN.
Prof. Luo is Co-Chair of the IEEE SMC Technical
Committee on Computer Supported Cooperative
Work in Design.
Wei Yu received the B.S. degree in electrical en-
gineering from Nanjing University of Technology,
Nanjing, China, in 1992, the M.S. degree in elec-
trical engineering from Tongji University, Shanghai,
China, in 1995, and the Ph.D. degree in computer
engineering from Texas A&M University, College
Station, in 2008.
He is an Assistant Professor with the Department
of Computer and Information Sciences, Towson
University, Towson, MD. Before that, he worked for
Cisco Systems, Inc., San Jose, CA, for almost nine
years. His research interests include cyberspace security, computer network,
and distributed systems.
Xinwen Fu received the B.S. degree in electrical
engineering from Xi’an Jiaotong University, Xi’an,
China, in 1995, the M.S. degree in electrical
engineering from the University of Science and
Technology of China, Hefei, China, in 1998, and the
Ph.D. degree in computer engineering from Texas
A&M University, College Station, in 2005.
He is an Assistant Professor with the Department
of Computer Science, University of Massachusetts
Lowell, Lowell, which he joined in the summer of
2008 as a faculty member. From 2005 to 2008, he
was an Assistant Professor with the College of Business and Information Sys-
tems, Dakota State University, Madison, SD. His current research interests are
in network security and privacy.
Dong Xuan received the B.S. and M.S. degrees in
electronic engineering from Shanghai Jiao Tong Uni-
versity (SJTU), Shanghai, China, in 1990 and 1993,
respectively, and the Ph.D. degree in computer en-
gineering from Texas A&M University, College Sta-
tion, in 2001.
Currently, he is an Associate Professor with the
Department of Computer Science and Engineering,
The Ohio State University (OSU), Columbia. He
was on the faculty of Electronic Engineering at SJTU
from 1993 to 1998. His research interests include
distributed computing, computer networks, and cyberspace security.
Dr. Xuan received the NSF CAREER Award in 2005 and the Lumley Re-
search Award from the College of Engineering, OSU, in 2009.
Weijia Jia received the B.Sc. and M.Sc. degrees from
Center South University, Changsha, China, in 1982
and 1984, respectively, and the Master of Applied
Science and Ph.D. degrees from the Polytechnic Fac-
ulty of Mons, Mons, Belgium, in 1992 and 1993, re-
spectively, all in computer science.
He is currently a Full Professor with the Depart-
ment of Computer Science and the Director of Future
Networking Center, ShenZhen Research Institute,
City University of Hong Kong (CityU), Hong Kong.
He joined the German National Research Center for
Information Science (GMD), Bonn (St. Augustine), Germany, from 1993 to
1995 as a Research Fellow. In 1995, he joined the Department of Computer
Science, CityU, as an Assistant Professor. His research interests include
next-generation wireless communication, protocols and heterogeneous net-
works; distributed systems, and multicast and anycast QoS routing protocols.
... In this scenario, timely delivery of information needed by diverse apps is a challenge in IoT. To maximise the usage of network resources [86][87][88], existing resource sharing solutions generally rely on spectrum sharing. Spectrum sharing has three dimensions: space, frequency, and time. ...
Article
Full-text available
With the improvements in machine-to-machine (M2M) communication, ubiquitous computing, and wireless sensor networks, the Internet of Things (IoT) has become a notion that is constantly rising in importance. Using uniquely addressable IDs, the Internet of Things links diverse physical items and allows them to communicate with one another through the Internet. A general overview of the IoT in the context of the architecture and associated technologies is provided in this article. On the other hand, the Internet of Things does not follow a standardised architecture model. This is accomplished by describing widely recognised architectural concepts that are subsequently refined with the associated technology in various tiers. Also included are some solutions that have been developed and future directions for addressing the obstacles faced by the IoT paradigm. Finally, the article discusses several Internet of Things applications to demonstrate the viability of the IoT idea in real-world settings.
... Furthermore, multiple watermarks are can be imbedded into one image. In addition, watermarking schemes have been used to trace anonymous Internet malicious traffic flows for identifying the malicious sources for the purpose of forensics [139], [164], [165]. ...
... In 2012, Ling et al. [64] proposed a type of attack requiring the attacker to control a few of the Tor network's entry guards and exit nodes. This type of attack is motivated by the observation that even though Tor uses equal-sized cells at the application layer, the network's IP packets' size generally varies. ...
Article
Full-text available
Anonymity networks are becoming increasingly popular in today’s online world as more users attempt to safeguard their online privacy. Tor is currently the most popular anonymity network in use and provides anonymity to both users and services (hidden services). However, the anonymity provided by Tor is also being misused in various ways. Hosting illegal sites for selling drugs, hosting command and control servers for botnets, and distributing censored content are but a few such examples. As a result, various parties, including governments and law enforcement agencies, are interested in attacks that assist in de-anonymising the Tor network, disrupting its operations, and bypassing its censorship circumvention mechanisms. In this survey paper, we review known Tor attacks and identify current techniques for the de-anonymisation of Tor users and hidden services. We discuss these techniques and analyse the practicality of their execution method. We conclude by discussing improvements to the Tor framework that help prevent the surveyed de-anonymisation attacks.
... Another network analysis method is locating entry and exit nodes to identify attacks [145]. Although TOR has a powerful infrastructure which is hard to break, the communication of the users and the routing behaviour can be detected by implementing some attack techniques in the TOR network [6], [19], [58], [100]. ...
Article
Full-text available
Dark Web is one of the most challenging and untraceable mediums adopted by the cyber criminals, terrorists, and state-sponsored spies to fulfil their illicit motives. Cyber-crimes happening inside the Dark Web are alike the real world crimes. However, the sheer size, unpredictable ecosystem and anonymity provided by the Dark Web services are the essential confrontations to trace the criminals. To discover the potential solutions towards cyber-crimes evaluating the sailing Dark Web crime threats is a crucial step. In this paper, we will appraise the Dark Web by analysing the crimes with their consequences and enforced methods as well as future manoeuvres to lessen the crime threats. We used Systematic Literature Review (SLR) method with the aspiration to provide the direction and aspect of emerging crime threats in the Dark Web for the researchers and specialist in Cyber security field. For this SLR 65 most relevant articles from leading electronic databases were selected for data extraction and synthesis to answer our predefined research questions. The result of this systematic literature review provides (i) comprehensive knowledge on the growing crimes proceeding with Dark Web (ii) assessing the social, economic and ethical impacts of the cyber-crimes happening inside the Dark Web and (iii) analysing the challenges, established techniques and methods to locate the criminals and their drawbacks. Our study reveals that more in depth researches are required to identify criminals in the Dark Web with new prominent way, the crypto markets and Dark Web discussion forums analysis is crucial for forensic investigations, the anonymity provided by Dark Web services can be used as a weapon to catch the criminals and digital evidences should be analysed and processed in a way that follows the law enforcement to make the seizure of the criminals and shutting down the illicit sites in the Dark Web.
... In 2012, Ling et al. [60] proposed a type of attack requiring the attacker to control a few of the Tor network's entry guards and exit nodes. This type of attack is motivated by the observation that even though Tor uses equal-sized cells at the application layer, the size of the network's IP packets generally vary. ...
Preprint
Anonymity networks are becoming increasingly popular in today's online world as more users attempt to safeguard their online privacy. Tor is currently the most popular anonymity network in use and provides anonymity to both users and services (hidden services). However, the anonymity provided by Tor is also being misused in various ways. Hosting illegal sites for selling drugs, hosting command and control servers for botnets, and distributing censored content are but a few such examples. As a result, various parties, including governments and law enforcement agencies, are interested in attacks that assist in de-anonymising the Tor network, disrupting its operations, and bypassing its censorship circumvention mechanisms. In this paper, we survey known Tor attacks and identify currently available techniques that lead to improved de-anonymisation of users and hidden services.
Article
Privacy is currently one of the most concerned issues in Cyberspace. Tor is the most widely used system in the world for anonymously accessing Internet. However, Tor is known to be vulnerable to end-to-end traffic correlation attacks when an adversary is able to monitor traffic at both communication endpoints. In this paper, we present a set of novel Trapper Attacks that can be used to deanonymize user activities by both AS-level adversaries and Node-level adversaries in a Tor network. First, AS-level adversaries can exploit the occasional failures of censored network to selectively control entry guards of the Tor users. Second, the adversaries can exploit poor reliability of the Tor communication (e.g., natural churn) to compromise the exiting nodes and the anonymous path. Once the adversaries gain control of the routes, they can identify and inspect any traffic entering and leaving the Tor network, consequently, deanonymize a Tor user's activity in the network. To demonstrate the effectiveness and feasibility of this attacks, we implemented a tool that can launch the proposed Trapper Attacks to automatic reveal communication relationships between a Tor user and its destinations running on a live Tor network. We also present a formal analysis framework to evaluate the integrity of the Tor network. With this framework, we successfully obtained quantitative estimates of Tor's security vulnerability. The proposed Trapper Attacks are also designed to scale up in real-world Tor networks. Namely, it allows an adversary to perform deanonymization in honey relays effectively, and compromise the anonymity of Tor clients in real time. Our experimental results show that the proposed attacks succeed in less than 40 seconds achieving a 100% accuracy rate and a false positive rate close to 0.
Chapter
Anonymous communication networks (ACNs) aim to thwart an adversary, who controls or observes chunks of the communication network, from determining the respective identities of two communicating parties. We focus on low-latency ACNs such as Tor, which target a practical level of anonymity without incurring an unacceptable transmission delay.While several definitions have been proposed to quantify the level of anonymity provided by high-latency, message-centric ACNs (such as mix-nets and DC-nets), this approach is less relevant to Tor, where user–destination pairs communicate over secure overlay circuits. Moreover, existing evaluation methods of traffic analysis attacks on Tor appear somewhat ad hoc and fragmented. We propose a fair evaluation framework for such attacks against onion routing systems by identifying and discussing the crucial components for evaluation, including how to consider various adversarial goals, how to factor in the adversarial ability to collect information relevant to the attack, and how these components combine to suitable metrics to quantify the adversary’s success.KeywordsAnonymityOnion routingTorTraffic analysis
Article
Full-text available
Overlay mix-networks are widely used to provide low- latency anonymous communication services. It is gen- erally accepted that, if an adversary can compromise the endpoints of a path through an anonymous mix-network, then it is possible to ascertain the identities of a request- ing client and the responding server. However, theoretical analyses of anonymous mix-networks show that the like- lihood of such an end-to-end attack becomes negligible as the network size increases. We show that if the mix- network attempts to optimize performance by utilizing a preferential routing scheme, then the system is highly vul- nerable to attacks from non-global adversaries with only a few malicious servers. We extend this attack by exploring methods for low- resource nodes to be perceived as high-resource nodes by reporting false resource claims to centralized routing au- thorities. To evaluate this attack on a mature and represen- tative system, we deployed an isolated Tor network on the PlanetLab testbed. We introduced low-resource malicious nodes that falsely gave the illusion of high-performance nodes, which allowed them to be included on a dispropor- tionately high number of paths. Our results show that our malicious low-resource nodes are highly effective at com- promising the end-to-end anonymity of the system. We present several extensions to this general attack that fur- ther improve the performance and minimize the resources required. In order to mitigate low-resource nodes from ex- ploiting preferential routing, we present several methods to verify resource claims, including a distributed reputa- tion system. Our attacks suggest what seems be a funda- mental problem in multi-hop systems that attempt to si- multaneously provide anonymity and high-performance.
Chapter
The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a continuous random variable can never be perfect. How well can we do? To frame the question appropriately, it is necessary to define the “goodness” of a representation of a source. This is accomplished by defining a distortion measure which is a measure of distance between the random variable and its representation. The basic problem in rate distortion theory can then be stated as follows: given a source distribution and a distortion measure, what is the minimum expected distortion achievable at a particular rate? Or, equivalently, what is the minimum rate description required to achieve a particular distortion? One of the most intriguing aspects of this theory is that joint descriptions are more efficient than individual descriptions. It is simpler to describe an elephant and a chicken with one description than to describe each alone. This is true even for independent random variables. It is simpler to describe X1 and X2 together (at a given distortion for each) than to describe each by itself. Why don't independent problems have independent solutions? The answer is found in the geometry. Apparently rectangular grid points (arising from independent descriptions) do not fill up the space efficiently. Rate distortion theory can be applied to both discrete and continuous random variables. The zero-error data compression theory of Chapter 5 is an important special case of rate distortion theory applied to a discrete source with zero distortion. We consider the simple problem of representing a single continuous random variable by a finite number of bits.
Article
The wide use of digital communication has provided designers with the task of providing cheap, but reliable, electronic test instrumentation. To meet this need, designers need to use the latest techniques and components, selecting genuine advances and ignoring the gimmicks
Article
In this paper we present a slotted packet counting attack against anonymity protocols. Common packet counting attacks make strong assumptions on the setup and can easily lead to wrong conclusions, as we will show in our work. To overcome these limitations, we account for the variation of traffic load over time. We use correlation to express the relation between sender and receiver nodes. Our attack is applicable to many anonymity protocols. It assumes a passive attacker and works with partial knowledge of the network traffic.
Article
Voice over IP (VoIP) has become a popular protocol for making phone calls over the Internet. Due to the potential transit of sensitive conversations over untrusted network infrastructure, it is well understood that the contents of a VoIP session should be encrypted. However, we demonstrate that current cryptographic techniques do not provide adequate protection when the underlying audio is encoded using bandwidth-saving Variable Bit Rate (VBR) coders. Explicitly, we use the length of encrypted VoIP packets to tackle the challenging task of identifying the language of the conversation. Our empirical analysis of 2,066 native speakers of 21 different languages shows that a substantial amount of information can be discerned from encrypted VoIP traffic. For instance, our 21-way classifier achieves 66% accuracy, almost a 14-fold improvement over random guessing. For 14 of the 21 languages, the accuracy is greater than 90%. We achieve an overall binary classification (e.g., "Is this a Spanish or English conversation?") rate of 86.6%. Our analysis highlights what we believe to be interesting new privacy issues in VoIP.
Article
The Tor Network is a low-latency anonymity, privacy, and censorship resistance network whose servers are run by volunteers around the Internet. This distribution of trust creates resilience in the face of compromise and censorship; but it also creates performance, security, and usability issues. The TorFlow suite attempts to address this by providing a library and associated tools for measuring Tor nodes for reliability, capacity and integrity, with the ultimate goal of feeding these measurements back into the Tor directory authorities.