Conference PaperPDF Available

PIE: A lightweight control scheme to address the bufferbloat problem


Abstract and Figures

Bufferbloat is a phenomenon where excess buffers in the network cause high latency and jitter. As more and more interactive applications (e.g. voice over IP, real time video conferencing and financial transactions) run in the Internet, high latency and jitter degrade application performance. There is a pressing need to design intelligent queue management schemes that can control latency and jitter; and hence provide desirable quality of service to users. We present here a lightweight design, PIE (Proportional Integral controller Enhanced), that can effectively control the average queueing latency to a reference value. The design does not require per-packet extra processing, so it incurs very small overhead and is simple to implement in both hardware and software. In addition, the design parameters are self-tuning, and hence PIE is robust and optimized for various network scenarios. Simulation results, theoretical analysis and Linux testbed results show that PIE can ensure low latency and achieve high link utilization under various congestion situations.
Content may be subject to copyright.
PIE: A Lightweight Control Scheme to Address the
Bufferbloat Problem
Rong Pan, Preethi Natarajan, Chiara Piglione, Mythili Suryanarayana Prabhu,
Vijay Subramanian, Fred Baker and Bill VerSteeg
Advanced Architecture & Research Group, Cisco Systems Inc., San Jose, CA 95134, U.S.A.
{ropan, prenatar, cpiglion, mysuryan, vijaynsu, fred, versteb}
Abstract—Bufferbloat is a phenomenon where excess buffers in the
network cause high latency and jitter. As more and more interactive
applications (e.g. voice over IP, real time video conferencing and financial
transactions) run in the Internet, high latency and jitter degrade
application performance. There is a pressing need to design intelligent
queue management schemes that can control latency and jitter; and hence
provide desirable quality of service to users.
We present here a lightweight design, PIE (Proportional Integral
controller Enhanced), that can effectively control the average queueing
latency to a reference value. The design does not require per-packet extra
processing, so it incurs very small overhead and is simple to implement
in both hardware and software. In addition, the design parameters are
self-tuning, and hence PIE is robust and optimized for various network
scenarios. Simulation results, theoretical analysis and Linux testbed
results show that PIE can ensure low latency and achieve high link
utilization under various congestion situations.
Index Terms—bufferbloat, Active Queue Management (AQM), Quality
of Service (QoS), Explicit Congestion Notification (ECN)
The explosion of smart phones, tablets and video traffic in the
Internet brings about a unique set of challenges for congestion
control. To avoid packet drops, many service providers or data center
operators require vendors to put in as much buffer as possible.
With rapid decrease in memory chip prices, these requests are easily
accommodated to keep customers happy. However, the above solution
of large buffers fails to take into account the nature of TCP, the
dominant transport protocol running in the Internet. The TCP protocol
continuously increases its sending rate and causes network buffers to
fill up. TCP cuts its rate only when it receives a packet drop or mark
that is interpreted as a congestion signal. However, drops and marks
usually occur when network buffers are full or almost full. As a result,
excess buffers, initially designed to avoid packet drops, would lead
to highly elevated queueing latency and jitter. The phenomenon was
detailed in 2009 [1] and the term, “bufferbloat” was introduced by
Jim Gettys in late 2010 [2].
Figure 1 shows an example of extremely long latencies that are
caused by the bufferbloat problem. Ping messages were sent overnight
from a hotel in Ireland to San Jose, CA on January 27, 2012. Figure
1 depicts frequent delays in the neighborhood of 8 to 9 seconds.
A review of the round trip time distribution, shown in Figure 2,
is instructive. It suggests that as many as eight copies of the same
TCP segment, spaced Retransmission Time Out (RTO) apart, were
present at some time in the hotel’s DSL service. This obviously
reduces effective bandwidth; any bandwidth used for an unnecessary
retransmission is unavailable for valuable data. It also reduces the
value of the service and the willingness of a consumer to pay for it.
If we want to see additional value-add services, such as high volume
video content delivery in these networks, we need to get control of
the data buffered in networks.
Active queue management (AQM) schemes, such as RED [3],
BLUE [4], PI [5], AVQ [6], etc, have been around for well over
a decade. AQM schemes could potentially solve the aforementioned
Fig. 1. An Example of Extreme Long Latency: RTTs measured using
ping messages sent overnight from a hotel in Ireland to San Jose, CA.
Fig. 2. Experiment RTT Distributions: number of occurrences as a
function of RTTs. There are spikes that are RTOs apart.
problem. RFC 2309 [7] strongly recommends the adoption of AQM
schemes in the network to improve the performance of the Internet.
RED is implemented in a wide variety of network devices, both in
hardware and software. Unfortunately, due to the fact that RED needs
careful tuning of its parameters for various network conditions, most
network operators do not turn RED on. In addition, RED is designed
to control the queue length which would affect delay implicitly. It
does not control latency directly.
Note that the delay bloat caused by poorly managed big buffer
is really the issue here. If latency can be controlled, bufferbloat,
i.e., adding more buffers for bursts, is not a problem. More buffer
space would allow larger bursts of packets to pass through as long
as we control the average queueing delay to be small. Unfortunately,
Internet today still lacks an effective design that can control buffer
latency to improve the quality of experience to latency-sensitive
applications. In addition, it is a delicate balancing act to design a
queue management scheme that not only allows short-term burst to
smoothly pass, but also controls the average latency when long-term
congestion persists.
Recently, a new AQM scheme, CoDel [8], was proposed to
control the latency directly to address the bufferbloat problem. CoDel
requires per packet timestamps. Also, packets are dropped at the
dequeue function after they have been enqueued for a while. Both of
these requirements consume excessive processing and infrastructure
resources. This consumption will make CoDel expensive to imple-
ment and operate, especially in hardware.
In this paper, we present a lightweight algorithm, PIE (Proportional
Integral controller Enhanced), which combines the benefits of both
RED and CoDel: easy to implement like RED while directly control
latency like CoDel. Similar to RED, PIE randomly drops a packet
at the onset of the congestion. The congestion detection, however, is
based on the queueing latency like CoDel instead of the queue length
like conventional AQM schemes such as RED. Furthermore, PIE also
uses the latency moving trends: latency increasing or decreasing, to
help determine congestion levels.
Our simulation and lab test results show that PIE can control
latency around the reference under various congestion conditions. It
can quickly and automatically respond to network congestion changes
in an agile manner. Our theoretical analysis guarantees that the PIE
design is stable for arbitrary number of flows with heterogeneous
round trip times under a predetermined limit.
In what follows, Section II specifies our goals of designing the
latency-based AQM scheme. Section III explains the scheme in
detail. Section IV presents simulation and lab studies of the proposed
scheme. In Section V, we present a control theory analysis of PIE.
Section VI concludes the paper and discusses future work.
We explore a queue management framework where we aim
to improve the performance of interactive and delay-sensitive
applications. The design of our scheme follows a few basic criteria.
Low Latency Control. We directly control queueing latency
instead of controlling queue length. Queue sizes change with
queue draining rates and various flows’ round trip times. Delay
bloat is the real issue that we need to address as it impairs
real time applications. If latency can be controlled to be small,
bufferbloat is not an issue. As a matter of fact, we would allow
more buffers for sporadic bursts as long as the latency is under
High Link Utilization. We aim to achieve high link utilization.
The goal of low latency shall be achieved without suffering
link under-utilization or losing network efficiency. An early
congestion signal could cause TCP to back off and avoid queue
buildup. On the other hand, however, TCP’s rate reduction
could result in link under-utilization. There is a delicate balance
between achieving high link utilization and low latency.
Simple Implementation. The scheme should be simple to
implement and easily scalable in both hardware and software.
The wide adoption of RED over a variety of network
devices is a testament to the power of simple random early
dropping/marking. We strive to maintain similar design
Fig. 3. Overview of the PIE Design. The scheme comprises three simple
components: a) random dropping at enqueuing; b) latency based drop
probability update; c) dequeuing rate estimation.
Guaranteed stability and Fast Responsiveness. The scheme
should ensure system stability for various network topologies
and scale well with arbitrary number streams. The system
also should be agile to sudden changes in network conditions.
Design parameters shall be set automatically. One only needs to
set performance-related parameters such as target queue delay,
not design parameters.
We aim to find an algorithm that achieves the above goals. It is
noted that, although important, fairness is orthogonal to the AQM
design whose primary goal is to control latency for a given queue.
Techniques such as Fair Queueing [9] or its approximate such as
AFD (Approximate Fair Dropping) [10] can be combined with any
AQM scheme to achieve fairness. Therefore, in this paper, we focus
on controlling a queue’s latency and ensuring flows’ fairness is not
worse than those under the standard DropTail or RED design.
In the section, we describe in detail the design of PIE and its
operations. As illustrated in Figure 3, our scheme comprises three
simple components: a) random dropping at enqueuing; b) periodic
drop probability update; c) dequeuing rate estimation.
The following subsections describe these components in further
detail, and explain how they interact with each other. At the end of
this section, we will discuss how the scheme can be easily augmented
to precisely control bursts.
A. Random Dropping
Like most state-of-the-art AQM schemes, PIE would drop packets
randomly according to a drop probability, p, that is obtained from
the “drop probability calculation” component. No extra step, like
timestamp insertion, is needed. The procedure is as follows:
Random Dropping:
Upon packet arrival
randomly drop a packet with a probability p.
B. Drop Probability Calculation
The PIE algorithm updates the drop probability periodically as
estimate current queueing delay using Little’s law:
cur del =qlen
avg drate ;
calculate drop probability p as:
p=p+α(cur delref del) +β(cur delold del);
update previous delay sample as:
old del =cur del.
The average draining rate of the queue, avg drate, is obtained
from the “departure rate estimation” block. Variables, cur del and
old del, represent the current and previous estimation of the queueing
delay. The reference latency value is expressed in ref del. The
update interval is denoted as Tupdate. Parameters αand βare scaling
Note that the calculation of drop probability is based not only on
the current estimation of the queueing delay, but also on the direction
where the delay is moving, i.e., whether the delay is getting longer
or shorter. This direction can simply be measured as the difference
between cur del and old del. Parameter αdetermines how the
deviation of current latency from the target value affects the drop
probability; βexerts the amount of additional adjustments depending
on whether the latency is trending up or down. The drop probability
would be stabilized when the latency is stable, i.e. cur del equals
old del; and the value of the latency is equal to ref del. The relative
weight between αand βdetermines the final balance between latency
offset and latency jitter. This is the classic Proportional Integral
controller design [11], which has been adopted for controlling the
queue length before in [5] and [12]. We adopt it here for controlling
queueing latency. In addition, to further enhance the performance, we
improve the design by making it auto-tuning as follows:
if p < 1%:α= ˜α/8;β=˜
else if p < 10%:α= ˜α/2;β=˜
else: α= ˜α;β=˜
where ˜αand ˜
βare static configured parameters. Auto-tuning would
help us not only to maintain stability but also to respond fast to
sudden changes. The intuitions are the following: to avoid big
swings in adjustments which often leads to instability, we would like
to tune pin small increments. Suppose that pis in the range of 1%,
then we would want the value of αand βto be small enough, say
0.1%, adjustment in each step. If pis in the higher range, say above
10%, then the situation would warrant a higher single step tuning,
for example 1%. The procedures of drop probability calculation can
be summarized as follows.
Drop Probability Calculation:
Every Tupdate interval
1. Estimation current queueing delay:
cur del =qlen
avg dr ate .
2. Based on current drop probability, p, determine suitable step
if p < 1%, α = ˜α/8; β=˜
else if p < 10%, α = ˜α/2; β=˜
else , α = ˜α;β=˜
3. Calculate drop probability as:
p=p+α(cur del ref del) + β(cur del old del);
4. Update previous delay sample as:
old del =cur del.
We have discussed packet drops so far. The algorithm can be
easily applied to networks codes where Early Congestion Notification
(ECN) is enabled. The drop probability pcould simply mean marking
C. Departure Rate Estimation
The draining rate of a queue in the network often varies either
because other queues are sharing the same link, or the link capacity
fluctuates. Rate fluctuation is particularly common in wireless
networks. Hence, we decide to measure the departure rate directly
as follows:
Departure Rate Calculation:
Upon packet departure
1. Decide to be in a measurement cycle if:
qlen > dq threshold;
2. If the above is true, update departure count dq count:
dq count =dq count +dq pktsize;
3. Update departure rate once dq count > dq threshold and
reset counters:
dq int =now start;
dq rate =dq count
dq int ;
avg drate = (1 ε)avg drate +εdq rate
start =now.
dq count = 0;
From time to time, short, non-persistent bursts of packets result
in empty queues, this would make the measurement less accurate.
Hence we only measure the departure rate, dq rate, when there
are sufficient data in the buffer, i.e., when the queue length is over
a certain threshold, dq threshold. Once this threshold is crossed,
we obtain a measurement sample. The samples are exponentially
averaged, with averaging parameter ε, to obtain the average dequeue
rate, avg drate. The parameter, dq count, represents the number
of bytes departed since the last measurement. The threshold is
recommended to be set to 10KB assuming a typical packet size of
around 1KB or 1.5KB. This threshold would allow us a long enough
period, dq int, to obtain an average draining rate but also fast enough
to reflect sudden changes in the draining rate. Note that this threshold
is not crucial for the system’s stability.
D. Handling Bursts
The above three components form the basis of the PIE algorithm.
Although we aim to control the average latency of a congested queue,
the scheme should allow short term bursts to pass through the system
without hurting them. We would like to discuss how PIE manages
bursts in this section.
Bursts are well tolerated in the basic scheme for the following
reasons: first, the drop probability is updated periodically. Any short
term burst that occurs within this period could pass through without
incurring extra drops as it would not trigger a new drop probability
calculation. Secondly, PIE’s drop probability calculation is done
incrementally. A single update would only lead to a small incremental
change in the probability. So if it happens that a burst does occur
at the exact instant that the probability is being calculated, the
incremental nature of the calculation would ensure its impact is kept
Nonetheless, we would like to give users a precise control of the
burst. We introduce a parameter, max burst, that is similar to the
burst tolerance in the token bucket design. By default, the parameter
is set to be 100ms. Users can certainly modify it according to their
application scenarios. The burst allowance is added into the basic
PIE design as follows:
Burst Allowance Calculation:
Upon packet arrival
1. If burst allow > 0
enque packet bypassing random drop;
Upon dq rate update
2. Update burst allowance:
burst allow =burst allow dq int;
3. if p= 0; and both cur del and old del less than
ref del/2, reset burst allow,
burst allow =max burst;
The burst allowance, noted by burst allow, is initialized to
max burst. As long as burst allow is above zero, an incoming
packet will be enqueued bypassing the random drop process. When-
ever dq rate is updated, the value of burst allow is decremented
by the departure rate update period, dq int. When the congestion
goes away, defined by us as pequals to 0 and both the current and
previous samples of estimated delay are less than ref del/2, we
reset burst allow to max burst.
We evaluate the performance of the PIE scheme in both ns-
2 simulations and testbed experiment using Linux machines. As
the latest design CoDel is in Linux release, we compare PIE’s
performance against RED in simulations and against CoDel in testbed
A. Simulations Evaluation
In this section we present our ns-2 [13] simulations results. We first
demonstrate the basic functions of PIE using a few static scenarios;
and then we compare PIE and RED performance using dynamic sce-
narios. We focus our attention on the following performance metrics:
instantaneous queue delay, drop probability, and link utilization.
The simulation setup consists of a bottleneck link at 10Mbps
with a RTT of 100ms. Unless otherwise stated the buffer size is
200KB. We use both TCP and UDP traffic for our evaluations.
All TCP traffic sources are implemented as TCP New Reno
with SACK running an FTP application. While UDP traffic is
implemented using Constant Bit Rate (CBR) sources. Both UDP
and TCP packets are configured to have a fixed size of 1000B.
Unless otherwise stated the PIE parameters are configured to
(a) Light 5 TCP Flows (b) Heavy 50 TCP Flows
(c) Mix 5TCP + 2UDP Flows
Fig. 4. Queueing Latency Under Various Traffic Loads: a) 5 TCP flows;
b) 50 TCP flows; c) 5 TCP + 2 UDP flows. Queueing latency is controlled
at the reference level of 20ms regardless of the traffic intensity.
their default values; i.e., delay ref = 20ms, Tupdate = 30ms, ˜α=
0.125Hz, ˜
β= 1.25Hz, dq threshold = 10KB, max burst = 100ms.
(a) Light 5 TCP Flows (b) Heavy 50 TCP Flows
(c) Mix 5TCP + 2UDP Flows
Fig. 5. Link Throughput Under Various Traffic Loads: a) 5 TCP flows; b)
50 TCP flows; c) 5 TCP + 2 UDP flows. High link utilization is achieved
regardless of traffic intensity, even under low multiplexing case.
Function Verification: We first validate the functionalities of PIE,
making sure it performs as designed using static traffic sources with
various loads.
1) Light TCP traffic: Our first simulation evaluates PIE’s perfor-
mance under light traffic load of 5 TCP flows. Figure 4(a) shows the
queueing delay, and Figure 5(a) plots the instantaneous throughput
respectively. Figure 4(a) demonstrates that the PIE algorithm is able to
maintain the queue delay around the equilibrium reference of 20ms.
Due to low multiplexing, TCP’s sawtooth behavior is still slightly
visible in Figure 4(a). Nonetheless, PIE regulates the TCP traffic
quite well so that the link is close to its full capacity as shown in
Figure 5(a). The average total throughput is 9.82Mbps. Individual
Fig. 6. PIE’s Burst Control: at simulation time 1s, the UDP flow starts
sending traffic. With a burst allowance of 100ms, the flow can either
pass through without incurring drops or starting incurring drops at 1.1sec.
With a burst allowance of 0ms, the flow would incur drops right from
the beginning. The longer the burst is, the higher the number of drops is.
flows’ throughputs are 1.86Mbps, 2.15Mbps, 1.80Mbps, 1.92Mbps
and 2.09Mbps respectively, close to their fair share.
2) Heavy TCP traffic: In this test scenario, we increase the number
of TCP flows to 50. With higher traffic intensity, the link utilization
reaches 100% as clearly shown in Figure 5(b). The queueing delay,
depicted in Figure 4(b), is controlled around the desired 20ms,
unaffected by the increased traffic intensity. The effect of sawtooth
fades in this heavy traffic case. The queueing delay fluctuates more
evenly around the reference level. The average throughput reaches
full capacity of 10Mbps as shown in Figure 5(b).
3) Mixture of TCP and UDP traffic: To demonstrate PIE’s per-
formance under persistent heavy congestion, we adopt a mixture of
TCP and UDP traffic. More specifically, we have 5 TCP flows and
2 UDP flows (each sending at 6Mbps). The corresponding latency
plot can be found in Figure 4(c). Again, PIE is able to contain the
queueing delay around the reference level regardless the traffic mix
while achieving 100% throughput shown in Figure 5(c).
4) Bursty Traffic: As the last functionality test, we show PIE’s
ability to tolerate bursts, whose maximum value is denoted by
max burst. We construct a test where one short lived UDP traffic
sends at a peak rate of 25Mbps over a burst len period of time.
We set burst len to be 100ms and 200ms respectively. We also set
max burst values to be 0 (i.e., no burst tolerance) and 100ms. The
UDP flow starts sending at simulation time of 1s. Figure 6 plots
the number of dropped packets as a function of the simulation time
for four combinations of burst len and max burst. It is obvious
from the graph that if the burst len is less than max burst,
no packets are dropped. When burst len equals to 200ms, the
first 100ms of the burst are allowed into the queue and the PIE
algorithm starts dropping only afterword. Similarly, if we set
the PIE scheme to have no burst tolerance (i.e., max burst =
0), the algorithm starts dropping as soon as the queue starts filling up.
Performance Evaluation and Comparison: The functions of PIE
are verified above. This section evaluates PIE under dynamic traffic
scenarios, compares its performance against RED and shows how
PIE is better suited for controlling latency in today’s Internet. The
simulation topology is similar to the above. The core router runs
either the RED or PIE scheme. The buffer size is 2MB for the two
schemes. RED queue is configured with the following parameters:
minth = 20% of the queue limit, maxth = 80% of the queue limit,
maxp= 0.1 and q weight = 0.002. The PIE queue is configured
with the same parameters as in the previous section.
Fig. 7. PIE vs. RED Performance Comparison Under Varying Link
Capacity: 0s-50s, link capacity = 100Mbps, 50s-100s, link capacity =
20Mbps, 100s-150s, link capacity = 100Mbps. By only controlling the
queue length, RED suffers long queueing latency when the queue draining
rate changes. PIE is able to quickly adapt to changing network conditions
and consistently control the queueing latency to the reference level of
5) Varying Link Capacity: This test starts with the core capacity
of 100Mbps; 100 TCP flows start randomly in the initial second.
At 50s, the available bandwidth drops to 20Mbps and jumps back
to 100Mbps at 100s. The core routers queue limit is set to 2MB.
Figure 7 plots the queuing latency experienced by the RED and
PIE queues. The figure shows that both AQM schemes converge
to equilibrium after a couple of seconds from the beginning of the
simulation. The queues experience minimal congestion during the
first 50s. RED operates around minth, and the queue latency settles
around 30ms accordingly. Similarly, PIE converges to the equilibrium
value of 20ms. When available bandwidth drops to 20Mbps (at 50s),
the queue size and latency shoot up for both RED and PIE queues.
RED’s queue size moves from minth to maxth to accommodate
higher congestion, and the queuing latency stays high around 300ms
between 50s and 100s. On the other hand, PIE’s drop probability
quickly adapts, and in about three seconds, PIE is able to bring
down the latency around the equilibrium value (20ms). At 100s,
the available bandwidth jumps back to 100Mbps. Since congestion
is reduced, RED’s queue size comes down back to minth, and the
latency is reduced as well. PIE’s drop probability scales down quickly,
allowing PIE’s latency to be back at the equilibrium value. Note that
due to static configurations of RED’s parameters, it cannot provide
consistent performance under varying network conditions. PIE, on
the other hand, automatically adapts itself and can provide steady
latency control for varying network conditions.
6) Varying Traffic Intensity: We also verify both schemes’ perfor-
mance under varying network traffic load with the number of TCP
flows ranging from 10 through 50. The simulation starts with 10
TCP flows and the number of TCP flows jumps to 30 and 50 at 50s
and 100s, respectively. The traffic intensity reduces at 150s and 200s
when number of flows drops back to 30 and 10, respectively. Figure 8
plots the queuing latency experienced under the two AQM schemes.
Every time the traffic intensity changes, RED’s operating queue size
and latency are impacted. On the other hand, PIE quickly adjusts the
dropping probability in a couple of seconds, and restore the control
of the queuing latency to be around equilibrium value.
B. Testbed Experiments
We also implemented PIE in Linux Kernel. In this section, we
evaluate PIE in the lab setup and compare it against CoDel whose
design is the most up to date in Linux. The current implementation
is on Kernel Version 3.5-rc1.
Fig. 8. PIE vs. RED Performance Comparison Under Varying Traffic
Intensity: 0s-50s, traffic load is 10 TCP flows; 50s-100s, traffic load is
30 TCP flows;100s-150s, traffic load is increased to 50 TCP flows; traffic
load is then reduced to 30 and 10 at 200s and 250s respectively. Due to
static configured parameters, the queueing delay increases under RED as
the traffic intensifies. The autotuning feature of PIE, however, allows the
scheme to control the queueing latency quickly and effectively.
Fig. 9. The Testbed Setup: our experiment testbed consists of four unique
Linux-based machines. The router implements AQM schemes.
Our experiment testbed consists of four unique Linux-based ma-
chines as shown in Figure 9. The sender is directly connected to the
router with the PIE implemention. Hierarchical token bucket (htb)
qdisc has been used to create a bandwidth constraint of 10Mbps. The
router is connected to the receiver through another server. The delay
is added in the forward direction using a delay emulator. Iperf tool is
run on the sender and receiver to generate traffic. All measurements
are done at the router. We obtain statistics and measure throughput
at the router through the tc interface.
For all our lab test scenarios, we use the following PIE parameters
PIE: ˜α= 0.125, ˜
β= 1.25, buffer size = 200 packets, Tupdate = 30ms.
CoDel is used for comparison, whose parameters are set accordingly
to the default: interval = 100ms and queue limit = 200 packets.
Packet sizes are 1KB for both schemes.
(a) Reference Delay = 5ms (b) Reference Delay = 20ms
Fig. 10. Cdf of Queueing Delay Comparison Between PIE and CoDel:
20 TCP flows. When the reference delay is 5ms, 70% of the delays under
PIE vs. 30% of delays under CoDel are less than 5ms. PIE and CoDel
behave similarly when the reference delay is 20ms.
1) TCP Traffic: We evaluate the performance of PIE and CoDel
in a moderately congested scenario. We use 20 NewReno TCP flows
with RTT = 100ms and run the test for 100 seconds. Figure 10 plots
the cdf curve of the queueing delay for PIE and CoDel when the
(a) Reference Delay = 5ms (b) Reference Delay = 20ms
Fig. 11. Cdf of Queueing Delay Comparison Between PIE and CoDel: 5
TCP flows and 2 UDP flows. It is obvious that, under heavy congestion,
CoDel cannot control the latency to the the target values while PIE
behaves consistently according to the design.
reference delay is 5ms and 20ms respectively.1Both schemes are
able to control the queueing delay reasonably well. When the target
delay is 5ms, more than 90% packets under both schemes experience
delays that are less than 20ms. It is clear that PIE performs better:
70% of the delays are less than 5ms while CoDel has only 30%. When
the reference delay equals to 20ms, the performance of both schemes
look similar. PIE still performs slightly better: 50% of packet delays
are less than 20ms while only 40% of packet delays are less than
20ms under CoDel. For the 20ms target delay case, the throughput
for PIE is 9.87Mbps vs. CoDel is 9.94Mbps. For the 5ms target delay
case, the throughput for PIE is 9.66Mbps vs. CoDel is 9.84Mbps.
CoDel’s throughput is slightly better than PIE.
2) Mixture of TCP and UDP Traffic: In this test, we show the
stability of both schemes in a heavily congested scenario. We setup 5
TCP flows and 2 UDP flows (each transmitting at 6Mbps). The 2 UDP
flows result in a 20% oversubscription on the 10Mbps bottleneck.
Figure 11 shows the cdf plot for the delay. Under the mixture of TCP
and UDP traffic, it is obvious that CoDel cannot control the latency
under the target values of 5ms and 20ms respectively. Majority of the
packets experience long delays over 100ms. PIE, on the other hand,
behaves consistently according to the design: with 70% less than the
target of 5ms, and 60% less than the target of 20ms respectively.
Vast majority of packets, close to 90%, do not experience delay that
is more than twice of the target value. In this test, when the target
delay equals to 20ms, the throughput for PIE is 9.88Mbps vs. CoDel
is 9.91Mbps. When the target delay equals to 5ms, the throughput for
PIE is 9.79Mbps vs. CoDel is 9.89Mbps. Throughputs are similar.
We formulate our analysis model based on the TCP fluid-flow and
stochastic differential equations originated in the work by Misra et
al. [14] and [15]. We consider multiple TCP streams passing through
a network that consists of a congested link with a capacity Cl. It is
shown in [14] that the TCP window evolution can be approximated
dW (t)
dt =1
2R(tR(t)) p(tR(t)); (1)
dt =W(t)
where W(t)and q(t)denote the TCP window size and the expected
queue length at time t. The load factor, number of flows, is indicated
1There is a subtle difference in the definition of terms: in CoDel, the
parameter, target, represents the target latency, while del ref refers the
reference equilibrium latency in PIE. For easy comparison, we would use
”Reference Delay” to refer the target latency, target, in CoDel and the
reference equilibrium latency, del ref, in PIE.
Fig. 12. The Feedback Loop of PIE: it captures TCP, Queue and PIE
dynamics; and also models the RTT delay.
Fig. 13. Phase Margin as a Function of Parameters αand β: in order
for a system to be stable, we need a phase margin above 0.
as N(t).R(t)represents the round trip time including the queueing
delay. Note that R(t)denotes the harmonic mean of the flows’ round
trip times. The drop or mark probability is indicated as p(t).
AQM schemes determine the relationship between the drop or mark
probability p(t)and the queue length q(t). This relationship in PIE is
detailed in Section III-B. PIE increases its drop or mark probability
based on current queueing delay and the delay moving trend. Using
Bilinear Transformation, we can convert the PIE design into a fluid
model as follows:
τ(t) = q(t)
dt =ατ(t)τref
T+ (β+α
dt ;(4)
where τand τref represent the queue latency and its equilibrium
reference value, and Tis the update interval, which equals to Tupdate.
Note that although Tdoes not show up directly in the discrete form
of the algorithm, it does play a role in the overall system’s behavior.
These fluid equations describe our control system’s overall be-
havior from which we can derive the equilibrium and dynamic
characteristics as follows.
A. Linearized Continuous System
When the system reaches steady state, the operating point
(Wo, qo, po)is defined by ˙
W= 0,˙q= 0 and τ=τr ef so that
opo= 2 and Wo=RoCl
Fig. 14. An Illustration about Stability Margin for Different RTTs: for
Ro< R+, the gain of the system reduces and poles are moved towards
high frequency. As a result, phase margin increases.
We can approximate our system’s dynamics by their small-signal
linearization about an operating point based on small perturbations:
e.g. W=Wo+δW . Our system equations above lead to the
δ˙q(t) = N
RoδW (t)1
δq(t); (6)
W(t) = N
(δW (t)δW (tRo))
2N2δp(tRo); (7)
δ˙p(t) = α
T Cl
From Equations (5) to (8), we can obtain transfer function in
Laplace domain shown in Figure gain as:
G(s)≈ −
i.e. ≈ − κ(/z1+ 1)
(/s1+ 1)(/s2+ 1)
where κ=αRo/(poT),z1=α/((β+α/2)T),s1=2po/Ro
and s2 = 1/Ro. Note that the loop scales with C/N, which can be
derived from the drop probability poand Ro.
B. Determining Control Parameters α,β
Although many combinations of T,αand βwould lead to system
stability, we choose our parameter settings according to the following
guideline. We choose Tso that we sample the queueing latency two
or three times per round trip time. Then, we set the values of αand β
so that the Bode diagram analysis would yield enough phase margin.
Figure 13 shows what phase margin would be for various values of
αand βgiven that T= 30ms, R+=100ms and p= 0.01%. It is
obvious that, in order to have system stability, we need to choose α
and βvalues so that we have phase margin above 0.
Once we choose the values of αand βso that the feedback system
in Equation (9) is stable for R+and p. For all systems with Ro<
R+and po> p, the gain of the loop, |G(s)|=αRo/(poT), <
αR+/(pT),s1>p2p/R+and s2>1/R+. Hence, if we make
the system stable for R+and p, the system will be stable for all
Ro< R+and po> p. Figure 14 illustrates this point showing how
the phase margin improves from 29.5to 105.0when Rois 50ms
instead of R+of 100ms.
We could fix the values of αand βand still guarantee stability
across various network conditions. However, we need to choose
Fig. 15. Phase Margin as a Function of po: three pairs of α:βsettings.
Lower values of αand βgives higher phase margin. Autotuning picks
different pair for different porange to optimally tradeoff stability vs.
response time.
Fig. 16. Loop Frequency as a Function of po: three pairs of α:β
settings. Lower values of αand βhas slower response time. Autotuning
picks different pair for different porange to optimally tradeoff stability
vs. response time.
conservative values of αand βto achieve stability. For example,
we need to set α= 0.0156 and β= 0.156 in the above case to
guarantee a phase margin of 25for po0.01%. However, when
the drop probability increases, the system response time would take
a hit. Figure 15 shows the phase margin as a function of po2
for α:βvalues of 0.125:1.25, (0.125/2):(1.25/2) = 0.0625:0.625,
(0.125/8):(1.25/8) = 0.0156:0.156, respectively. Their corresponding
loop bandwidths, which directly determine their response time, are
shown in Figure 16. As shown in Figure 15, if we choose α:β
values to be 0.0625:0.625 and 0.125:1.25, we don’t have enough
phase margin to ensure stability when po<1%, i.e. po<0.1.
On the other hand, these two higher value pairs would lead to faster
response time as depicted in Figure 16.
Auto-tuning in PIE tries to solve the above problem by adapting
its control parameters αand βbased on congestion levels. How
congested a link is can be easily inferred from the drop probability
po. When the network is lightly congested, say under 1%, we choose
numbers that can guarantee stability. When the network is moderately
congested, say under 10%, we can increase their values to increase
system response time. When the network is heavily congested, we can
increase their values even further without sacrificing stability. While
the adjustment can be continuous, we choose discrete numbers for
simplicity. As demonstrated in Figure 15 and 16, the auto-tuning
design in PIE can improve the response time of the loop greatly
without losing stability. Our tests results in Section IV als shows that
2We choose poinstead of pobecause s1scales with po.
auto-tuning works well in varying congestion environment.
In this paper we have described PIE, a latency-based design for
controlling bufferbloat in the Internet. The PIE design based its
random dropping decisions not only on current queueing delay but
also on the delay moving trend. In addition, the scheme self-tunes
its parameters to optimize system performance. As a result, PIE is
effective across diverse range of network scenarios. Our simulation
studies, theoretical analysis and testbed results show that PIE can
ensure low latency under various congestion situations. It achieves
high link utilization while maintaining stability consistently. It is a
light-weight, enqueing based design that works with both TCP and
UDP traffic. The PIE design only requires low speed drop probability
update, so it incurs very small overhead and is simple enough to
implement in both hardware and software.
Going forward, we will study how to automatically set latency
references based on link speeds: set low latency references for high
speed links while being conservative for lower speed links. We will
also explore efficient methods to provide weighted fairness under PIE.
There are two ways to achieve this: either via differential dropping
for flows sharing a same queue or through class-based fair queueing
structure where flows are queued into different queues. There are pros
and cons with either approach. We will study the tradeoffs between
these two methods.
[1] B. Turner, “Has AT&T Wireless Data Congestion Been Self-Inflicted?”
[Online]. Available: BroughTurnerBlog
[2] J. Gettys, “Bufferbloat: Dark buffers in the internet,IEEE Internet
Computing, vol. 15, pp. 95–96, 2011.
[3] S. Floyd and V. Jacobson, “Random early detection gateways for
congestion avoidance,IEEE/ACM Transactions on Networking, vol. 1,
no. 4, pp. 397–413, Aug. 1993.
[4] W. Feng, K. Shin, D. Kandlur, and D. Saha, “The blue active queue man-
agement algorithms,” IEEE/ACM Transactions on Networking, vol. 10,
no. 4, pp. 513–528, Aug. 2002.
[5] C. V. Hollot, V. Misra, D. Towsley, and W. bo Gong, “On designing
improved controllers for aqm routers support,” in Proceedings of IEEE
Infocom, 2001, pp. 1726–1734.
[6] S. Kunniyur and R. Srikant, “Analysis and design of an adaptive virtual
queue (avq) algorithm for active queue management,” in Proceedings of
ACM SIGCOMM, 2001, pp. 123–134.
[7] B. Braden, D. Clark, J. Crowcroft, and et. al., “Recommendations on
Queue Management and Congestion Avoidance in the Internet,” RFC
2309 (Proposed Standard), 1998.
[8] K. Nichols and V. Jacobson, “A Modern AQM is just one
piece of the solution to bufferbloat.” [Online]. Available: http:
[9] A. Demers, S. Keshav, and S. Shenker, “Analysis and simulaton of
a fair queueing algorihtm,” Journal of Internetworking Research and
Experience, pp. 3–26, Oct. 1990.
[10] R. Pan, B. Prabhakar, F. Bonomi, and R. Olsen, “Approximate fair band-
width allocation: a method for simple and flexible traffic management,
in Proceedings of 46th Annual Allerton Conference on Communication,
Control and Computing, 2008.
[11] G. Franklin, J. D. Powell, and A. Emami-Naeini, in Feedback Control
of Dynamic Systems, 1995.
[12] R. Pan, B. Prabhakar, and et. al., “Data center bridging - congestion
notification.” [Online]. Available:
[13] “NS-2.” [Online]. Available:
[14] V. Misra, W.-B. Gong, and D. Towsley, “Fluid-based analysis of a
network of aqm routers supporting tcp flows with an application to red,
in Proceedings OF ACM SIGCOMM, 2000, pp. 151–160.
[15] C. V. Hollot, V. Misra, D. Towsley, and W. bo Gong, “A control theoretic
analysis of red,” in Proceedings of IEEE Infocom, 2001, pp. 1510–1519.
... Therefore our goal is very low queuing delay, not just on average but for a high percentile of packets. 'Very low' means within single digit milliseconds, for instance 2 ms at the 99th percentile over a typical Internet path, which is an order of magnitude lower than with PIE [51] or FQ-CoDel [39]. ...
... Nonetheless, the coupling between the AQMs makes flows behave as if they are using a single pool of capacity. This builds on the theoretical and experimental proof in [26] that squaring the output of a PI controller is a more effective, more principled and simpler way of controlling Reno (rate proportional to 1/ √ p ) than PI Enhanced (PIE [51]). It has the added advantage that the controller's direct (unsquared) output can be used to control a scalable CC like Prague as well (rate proportional to 1/p ). ...
Full-text available
On the Internet, sub-millisecond queueing delay and capacity-seeking have traditionally been considered mutually exclusive. We introduce a service that offers both: Low Latency Low Loss Scalable throughput (L4S). When tested under a wide range of conditions emulated on a testbed using real residential broadband equipment, queue delay remained both low (median 100--300 $\mu$s) and consistent (99th percentile below 2 ms even under highly dynamic workloads), without compromising other metrics (zero congestion loss and close to full utilization). L4S exploits the properties of `Scalable' congestion controls (e.g., DCTCP, TCP Prague). Flows using such congestion control are however very aggressive, which causes a deployment challenge as L4S has to coexist with so-called `Classic' flows (e.g., Reno, CUBIC). This paper introduces an architectural solution: `Dual Queue Coupled Active Queue Management', which enables balance between Scalable and Classic flows. It counterbalances the more aggressive response of Scalable flows with more aggressive marking, without having to inspect flow identifiers. The Dual Queue structure has been implemented as a Linux queuing discipline. It acts like a semi-permeable membrane, isolating the latency of Scalable and `Classic' traffic, but coupling their capacity into a single bandwidth pool. This paper justifies the design and implementation choices, and visualizes a representative selection of hundreds of thousands of experiment runs to test our claims.
... The aim of Active Queue Management, or AQM, is to mitigate the disastrous effects of Internet congestion, when using the Transmission Control Protocol, or TCP. An abundant literature has been devoted to this hot topic since 30 years, where control theory often plays a key rôle (see, e.g., [2], [48]), especially perhaps P, PI, PD, and PID regulators: See, e.g., [42] for the popular PIE, or Proportional Integral Enhanced controller. Like many control-oriented investigations until today we are also using a linear time-invariant delay system [19], [20]. ...
Full-text available
Active Queue Management (AQM) for mitigating Internet congestion has been addressed via various feedback control syntheses, among which P, PI, and PID regulators are quite popular and often associated to a Smith predictor. Here, to better account for the many uncertainties, like the round trip time or the number of TCP sessions, the above controllers are replaced by an intelligent proportional controller associated to Model-Free Control (MFC) and by forecasting techniques derived from a new approach to time series. Several computer simulations via a well accepted linear modeling, where the delay is assumed to be constant, are presented and discussed.
... PIE (see [31][32][33]) uses an estimate of the buffer queue delay as an indicator of congestion, marking with this estimated time each packet at the buffer entrance. When queuing a packet, a random discard is performed with a probability p obtained as a function of the latency calculated as an estimate of the delay and the trend that this value develops. ...
Full-text available
In recent years, Active Queue Management (AQM) mechanisms to improve the performance of TCP/IP networks have acquired a relevant role. In this paper, we present a simple and robust RED-type algorithm together with a couple of dynamical variants with the ability to adapt to the specific characteristics of different network environments, as well as to the user’s needs. We first present a basic version called Beta RED (BetaRED), where the parameters can be tuned according to the specific network conditions. The aim is to introduce control parameters that are easy to interpret and provide a good performance over a wide range of values. Secondly, BetaRED is used as a framework to design two dynamic algorithms, which we will call Adaptive Beta RED (ABetaRED) and Dynamic Beta RED (DBetaRED). In those new algorithms, certain parameters are dynamically adjusted so that the queue length remains stable around a predetermined reference value and according to changing network traffic conditions. Finally, we present a battery of simulations using the Network Simulator 3 (ns-3) software with a two-fold objective: to guide the user on how to adjust the parameters of the BetaRED mechanism, and to show a performance comparison of ABetaRED and DBetaRED with other representative algorithms that pursue a similar objective.
Full-text available
This work proposes a new mathematical model for the TCP/AQM system that aims to improve the accuracy of existing fluid models, especially with respect to the sequential events that occur in the network. The analysis is based on the consideration of two time bases, one at the queue's router level and the other at the congestion window level, which leads to the derivation of a new nonlinear two-dimensional fluid model for Internet congestion control. To avoid the difficult task of assessing stability of a 2D nonlinear dynamic model, we perform a local stability analysis of a 2D linear TCP AQM model. By constructing a new two dimensional second order Bessel Legendre Lyapunov functional, new matrix inequalities are derived to evaluate the stability of the 0-input system and to synthesize a feedback controller. Finally, two Internet traffic scenarios, with state space matrices for replicability, are presented, demonstrating the validity of the theoretical results.
Unmanned Aerial Vehicles (UAVs) have emerged as adequate platforms to carry communications nodes, including Wi-Fi Access Points and cellular Base Stations. This has led to the concept of flying networks composed of UAVs as a flexible and agile solution to provide on-demand wireless connectivity anytime, anywhere. However, state of the art works have been focused on optimizing the placement of the access network providing connectivity to ground users, overlooking the backhaul network design. In order to improve the overall Quality of Service (QoS) offered to ground users, the placement of Flying Gateways (FGWs) and the size of the queues configured in the UAVs need to be carefully defined to meet strict performance requirements. The main contribution of this article is a traffic-aware gateway placement and queue management (GPQM) algorithm for flying networks. GPQM takes advantage of knowing in advance the positions of the UAVs and their traffic demand to determine the FGW position and the queue size of the UAVs, in order to maximize the aggregate throughput and provide stochastic delay guarantees. GPQM is evaluated by means of ns-3 simulations, considering a realistic wireless channel model. The results demonstrate significant gains in the QoS offered when GPQM is used.
Conference Paper
Full-text available
In this paper we study a previously developed linearized model of TCP and active queue management (AQM). We use classical control system techniques to develop controllers well suited for the application. The controllers are shown to have better theoretical properties than the well known RED controller. We present guidelines for designing stable controllers subject to network parameters like load level propagation delay etc. We also present simple implementation techniques which require a minimal change to RED implementations. The performance of the controllers are verified and compared with RED using ns simulations. The second of our designs, the proportional integral (PI) controller is shown to outperform RED significantly
Conference Paper
Full-text available
In this paper we use jump process driven Stochastic Differential Equations to model the interactions of a set of TCP flows and Ac- tive Queue Management routers in a network setting. We show how the SDEs can be transformed into a set of Ordinary Differen- tial Equations which can be easily solved numerically. Our solu- tion methodology scales well to a large number of flows. As an application, we model and solve a system where RED is the AQM policy. Our results show excellent agreement with those of sim- ilar networks simulated using the well known ns simulator. Our model enables us to get an in-depth understanding of the RED al- gorithm. Using the tools developed in this paper, we present a crit- ical analysis of the RED algorithm. We explain the role played by the RED configuration parameters on the behavior of the algorithm in a network. We point out a flaw in the RED averaging mecha- nism which we believe is a cause of tuning problems for RED. We believe this modeling/solution methodology has a great potential in analyzing and understanding various network congestion control algorithms.
Conference Paper
Full-text available
We use a previously developed nonlinear dynamic model of TCP to analyze and design active queue management (AQM) control systems using random early detection (RED). First, we linearize the interconnection of TCP and a bottlenecked queue and discuss its feedback properties in terms of network parameters such as link capacity, load and round-trip time. Using this model, we next design an AQM control system using the RED scheme by relating its free parameters such as the low-pass filter break point and loss probability profile to the network parameters. We present guidelines for designing linearly stable systems subject to network parameters like propagation delay and load level. Robustness to variations in system loads is a prime objective. We present no simulations to support our analysis
This paper presents Random Early Detection (RED) gateways for congestion avoidance in packet-switched networks. The gateway detects incipient congestion by computing the average queue size. The gateway could notify connections of congestion either by dropping packets arriving at the gateway or by setting a bit in packet headers. When the average queue size exceeds a preset threshold, the gateway drops or marks each arriving packet with a certain probability, where the exact probability is a function of the average queue size. RED gateways keep the average queue size low while allowing occasional bursts of packets in the queue. During congestion, the probability that the gateway notifies a particular connection to reduce its window is roughly proportional to that connection's share of the bandwidth through the gateway. RED gateways are designed to accompany a transport-layer congestion control protocol such as TCP. The RED gateway has no bias against bursty traffic and avoids the global synchronization of many connections decreasing their window at the same time. Simulations of a TCP/IP network are used to illustrate the performance of RED gateways.
Conference Paper
The Internet architecture uses congestion avoidance mechanisms implemented in the transport layer protocol like TCP to provide good service under heavy load. If network nodes distribute bandwidth fairly, the Internet would be more robust and accommodate a wide variety of applications. Various congestion and bandwidth management schemes have been proposed for this purpose and can be classified into two broad categories: packet scheduling algorithms such as fair queueing (FQ) which explicitly provide bandwidth shares by scheduling packets. They are more difficult to implement compared to FIFO queueing. The second category has active queue management schemes such as RED which use FIFO queues at the routers. They are easy to implement but don't aim to provide (and, in the presence of non-congestion-responsive sources, don't provide) fairness. An algorithm called AFD (approximate fair dropping), has been proposed to provide approximate, weighted max-min fair bandwidth allocations with relatively low complexity. AFD has since been widely adopted by the industry. This paper describes the evolution of AFD from a research project into an industry setting, focusing on the changes it has undergone in the process. AFD now serves as a traffic management module, which can be implemented either using a single FIFO or overlaid on top of extant per-flow queueing structures and which provides approximate bandwidth allocation in a simple fashion. The AFD algorithm has been implemented in several switch and router platforms at Cisco sytems, successfully transitioning from the academic world into the industry.
We have conflated "speed" with "band width." As Stuart Chesire wrote in "It's the Latency, Stupid" (, "Making more bandwidth is easy. Once you have bad latency, you're stuck with it." Bufferbloat is the existence of excessively large (bloated) buffers in systems, particularly network communication systems. Bufferbloat is now (almost?) everywhere. Today's routers, switches, gateways, broad band gear, and so on have bloated buffer sizes to where we often measure latency in seconds, rather than microseconds or milliseconds.
Virtual queue-based marking schemes have been recently proposed for Active Queue Management (AQM) in Internet routers. We consider a particular scheme, which we call the Adaptive Virtual Queue (AVQ), and study its following properties: its stability in the presence of feedback delays, its ability to maintain small queue lengths, and its robustness in the presence of extremely short flows (the so-called web mice). Using a linearized model of the system dynamics, we present a simple rule to design the parameters of the AVQ algorithm. We then compare its performance through simulation with several well-known AQM schemes such as RED, REM, Proportional Integral (PI) controller, and a nonadaptive virtual queue algorithm. With a view toward implementation, we show that AVQ can be implemented as a simple token bucket using only a few lines of code.