Timer Interaction in Route Flap Damping.
ABSTRACT Route Flap Damping is a mechanism generally used in network routing protocols. Its goal is to limit the global impact of unstable routes by temporarily suppressing routes with rapid changes over short time periods. Although route damping is a clearly defined and simple procedure at each router, its effect in a large network setting is not well understood. We show that the current damping design leads to the intended behavior only under persistent route flapping. When the number of flaps is small, the global routing dynamics deviates significantly from the expected behavior with a longer convergence delay. Previous work observed that a single route flap can falsely trigger route suppression due to path exploration. However our simulations show that this false suppression only accounts for 30% of the convergence delay after a single route flap. Our study reveals previously unknown interactions between reuse timers at different routers. Route suppression and reuse at different routers are triggered at different times and thus affect the number of updates received by other routers. In turn, this impacts other routers' damping behavior. We propose to use Root Cause Notification to eliminate both false suppression and undesirable timer interaction
Timer Interaction in Route Flap Damping
Colorado State University
Route Flap Damping is a mechanism generally used in
network routing protocols. Its goal is to limit the global im-
pact of unstable routes by temporarily suppressing routes
with rapid changes over short time periods. Although route
damping is a clearly defined and simple procedure at each
router, its effect in a large network setting is not well un-
derstood. We show that the current damping design leads
to the intended behavior only under persistent route flap-
ping. When the number of flaps is small, the global routing
dynamics deviates significantly from the expected behavior
with a longer convergence delay. Previous work observed
that a single route flap can falsely trigger route suppression
this false suppression only accounts for 30% of the conver-
gence delay after a single route flap. Our study reveals pre-
entrouters. Routesuppressionandreuse at differentrouters
are triggered at different times and thus affect the number
of updates received by other routers. In turn, this impacts
other routers’ damping behavior. We propose to use Root
Cause Notification to eliminate both false suppression and
undesirable timer interaction.
It remains a challenge to design a responsive and effi-
cient routingprotocolfor largescale networks. Inthis paper
we examine a specific problem caused by unexpected inter-
actions among multiple nodes in a large network. Since dy-
namic routing protocolsadapt to topologicalchanges, a sin-
gleunstablelinkcan potentiallycause a largenumberof up-
dates being propagated throughout the entire network, con-
suming router CPU cycles and link bandwidth . To limit
the global impact of individual unstable routes, Route Flap
Damping  was added to the Border Gateway Protocol
(BGP)  several years ago. It is commonly believed that
damping has played an essential role in putting the global
Internet routing update overhead under control .
The goal of BGP damping is to allow updates of stable
ble routes. Briefly, route flap damping works as follows. A
router associates a penalty value with each destination (i.e.,
an IP prefix) advertised by a neighbor router. A route flaps
whenever the neighbor router changes its route to the desti-
nation. When it happens, the penalty value is increased. In
the absence of route changes, the penalty value decays over
time. When the penalty value exceeds a predefined cut-off
threshold, further updates from the same neighbor for the
same destination will no longer be propagated. That is, the
route is suppressed. When the penalty value drops below a
predefined reuse threshold, the router will start propagating
updates for that destination again, i.e., the route is reused.
Throughout this paper we will use the word damping as an
abbreviation for “route flap damping” to refer to the whole
mechanism, and the word suppression to refer to the spe-
cific action of stopping propagating updates.
Despite its simple rules of operation at each router, the
overall effect of damping is not fully understood. A recent
study  showed that, after a single route flap, path explo-
ration can falsely trigger route suppression and prolong the
convergence time. Yet our simulations show that this false
suppression alone can only account for 30% of the conver-
gence delay after a single route flap, and cannot explain the
damping behavior after two or more route flaps. We will
show that the current BGP damping mechanism achieves
the intended behavior only under persistent route flapping.
When the number of route flaps is small, the global routing
dynamics deviates significantly from the intended behavior.
Because the current damping implementation counts all re-
ceivedupdates in calculatingthe penalty value, and because
route suppression and reuse at different routers happen at
different times, false damping can be triggered not only by
path exploration, but also by the updates due to route reuse
at neighbor routers. We propose to add Root Cause Notifi-
cation (RCN)  to routing updates, in order to eliminate
both false suppression and undesirable timer interactions.
The remainder of the paper is organizedas follows. Sec-
tion 2 describes BGP damping mechanism and previous
work. Section 3 analyzes the intended damping behavior.
Figure 1. Example
Figure 2. A router’s RIBs
0 240 480 720 960 1200 1440 1680 1920 2160 2400 2640
Figure 3. Damping Penalty
Section4 analyzestheactualdampingbehaviorindetail, in-
cluding how timer interactions shape routing dynamics dur-
ing damping. Section 5 presents simulation results. Section
6 proposes the use of RCN to facilitate damping. Section
7 discusses the implication of routing policies on damping,
and Section 8 concludes the paper.
2Route Flap Damping and Previous Work
Figure 1 shows a general network scenario that will be
used throughout our analysis and simulations in this paper.
A router in a customer network, the originAS, is connected
to a router in its provider network, the ispAS. When the link
[originAS,ispAS] comes up, the router in ispAS will an-
nounce to the rest of the network the route to originAS;
when the link goes down, the ispAS router will withdraw
Generally speaking, each BGP router peers with a num-
ber of neighboring routers and exchanges routing updates.
A router stores the routes received from each peer in the
corresponding RIB-IN table (Figure 2). For each destina-
tion prefix, the router picks the best route among all the
RIB-INs and stores this best route in the Local-RIB table.
Depending on the routing policy, the router may announce
all or partof its best routesto its peers. It stores theroutes to
be announced to each peer in the corresponding RIB-OUT
Damping associates a penalty value with each entry in
a RIB-IN. That is, there is a penalty value associated with
each peer and destination prefix pair. Whenever a new up-
date message is received, the corresponding RIB-IN entry
is updated and so is its penalty value. Different types of
updates are assigned different penalty increments. If the
penalty exceeds the cut-off threshold, the RIB-IN entry will
no longer be used in selecting the best route. Note that dur-
ingthis route suppression,new routingupdates forthe same
entry may continue to arrive, and if so the penalty value
will continueto increase accordingly. Because a suppressed
route does not enter Local-RIB, none of the new changes
will be propagated any further.
Withdrawal Penalty (PW)
Re-announcement Penalty (PA)
Attributes Change Penalty
Cut-off Threshold (Pcut)
Half Life (minute) (H)
Reuse Threshold (Preuse)
Max Hold-down Time (minute)
Table 1. Default Damping Parameters
When the penalty value is greater than zero, it decays
exponentially over time. More formally, if the penalty is
p(t0) at time t0and becomes p(t) at time t, then
p(t) = p(t0)e−λ(t−t0)
where λ is often configured by the half-life H = ln2/λ.
A suppressed route will be reused when the penalty drops
below the reuse threshold. This is often implemented by
setting a reuse timer based on the current penalty value and
reusing the route when the reuse timer expires.
Table 1 lists the default parameter settings from two ma-
jor router vendors, and Figure 3 shows an example of the
penalty value (with Cisco default parameters) changes over
time in response to a few route flaps.
ing in mid 1990s, and has been widely supported in com-
mercial routers. RFC 2439  documents its design ra-
tionale, algorithm, and implementation strategy. RFC 3221
 states that damping is widely deployed and helps stabi-
lize the routing infrastructure, but it is also well known that
different implementations use inconsistent parameters, and
damping is not universally deployed everywhere.
The intended effect of damping is to allow occasional
routing changes to propagate without delay, while suppress
ever, as early as in 1998, Panigl  observed that a single
route withdrawal followed by a re-announcementin Europe
triggeredroute suppression in North America. The cause of
this behavior was not explaineduntil 2002, when Mao et al.
 showed that path exploration was the reason.
Labovitz et al.
.For example, in Figure 1, as-
sume that X can reach originAS via three peers. When link
[originAS,ispAS] fails and ispAS sends a withdrawal to
the rest of the network, X will receive the withdrawal from
one of its peers first. Not knowing link [originAS,ispAS]
has failed, X will switch to another peer to reach originAS,
thus it “explores” alternate paths. Every time X changes
its route, it will send an update to Y . Only after receiving
withdrawals from all its peers, will X finally send its
own withdrawal to Y . This path exploration happens at
every router in the network with alternative paths, and can
amount to a large number of updates. Depending on the
timing of these updates, Y can receive multiple updates on
link [X,Y ] even though link [originAS,ispAS] only flaps
 is the first work to point out the interplay between
path exploration and damping. It shows that path explo-
ration can amplify one single flap into many updates which
falsely trigger suppression somewhere in the network. This
unexpected interplay highlights the complexity introduced
by the scale of a large network: one cannot easily predict
the overall network behavior even if he knows exactly how
each individual node works.
However, false suppression caused by path exploration
alone cannot fully explain the observed long convergence
delay. In , the simulation results show that convergence
delay can be as long as one hour. In section 5.2, we will
explain why it is unlikely to reach such a high penalty value
damping would behave in response to more than one flap.
In this paper, we will give a detailed analysis of the
damping process under one or more flaps, and show that it
is the reuse timer interaction among multiple routers that
stretches the convergence delay to be much longer than
what path exploration alone could do. We will also show
how to enhance the damping mechanism with RCN to pre-
vent the undesirable behavior due to path exploration and
reuse timer interaction.
3The Intended Behavior of BGP Damping
Before analyzing its actual behavior in a network, we
first quantify damping’s intended behavior. We are inter-
ested in how damping affects routing dynamics in response
to one or more route flaps. To quantify the effect, we use
two metrics: convergence time and message count. Conver-
gence time is defined as the time from when the originAS
stops flapping (i.e., sends its final route announcement) to
when the last update message is observed in the network.
The message count is the total number of updates observed
in the network starting from the first flap.
Convergence time can be calculated given the flapping
intervals and damping parameters. For occasional flaps of
link [originAS,ispAS], route suppression should not be
triggered,and the convergencetime is the normalBGP con-
vergence time, usually between seconds and a few minutes
. When link [originAS,ispAS] flaps persistently, the
excessive routing updates will increase the penalty value at
ispAS and cause ispAS to suppress its route to originAS.
After the flapping stops, ispAS will wait for the penalty
value p to drop below the reuse threshold Preuse before
re-announcing the route. The announcement will trigger a
BGP Tupevent (i.e., a previously unreachable destination
becomes reachable), which takes time tupfor the network
to converge. Let r denote the time it takes for the penalty
value to drop below the reuse threshold, then the total con-
vergence time should be:
t = r + tup? r =1
FromTable1,wecanseethatwithCisco defaultsetting,r is
at least 20 minutes and therefore r ? tup. The actual value
ofr dependsonthe penaltyvaluep, thereuse threshold,and
thehalf-life. Tocalculatep, let w(i) bethetime betweenthe
ithflap and the (i−1)thflap, f(i) be the penalty increment
caused by the ithflap, i = 1,2,...,k−1,k, and w(1) = 0.
Right after the kthflap, the penalty value p(k) is
p(k) = p(k − 1) ∗ e−λw(k)+ f(k)
Later in the paper, Figure 8 shows the calculation results
of damping’s intended convergence delay under a varying
number of route flaps.
A precise message count generally cannot be obtained
analytically, since it depends on the network topology and
timing of updates. Nevertheless, the general trend can be
predicted. As the number of flaps increases, the number of
updatesalsoincreases since eachnewflap triggerssomeup-
dates in the network. After a certain number of flaps, how-
ever, the message count is expected to be almost constant,
since new flaps are suppressed by ispAS and no update is
propagated beyond ispAS.
Damping reduces the number of updates by suppress-
ing routing updates but it also increases convergence time.
Our analysis suggests that ispAS can largely control the
trade-off by setting appropriate penalty increments, cut-off
threshold, and reuse threshold. The configuration can be
tuned so that a small number of flaps does not trigger any
damping delay, while a large number of flaps is suppressed,
keeping the overall updates injected into the network at a
[f(i) ∗ e−λPk
j=i+1w(j)] + f(k)
reasonable level. Therefore, the overall intended behavior
in a network relies only on how the unstable link flaps and
how the incident routers set their damping parameters, re-
gardless of the rest of the network.
4 Damping Behavior in Distributed Systems
The previous section describes damping’s intended be-
havior based on the rules applied to each individual router.
However, the overall behavior of a network cannot be di-
rectly derived by examining individual routers separately.
As we will show in this section, the network damping be-
havior is largely driven by previously unknown reuse timer
interactions among different routers.
4.1 Stages of Damping Behavior
Our simulation studies show that, when an unstable des-
tination exists and all the routers in a network perform BGP
damping, the whole network goes through different states
during damping. We will first give definitions to these
states, then explain them in more detail, and discuss two
types of reuse timer interactions.
• Charging: It starts with the first flap of the route to the
unstable destination. During the chargingperiod, rout-
ing updates are exchangedamongrouters and each up-
date increases (charges) the router’s damping penalty.
This charging ends when there is no update in transit
or waiting to be sent in the whole network.
• Suppression: After charging, if there is at least one
router whose best route is unavailable due to suppres-
sion, the network enters suppression state, which ends
when a reuse timer expires and triggers a new routing
• Releasing: This period follows the suppression period
and lasts until all the routing updates have been deliv-
• Converged: After releasing, the network enters con-
verged state, where every route in each router’s Local-
RIB is the best route from all its RIB-IN entries. Note
that some RIB-IN entries mightstill be suppressed,but
they are not the best route and thus their unavailability
makes no impact to Local-RIB.
Figure 4 illustrates the transitions between different
states.1Some routing flaps make the network move from
the converged state to charging state, during which updates
are propagatedin the networkand each update increases the
1In the real Internet, due to its large scale, different parts of the network
may be in different state and these four states may not be clearly separated.
penalty value at the receiving router, until eventually either
the flapping stops or the flapping routes are suppressed. 
showed that path exploration can amplify a single flap dur-
ing the charging period and falsely trigger route suppres-
sion. In the rest of this section, we will analyze other states
and show that reuse timer interaction plays a major role in
4.2 Secondary Charging Effect
After charging ends, there is no update in flight or
queued for transmission. However, some routes may be
suppressed by some routers. In other words, some routes in
are overthe threshold. This can occur in both the converged
state and the suppressed state. To understand the difference
between these two states, one must determine whether the
reuse timer will be silent or noisy when it expires.
Figure 5 shows an example of a silent reuse timer.
Router A has received two routes, RBand RC, from neigh-
bors B and C respectively. RB is the best path and is
currently installed in Local-RIB, while RC is suppressed
and cannot be considered as a candidate for use in Local-
RIB. When the reuse timer for RC expires, RC will be-
come available and A will re-run its path selection algo-
rithm. However in this case, RCis irrelevant and RBre-
mains as the best path. We say this reuse timer is silent
since its expiration will have no effect on Local-RIB and
will not trigger any update by A. The network is in a con-
verged state if there is no reuse timer at all or every reuse
timer is silent.
Figure 6 shows an exampleof a noisy reuse timer. Again
router A has received RBand RCfrom B and C respec-
tively. But in this case, RB is currently suppressed and
cannot be considered as a candidate for use in Local-RIB.
When its reuse timer expires, RBbecomes available and A
will select it as the new best path. A will update its Local-
RIB and RIB-OUT, and announces this change to its neigh-
bors. In turn this new message may cause A’s neighbors to
update their routes. The network is in the suppression state
if there is no pending update, and at least one router has a
noisy reuse timer waiting to expire.
When a noisy reuse timer expires, the network moves
from the suppression state to the releasing state, during
which messages triggered by noisy route reuse can charge
remainingreuse timers. Forexample,considernodesX and
Y in Figure 1, and assume Y has suppressed link (X,Y ).
If a noisy reuse timer expires at X, it will trigger an update
sent to Y . Although this update was not directly caused by
route flapping, Y will follow the damping rule and increase
its penalty value, thus Y ’s reuse timer is charged again. We
reusetimer the SecondaryCharging effect. Combinedwith
Figure 4. Four-state of a
damping process in a dis-
Figure 5. Silent Reuse
Figure 6. Noisy Reuse
0 1000 2000 3000 4000 5000 6000
path exploration,secondarychargingwill not only lengthen
existing reuse timers, but can also lead to new route sup-
pressions sometimes. This drives the network to a new sup-
pressionperiodeventhoughno new route flap has occurred.
The network can convergeonly when all noisy reuse timers
Figure 7 shows an example of simulated route penalty
over time after a single route flap. In this case, the router
computing the penalty is not adjacent to the flapping link;
more precisely it is 7 hops away from originAS. The
charging period happens within the first 100 seconds, dur-
ing which path exploration amplifies one flap into several
updates and triggers route suppression, as described in .
With path exploration alone, the network would converge
around 2000thsecond when the route is reused. However,
due to secondary charging, the penalty value is pushed up
over the cut-off threshold again. Before the route is even-
tually reused after the 5000thsecond, secondary charging
pushes the penalty up three more times. In this case, sec-
ondary charging accounts for more than 60% of the total
convergence delay! We will discuss details of the simula-
tions in Section 5.
For a single or a small number of route flaps, ispAS
does not suppress or delay any update. After a number
of flaps, however, route suppression will be triggered at is-
pAS, and further flaps will be blocked from entering the
network. Since the link (originAS,ispAS) is suppressed,
ispAS has no route to reach originAS. As a result, ispAS
sends a route withdrawalto all its peers, which is then prop-
agated throughout the network. Note that when a router
receives a withdrawal message, it removes the route and in-
creases the penalty value for that route. When a remote
router’s reuse timer expires, it will find no route to the orig-
inAS, thus cannot trigger any update. Any reuse timer that
expires before ispAS reuses link (originAS,ispAS) will
be silent. We call this effect the Muffling effect. The muf-
fling effect is removed after ispAS reuses its route to the
originAS and sends an announcement to the network.
4.4 Overall Damping Behavior
The above discussion shows that there are two types of
reuse timer interaction: secondary charging prolongs con-
vergence time, while muffling by ispAS’ reuse timer re-
expirations silent. These two types of timer interactions
compete with each other, and the net result depends on the
number of flaps sent by originAS.
Let RThbe ispAS’ reuse timer, and RTnetbe the last
noisy reuse timer in the rest of the network. Initially RThis
zeroas routesuppressionis nottriggeredat ispAS. But once
it is triggered, any further flaps from originAS will increase
RThonlyandhaveno effectonRTnetat all. As the number
of flaps increases, a critical point (Nh) is reached when
That is, when the number of flaps is greater than a certain
number Nh, RThwill outlast all noisy reuse timers in the
network, making the muffling effect dominant. When RTh
expires, it is the only reuse timer in the entire network, and
there will be no secondary chargingat all. The convergence
bringstheconvergencetime inlinewith theintendedbehav-
ior, as we described in Section 3. The overall results can be
summarized as follows:
• After a small number of route flaps, due to path explo-
ration and secondary charging, a network with damp-
ing can have longer convergence time than the in-
• When the number of flaps is greater than a certain
number, due to muffling effect, a system with damp-
ing follows the intended behavior.
Predicting the actual damping behavior in a network is
difficult. It depends on the degree of path exploration, tim-
ing of updates, and order of reuse timer expirations. There-