ArticlePDF Available

An Algorithm for Distributed computation of a Spanning Tree in an Extended LAN


A protocol and algorithm are given in which bridges in an extended Local Area Network of arbitrary topology compute, in a distributed fashion, an acyclic spanning subset of the network. The algorithm converges in time proportional to the diameter of the extended LAN, and requires a very small amount of memory per bridge, and communications bandwidth per LAN, independent of the total number of bridges or the total number of links in the network. Algorhyme I think that I shall never see A graph more lovely than a tree. A tree whose crucial property Is loop-free connectivity. A tree which must be sure to span So packets can reach every LAN. First the Root must be selected By ID it is elected. Least cost paths from Root are traced. In the tree these paths are placed. A mesh is made by folks like me Then bridges find a spanning tree.
An Algorithm for Distributed
Computation of a Spanning Tree
in an Extended LAN
Radia Perlman
Digital Equipment Corporation
1925 Andover St., Tewksbury MA 018.76
A protocol and alqorithm are qiven in
which bridges in -an extended Local Area
Network of arbitrary topology compute, in
a distributed fashion, an acyclic spanning
subset of the network.
The algorithm converges in time
proportional to the diameter of the
extended LAN, and requires a very small
amount of memory per bridge, and
communications bandwidth per LAN,
independent of the total number of bridges
or the total number of links in the
I think that 1 shall never see
A graph more lowly than a tree
tree whose crucial property
Is loop-free connectiuity.
A tree which must be sure to span
So packets can reach every LAN.
First the Root must be selected
By ID it is elected.
Least cost paths from Root are traced.
In the tree thesepaths are
mesh is made
folks LikP me
Then bridges finda spanntnc: trw
Permission to copy without fee all or part of this material is granted
provided that the copies are not made or distributed for direct
commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by
permission of the Association for Computing Machinery.
To copy
otherwise, or to republish, requires a fee and/or specific permission.
Local area networks are limited in
geography, traffic, and number of
stations. A single local area network
will often not meet the needs of an
organization for these reasons.
Conventional LAN interconnection
mechanisms, for
Sytekt21, IBM1[31, II%~~;CeDNA?~ reqL:i;
cooperation from the ' stations with
compatible protocols layered above the
protocol necessary to connect to a single
An approach that is transparent to
stations, and thus allows a station to
participate in an extended
with no
modification, is presented in [6],[7], and
181. In this approach, a bridge connected
to two or more links will listen
"promiscuously" to all packets transmitted
on each of its links, and forward packets
received on one of the links onto the
others. A bridge also learns of the
location of stations relative to itself,
so that it will not forward traffic for a
station onto a link unnecessarily.
This approach assumes that the topology is
a tree (loop-free). However, requiring a
topology to be loop-free means there are
no backup paths in the case of bridge or
LAN failures. Also, because the
technology allows network growth so
easily, it might be difficult to prevent
someone from adding a bridge and creating
a loop. A loop in the topology might
performance degradation in
the entire extended network due to
congestion caused by infinitely
circulating packets. It is undesirable to
have a network that can be brought down so
easily, merely by plugging a cable into
the wrong place.
8 1985A(:MO-89791-164-4/85/0009/0044$00.75
Thus we have designed an algorithm that
allows the extended network to consist of
an arbitrary topology. The algorithm is
run by the bridges, and computes a subset
of the topology that connects all LANs yet
is loop-free (a spanning tree). The
algorithm is self-configuring. The only a
priori information necessary in a bridge
is its own unique ID (MAC address), which
we are assuming can be attained in some
manner, for instance with a hardware ROM
containing the value.
1. station -- a node in the extended LAN
that does not forward packets. It is
connected to the *extended LAN solely
for the purpose of communicating with
other stations. It does not
participate in the spanning tree
2. bridge -- a node connected to two or
more LANS, for the purpose of
forwarding packets between the LANs.
3. link -- a connection from a bridge to
a single LAN.
4. extended LAN -- the collection of
LANs connected by bridges.
5. Root--the bridge chosen to be the
root of the tree to be formed. Note
that there is exactly one Root in the
network, chosen dynamically.
6. Designated Bridge--the bridge on a
LAN closest to the Root, with a
tie-breaker to enforce uniqueness.
Thus there is exactly one Designated
Bridge per LAN in the network, chosen
Goals of the Spanninq Tree Algorithm
. Allow interconnection of LANS
compatibly with LAN standards, ,so
that stations need not know whether
they are attached to a single LAN br
to a network of LANs.
. Allow redundant bridges and LANs in
an extended LAN, for instance so that
connectivity can be preserved after
bridge or LAN failures.
. Be self-configuring, so that each
bridge need only know its own ID and
a well-known generic group address.
Memory requirements of a
should bridge
not grow with the number of
LANs or bridges, so that a network
cannot be misconfigured.
The communications bandwidth consumed
by the algorithm on any particular
LAN should be a constant (and small)
amount, regardless of the total
number of bridges or LANs in the net.
The algorithm should stabilize as
quickly possible to
deterministi? spanning tree (nZ
size* complete connectivity) in any
network. Assuming no lost
messages, it should stabilize in some
small multiple of the round
delay across the network. trip
The algorithm should allow bridges to
keep caches of station membership (a
forwarding data base), to cut down on
No permanent loops should ever form.
If no messages are lost, no transient
loops should form (in any legal sized
network). The algorithm should
minimize the probability of transient
loops, given the possibility of lost
messages and the possibility of a
network larger than designed for.
Similarly, transient behavior should
not cycle a link
(turn it off,
back on again) in a legal sized
network if no messages are lost.
The algorithm should be
self-stabilizing, once
malfunctioning equipment is any
or disconnected. repaired
The algorithm should break loops even
if some of the bridges in the loop do
not implement the algorithm. (Of
course it cannot break a loop in
which none of the bridges implement
the algorithm.)
The algorithm should require only a
connectionless service (LLC type 1)
for any LANs on which it is run.
The algorithm should allow tuning of
the configuration by
parameters. This adjus;;ent of
tuning the
configuration may help performance,
but the algorithm should still work
acceptably with default values.
Link States
Each bridge has two or more links. For
each LAN to which a bridge is attached,
the bridge computes a state for that link.
In all states, the algorithm continues to
run. The only difference between these
states ' whether data packets
forwarded':o and from the LAN. are
FORWARDING -- Forward Data
BACKUP -- Do not forward data
. PRE FORWARDING -- Do not forward
data packets yet; however, unless
%KUP reverts the link' to
the state
become'FORWARDING. will shortly
PRE-BACKUP -- Continue forwarding
data packets: however, unless an
event reverts the link to
FORWARDING, the state will
shortly become BACKUP.
HELLO Messages
The Designated Bridqe on each LAN
periodically transmits a HELLO Message,
with period HELLO TIME, broadcast to the
well-known group sddress "all bridges".
The contents of the HELLO Message are:
1. Transmitting bridge ID
2. ID of bridge assumed to be Root
3. Length of best known path to Root
Age of the HELLO (time since
information from the Root has
propagated on this path)
5. Link identifier (of local
bridge) to the transmitting
6. MAX-AGE (a parameter, passed in
HELLOS so that the network will
agree on timer values)
If the Root or a Designated Bridge
the path to along
the Root fails, some time
threshold will be exceeded in which no
HELLO messages were received on that path.
This is detected by keeping track of the
age of the HELLO message. When the age
exceeds MAX-AGE, the information is
discarded, triggering bridges to recompute
a new Root or new path.
The AGE field is included in HELLO
messages so that a bridge other than the
Root can initiate sending a HELLO message,
without making the Root's last HELLO seem
more recent than it should be. When a
HELLO is initiated by the Root, the age
will be 0. As the information about the
HELLO is held in memory, the age is
increased. If a Designated Bridge (other
than the Root) initiates transmission of a
HELLO, the age field in the transmitted
HELLO is filled in to be the age of the
stored HELLO. When a HELLO is received,
the age is set to the received value
initially, and then increased as the
information is held.
The Algorithm
As the alqorithmis
primary/s&ondary status is-ignored. In
the section "Extensions" the modifications
necessary to provide such designation is
Electing the Root and Designated Bridges
The Root Ttheridae in the network with
the smallest ID. When a bridge receives a
HELLO Message, it compares the ID of the
Root in the HELLO Message with the
currently best known Root. If the Root in
the HELLO message has a lower ID than the
currently known Root, then information
about the current Root is discarded and
overwritten by information about the newly
discovered Root.
The Designated Bridge on a LAN is the
bridge with the shortest path to the Root,
on that
In case of ties, the bridge
with the smallest ID is the Designated
Bridge on that LAN. The Root is distance
0 from itself, and claims 0 as the length
of its path to the Root in its transmitted
HELLO Messages. Other bridges compute
their distance from the Root as 1 greater
than the minimum distance received from a
HELLO Message from the Designated Bridge
on any attached LAN.
A bridge B initially attempts to become
Root (and therefore Designated Bridge on
eactiof its LANs). When a Root with a
lower ID is discovered in a received HELLO
mess.age, B then computes its distance from
that Root, and attempts to become
Designated Bridge on each of its links.
If a HELLO Message is received on a link
from a bridge that is closer to the Root
than B, or equally close with a
transmitting bridge's ID lower than B'S
(as a tie breaker to ensure a unique
Designated Bridge per LAN), B defers to
the other bridge and stops sending HELLO
messages on that link.
The Root is Designated Bridge on each of
its LANs , since it is the only bridge
which is 0 from the Root. Any other
bridge B is some distance K from the Root.
B derived its distance K by receiving a
HELLO from the Designated Bridge on one of
its links containing HOPS of K-l. Thus
bridges other than the Root can not be
Designated Bridge on all of their LANs --
one of the LANs has to contain a bridge
closer to the Root.
Transmission of HELLO Messages
The Root periodin broadcasts a HELLO
on each- of its links, with period
HELLO TIME. Other bridges only broadcast
a HELLO on a link if they are Designated
Bridge on that LAN. Since there is
exactly one Designated Bridge on a LAN,
exactly one bridge on each LAN will
periodically broadcast a HELLO message.
The HELLO packet has an age in it,
indicating the age of the information
contained therein. Usually a Designated
Bridge will transmit a HELLO only upon
receipt of a HELLO on the link towards the
Root, with a younger age. However, a
Designated Bridge will also transmit a
HELLO on a particular link upon receipt of
a HELLO from a different bridge on that
LAN that should not be Designated Bridge.
If the age field exceeds
maximum, the
information is discarded. In this way
failure of the Root or Designated Bridge
will be detected, and the other bridges
will be triggered to compute a new Root or
a new path to the Root.
Link States and Databases
Each brldse, A, keens a Der-link database
for each- its.links, of-the state of that
link, the information from the latest
HELLO Message from the Designated Bridge
on that LAN, whether A should send a HELLO
on that link, and a timer, activated in
whose expiration will move the 1Tnk from
from PRE BACKUP to BACKUP state.
State Transitions
Upon startup, bridge B initializes each
link to be in the PRE FORWARDING state,
with the link database for each link
initialized to claim B as the Root, B as
the Designated Bridge on that link, B's
distance from the Root to be 0, and the
The eventsthat cause the algorithm to run
are receipt of a HELLO Message, and a
timer pulse. If a HELLO Message is
received that supercedes what is stored in
the link database, the informatiion in the
link database is overwritten. The timer
pulse triggers the Root to transmit a new
HELLO message (if HELLO TIME has
transpired since the last HELLO-
triggers each bridge to update the age of
the information in the link database for
each link (causing the information to be
discarded if MAX AGE is exceeded), and
triggers links-
PRE FORWARDING or PRpBACK:banziktes ';??
the-delay time has tra&.pired).
When an event causes information in the
link database to change (either receipt of
a different and superior HELLO message, or
expiration of the stored HELLO
information), bridge B recomputes its
distance from the Root, and examines each
link to determine if B should take over as
Designated Bridge. If all information
about the current Root has expired, then B
will attempt to become the new Root, and
therefore Designated Bridge on each of its
If B is not the Root, it selects a single
link which is closest to the Root. If
more than one link is equally closest to
the Root, B chooses one arbitrarily. B
attempts to make the state of that link
FORWARDING as follows:
. If the link is in FORWARDING or
PRE BACKUP, the state of the link
. - changed
IZRWARDING. immediately to.
. If the link is in PRE-FORWARDING,
nothing is done.
. If the link is in BACKUP, the
state is changed to
PRE FORWARDING, and the timer is
Likewise, for each link for which B is
Designated Bridge, B attempts to make the
state of that link FORWARDING.
For each other link (links other than the
one closest to the Root or for which B is
Designated Bridge), B attempts to make the
state BACKUP as follows:
. If the link is in BACKUP or
PRE FORWARDING, the state of the
1inE is changed immediately to
. If the
link is in PRE BACKUP,
nothing is done. -
. If
the link is in
state is
changed to PRE BACKUP,
and the timer is set to
- -
Handlinq Data Traffic
Bridaes maintain a forwardinu data base of
stations. An entry in the forwarding data
base consists of a station address,
together with the link which is in the
direction from which traffic from that
station was seen.
Assuming for the moment that all links are
state, the use of the
forwarding data base is as follows. The
forwarding data base is constructed based
on the source field of packets. Traffic
forwarded based on the destination
i?eld. If a packet is received on link L,
with destination .D and source S, then an
is made
in the forwarding data base
that S resides in the direction of link L.
If no entry for D exists, then the packet
forwarded back onto all links except
i& L. If an entry for D exists, with
link L, then the packet is not forwarded.
If an entry for D exists with some other
link Q, then the packet is forwarded onto
But links can be in other states than
FORWARDING. The rules are as follows:
. Data
traffic is
ignored when
received from links in the BACKUP
. Data traffic is examined for
for the purpose
of updating the forwarding data
base, from links in the other
. Data traffic is forwarded to and
from links in the FORWARDING and
PRE-BACKUP states.
Interaction with Simple Bridges
Simple bridges are ones that do not
participate -in this algorithm, but pass
HELLO messages through transparently.
Thus LANs connected with simple bridges
will look to this algorithm like a single
As long as there is more than one bridge
participating in the algorithm in every
loop, the algorithm will not even detect
the simple bridges. However, if there is
only one bridge in a loop that
participates in the algorithm, the effect
is that it will look to that bridge as
though it has multiple attachments to the
same LAN. This topology will be detected
by the algorithm because a bridge will
receive its own HELLO message back on a
different link.
The HELLO message should include a link
number, of local significance to the
Designated Bridge. If the Designated
Bridge B receives a HELLO from itself on
link Ll, that claims to have been
transmitted on L2, then B should place the
link with the larger link number into
PRE FORWARDING state, with timer
PRE-FORWARDING DELAY. Each time such a
received, the
timer should be
multiple links are involve3 in the loop,
then only one of them will remain in the
others will remain
in the PRE FORWARDING state, and the
will not exist for data traffic. loop
Timer Values
To discuss the appropriate timer settings
below, let us define the following:
In state PRE BACKUP, if newer information
arrives sug<esting. the link should be in
FORWARDING state, the link is switched to
state FORWARDING without delay.
MaxDelay == the maximum one-way
delay across the extended LAN
. MaxLostMsgs == the maximum number
of consecutive messages that
(within an acceptable
probability) can get lost
. MaxPropTime == MaxDelay +
(HELLO-TIME * MaxLostMsgs)
MAX AGE -- MAX AGE is a parameter that
should be greater than or equal to
MaxPropTime. If the
of a received
HELLO message exceeds MAX-AGE, the
information is discarded, and the bridge
which discarded the information vi11
attempt to take over as Designated Bridge,
based on a different path towards the
Root, if a HELLO Message that has not
timed out is stored for some other link.
If all information about the current Root
has timed out and been discarded, the
bridge will attempt to take over as the
new Root, until and unless it hears about
a better Root in a subsequent received
HELLO message.
The reason MAX-AGE must be at least
MaxPropTime is so that a Root that is
currently operational will not be
mistakenly timed out. The Root can
transmit one HELLO that can make it
through the extended LAN virtually
immediately. (Technically, the Root's
HELLO does not propagate through the net,
but is received only by the bridges
attached to the same LANs as the Root.
However, those neighbor bridges transmit
a result of receiving a HELLO,
so the effect is similar.) The next few
HELLOS can get lost (up to MaxLostMsgs).
Then the next HELLO might encounter
maximum delays, and take the MaxDelay time
to reach the periphery of the net.
If MAX AGE were shorter, then the bridges
on the periphery would have timed out the
Root before the second HELLO were
received, in worst case behavior.
PRE-BACKUP-DELAY should be greaser than or
equal to MAX-AGE + MaxPropTime.
The reason it needs to be that large is so
that sufficient time will have
elapsed so
that all bridges have received information
about the current topology before any
bridges stop forwarding.
The worst
w: en the Robot fails.
described above, it
take a MaxPropTime
from the time the final HELLO was
transmitted by the Root to the time
it is
received on the periphery of the net.
Bridges near the old Root (in the worst
case) will have timed out the Root
MaxPropTime before . bridges on the
periphery. MAX-AGE after receipt of the
last HELLO, a bridge will have timed out
the Root. If the bridge destined to
the new Root is maximally far from the old
Root, it will issue its first HELLO as the
Root MAX AGE MaxPropTime after
transmission of the+final HELLO by the old
Root. If the bridge destined to be the
new Root is maximally far from the old
Root, news of the new Root can take
MaxPropTime to reach the rest of the net.
Thus following the failure of the old
Root, it can take MAX AGE + 2 *
MaxPropTime for news of-the new Root to
reach all portions of the net.
Thus to be completely safe, bridges should
wait 2 * MaxPropTime after timing out the
Root to ensure they will be making
decisions based on the
be greater than or
equal to 2 * MaxPropTime.
For simplicity of parameter setting,
PRE FORWARD DELAY -- To prevent transient
loops, a bridge keeps
link in state
PRE-FORWARD DELAY before switching the
1inE from- state BACKUP to state
PRE BACKUP DELAY -- To prevent transient While in state PRE FORWARDING, if newer
partitions, a bridge keeps a link in state information arrives suggesting the link
PRE BACKUP for a time PRE BACKUP DELAY should be in BACKUP state, the link is
befzre switching the link- from -state switched to state BACKUP without delay.
Since loops are far more serious than
temporary partitions, it ' highly
desirable that after a topologylzhange all
bridges that should not be forwarding in
the new topology will have stopped
forwarding before any bridges start
News of the new topology will have spread
to all bridges within a window of
MaxPropTime. Thus MaxPropTime after
PRE BACKUP DELAY, all links that will be
turning off in the new topology will have
turned off. At this point it is safe to
start turning links . Thus
PRE FORWARDING-DELAY shouldonbe greater
than or equal to PRE BACKUP-DELAY +
For simplicity of parameter setting,
Setting Parameters
A network management
discussed in [Sl, facility, such as
allows parameters to be
set at a node.
To allow tuning of a particular
configuration, parameters HELLO TIME and
MAX AGE are network management -settable
(the other timers are derived from
HELLO TIME can differ from bridge to
bridge, provided that the Root's
HELLO TIME not be so large as to violate
the Tnequality that MAX-AGE be greater
than or equal to MaxPropTime. Thus
HELLO TIME can be set independently at
each &idge, with an appropriate default
so that performance would be adequate
without the necessity of setting this
However, the network must agree on
(from whicTh" ensure consistency, MAX-AGE
the other parameters are
derived) is passed in the HELLO messages
and the value passed by the Root is used.
Exactly one Root in the net
Suppose therearetw-oz, A and B, with
A having the numerically smaller ID. If
there is physical connectivity between A
and B, there will be some LAN with a path
to both A and B. On that LAN, as soon as
a bridge issues a Hello with information
about A, the other bridges on that LAN
will overwrite their information about B.
This will propagate eventually to B, which
will stop attempting to be Root.
Exactly one Designated Bridge per LAN
Each bridge has a unique ID. Of the set
of bridges on a LAN that have the smallest
hop count from the Root, only one of them
can have the smallest ID. Thus there is a
unique Designated Bridge per LAN.
A bridge will attempt to become Designated
Bridge on a LAN unless a different bridge
which has a "better" HELLO message (closer
to the Root, or equally close, with lower
ID) transmits.
Thus if any bridges are attached to a LAN,
one of them will become Designated Bridge.
All LANs are included in the formed graph
Ifthere isohvsical connectivitv between
a LAN and 'tke bridge that is Root, then
news of the Root will have spread to that
LAN, and the Designated Bridge will chosen
because of its path to the Root. A bridge
which is Designated Bridge on a LAN will
be in the formed graph, and will include
that LAN in the formed graph.
Since every LAN has a Designated Bridge,
every LAN (to which there is physical
connectivity from the Roof) is included in
the formed graph.
Amount of Memory
Theamount of memory required for finding
an acyclic spanning subset of the topology
is just the amount needed for the per link
To keep the IDS of the Root and the
Designated Bridge requires theoretically
the log of the number of nodes, but in
practice the fixed value of 48 bits for
each is sufficient.
To keep the distance from the Root
requires the log of the network diameter,
but in practice a fixed value of 16 bits
or even 8 bits would suffice.
The other pieces of information
fixed length. are of
Bridge B must keep this information for
each link attached to B, not each link
the network.
Communications Bandwidth
Maintenance of the algorithm when topology
has not recently changed requires one
HELLO per HELLO TIME to be transmitted on
each LAN by that LAN's Designated Bridge.
In the absence of topology changes, only
the Designated Bridge on a LAN will
transmit HELLOS, and the only event to
cause that bridge to transmit a HELLO is
receipt of a HELLO
on its upstream
LAN (or for the Root, the periodic
HELLO-TIME trigger).
Thus regardless of the total size of the
network, only one control message will be
sent each HELLO TIME on each LAN, unless
the topology changes.
Following a topology change, a few extra
HELLO messages might be transmitted, for
a bridge that has just come
up, or by all bridges on a LAN following
information having timed out.
Additionally, the Designated Bridge issues
a HELLO in response to a "worse" HELLO, to
more quickly synchronize bridges that have
just come up. However, there is no need
to do this more frequently than
HELLO TIME, so a hold down is instituted
to prevent a bridge from transmitting more
than one HELLO per HELLO TIME. Thus the
maximum amount of controi traffic on a LAN
even during topological changes is limited
by the total number of bridges on that
LAN, not by the total number of bridges in
the network.
Note that the overhead of the algorithm is
independent of the characteristics of the
stations' data traffic (e.g. factors such
as frequent connection establishment).
The formed graph is a Tree
The DroDerties of a tree are:
- 1.-
It has a unique root.
2. Each node other than the root has
a unique predecessor closer to
the root.
Analysis of the algorithm is simplified by
assuming that the formed graph is
bipartite, i.e. that there are two types
of nodes:
1. Bridges, and
2. LANs.
The ancestor node of a Bridge is the LAN
in the direction of the Root. The
ancestor node of a LAN is the Designated
Bridge. Thus all nodes have a unique
ancestor. Likewise, there is a unique
Root. which is the bridae with the lowest
Deterministic Behavior
Given a par 'titular physical topology,
behavior of the algorithm is completely
predictable, The routes chosen do not
depend on the order in which messages were
received, or on the order in which bridges
were brought up, or on the history of
previous topologies.'
There are many advantages to deterministic
1. reproducibility -- It is easier
to maintain a network if
conditions‘ are reproducible. If
routes depended on chance
occurrences such as the order in
which bridges were brought up, or
the order in which
received, then it is difficult to
diagnose problems, since behavior
is not reproducible.
2. configuratiblity -- It is easier
to configure a network if
behavior of a particular topology
can be calculated.
3. predictability -- Users will be
more satisfied with a particular
level of performance, if the
conditions always produce the
same level of performance.
However, if
is noticeably better, due to
chance occurrences, users will
not be satisfied with the usual
level of performance. Similarly
if sometimes performance is
noticeably worse, due to chance
occurrences under what appears to
the user as identical conditions,
the network will not be viewed as
It might be desirable to influence the
topology that is computed by the spanning
tree algorithm.
be desirable One facility that might
is the ability to designate
bridges as
others as "primary" bridges, and
"secondary" bridges. The
algorithm should compute a topology in
which no secondary bridges appear, unless
no topology exists
primary bridges. consisting solely of
Also, it might be
desirable to minimize through-traffic on
some LANs, for instance those that might
be lower speed or more congested with
local traffic. Also, it might be
desirable to configure the network so that
impact. changes will have minimal
All of these facilities are trivial
extensions to the algorithm. These
facilities can be provided as follows.
The bridge ID should actually consist of a
priority field, settable via network
management, concatenated as the most
significant byte onto the unique 48 bit ID
assigned by the hardware.
If the topology is mostly a set of
tree-like structures off a backbone, then
the priority should be set to be the
number of levels away from the backbone
LAN that the bridge is. In this way, a
bridge will be chosen as Root that is
closest to the actual backbone, which is
preferable, since it minimizes
through-traffic on local segments, and
minimizes actual topology changes that
would occur due to changing Roots. (If
the topology doesn't change, then the
endnode caches are still valid, and the
network is not partitioned temporarily
while nodes switch over to other routes.
If the backup Root is very near the old
Root, then the topology will not change
significantly when the old Root dies, and
the net will experience no disruptions due
to Root switchover.)
If the only switch is "primary" or
"secondary", then a "1" will be the high
order bit of the ID if "secondary", and a
"0" if the bridge is designated "primary".
HOPS should be a double length field, with
a high order part and a low order part.
a bridge is designated "secondary",
instead of adding 1 to HOPS to get its
distance from the Root (incrementing the
low order part of HOPS), it will increment
the high order part of HOPS.
To minimize traffic on some LANs, links
can be assigned a cost. Then, instead of
adding 1 to the distance to the Root given
by the Designated Bridge on that LAN to
obtain this bridge's distance from the
Root, it would add the cost of the link.
This will tend to place links with higher
costs towards the leaves of the tree,
which will minimize through traffic.
One Way Connectivity
Connectivity between bridges can be
one-way if one transmitter-is broken, or
the other's receiver is broken, or some
channel hardware works in only one
direction. One-way connectivity might be
common hardware failure mode and it is
desirable that the algorithm pr&ent loops
in the face of one-way links.
This algorithm can prevent one-way links
from causing loops. A Designated Bridge
will detect a problem if another bridge on
the same LAN persists in sending HELLO
messages, because that would indicate that
the other bridge is not receiving the
Designated Bridge's HELLOS. Note that if
the other node disagrees about the
identity of the Root, there is no loop.
If the other node agrees about the Root,
there is a loop.
If there is no loop, then no harm is done
by continuing to forward packets to and
from the LAN. Thus if the other node
disagrees about the identity of the Root,
no further action is taken (other than
perhaps logging the condition).
If the other node does agree about the
Root, the Designated Bridge must stop
forwarding packets to and from the LAN.
The algorithm presented here maintains a
spanning acyclic subset of a general mesh
topology. It requires a very small,
bounded amount of memory per bridge,
independent of the total number of LANs or
the total number of bridges. It requires
a very small, bounded amount of
communications bandwidth on each LAN,
the total number of LANs or
the total number of bridges. It tolerates
lost messages and efficiently utilizes the
broadcast nature of multiaccess LANs. It
requires no effort on the part of
stations. The computed topology converges
in at most twice the round trip delay
across the extended network. The computed
topology is deterministic. Bridges
implementing this algorithm can coexist
with simpler bridges that do not implement
this algorithm, and loops will still be
broken, provided that no loop exists
composed solely of bridges that do not
implement the algorithm.
Bows, Shoch, Taft, and Metcalfe,
"PUP: An Internetwork
Architecture," IEEE Transactions
on Communications, April 1980.
C. Sunshine, D. Kaufman, G.
Ennis, and K. Biba,
"Interconnection of Broadband
Local Area Networks", Eighth Data
Communications Symposium,
Massachusetts, 1983.
Norman Strole,
Communications Network Based on
Interconnected Token-Access
Rings: A Tutorial", IBM J. Res.
Develop, Vol 27, No 5, Sept.,
Kian-Bon Sy, Daniel A. Pitt,
Donnan, "An
Architecture for Interconnecting
LAN Segments", IBM Corporation,
Technical submission to IEEE 802
LAN standards committee, July 13,
Radia Perlman, "Incorporation of
Multiaccess Links Into a Routing
Protocol", Eighth Data
Communications Symposium,
Massachusetts, 1983.
Bill Hawe, Alan Kirby, Anthony
Lauck, "An Architecture for
Transparently Interconnecting
IEEE 802 Local Area Networks",
Paper presented at the IEEE 802
meeting in San Diego, CA on
October 1984
Bill Hawe,
Kirby, Bob
Stewart, "Transparent
Interconnection of Local Networks
with Bridges", Journal of
Telecommunication Networks, June
George Varghese and Bill Hawe,
"Extended Local Area Network
Management Principles," Paper
presented at the IEEE 802 meeting
in San Diego, CA on October 1984
Yogen Dalal and Robert Printis,
"40-bit Absolute Internet and
Ethernet Host Numbers", Seventh
Data Communications Symposium,
... Le standard préconise l'utilisation du protocole de reconfiguration Spanning Tree Protocol (STP) pour répondre au premier point. Formalisé par la norme IEEE 802.1D suite à une proposition de Perlman [165], STP établit automatiquement une topologie réseau commutée (i.e., sans redondance) pour faciliter la commutation des données et d'éviter le phénomène de « tempête de broadcast ». Si un changement impromptu s'opère dans la topologie, (rupture d'un lien), le protocole STP recalcule une topologie adaptée. ...
Ce travail de thèse s'intéresse à l'évaluation de performances des systèmes industriels de type smart grids, dont le rôle est d'assurer la transmission d'électricité depuis la/les source(s) de production jusqu'aux consommateurs. Considérés comme des systèmes distribués à forte criticité, il en résulte une obligation de respect de contraintes temps réel. Le standard IEC 61850, déployé pour l'automatisation et la protection des postes électriques composant ces smart grids, propose une quantification de ces contraintes sous forme de latences minimales à ne pas excéder. L'IEC 61850 ne préconisant aucune approche spécifique pour garantir ces contraintes temporelles, des solutions doivent alors être trouvées pour y répondre. Dans le cadre de cette thèse, nous proposons en premier lieu un nouvel outil d'aide à la décision fournissant des résultats obtenus par simulation, basés sur le logiciel OMNeT++. Ces modèles intègrent à la fois des outils pour Ethernet classique, la norme Time Sensitive Networking (TSN) et l'IEC 61850. Une seconde contribution est la modélisation analytique des délais de pire cas, basée sur l'agrégation de flux. Cette nouvelle approche permet de simplifier l'analyse du délai pire de cas par une succession d'analyses locales reposant sur des opérations peu coûteuses en temps de calcul, tout en minimisant le pessimisme des bornes de délais. Cette analyse prend en considération des architectures reposant sur Ethernet classique et TSN. Enfin, nous étudions l'apport possible, à notre problématique, de la Multi-Modélisation et de la co-simulation, reconnue comme solution pour l'étude de systèmes complexes (dont les smart grids). Nous contribuons ainsi à l'amélioration des capacités de l'intergiciel de co-simulation MECSYCO, en permettant à ce dernier la possibilité de co-simuler des systèmes smart grids intégrant trois expertises métiers : électrique, contrôle-commande et communication numérique.
... Another method is spanning tree protocol (STP). This protocol is used because the presence of multiple paths in Ethernet networks ensures that, even in a failure in the communication links between the nodes, the network will still work [24]. As is clear from the STP protocol performance, the protocol provided is not able to use the multi-paths provided in data center networks and will waste the resources. ...
Full-text available
The inappropriate distribution of traffic in high workload areas leads to congestion, packet loss, and poor service quality provided by data center networks (DCNs). In this paper, we have proposed a new method for traffic engineering (TE) as a modular approach. Based on this algorithm, depending on the type of traffic, when enough resources are not available for routing, the best path for that flow would be chosen. In the proposed algorithm, the less loaded paths for conduction of current are selected with regard to the present conditions as soon as a current is generated between two hosts, their position is identified in the DCN, and the paths between the two are obtained. We have simulated our algorithm by Mininet emulator. The results demonstrated that the SEMTE method performed an average of 30% better in energy consumption and 70% better in increasing maximum link utilization (MLU) compared to the conventional equal cost multi-path (ECMP) algorithm; however, there was some overhead on the system in terms of time of reading information from OpenFlow switches.
... (Distributed protocols for computing a spanning tree are abundant in the literaturee.g. [19], [20]and also in practice.) More specifically, we assume that the network has a distinguished node r and that each node w 6= r has chosen one neighbor p(w) (the "parent" of w) such that the edges (p(w),w) form an arborescence rooted at r. Let d(w) be the degree of w, and let 0,1,...,d(w) − 1 be a numbering of the neighbors of w such that 0 is assigned to p(w) and 1,...,d(w)−1 sre assigned to the children of w in arbitrary order. ...
Full-text available
We propose a scalable and reliable point-topoint routing algorithm for ad hoc wireless networks and sensornets. Our algorithm assigns to each node of the network a virtual coordinate in the hyperbolic plane, and performs greedy geographic routing with respect to these virtual coordinates. Unlike other proposed greedy routing algorithms based on virtual coordinates, our embedding guarantees that the greedy algorithm is always successful in finding a route to the destination, if such a route exists. We describe a distributed algorithm for computing each node's virtual coordinates in the hyperbolic plane, and for greedily routing packets to a destination point in the hyperbolic plane. (This destination may be the address of another node of the network, or it may be an address associated to a piece of content in a Distributed Hash Table. In the latter case we prove that the greedy routing strategy makes a consistent choice of the node responsible for the address, irrespective of the source address of the request.) We evaluate the resulting algorithm in terms of both path stretch and node congestion.
... Link weight changes are not only an important operation in the context of traffic engineering and shortest path routing, but link weights also define other fundamental network structures, such as spanning trees [41]. We are in the realm of dynamic graph algorithms, an active research area in theoretical computer science [25]. ...
Full-text available
While operating communication networks adaptively may improve utilization and performance, frequent adjustments also introduce an algorithmic challenge: the re-optimization of traffic engineering solutions is time-consuming and may limit the granularity at which a network can be adjusted. This paper is motivated by question whether the reactivity of a network can be improved by re-optimizing solutions dynamically rather than from scratch, especially if inputs such as link weights do not change significantly. This paper explores to what extent dynamic algorithms can be used to speed up fundamental tasks in network operations. We specifically investigate optimizations related to traffic engineering (namely shortest paths and maximum flow computations), but also consider spanning tree and matching applications. While prior work on dynamic graph algorithms focuses on link insertions and deletions, we are interested in the practical problem of link weight changes. We revisit existing upper bounds in the weight-dynamic model, and present several novel lower bounds on the amortized runtime for recomputing solutions. In general, we find that the potential performance gains depend on the application, and there are also strict limitations on what can be achieved, even if link weights change only slightly.
... To visualize such equity maps, we use a force-tree algorithm that captures an equity map whole. Force-directed layout algorithms produce graph drawings using social 'gravity' as an additional force in force-directed layouts, together with a scaling technique (Perlman, 1985;Tamassia, 2013). Figure 2 is an example of the resulting graph. ...
Full-text available
International business and public policy research have examined the techniques that multinational enterprises (MNEs) use to shift revenues to subsidiaries in offshore financial centres (OFCs) in order to minimize tax liability and arbitrage for their advantage. While study of such tax arbitrage strategies has looked to geographical locations and legal dimensions to better understand these strategies, it has ignored the structural and organizational relationship between MNEs and their subsidiaries. We define two distinct types of OFC-based corporate entities based on their location among and apparent control over other MNE affiliates: ‘stand-alone’ OFCs at the end of a chain of MNE subsidiaries; and ‘in-betweener’ OFCs with equity control over further entities and hence apparent flexibility to redirect profits to other MNE subsidiaries further down the chain. We hypothesize that when MNEs have in-betweener OFCs controlling a substantial share of overall MNE profits, this indicates greater MNE interest in aggressive tax planning (ATP). We then evaluate empirical support for our claims based on an ‘equity mapping’ approach identifying stand-alone and in-betweener OFCs in 100 of the largest MNEs operating globally. This study demonstrates that a key factor determining tax arbitrage is not the amount of value registered on OFC subsidiaries’ balance sheets, but rather the portion of the group’s operating revenues and net income controlled by OFC subsidiaries. National taxing authorities could benefit from tracking in-betweener OFC locations and behaviour to counter ATP strategies, decrease sovereign arbitrage, and increase MNE tax revenue.
We synthetically provide in this section a brief overview about Data Center optimization techniques. In the first section, we first briefly introduce an overview of layer 2, layer 3, and up layers routing solutions. In the second section, we present useful works on virtual network embedding, and discuss state of the art energy efficiency consolidation, followed by an overview on Traffic Engineering (TE) techniques from link‐state TE to introducing the transmission control protocol (TCP) fairness models.
Conference Paper
Consider a distributed task where the communication network is fixed but the local inputs given to the nodes of the distributed system may change over time. In this work, we explore the following question: if some of the local inputs change, can an existing solution be updated efficiently, in a dynamic and distributed manner? To address this question, we define the batch dynamic CONGEST model in which we are given a bandwidthlimited communication network and a dynamic edge labelling defines the problem input. The task is to maintain a solution to a graph problem on the labeled graph under batch changes. We investigate, when a batch of 𝛼 edge label changes arrive, – how much time as a function of 𝛼 we need to update an existing solution, and – how much information the nodes have to keep in local memory between batches in order to update the solution quickly. Our work lays the foundations for the theory of input-dynamic distributed network algorithms. We give a general picture of the complexity landscape in this model, design both universal algorithms and algorithms for concrete problems, and present a general framework for lower bounds. In particular, we derive non-trivial upper bounds for two selected, contrasting problems: maintaining a minimum spanning tree and detecting cliques.
Conference Paper
Full-text available
We introduce FatPaths: a simple, generic, and robust routing architecture that enables state-of-the-art low-diameter topologies such as Slim Fly to achieve unprecedented performance. FatPaths targets Ethernet stacks in both HPC supercomputers as well as cloud data centers and clusters. FatPaths exposes and exploits the rich ("fat") diversity of both minimal and non-minimal paths for high-performance multi-pathing. Moreover, FatPaths features a redesigned "purified" transport layer that removes virtually all TCP performance issues (e.g., the slow start), and uses flowlet switching, a technique used to prevent packet reordering in TCP networks, to enable very simple and effective load balancing. Our design enables recent low-diameter topologies to outperform powerful Clos designs, achieving 15% higher net throughput at 2× lower latency for comparable cost. FatPaths will significantly accelerate Ethernet clusters that form more than 50% of the Top500 list and it may become a standard routing scheme for modern topologies.
A class of devices known as bridges can be used to provide a protocol-transparent interconnection of similar or dissimilar Local Area Networks (LANs). The motivation for building such devices is briefly described followed by a discussion of their desirable characteristics. The authors describe the architecture, operating principles, and services provided by a bridge which utilizes a flat address space and is self-configuring. This is followed by a simple resource model of the bridge. The performance of individual LANs is contrasted with the performance of a hybrid network composed of dissimilar LANs connected with bridges.
Interconnection of multiple broadband local area networks to form an integrated packet transport system presents several challenges. To take full advantage of broadband systems, assignment of nodes to channels must be dynamic, leading to the use of a flat address space. Combined with the desire to avoid reliance on a central server or complex routing in packet forwarders, this addressing scheme leads to adoption of a controlled flooding technique to “discover” the best path to a destination node. This discovery procedure sets up a path through internetwork forwarders for use by subsequent packets to the same destination. This paper describes the design and implementation of such a technique in Sytek's LocalNet(TM) systems along with several refinements which increase performance and keep the worst case load for route discovery below a few percent of network capacity.
Pup is the name of an internet packet format (PARC Universal Packet), a hierarchy of protocols, and a style of internetwork communication. The fundamental abstraction is an end-to-end media-in dependent internetwork datagram. Higher levels of functionality are achieved by end-to-end protocols that are strictly a matter of agreement among the communicating end processes. This report explores important design issues, sets forth principles that have guided the Pup design, discusses the present implementation in moderate detail, and summarizes experience with an operational internetwork. This work serves as the basis for a functioning internetwork system that provides service to about 1000 computers, on 25 networks of 5 different types, using 20 internetwork gateways.
Local area networks are expected to provide the communications base for interconnecting computer equipment and terminals over the next decade. The primary objective of a local area network (LAN) is to provide high-speed data transfer among a group of nodes consisting of data-processing terminals, controllers, or computers within the confines of a building or campus environment. The network should be easily accessible, extremely reliable, and extendible in both function and physical size. The rapid advances in computing and communications technology over the last two decades have led to several different transmission schemes and media types that could be used in these networks. The star/ring wiring topology with token-access control has emerged as a technology that can meet all of these objectives. The requirements of small networks with just a few nodes, as well as those of very large networks with thousands of nodes, can be achieved through this one architecture. This paper is a tutorial of the fundamental aspects of the architecture, physical components, and operation of a token-ring LAN. Particular emphasis is placed on the fault detection and isolation capabilities that are possible, as well as the aspects that allow for network expansion and growth. The role of the LAN relative to IBM's Systems Network Architecture (SNA) is also discussed.
An Architecture for Transparently Interconnecting IEEE 802 Local Area Networks" Paper presented at the IEEE 802
  • Bill Hawe
  • Alan Kirby
  • Anthony Lauck
Extended Local Area Network Management Principles " Paper presented at the IEEE 802 meeting in San Diego CA on October 1984 George Varghese and Bill Hawe
  • George Varghese
  • Bill Hawe
Boggs Shoch Taft and Metcalfe "PUP: An Internetwork Architecture
  • Shoch Boggs
  • Metcalfe Taft