A scalable architecture for end-to-end QoS provisioning
Spiridon Bakiras*, Victor O.K. Li
Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong, China
Received 23 September 2003; revised 18 March 2004; accepted 13 April 2004
Available online 4 May 2004
The Differentiated Services (DiffServ) architecture has been proposed by the Internet Engineering Task Force as a scalable solution for
providing end-to-end Quality of Service (QoS) guarantees over the Internet. While the scalability of the data plane emerges from the
definition of only a small number of different service classes, the issue of a scalable control plane is still an open research problem. The initial
proposal was to use a centralized agent, called Bandwidth Broker, to manage the resources within each DiffServ domain and make local
admission control decisions. In this article, we propose an alternative decentralized approach, which increases significantly the scalability of
both the data and control planes. We discuss in detail all the different aspects of the architecture, and indicate how to provide end-to-end QoS
support for both unicast and multicast flows. Furthermore, we introduce a simple traffic engineering mechanism, which enables the more
efficient utilization of the network resources.
q 2004 Elsevier B.V. All rights reserved.
Keywords: Admission control; Differentiated services; Quality of service; Resource management; Traffic engineering
In the past few years, the dramatic increase in the
capacity of the Internet core, and the development of
powerful compression techniques, have allowed the
deployment of new applications such as Internet telephony,
video-conferencing, streaming audio/video, etc. These
applications are called real-time, since they require the
periodic and timely delivery of the content from the source
to the destination. Clearly, the traditional best-effort service
that is provided in the current Internet cannot offer
an acceptable level of service quality to this type of
applications. To address this problem, the Internet Engi-
neering Task Force (IETF) has proposed the Differentiated
Services (DiffServ) architecture  as a scalable solution for
providing end-to-end Quality of Service (QoS) guarantees
over the Internet. The scalability issue is of outmost
importance, since, in the future, the number of flows that
will require some QoS guarantees is expected to be very
large. Consequently, a core router should be able to
accommodate thousands of QoS-sensitive flows at any
The basic idea of the DiffServ architecture is that only
edge routers should manage traffic on a per flow basis. Core
routers should not keep any kind of per flow state, and
should process traffic on a much coarser granularity. At the
data plane this goal is achieved by specifying different Per
Hop Behaviors (PHBs), where packets belonging to the
same PHB form a Behavior Aggregate (BA) and receive
identical service at the core routers. Specifically, the edge
routers will be equipped with flow classifiers, policers, and
markers that will properly mark the incoming packets by
setting a number of bits on the DiffServ Codepoint (DSCP)
 field of the IP packet header. The DSCP value will
indicate the corresponding PHB, and the core routers will
forward the packets based on their DSCP value (by utilizing
several scheduling and buffer management techniques).
The IETF has currently specified two different PHBs.
The Expedited Forwarding (EF) PHB  offers the
equivalent of a leased line (i.e. low delay, loss, and jitter)
between a source and a destination. This is accomplished
by giving EF traffic strict priority over the traditional best-
effort traffic inside the DiffServ domain. However, each
flow has to specify in advance the required bandwidth so
that the appropriate resources may be reserved inside the
network. In addition, the maximum burst size that is allowed
is equal to two Maximum Transmission Units (MTUs).
Computer Communications 27 (2004) 1330–1340
0140-3664/$ - see front matter q 2004 Elsevier B.V. All rights reserved.
* Corresponding author. Tel.: þ852-2857-8487; fax: þ852-2559-8738.
E-mail addresses: email@example.com (S. Bakiras); firstname.lastname@example.org
The edge routers will police each flow, and the non-
conforming packets will either be dropped or shaped. The
Assured Forwarding (AF) PHB group  does not offer
hard QoS guarantees, but instead defines four different AF
classes with three different levels of drop precedence within
each class. Each AF class is assigned a certain amount of
bandwidth at each node, and when the amount of traffic
exceeds this bandwidth, packets will be dropped according
to their drop precedence value.
While the scalability of the data plane emerges from the
definition of only a small number of PHBs, the issue of a
scalable control plane is still an open research problem. The
initial proposal was to use a centralized agent, called
Bandwidth Broker (BB) , to manage the resources within
each DiffServ domain and make local admission control
decisions. The centralized approach removes the burden of
admission control from the core routers, but there might be
some scalability considerations if the BB has to process
thousands of requests per second. Moreover, this approach
has certain disadvantages that are inherent to any centra-
† The links around the BB will become very congested
when the traffic load from the signaling messages is high.
† The BB must maintain per flow information about every
flow that is currently active inside its domain.
† The BB is a single point of failure (i.e. undesirable in
In this article, we propose an alternative decentralized
architecture, where the local admission decisions are made
independently at the edge routers of each domain. The BB in
each domain is only responsible for periodically updating
the allocation of the resources inside the domain, according
to some measurements of the traffic load at the edge routers.
We discuss in detail all the aspects of the proposed
architecture (i.e. intra- and inter-domain routing, admission
control, packet forwarding, etc.), and indicate how to
provide end-to-end QoS support for both unicast and
multicast flows. Furthermore, we introduce a simple traffic
engineering mechanism, which enables the more efficient
utilization of the network resources.
The remainder of this article is organized as follows.
In Section 2 some related work on DiffServ resource
management is presented. In Section 3 we give the details of
the proposed architecture, and also discuss various
implementation issues. In Section 4 the results of the
simulation experiments are presented, while Section 5
concludes our work.
2. Related work
The standardization of the DiffServ architecture by the
IETF triggered the initiation of several projects, which aim
to provide DiffServ-based QoS guarantees over the Internet.
The largest of these projects is the Internet2 project, which
involves over 200 universities, corporations, and other
organizations worldwide. The main objective of the
Internet2 QBone initiative  is to build an experimental
testbed for providing end-to-end QoS guarantees in a
scalable manner. Their approach on resource management
follows the initial proposal of a centralized BB, which is
responsible for managing the resources within a DiffServ
domain, and performing intra-domain admission control.
For end-to-end resource reservations, inter-BB signaling is
required between the BBs of adjacent domains.
One direction towards improving the scalability of the
resource management is based on aggregated resource
reservations between DiffServ domains. The BB is still the
centralized agent responsible for resource reservation, but
the scalability is improved by reserving resources for
aggregate traffic between different domains. In Ref.  a
two-tier model is introduced, where each domain is assumed
to have long-term bilateral agreements with each of its
neighbors, specifying the amount of traffic that will be
exchanged between them. Whenever there is an increase in
the traffic between two domains, the BBs will re-negotiate
and make new agreements. In Ref.  a Clearing House
architecture is proposed, where multiple basic domains are
clustered to form a logical domain. In this way, a
hierarchical tree is created, where the BB of the logical
domain is responsible for resource reservation across the
basic domains. The BBs at the basic domains forward only
aggregation of inter-domain requests to the BB of the
logical domain, thus enhancing the scalability of this
Alternatively, an approach based on the Multiprotocol
Label Switching (MPLS)  architecture has also been
considered in Refs. [10,11]. In these two architectures,
reservations for aggregate traffic are made between pairs of
edge routers on specific Label Switched Paths (LSPs) inside
the domain. All the QoS-sensitive flows will then follow the
appropriate LSPs, in order to receive the requested QoS.
The work in Ref.  is introduced as part of the Traffic
Engineering for Quality of Service in the Internet at Large
Scale (TEQUILA) project.
3. An architecture for end-to-end QoS provisioning
In this section we introduce an architecture for DiffServ-
based networks, which enhances the scalability of both the
data and control planes. The goal is to push most of the
functionality to the edge of the network, and maintain a
simple core, which only performs a standard packet
forwarding function. Our assumption is that the Internet
consists of several independently administered DiffServ
domains that are interconnected in order to provide
global connectivity. One typical example is shown in
are interconnected through three different domains.
S. Bakiras, V.O.K. Li / Computer Communications 27 (2004) 1330–1340 1331
Each DiffServ domain consists of a BB (not shown in the
figure), and the core and edge routers. The BB will
periodically exchange control messages with the edge
routers for the purpose of resource management.
3.1. Intra-domain routing
Routing is the process of correctly identifying the next
hop at each node (router) so that a packet will be able to
reach its final destination. In this work we focus on intra-
domain routing, and assume that all the DiffServ domains
use a standard inter-domain routing protocol, such as the
Border Gateway Protocol (BGP) , to exchange reach-
ability information with their neighbor domains. All the
edge routers in each domain will participate in this
Routing with QoS guarantees requires the reservation
of enough resources along the path from the source to the
destination. Therefore, unlike best-effort traffic routing, a
path has to be established in advance between the source
and destination nodes, and all the packets should follow
the same path. In our architecture, we adopt a source
routing scheme, where the ingress router of the domain
will identify the complete path towards an egress router.
Specifically, during the initialization of the network, the
BB will pre-compute k different paths to carry the traffic
between each pair of edge routers, and it will distribute
this information to all the routers in its domain. In the
simulation experiments presented in Section 4, we used
the well known k-shortest path algorithm  for the
path selection, where the hop count was used as the
The source routing approach was adopted for several
reasons: (i) it facilitates fast packet forwarding which will
be further discussed in Section 3.2, (ii) it follows the general
principles of the DiffServ architecture, by completely
isolating the core routers from the admission control
procedure, and (iii) it provides the means for implementing
traffic engineering mechanisms inside the domain.
Finally, we assume that a standard link state routing
protocol, such as OSPF, operates inside each domain, in
order for the edge routers to advertise their routes to other
networks, and also to exchange information with all the core
routers regarding link or node failures. In other words, our
intra-domain routing protocol will operate on top of any link
state protocol, and the ingress routers of the domain will use
the existing link state database to identify the corresponding
egress router where a packet should be forwarded to.
Fig. 1. The Differentiated Services architecture.
S. Bakiras, V.O.K. Li / Computer Communications 27 (2004) 1330–13401332
3.2. Packet forwarding with IPv6
The slowest process in the forwarding path of an IP
router is the multi-field classification and routing procedure.
When a packet is received at a router, the next hop is
decided by looking into several fields on the IP header (e.g.
IP addresses, TCP/UDP port numbers, etc.), and then
finding the appropriate entry at the local routing table. This
operation will be even more complicated for QoS-sensitive
flows, since their packets should follow exactly the same
path. Clearly, this procedure will become the bottleneck in a
The IPv6 packet header contains a new 20-bit field,
which does not exist in the earlier IPv4, called flow label.
Using this field in the context of a source routing
architecture, enables us to increase considerably the speed
of the forwarding path. As we mentioned earlier, for each
pair of edge routers inside a domain, there will be k pre-
computed paths connecting them. We may then assign one
flow label value to each one of these paths, and construct
new (much smaller) routing tables inside the core of the
domain, based only on flow labels. We should emphasize
that the flow label in our approach is not related to the
traditional definition of a flow (i.e. a connection between a
certain source–destination pair). Instead, we use the flow
label field in the IP header, in order to identify a unique path
within an AS domain. As a result, any path within a domain
will be assigned a specific flow label value, and all the
packets (from any packet flow) that have to follow this path
will be marked with that exact flow label value. Therefore,
the unicast routing table within a domain will be static, and
its maximum size will be equal to the total number of paths
that we choose to identify. Furthermore, during the resource
reservation procedure, the ingress router will select one of
the k paths for each new flow, and then mark all the packets
that belong to this flow with the corresponding flow label
An alternative method would be to use the routing header
option in IPv6, since it provides exactly the same
functionality (i.e. source routing). The reason why we did
not choose this option is due to the large overhead that it
introduces. Each IPv6 address entry is 16 bytes, and for a
multi-hop path this approach could increase substantially
the protocol overhead (especially for small packet sizes). In
addition, usingthe routing header option does not help in the
case of multicast communication. In the following para-
graphs we will indicate how the flow label approach may be
exploited to facilitate multicast routing.
Notice, that our forwarding scheme is similar to the
MPLS architecture, but it does not require inter-router
signaling as in the case of the label distribution protocol
(LDP) in MPLS networks. The BB will be the centralized
agent responsible for distributing the routing tables to the
domain routers. In particular, after the BB has selected the k
paths for each pair of edge routers, it will send the
appropriate routing table entries to all the routers in
the domain. These entries will be in the form of
kflow_id;link_idl; where link_id indicates the outgoing
interface where the packet should be forwarded to. Having a
centralized agent distribute all the routing information does
not affect the scalability of the architecture, since this
information will be distributed only once (at the initializa-
tion of the network) and will be updated only if a new link or
router is introduced in the topology.
One example of how to assign flow labels to the different
paths inside a domain is depicted in Fig. 2. With this
assignment, for instance, we are able to identify a maximum
of 16 different paths between any pair of 256 edge routers.
To further illustrate the concept of flow-based routing,
consider the edge routers B:E1 and B:E3 in domain B
(Fig. 1). Assume that the ID of B:E1 is 0, and the ID of B:E3
is 2. There are exactly two paths connecting these routers
(through B:C2 ! B:C3 or B:C2 ! B:C1 ! B:C3), and
suppose we assign them the IDs 0 and 1, respectively.
These two paths may then be represented by the flow
label values ‘0.2.0’ and ‘0.2.1’. As a result, if B:C2
receives a packet with a flow label value 0.2.1, it will
forward it towards B:C1 and not towards the shortest path
(i.e. through B:C3).
For multicast communication, packet forwarding is
slightly more complicated, but we may still utilize the
above flow-based routing mechanism to maintain a scalable
forwarding path. Let us consider Fig. 1 again, and assume
that receivers R1 and R2 wish to join a multicast group
created by the sender S in domain A: Suppose node R1 joins
the multicast group first. Then, B:E1 will send a control
message to the core routers B:C2 and B:C3 in the form of
kINSERT;MG;0:2:0l; or else “if you see a packet destina-
tion address MG; use flow label 0.2.0 to forward it”. This
message will create an entry in a local multicast routing
table inside the core routers. When R2 joins the group, the
message kINSERT;MG;0:1:0l will be sent to B:C2; forcing
it to forward the packets towards both B:C3 and B:E2: If a
node decides to leave the multicast group, similar DELETE
messages will be sent, if necessary.
3.3. End-to-end admission control
Resource reservation is an essential part of any network
that provides QoS guarantees, and an appropriate signaling
protocol is necessary in order to perform this function. In
our architecture, the receiver nodes will initiate the
signaling procedure for the resource reservation, while the
intermediate ingress routers will be responsible for admit-
ting or rejecting the reservation requests. In the following
paragraphs we illustrate how admission control may be
Fig. 2. Flow label assignment.
S. Bakiras, V.O.K. Li / Computer Communications 27 (2004) 1330–13401333
performed across multiple DiffServ domains for the case of
unicast and multicast flows.
Let us consider unicast flows first, and assume that R1
(Fig. 1) wishes to receive some QoS-sensitive data from the
sender S at domain A: Then, the end-to-end admission
control will be performed as follows (with an RSVP-like
(1) R1 will send a PATH message towards S; indicating the
required amount of bandwidth b:
(2) The PATH message will reach B:E1 which will be the
ingress router for that particular flow. Therefore, it will
check whether there are enough resources to carry this
flow towards B:E3: The details of the admission control
decision will be discussed in Section 3.4, where we
introduce the traffic engineering mechanism.
(3) If there are not any sufficient resources, the request will
be rejected. Otherwise one of the k available paths
towards B:E3 will be selected, and the PATH message
will be forwarded towards S:
(4) The PATH message will reach A:E1 which will also
perform the admission control as in steps (2) and (3).
(5) If this request can be accommodated, A:E1 will
forward the PATH message to the source node S:
(6) If S wishes to establish this connection, it will send the
RESV message back to R1:
(7) While the RESV message travels back to the destina-
tion node, all the intermediate edge routers will
configure their traffic shapers, policers, and markers
to account for the new connection.
The signaling procedure for multicast flows is essentially
identical to the one described above, with only a few minor
additions. We assume that a multicast group is identified by
the pair ðS;MGÞ; i.e. the IP address of both the source and
the multicast group. For multicast groups where any
receiver can also be a sender, we assume that only one
node is allowed to create the multicast group, and this node
will be the designated source. This is a very reasonable
assumption, which greatly simplifies the routing of the
signaling messages when new nodes join a multicast group.
Going back to our example, let us consider the case where
node R1 joins the multicast group initiated by node S: The
signaling message flow will then be as follows.
(1) R1 will send a PATH (or join) message towards S;
indicating the required amount of bandwidth b:
(2) If R1 may also be a sender (i.e. in multipoint-to-
multipoint communication), B:E3 will reserve the
appropriate resources towards B:E1; and forward the
PATH message towards S:
(3) Steps (2)–(7) from the unicast case will be performed
in the same manner. In the case of multipoint-to-
multipoint communication, both the ingress and egress
routers will perform admission control, since the
multicast data may flow in both directions.
(4) A:E1; A:E2; B:E1 and B:E3 will send the appropriate
control messages to the core routers of their domains,
in order to set the corresponding entries in the multicast
routing tables (as described in Section 3.2).
If R2 decides to join the multicast group, the same steps
as above will take place. However, as soon as B:E1 receives
the PATH message, it will not forward it to S; since it
already has a reservation entry for that particular multicast
address. Instead, it will send the RESV message back to R2;
given that there are enough resources to carry the multicast
traffic towards B:E2:
Notice, that in the multipoint-to-multipoint scenario we
have assumed that only one member is allowed to send
packets at any given time. If this is not the case (i.e. if all
members are allowed to transmit simultaneously), then the
PATH messages will have to travel always back to the
initiator of the group (node S in our case), in order to reserve
additional bandwidth for each new member. Also, we have
made the implicit assumption that the ingress routers
corresponding to the initiator of the group will act as the
core nodes of the multicast tree inside their own domain. In
our example, since node S initiated the multicast group,
A:E1 will be the core of the tree in domain A; B:E1 will be
the core in domain B; and C:E1will be the core in domainC:
In other words, if R1 wishes to send a packet to the multicast
group, this packet will be forwarded towards B:E1; and each
router along the path will forward it to all the corresponding
interfaces (based on the local multicast routing table),
except the one that the packet was received from. Clearly,
this is not the optimal way to distribute the multicast traffic,
but it simplifies greatly the multicast routing protocol, since
it avoids the construction of a separate multicast tree for
each member of the group.
paradigm, which is the basis of the IntServ architecture.
IntServ, by (i) eliminating the core routers from the resource
reservation procedure, and (ii) decentralizing the resource
reservation procedure. Moreover, the resource reservation
phase is completed by the exchange of only two messages.
The first message ðPATHÞ is sent from the receiver towards
the sender, and is followed by the reply message (RESV or
reject)which is sent back to the receiver.These messagesare
only intercepted by the corresponding edge routers (i.e.
ingress and egress) of each domain, and are not processed at
all by any of the core routers. The advantages of this
decentralized architecture may be summarized as follows.
† The signaling overhead for connection set up is spread
across multiple links, avoiding the congestion of the links
around the centralized BB.
† The BB does not need to maintain per flow information
about every flow that is currently active inside its domain.
of the domain.
S. Bakiras, V.O.K. Li / Computer Communications 27 (2004) 1330–1340 1334