Page 1
Performance Analysis of Greedy Shapers in RealTime Systems
Ernesto Wandeler
Computer Engineering and Networks Laboratory
Swiss Federal Institute of Technology (ETH)
8092 Z¨ urich, Switzerland
{wandeler,maxiagui,thiele}@tik.ee.ethz.ch
Alexander MaxiaguineLothar Thiele
Abstract— Traffic shaping is a wellknown technique in the
area of networking and is proven to reduce global buffer require
ments and endtoend delays in networked systems. Due to these
properties, shapers also play an increasingly important role in the
design of multiprocessor embedded systems that exhibit a consid
erable amount of onchip traffic. Despite their growing importance
in this area, no methods exist to analyze shapers in distributed em
bedded systems, and to incorporate them into a systemlevel per
formance analysis. Hence it is until now not possible to determine
the effect of shapers to endtoend delay guarantees or buffer re
quirements in these systems. In this work, we present a method to
analyze greedy shapers, and we embed this analysis method into
a wellestablished modular performance analysis framework. The
presented approach enables systemlevel performance analysis of
complete systems with greedy shapers, and we prove its applica
bility by analyzing two case study systems.
1. Introduction
In the area of broadband networking, traffic shaping is
a wellknown and wellstudied technique to regulate con
nections and to avoid buffer overflow in network nodes, see
e.g. [3] or [6]. A traffic shaper in a network node buffers the
data packets of an incoming traffic stream and delays them
such that the output stream conformsto a given traffic spec
ification. A shaper may ensure for example that the output
stream has limited burstiness, or that packets on the out
put stream have a specified minimum interarrival time. A
greedy shaper is a special instance of a traffic shaper, that
not only ensures an output stream stream that conforms to
a given traffic specification, but that also guarantees that no
packets get delayed any longer than necessary.
By limiting the burstiness of the output stream of a net
work node, shapers typically drastically reduce the buffer
requirements on subsequent network nodes. And if some
sort of priority scheduling is used on a network node to
share bandwidth among several incoming streams, then a
limitedburstiness of highprioritystreams leads to betterre
sponsiveness of lowerpriority streams.
In addition, under some circumstances, shaping comes
for free from a performance point of view. To be more spe
cific, if the output stream of a node is shaped with a greedy
shaper to conform again to the input traffic specification,
and if the buffer of the shaper accesses the same memory
as the input buffer of the node, then the endtoend delay
of the stream and the total buffer requirements on the net
work node are not affected by adding the shaper.
Due to these favorable properties, shapers also play an
increasingly important role in the design of realtime em
bedded systems. Particularly, since modern embedded sys
tems are often implemented as multiprocessor systems
with a considerable amount of onchip traffic.
In this domain,we may identifytwo main applicationar
eas fortraffic shaping.First, shapers maybe used internally,
to reshape internal traffic streams to reduce global buffer
requirements and endtoend delays, and secondly, shapers
may be added at the boundaries of a system, to ensure con
formantinput streams and to thereby preventinternal buffer
overflows caused by malicious input. Figure 1 shows two
simple example systems from these two application areas.
Shared
BUS
CNI1
External InputShaping
CPU1
S1’
TS1
σ1
S1’’S1
CNI2
CPU2
S2’
TS2
σ2
S2’’S2
CNI3
CNI4
S1’’’
S2’’’
CPU1
TS1
S1’’S1
σ1
TS2
S2’’S2
σ2
TS3
S3’’S3
σ3
S1’
S2’
S3’
MPSoC
Internal ReShaping
Figure 1. Two systems with greedy shapers.
The analysis of traffic shapers in communication net
works is wellknown [4]. But to our best knowledge, none
of the existing frameworks for modular system level per
Page 2
formance analysis of realtime embedded system considers
traffic shapers at this time, see e.g. [5], [7] or [1, 9].
Only [7] introduces a restricted kind of traffic shaping
through socalled event adaption functions (EAF’s). But
EAF’s play a crucialrole in the fundamentalability of [7] to
analyzesystems, and a designerhas thereforea very limited
freedom to place or leave away, or to parameterize EAF’s.
In this work, we will extend the framework presented
in [1, 9], to enable system level performance analysis of
realtime embedded systems with traffic shapers. It has to
be noted here, that in [4], Le Boudec and Thiran challenge
the ability of the methods in [9] to analyze traffic shapers,
and in [8], Schiøler et al. even claim that it is not possible
to analyze traffic shapers within the framework of [1, 9].
Contributions of this work:
• We present a method to analyze greedy shapers in the
area of multiprocessor embedded systems.
• We embed this new analysis method into the well es
tablished modular performance analysis framework of
[1, 9]. This enables systemlevel performance analysis
of complete systems with greedy shapers, i.e. amongst
others, we may analyze endtoend delay guarantees
and global buffer requirements of such systems.
• We provethe applicabilityof thepresentedmethodsby
analyzing two small case study systems with greedy
shapers.
2. Modular Performance Analysis
In the domain of communication networks, powerful
abstractions have been developed to model flow of data
through a network. In particular Network Calculus [4] pro
vides means to deterministically reason about timing prop
erties of data flows in queuing networks. RealTime Cal
culus [9] extends the basic concepts of Network Calculus
to the domain of realtime embedded systems, and in [1] a
unifying approach to Modular Performance Analysis with
RealTime Calculus has been proposed. It is based on a
general event and resource model, allows for hierarchical
scheduling and arbitration, and can take computation and
communication resources into account. Following, we in
troduce some concepts of Network and RealTime Calcu
lus.
2.1. A General Event Stream Model
A trace of an event stream can convenientlybe described
by means of a cumulative function R(t), defined as the
number of events seen on the event stream in the time inter
val [0,t]. While any R always describes one concrete trace
of an event stream, a tuple α(Δ) = [αu(Δ),αl(Δ)] of up
per and lower arrival curves [2] provides an abstract event
stream model, representing all possible traces of an event
stream.
αu(Δ) providesan upperboundon the numberof events
seen on the event stream in any time interval of length Δ,
and analogously,αl(Δ) denotes a lower bound on the num
ber of events in a time interval Δ. R, αuand αlare related
to each other as follows:
αl(t − s) ≤ R(t) − R(s) ≤ αu(t − s)
with αl(0) = αu(0) = 0.
Arrival curves substantially generalize traditional event
models such as sporadic, periodic, periodic with jitter, or
any other arrival pattern with deterministic timing behav
ior. For example an event stream with a period p, a jitter j,
and a minimum interarrival distance d, can be modeled by
the following arrival curves:
—Δ − j
∀s < t
(1)
αl(Δ) =
p
?
;αu(Δ) = min
j‰Δ + j
p
ı
,
‰Δ
d
ıff
(2)
2.2. A General Resource Model
Analogously to the cumulative function R(t), the con
crete availability of a computation or communication re
source can be described by a cumulative function C(t), de
fined as the numberof available resources, e.g. processor or
bus cycles, in the time interval [0,t]. To provide an abstract
resource model, we define a tuple β(Δ) = [βu(Δ),βl(Δ)]
ofupper,βu, andlower,βl, service curves.Then,C, βuand
βlare related to each other as follows:
βl(t − s) ≤ C(t) − C(s) ≤ βu(t − s)
with βl(0) = βu(0) = 0.
∀s < t
(3)
2.3. From Components to Abstract Components
In a realtime system, an incoming event stream is typi
cally processed on a sequence of HW/SW components,that
we will interpret as tasks that are executed on possibly dif
ferent hardware resources.
Abstract FP ComponentConcrete FP Component
0
2
4
6
8
Δ
FP
0
2
4
6
8
Δ
0
2
4
6
8
Δ
0
2
4
6
8
Δ
β'(Δ)
β(Δ)
α'(Δ)
α(Δ)
0
2
4
6
8
t
0
2
4
6
8
t
0
2
4
6
8
t
0
2
4
6
8
t
C’(t)
C(t)
R’(t)R(t)
T
Figure 2. A component and its abstraction.
Figure 2 shows on the left side such a component. An
event stream, described by R(t), enters the component and
Page 3
is processed using a hardware resource whose availabil
ity is modeled by C(t). After being processed, the events
are emitted on the component’s output, resulting in an out
going event stream, modeled by R?(t), and the remain
ing resources that were not consumed are made available
to other components and are described by an outgoing re
source availability trace C?(t).
The relations between R(t), C(t), R?(t) and C?(t) de
pendon the component’sprocessingsemantics, and the out
goingeventstream R?(t) will typicallynot equalthe incom
ing event stream R(t), as it may, for example, exhibit more
or less jitter. Analogously, C?(t) will differ from C(t).
For modular performance analysis with realtime calcu
lus, we model such a HW/SW component as an abstract
componentas shownon the rightside of Fig. 2. Here,an ab
stract event stream α(Δ) enters the abstract componentand
is processed using an abstract hardwareresourceβ(Δ). The
output is then again an abstract event stream α?(Δ), and the
remainingresourcesareexpressedagainas anabstracthard
ware resource β?(Δ).
Internally, an abstract component is specified by a set of
relations, that relate the incoming arrival and service curves
to the outgoing arrival and service curves:
α?= fα(α,β)
β?= fβ(α,β)
Again, these relations depend on the processing semantics
of the modeled component, and must be determined such
thatα?(Δ) andcorrectlymodelstheeventstream with event
trace R?(t) and that β?(Δ) correctly models the resource
availability C?(t).
As an example, consider a component modeling a task
that greedily uses the resources offered to it. This compo
nent can be described by the relations fαas follows1[1]:
α
α
?u
FP
?l
FP
=
=
min{(αu⊗ βu) ? βl,βu}
min{(αl? βu) ⊗ βl,βl}
(4)
(5)
Such a component is very common in the area of realtime
embedded systems, and we will refer to it as a Fixed Prior
ity (FP) component.
2.4. Abstract Performance Models
To analyze the performance of a concrete system, we
need to capture its essential properties in an abstract perfor
mance model, that consists of a set of interconnected ab
stract components. For this, first all concrete system com
ponents are modeled using their abstract representation (as
described in the preceding section). And then, the arrival
curve inputs and outputs of these abstract components are
interconnectedto reflect the flow event streams throughthe
system.
1See the Appendix for a definition of ⊗ and ?
When several components of the concrete system are
allocated to the same hardware resource, they must share
this resource according to a scheduling policy. In the per
formance model, the scheduling policy on a resource can
be expressed by the way the abstract resources β are dis
tributed among the different abstract components.
For example,considerpreemptivefixed priorityschedul
ing: Abstract component A with the highest priority may
use all available resources on a hardware, whereas abstract
component B with the second highest priority only gets
the resources that were not consumed by A. This is mod
eled by using the service curves β?
to B. For some other scheduling policies, such as GPS or
TDMA, resources must be distributed differently, while for
some scheduling policies, such as EDF or nonpreemptive
scheduling, different abstract components, with tailored in
ternal relations, must be used.
Athat exit A as input
2.5. Analysis
In the performance model of a system, various perfor
mance measures can be computed analytically.
For instance, for an FP component the maximum delay
dmaxexperienced by an event is bounded by [4, 1]:
?inf{τ ≥ 0 : αu(λ) ≤ βl(λ + τ)}?
def
=
Del(αu,βl)
dmax
≤
sup
λ≥0
(6)
and when processed by a sequence of components, the to
tal endtoend delay experienced by an event is bounded by
[4]:
dmax≤ Del(αu,βl
Similarly, the maximum buffer space bmaxrequired to
buffer an event stream in front of such an FP component is
bounded by:
1⊗ βl
2⊗ ... ⊗ βl
n)
(7)
bmax≤ sup
λ≥0{αu(λ) − βl(λ)}
def
= Buf(αu,βl)
(8)
and when the buffers of consecutive components access the
same shared memory, the total buffer space is bounded by:
bmax≤ Buf(αu,βl
1⊗ βl
2⊗ ... ⊗ βl
n)
(9)
3. Performance Analysis of Greedy Shapers
To enable analysis of systems with greedy shapers in the
Modular Performance Analysis framework, we need to in
troduce a new abstract component that models a greedy
shaper,as depictedin Fig. 3.We will first explainthebehav
ior and the implementation of concrete greedy shapers, and
will then introduce the internal relations for abstract greedy
shapers.
Page 4
Abstract Greedy Shaper Concrete Greedy Shaper
0
2
4
6
8
Δ
GS
0
2
4
6
8
Δ
0
2
4
6
8
Δ
σ
α'(Δ)
α(Δ)
0
2
4
6
8
Δ
0
2
4
6
8
t
0
2
4
6
8
t
σ
R’(t)R(t)
σ
Figure 3. A greedy shaper and its abstraction.
3.1. Concrete Greedy Shapers
A greedy shaper with a shaping curve σ delays events of
an input event stream, so that the output event stream has σ
as an upper arrival curve, and it outputs all events as soon
as possible.
Consider a greedy shaper with shaping curve σ, which
is subadditive and with σ(0) = 0. Assume that the shaper
buffer is empty at time 0, and that it is large enough so that
there is no event loss. In [4], Le Boudec and Thiran proved
that for an input event trace R to such a greedy shaper, the
output event trace R?is given by:
R?= R ⊗ σ
(10)
In practice, a greedy shaper with a shaping curve
σ(Δ) = min∀i{bi+ riΔ} with σ(0) = 0 can be im
plemented using a cascade of leaky buckets. Every leaky
bucket has a bucket size bi and a leaking rate ri, and
the leaky buckets are arranged with decreasing leak
ing rate within the cascade. Initially all buckets are empty.
When an event arrives at a leaky bucket stage, a token is
generated. If there is enough space in the bucket, the to
ken is put into the bucket and the event is sent to the next
stage immediately. Otherwise, the event is buffered un
til the bucket emptied enough to put the token in.
3.2. Abstract Greedy Shapers
Theorem 1 (Abstract Greedy Shapers) Assume an event
streamthatcanbe modeledas anabstracteventstream with
arrival curves [αu,αl] serves as input to a greedy shaper
with a subadditive shaping curve σ with σ(0) = 0. Then,
the output of the greedy shaper is an event stream that can
be modeled as an abstract event stream with arrival curves
αu?
GS
αl?
GS
=
=
αu⊗ σ
αl⊗ (σ?σ)
(11)
(12)
Further, the maximum delay and the maximum backlog at
the greedy shaper are bounded by
dmax,GS
bmax,GS
=
=
Del(αu,σ)
Buf(αu,σ)
(13)
(14)
Proof: To prove (11) we use the fact that R ? R is the
minimum upper arrival curve of a cumulative function R,
and we use the properties
(f ? g) ? h
(f ⊗ g) ? g
=
≤
f ? (g ⊗ h)
f ⊗ (g ? g)
that were proven in [4]. We can then compute
R?? R?
=
=
≤
≤
=
(R ⊗ σ) ? (R ⊗ σ)
((R ⊗ σ) ? R) ? σ = ((σ ⊗ R) ? R) ? σ
(σ ⊗ (R ? R)) ? σ
(σ ⊗ αu) ? σ = (αu⊗ σ) ? σ
αu⊗ σ
To prove (12) we use the fact that R?R is the maximum
lower arrival curveof a cumulativefunctionR. We can then
compute
R??R?= (R ⊗ σ)?(R ⊗ σ)
= inf
λ≥0
When we separately evaluate this formula for 0 ≤ u ≤ v,
for v ≤ u ≤ v + μ and for v + μ ≤ u ≤ λ + μ, we get
sup
0≤v≤λ
inf
v≤u≤v+μ{R(u) − R(v) + σ(μ + λ − u) − σ(λ − v)}
(R ⊗ σ)?(R ⊗ σ)
≥
=
min{(R?R) ⊗ (σ?σ), R?R,σ?σ}
(R?R) ⊗ (σ?σ)
The complete proofs for (13) and (14) are omitted here
due to space restrictions, but they were deducted starting
from the following relations:
d(t)=
=
inf{τ ≥ 0 : R(t) ≤ R?(t + τ)}
inf{τ ≥ 0 : 0 ≤
R(t) − R?(t) = R(t) − (σ ⊗ R)(t)
sup
0≤u≤t{R(t) − R(u) − σ(t − u)}
inf
0≤u≤t+τ{σ(t + τ − u) + R(u) − R(t)}}
b(t)=
=
?
Relations (11) and (12) can now be used as internal rela
tions of an abstract greedy shaper, and (13) and (14) can be
used to analyzedelay guaranteesandbufferrequirementsof
greedy shapers in a performance model.
4. Applications & Case Studies
In this section, we analyze the two system designs de
picted in Fig. 1. The analysis results will clearly reveal the
positive influence of greedy shapers to a system’s perfor
mance and buffer requirements when applied internally, or
to a system’s robustness when applied externally. We de
liberately chose two small system designs that clearly fo
cus on the influence of the greedy shapers, and that do not
dilute the analysis results by any possibly hard recogniz
able influences of other system properties. Modular Perfor
mance Analysis with RealTime Calculus was however al
ready used several times to analyze bigger and more com
plex system designs, and the abstract greedy shapers can
seamlessly be integrated into bigger performance models.
Page 5
4.1. Internal Shaping for System Improvement
Consider a distributed realtime system with 2 CPU’s
that communicate via a shared bus, as depicted on the left
side in Fig. 1. CPU1and CPU2both process an incom
ing event stream S1and S2, and send the resulting event
streams S??
The shared bus implements a fixedpriority protocol, where
sending the events from CPU1has priority over sending
the events from CPU2. Events that are ready to be sent get
buffered in the communication network interfaces CNI1
and CNI2that connect CPU1and CPU2with the shared
bus.
In this system, S??
example S??
odic event stream. This may happen for example, if besides
TS1, other tasks are executed on CPU1 using a TDMA
scheduling policy. Or also if FP scheduling is used and TS1
has a low priority. In both cases, the processor may not be
available to TS1during some time interval in which all ar
riving events of S1get buffered, and it may be fully avail
able to TS1 during a later time interval in which all the
buffered events will be processed and emitted, leading to
a burst on S??
Now suppose that event stream S??
burst of events arrive on S??
pied until all buffered events of S??
riod, event stream S??
experience a delay caused by the burstiness of S??
over,also the buffer demandin CNI2will increase with in
creasing burstiness of S??
In this system, it may be an interesting option to place
a greedy shaper at the output of CPU1, that shapes event
stream S?
S??
S1to the delay of S?
To investigate the effect of adding greedy shapers to the
system with internal reshapingin Fig. 1, we analyze it with
Modular Performance Analysis, using the abstract greedy
shaper component that we introduced in Section 3.
We assume that S1 and S2 are both strictly periodic
with a period p = 1ms. In both CPU’s, the CPU may not
be available to process the tasks TSi for up to 5ms. Af
ter this period of at most 5ms, the processor is fully avail
able and can process 5 events per ms (βu
5Δ[e/ms], βl
The bus can send 2.5 events per ms (βu
2.5Δ[e/ms]).
With this specification, we analyze four different sys
tem designs. First, we analyze the system without greedy
shapers, secondly, we place a greedy shaper only at the out
put of CPU1to shape S?
only at the output of CPU2to shape S?
1and S??
2via the shared bus to other components.
1may differ considerably from S1. For
1may be bursty even when S1is a strictly peri
1.
1is bursty. Whenevera
1, the shared bus gets fully occu
1are sent. During this pe
2will receive no service, and S??
2will
1. More
1.
1. This greedy shaper will limit the burstiness of
1, and will therefore reduce the influence of CPU1and
2and the buffer requirements of CNI2.
CPU1= βu
CPU2=
CPU1= βl
CPU2= max{0,Δ − 5}[e/ms]).
BUS= βl
BUS=
1, then, we place a greedy shaper
2, and finally we will
add two greedy shapers to shape both S?
use the upper arrival curves αu
σ1and σ2, respectively, and we assume that the buffers of
the greedy shapers and the corresponding processing tasks
access the same memory. On the left side of Fig. 4, the ab
stract performancemodel of the fourth system design is de
picted.
1as well as S?
S2as shaping curves
2. We
S1and αu
External InputShaping
FP
βCPU1
α1’GSFP
α1’’
α1’’’
α1
βBUS
σ1
FP
βCPU2
α2’GS FP
α2’’
α2’’’
α2
σ2
α1
GS
σ2
FP
α1’
α1’’
βCPU
σ1
α2
GS
σ3
FP
α2’
α2’’
α3
GSFP
α3’
α3’’
Internal ReShaping
Figure 4. Performance models.
Using the four performance models, we analyzed the
maximum required buffer spaces of the different buffers, as
well as the endtoend delays of both event streams S1and
S2. The results are shown in Table 1.
shapersbufferdelay
CPU1 CPU2 CNI1 CNI2
66
66

66

66

Tot
25
19
S1
5.4
5
S2
9
5.8
none4
1
9
6
S1
Δ%
−75% −33% −24%
44

−56% −20%
11
−75% −89% −44%
−7.4% −36%
5.4

5
−7.4% −40%
S2
Δ%
both
Δ%
208.6
−4.4%
5.414
Table 1. Effect of ReShaping.
From the results, we learn that placing greedy shapers
helps to reduce the total buffer requirements from 25 down
to 14 events that need to be buffered at most. Moreover,
the greedy buffers also reduce the endtoend delay of both
eventstreams,namelyby7.4%forS1,andbya totalof40%
for S2.
When we look at the results, we also recognize the well
knownpropertyof greedyshapers that reshapingis forfree
[4]. Since we use σ1 = αu
shapers effectively only reshape S1
the buffer requirements of CPU1 and CPU2 are not af
fected by adding the greedy shapers.
S1and σ2 = αu
1and S2
S2, the greedy
2, and therefore