Rate analysis for streaming applications with on-chip buffer constraints.
ABSTRACT While mapping a streaming (such as multimedia or network packet processing) application onto a specified architecture, an important issue is to determine the input stream rates that can be supported by the architecture for any given mapping. This is subject to typical constraints such as on-chip buffers should not overflow, and specified play out buffers (which feed audio or video devices) should not underflow, so that the quality of the audio/video output is maintained. The main difficulty in this problem arises from the high variability in execution times of stream processing algorithms, coupled with the bursty nature of the streams to be processed. We present a mathematical framework for such a rate analysis for streaming applications, and illustrate its feasibility through a detailed case study of a MPEG-2 decoder application. When integrated into a tool for automated design-space exploration, such an analysis can be used for fast performance evaluation of different stream processing architectures.
- [Show abstract] [Hide abstract]
ABSTRACT: This article proposes a hardware/software partitioning method targeted to performance-constrained systems for datapath applications. Exploiting a platform based design, a Timed Petri Net formalism is proposed to represent the mapping of the application onto the platform, allowing to statically extract performance estimations in early phases of the de- sign process and without the need of expensive simulations. The mapping process is generalized in order to allow an automatic exploration of the solution space, that identi- es the best performance/area congurations among several application-architecture combinations. The method is eval- uated implementing a typical datapath performance con- strained system, i.e. a packet processing application.Proceedings of the 6th International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2008, Atlanta, GA, USA, October 19-24, 2008; 01/2008 - SourceAvailable from: Insup Lee
Conference Paper: Video Quality Driven Buffer Sizing via Frame Drops
[Show abstract] [Hide abstract]
ABSTRACT: We study the impact of video frame drops in buffer constrained multiprocessor system-on-chip (MPSoC) platforms. Since on-chip buffer memory occupies a significant amount of silicon area, accurate buffer sizing has attracted a lot of research interest lately. However, all previous work studied this problem with the underlying assumption that no video frame drops can be tolerated. In reality, multimedia applications can often tolerate some frame drops without significantly deteriorating their output quality. Although system simulations can be used to perform video quality driven buffer sizing, they are time consuming. In this paper, we first demonstrate a dual-buffer management scheme to drop only the less significant frames. Based on this scheme, we then propose a formal framework to evaluate the buffer size vs. video quality trade-offs, which in turn will help a system designer to perform quality driven buffer sizing. In particular, we mathematically characterize the maximum numbers of frame drops for various buffer sizes and evaluate how they affect the worst-case PSNR value of the decoded video. We evaluate our proposed framework with anMPEG-2 decoder and compare the obtained results with that of a cycle-accurate simulator. Our evaluations show that for an acceptable quality of 30 dB, it is possible to reduce the buffer size by up to 28.6% which amounts to 25.88 megabits.Embedded and Real-Time Computing Systems and Applications (RTCSA), 2011 IEEE 17th International Conference on; 10/2011 - SourceAvailable from: psu.edu
Article: Processor Frequency Selection in Energy-Aware SoC Platform Design for Multimedia Application
09/2004;
Page 1
In Proceedings of Asia South Pacific Design Automation Conference 2004
Yokohama, Japan, January 2004
Rate Analysis for Streaming Applications with On-chip Buffer Constraints
Alexander Maxiaguine
ETH Z¨ urich
maxiagui@tik.ee.ethz.ch
Simon K¨ unzli
ETH Z¨ urich
kuenzli@tik.ee.ethz.ch
Samarjit Chakraborty
National University of Singapore
samarjit@comp.nus.edu.sg
Lothar Thiele
ETH Z¨ urich
thiele@tik.ee.ethz.ch
Abstract— While mapping a streaming (such as multimedia or
network packet processing) application onto a specified architec-
ture, an important issue is to determine the input stream rates
that can be supported by the architecture for any given map-
ping. This is subject to typical constraints such as on-chip buffers
should not overflow, and specified playout buffers (which feed au-
dio or video devices) should not underflow, so that the quality
of the audio/video output is maintained. The main difficulty in
this problem arises from the high variability in execution times of
stream processing algorithms, coupled with the bursty nature of
the streams to be processed. In this paper we present a mathe-
matical framework for such a rate analysis for streaming applica-
tions, and illustrate its feasibility through a detailed case study of
a MPEG-2 decoder application. When integrated into a tool for
automated design-space exploration, such an analysis can be used
for fast performance evaluation of different stream processing ar-
chitectures.
I. INTRODUCTION
Lately, there has been a tremendousincrease in portable and
mobile devices running algorithms for processing streams of
audio and video data, and sometimes network packets. These
include hand-held computers and mobile phones, and it is
expected that their usage will increase even more in the fu-
ture. Such devices typically have very stringent constraints
pertaining to cost, size, and power consumption, and have
posed several challenges towards developingappropriatemod-
els, methodologies, languages and tools for designing them
(for example, see [10, 19, 20]).
The architecture of such devices typically consists of mul-
tiple processing elements (PEs) onto which parts of an appli-
cation are mapped, and they are integrated on a single chip
following a system-on-a-chip (SoC) design paradigm. In this
setup, a system-level view of stream processing is as follows:
the input stream enters a PE, gets processed by a function or
algorithm implemented on this PE, and the processed stream
enters another PE for further processing. Between two such
PEs there is a buffer which stores the intermediate stream. Fi-
nally, the fully processed stream emerges out of a PE and gets
stored in a playout buffer which feeds some real-time client
such as an audio or video output device. The process of map-
ping a stream processing application onto such a target archi-
tecture gives rise to the problem of determining the range of
input stream rates that can be supported by the architecture for
a givenmapping. Anyfeasible implementation,ormapping,of
an algorithm onto an architecture is subject to constraints such
as (i) the buffers between any two PEs should not overflow,
and (ii) the playout buffer, which is read by the real-time client
at some specified rate (depending on the quality of the audio
or video output required) should not underflow at any point
in time. Determining the range of feasible input stream rates,
subject to theaboveconstraintsis difficultbecauseoftwo main
reasons. Firstly, there is a high data-dependent variability in
the execution time of many stream processing algorithms, be-
cause it depends on the properties of the particularaudio/video
sample being processed. Secondly, the input streams them-
selves tend to be bursty in nature. These two factors coupled
together can result in increasing the burstiness of the stream
coming out of a PE, thereby necessitating a large amount of
on-chip buffer space for its storage. Here, it may be noted that
in contrast to the simple setup described above, there might be
multiple streams being processed by a PE, where the different
streams are processed by different functions—all of which are
implemented on the same PE. The burstiness of the outgoing
processed streams in such cases would additionally depend on
the scheduling policy used to schedule them on the PE [16].
The importance of the above problem of rate analysis stems
from the fact that on-chip buffers are available only at a
premium, because of their large area requirements (see [9]).
Therefore, when mapping a streaming application onto a spec-
ified architecture, it is necessary to accurately identify the fea-
sible range of input stream rates (and bursts) that can be sup-
ported by the available on-chip buffers. This also includes the
minimum rate that should be maintained to ensure the quality
of the audio/video output.
A. Our results and relation to previous work
There is a large body of work on on-chip traffic analysis and
SoC communication architecture design (see [13, 14] and the
references therein) which is relevant to the problem addressed
in this paper. However, most of this relies on simulation based
approaches. In the context of our problem, it typically requires
several hours to simulate a few minutes of audio/videodata for
any reasonably detailed processor model. Further, simulation
based approaches often fail to accurately characterize the al-
lowed input rates and burst ranges, and strongly depend on the
audio or video data used, which itself is difficult to select.
In this paper we present a mathematical framework for rate
analysis for streaming applications, with the aim of overcom-
ing the main problems associated with simulation based meth-
ods. For the sake of generality, we consider any stream to be
made up of a potentially infinite sequence of stream objects,
where a stream object might be a macroblock, a video frame,
an audio sample, or a network packet, depending on the ap-
plication in question. Given a specification of the architecture
along with the different buffer sizes, the scheduling policies
Page 2
implemented at the different PEs, the execution requirement
per stream object of each processing function implemented on
a PE, and the output rate required to drive the real-time client
(such as the audio or video terminal), the proposed framework
can be used to compute the minimum and maximum rate of
the input stream. Here, rate refers to the precise characteri-
zation of the stream—including the allowed burst range/jitter
and the long-term arrival rate—which are described more pre-
cisely in Section II. Any input stream whose rate is in between
the computed minimum and maximum rates is guaranteed to
satisfy all the constraints pertaining to buffer overflow and un-
derflow. We also substantiate these theoretical results through
a detailed case study of mapping a MPEG-2 decoder applica-
tion on a specified architecture.
The proposedframeworkcan beused to drivea system-level
design space exploration where different possible mappings of
a streaming application onto an architecture with fixed buffers
need to be evaluated. It can also be used for optimal buffer
sizing in an architecture, which is of crucial importance due to
the high space requirements of on-chip buffers.
Our framework is based on the theory of Network Calculus
[4], which extends the concept of service curves proposed in
[7, 8] by placing it in an algebraic setting developedin [2]. Al-
thoughNetwork Calculus was originally developed,and is still
being largely used in the domain of communication networks,
very recently it was used to analyse SoC architectures in the
context of network processors [6, 18]. This work was further
extended in [5, 11]. Our work in this paper follows this line
of development. On an abstract level, all the previous papers
consideredtheproblem: givenaninputstreamandtheschedul-
ing policy at each PE, what is the worst-case buffer require-
ment and what is the nature of the output stream? However,
the problem addressed in this paper is the “reverse problem”,
where the output stream and the buffer size are given and the
nature of the input stream needs to be computed. It turns out
that noneofthepreviousresults canbeextendedto addressthis
question, and a more elaborate theory based on [2] is required.
Our work is also related to the problem of multimedia smooth-
ing in the domain of communication networks [4], which ad-
dresses the problems of shaping an input stream to meet buffer
constraints and that of computing the optimal playback delay
orbufferingtimetomaintainqualityofservice. Itmaybenoted
here that the maindifferencesbetweenthe domains of commu-
nication networks and on-chip communication in SoC archi-
tectures are that in the former case (i) buffers are more read-
ily available since there are no space constraints, (ii) packet
dropping is a feasible option, which might not be possible in
the latter case due to power/performance constraints, and (iii)
shapingcan be employedto reduce bufferconsumption,which
might be too costly to employ in the latter case. Lastly, related
to this paper is also the work in [20], which proposes that on-
chip traffic for multimediaapplications exhibitsself-similarity,
and uses this property for optimal buffer sizing.
The problem is formally defined in the next section, fol-
lowed by the details of the proposed framework. In Section III
we present the case study of mapping a MPEG-2 decoder on a
specified architecture. Finally, in Section IV we outline some
of the possible directions in which this work may be extended.
Fig. 1. A node with a processing element and an internal buffer of size b,
processing an input stream and feeding the processed stream into a playout
buffer of size B.
II. RATE ANALYSIS WITH BUFFER CONSTRAINTS
In this section we first state the problem definition, followed
by some notation and then the case of a single PE with a play-
out buffer. This is then extended to consider the case of a
stream passing through multiple PEs.
As mentioned in the last section, a stream is processed by
multiple nodes, where each node consists of a PE and an in-
ternal buffer. Let us first consider the last node in the path
of a stream, which feeds the processed stream into a playout
buffer of size B, as shown in Figure 1. Let this node consist
of a PE and an internal buffer of size b. The playout buffer
is read by a real-time client such as an audio/video output de-
vice, at some specified rate. Let the input stream entering the
node be denoted by x(t), where x(t) denotes the number of
stream objects that arrived during the time interval [0,t]. The
PE provides a guaranteed service β(∆) of the following form:
within any time interval of length ∆, it will be able to process
at least β(∆) number of input stream objects. The function
β therefore provides a lower bound on the service provided
by the PE, and is determined by the time required to process
each stream object and the scheduling policy implemented at
the PE (in case multiple streams or other tasks are also being
processed by it). Let us denote the processed output stream
entering the playout buffer by y(t), which (like x(t)) denotes
the numberof stream objects comingout during the time inter-
val [0,t]. The real-time client consumes stream objects from
the playout buffer at a rate C(t), which denotes the number of
stream objects consumed within the time interval [0,t].
Therefore, x, y and C are functions denoting cumulative
values, while the function β denotes values over time interval
lengths and is referred to as a service curve [7]. Throughout
this paper, we assume all functions f to be wide-sense increas-
ing (which means f(s) ≤ f(t), ∀s ≤ t), and f(t) = 0 for
t ≤ 0. Now, given β, C, and the buffer sizes b and B, the
problem is to compute the function (or set of possible func-
tions) x(t), such that (i) the playout buffer does not overflow,
(ii) it does not underflow, and (iii) the internal buffer at the
node does not overflow. These constraints are subject to the
real-time server consuming stream objects at the specified rate
C(t) and the processing element providing a guaranteed ser-
vice β. The version of the problem with multiple PEs is a
simple extension of this, and is stated later.
As mentioned before, the constraint on playout buffer un-
derflow is to maintain the quality of the audio/video output.
The constraints on buffer overflow is motivated by the fact
that typicallystatic schedulingpolicies areimplementedon the
PEs (for simplicity), and hence checking buffer fill-levels and
stalling a processor in case an output buffer is full, can not be
easily implemented.
Page 3
A. Notation
For any two functions f and g, the min-plus convolution of
f and g is given by: (f ⊗ g)(t) = infs:0≤s≤t{f(t − s) +
g(s)}. The min-plus deconvolution of f and g is given by:
(f ? g)(t) = supu:u≥0{f(t + u) − g(u)}. We use f ∧ g to
denote the infimum of f and g, or the minimum if it exists, and
f ∨ g to denote the supremum of f and g, or the maximum if
it exists.
B. Buffer underflow and overflow constraints
Following our problem description, the constraint on the play-
out buffer underflow can be stated as:
y(t) ≥ C(t) ∀t ≥ 0
Since the PE providesa serviceguaranteeof β, it can be shown
that y(t) ≥ (x ⊗ β)(t) (see [4] for details). Hence, the min-
imum value of y(t) is equal to (x ⊗ β)(t) and the constraint
given by Eqn.(1) can be reformulated as:
(x ⊗ β)(t) ≥ C(t) ∀t ≥ 0
It can be shown that for any functions f, g and h, g ⊗h ≥ f if
and only if h ≥ f ? g. Using this, Eqn.(2) can be stated as:
x(t) ≥ (C ? β)(t) ∀t ≥ 0
Similarly, the constraint on the playout buffer overflow can be
stated as: y(t) − C(t) ≤ B
large as x(t) but not larger, this constraint can be reformulated
as:
x(t) ≤ C(t) + B ∀t ≥ 0
Finally,theconstraintthattheinternalbufferatthenodeshould
not overflow, is given by: x(t) − y(t) ≤ b
y(t) ≥ (x ⊗ β)(t), the minimum value of y(t) is (x ⊗ β)(t)
and the above constraint, as before, can be formulated as:
x(t) ≤ (x ⊗ β)(t) + b ∀t ≥ 0
Eqns.(3), (4) and (5) therefore state all the constraints that the
input stream x(t) is required to satisfy.
(1)
(2)
(3)
∀t ≥ 0. Since y(t) can be as
(4)
∀t ≥ 0. Since
(5)
C. Computing bounds on x(t)
Eqns.(4) and (5) can be combined and stated as follows:
x(t) ≤ (C(t) + B) ∧ ((x ⊗ β)(t) + b) ∀t ≥ 0
Let xmax(t) be the maximum value of x(t) which satisfies the
above inequality. This inequality is of the form:
x ≤ (C + B) ∧ Π(x)
where x and C are functions and Π is an operator given by
Π(x) = (x⊗β)+b. It followsfrom[4] (see also [2] forfurther
details), that the maximum solution which satisfies Eqn.(6) is
given by xmax(t) = Π(C(t)+B), where Π is the sub-additive
closure of Π and is defined as
Π(x) = x ∧ Π(x) ∧ Π(Π(x)) ∧ ...
Since Π(x) = (x ⊗ β) + b, it follows that:
Π(x) = x ∧ (x ⊗ β + b) ∧ (x ⊗ β ⊗ β + 2b) ∧ ...
(6)
or, Π(x) = x∧infn≥1{x⊗β(n)+nb}, where β(n)is the n-th
self-convolution of β. Now, it is known that for any functions
f, g andh, (f∧g)⊗h = (f⊗h)∧(g⊗h) [4]. Usingthis result,
it follows that: Π(x) = x ⊗ δ0 ∧ x ⊗ infn≥1{β(n)+ nb},
whereδ0is afunctiondefinedasδ0(t) = +∞forallt > 0,and
δ0(t) = 0 forall t ≤ 0. Hence, Π(x) = x⊗infn≥0{β(n)+nb}
since, for any function f, by convention f(0)= δ0(see [4]).
The sub-additive closure of any function f, denoted by¯f,
is defined as¯f = infn≥0{f(n)} (which is similar to the sub-
additive closure of an operator as described above). Hence, it
follows that Π(x) = x ⊗ (β + b) and therefore,
xmax(t) = (C(t) + B) ⊗ (β(t) + b)
Similarly, to obtain a lower bound on x(t), we recast Eqn.(5)
as follows: x(t) ≥ (x(t) − b) ? β(t). By combining this with
Eqn.(3), we obtain: x(t) ≥ (C ? β)(t) ∨ (x(t) − b) ? β(t).
This is of the form:x ≥ (C ? β) ∨ Γ(x)
where Γ is an operator given by Γ(x) = (x − b) ? β. Using a
result [4] analogous to the existence of the maximum solution
to Eqn.(6), it follows that Eqn.(8) has one minimum solution
which is given by:xmin(t) = Γ(C(t) ? β(t))
where Γ is the super-additive closure of Γ and is defined as
Γ(x) = x ∨ Γ(x) ∨ Γ(Γ(x)) ∨ ...
Unlike Eqn.(7), Eqn.(9) unfortunately does not give a
closed-form solution to xmin(t) and must be iteratively com-
puted for any given problem instance. Eqns.(7) and (9) there-
fore giveupperand lower boundson the functionx(t), andthis
is summarized in the following theorem, which is the main re-
sult of this paper.
(7)
(8)
(9)
Theorem 1 Any non-decreasing function x(t) which satisfies
the inequality: xmin(t) ≤ x(t) ≤ xmax(t), ∀t ≥ 0, respects
both, buffer overflow and the playout buffer underflow con-
straints, where xminand xmaxare computed using Eqns.(7)
and (9).
D. The case of multiple processing elements
PEs in the path of a stream, other than those considered in
the preceding subsections (i.e. those which do not feed their
output into a playout buffer) process an input stream x?(t) and
the output y?(t) is fed into another PE for further processing.
The only constraint for any such PE is that the associated in-
ternal buffer should not overflow. The required output y?(t)
for such a PE is determined from the input x(t) of the pre-
ceding PE. Following this composition scheme, we first fix an
input x(t) of the PE which feeds the playout buffer (where
xmin(t) ≤ x(t) ≤ xmax(t) from the previous subsection).
This x(t) is the requiredoutputy?(t) ofthe immediatenextPE,
for which we compute bounds x?maxand x?min(using similar
techniques as described above) and choose some x?(t) lying in
between these bounds. This is the required output of the im-
mediatenextPE, andthis processis followeduntilwe compute
x?(t) (or bounds on it) for the input stream entering the first
PE in the path of the stream. Any input stream conforming
to this computed value is guaranteed to respect all the buffer
constraints.
Page 4
MCVLD
IQ
MP
IDCT
PE1
VOUTPE2
B1
B2
Bout
C(t)x(t)
????
Fig. 2. Mapping the MPEG-2 decoder application onto a multiprocessor SoC.
III. RATE ANALYSIS FOR A MPEG-2 DECODER
Inthissectionweapplytherateanalysismethodologydevel-
opedintheprevioussectiontostudythemappingofaMPEG-2
decoder[3] applicationontoa multiprocessorSoC architecture
with fixed buffers. By comparing the results from our mathe-
matical framework with those obtained from a system simu-
lator, we show that our framework is able to provide useful
bounds on the allowed rates of the input stream. For any in-
put sequence x(t) and the computed bounds xmaxand xmin,
the results obtained by our framework were in conformance
with the simulation results in terms of predicting buffer over-
flow, underflow, and all cases where the buffer constraints are
satisfied.
A. The MPEG-2 decoder application
Ourtargetarchitectureconsists of a numberofPEs intercon-
nected by an on-chip communication network. This network
can be seen as a system of point-to-point buffered FIFO chan-
nels of limited capacity. The PEs exchange packetized data by
writing/reading to/from these channels.
ThepartofthearchitectureontowhichtheMPEG-2decoder
application is mapped, along with the mapping of the applica-
tion’s task graph on it is shown in Figure 2. In this figure, PE1
and PE2are programmable processors and V OUT denotes a
video output port.
The task graph of the MPEG-2 decoder includes several
tasks such as variable length decoding (VLD), inverse quan-
tization (IQ), inverse discrete cosine transform (IDCT), for-
mation of motion predictors (MP) and motion compensation
(MC). Based on profiling information, we partitioned this set
of tasks into two subsets, with one being executed on PE1,
and the other on PE2.
A compressed video bit stream arrives into a buffer B1(as
shown in Figure 2) and is processed by the VLD and the IQ
tasks running on PE1. Decompressed macroblocks are writ-
ten into the buffer B2, which is read by PE2for further pro-
cessing. Note that the data exchange between PE1and PE2
can be seen as a single stream of packets with each packet en-
capsulatingIDCT coefficientsandmotionvectorsforonemac-
roblock. This information is processed by the IDCT, MP and
MC tasks mapped onto PE2. Finally, the decoded video sam-
ples are written (one macroblock at a time) into the playout
buffer Bout.
Boutis read by an output process located in the video port,
which reads one macroblockat a time, at a constant rate that is
determined by the frame rate and the resolution of the decoded
video stream. The constraints associated with Boutare that
it should never be empty when the output process attempts to
read from it, and it should also not overflow when PE2writes
into it. We would not want adopt the option of stalling PE2
when Boutis filled, in order to use simple static scheduling
algorithms on PE2when multiple streams are processed by
it. Additionally, we also require the buffers B1and B2not to
overflow.
Satisfying all the above conditions simultaneously is diffi-
cult, because tasks executing on PE1produce a bursty time-
varying traffic on its output. This is mainly due to the fact that
the execution time of the VLD task exhibits high variability,
which depends on the structure of the compressed stream and
the propertiesof the encodedvideoinformation(see also [20]).
Using the proposedrate analysis technique, we can compute
upper and lower bounds on the macroblock output rate that is
to be satisfied by the processor PE1. Any macroblock output
stream from PE1which conforms to these bounds is guaran-
teed not to overflow the buffers B2and Boutand also not un-
derflow the buffer Bout. Depending on such a chosen output
rate of PE1, upper and lower bounds on the input rate to the
buffer B1can also be computed, as discussed in the previous
section.
B. Analytical results
Due to space restrictions, we will only concentrateon buffer
overflow and underflow constraints associated with B2 and
Boutand compute bounds that the macroblock output stream
coming out of PE1is required to satisfy. Following the nota-
tion ofSectionII(see alsoFigure1), theoutputprocesslocated
in the video port is a real-time client characterized by a cumu-
lative rate function C(t). Boutis the playout buffer of size B.
The processing capacity of PE2is characterized by a guaran-
teed service of at least β(t). The buffer B2corresponds to the
internal buffer of PE2with size b. Let x(t) be the cumula-
tive rate function of any macroblock stream on the output of
PE1. The curves denoted by xmax(t) and xmin(t), which are
computed analytically using the framework developed in Sec-
tion II, are the bounds which the macroblock stream x(t) on
PE1’s output has to conform in order to guarantee that buffers
B2and Boutdo not overflow and Boutdoes not underflow.
C(t) can be derivedfromthe parametersof the video stream
to be decoded, and is given as follows:
?
NFt
C(t) =
0if t ≤ t0
if t > t0
where N is the number of macroblocks per video frame, F is
the video frame rate and t0is the time, starting from which the
real-time client starts reading data out of the playout buffer. t0
can be referred to as the playback delay or buffering time. All
video streams used in our experiments have F = 25 frames
per second and N = 1620. The playback delay t0, in general,
can be chosen arbitrarily. We have set t0to be equalto the time
required by the real-time client to read half a frame from the
playout buffer.
A system configuration is defined by a set of parameters for
β(t), B, and b. In our experiments we varied the size of the
playout buffer B. Assuming that the processor PE2 is un-
loaded and hence its entire capacity is available for processing
Page 5
50100150200250 300350
2500
5000
7500
10000
12500
15000
t [ms]
# macroblocks
x(t)
max
x (t)
min
Fig. 3. The analytical bounds xmin(t) and xmax(t) computed for a system
configuration with B = 2430 macroblocks.
the IDCT, MP and MC tasks mapped onto it, the service curve
β(t) was modeled by a straight line. The slope of this line was
set to be equal to the long-term average macroblock produc-
tion rate on the output of PE1(which was measured using the
system simulator described in the next subsection). This was
based on the assumption that in the long term, PE2has suf-
ficient capacity to process all the incoming macroblocks. B2
was always set to a fixed size of b = 500 macroblocks. Fig-
ure 3 shows the resulting bounds xmin(t) and xmax(t) com-
puted by implementingEqns.(7) and (9) of Section II in Math-
ematica (from Wolfram Research), with the system configura-
tion and β(t) describedaboveand the playoutbuffersize B set
to 2430 macroblocks.
C. Simulation Setup
We performed simulations of the MPEG-2 decoder applica-
tion (shown in Figure 2) using a transaction level model of the
system architecture (see [12]). The system model was writ-
ten in SystemC [17], and the models of the programmable
PEs were based on the Sim-Profile configuration of the Sim-
pleScalar [1] instruction set simulator. Both, PE1and PE2,
had a RISC-core (similar to the MIPS3000 processor) aug-
mented with MPEG-2 specific hardware accelerators. PE1
was enhanced with bit-stream access operations, while PE2
had special support for application kernels such as IDCT,
Add Clip and Block Average and could prefetch memory in a
special video-block mode. Floating point operations were not
used on either of the PEs. The implementation of the MPEG-2
decoder was based on the source code available from [15].
We simulated the decoding of several MPEG–2 video clips.
All video clips had parameters as described in the previous
subsection. The clips were encoded using a constant bit rate of
9.78 Mb/s and a resolution of 720x576 pixels (typically used
in DVD applications). A selection of the simulated scenarios
(with each scenario being a combination of a video clip and
the playout buffer size), representing corner cases for our ex-
periments, is summarized in Table I. Video sequence A cor-
responds to a video clip with global motion, whereas video
sequence B corresponds to a video clip with moving objects
and still background and video sequence C represents a still
picture.
For all the simulation scenarios listed in Table I, using our
simulation setup we measured x(t) at the output of PE1and
the maximum and minimum fill levels of B2and the playout
buffer Bout.
D. Comparing the analytical bounds with simulation
In this subsection we evaluate the usefulness of the analyti-
cal boundson the input rate x(t), by comparingthe predictions
TABLE I
SIMULATION SCENARIOS
ScenarioBuffer Size B
# macroblocks
1620
2430
2430
2430
3240
Video
Clip
1
2
3
4
5
Sequence A
Sequence A
Sequence B
Sequence C
Sequence C
0
24
68 10
12 14
8
x 10
0
500
1000
1500
2000
2500
02468101214
(c)(a)
(d)
(b)
0
500
1000
1500
2000
2500
0
500
1000
1500
2000
2500
0
500
1000
1500
2000
2500
x 10
8
x 10
8
x 10
8
0
24
6810
1214
02468101214
Fig. 4. (a) The difference plot for a macroblock stream x(t) which is
compliant with the computed upper and lower bounds; (b) Corresponding
playout buffer fill levels; (c) The difference plot for a non-compliant stream
x(t); (d) Corresponding playout buffer fill levels indicating buffer overflow.
The horizontal-axis shows time in ns and the vertical-axis shows the number
of processed macroblocks in the playout buffer.
on buffer overflow/underflow/conformance deduced from the
analytical framework, with the results obtained by simulation.
For the ease of interpretation of the simulation results, we
always show a difference plot. Such a plot does not show the
absolute values of x(t) (obtained from the simulation) versus
xmin(t) and xmax(t) (which are computed following Theo-
rem1ofSectionII). Instead,it showsthecurvescorresponding
to thedifferencesxmax(t)−xmin(t) andx(t)−xmin(t). From
such a plot it is possible to detect when an input stream x(t)
(where x(t) is measured from the input stream resulting from
simulating the decodingalgorithm on a video clip) violates the
computed bounds. A violation occurs whenever the curve rep-
resenting x(t)−xmin(t) crosses the curvexmax(t)−xmin(t),
or goes below 0.
Figure 4(a) shows an example where x(t) resulting from a
video clip is compliant with the bounds xmin(t) and xmax(t).
In Figure 4(b) the corresponding playout buffer fill level (as
measured from simulation) is shown, which confirms that no
bufferoverflowor underflowoccurs. Figure 4(c)depicts an ex-
ample of a sequence x(t) (obtained from the simulation Sce-
nario 1), which violates the upper bound. The corresponding
bufferfill levelplot in Figure4(d)shows thatthe playoutbuffer
overflows.
In Figure 5 we show the difference plots corresponding to
the Scenarios 2–5 which are outlined in Table I. All the fig-
ures in this subsection show an excerpt of 1.36 seconds (corre-
sponding to 34 frames) of video sequences from some simula-
tion scenario (1 to 5). The left bar plot in Figure 6 shows nor-
malized values of largest buffer fill levels observed at the play-