A Model-Based Schedule Representation for Heterogeneous Mapping of Dataflow Graphs
ABSTRACT Dataflow-based application specifications are widely used in model-based design methodologies for signal processing systems. In this paper, we develop a new model called the dataflow schedule graph (DSG) for representing a broad class of dataflow graph schedules. The DSG provides a graphical representation of schedules based on dataflow semantics. In conventional approaches, applications are represented using dataflow graphs, whereas schedules for the graphs are represented using specialized notations, such as various kinds of sequences or looping constructs. In contrast, the DSG approach employs dataflow graphs for representing both application models and schedules that are derived from them. Our DSG approach provides a precise, formal framework for unambiguously representing, analyzing, manipulating, and interchanging schedules. We develop detailed formulations of the DSG representation, and present examples and experimental results that demonstrate the utility of DSGs in the context of heterogeneous signal processing system design.
-
Citations (0)
-
Cited In (0)
Page 1
A Model-based Schedule Representation for Heterogeneous Mapping of Dataflow
Graphs
Hsiang-Huang Wu, Chung-Ching Shen, Nimish Sane, William Plishker, Shuvra S. Bhattacharyya
Department of Electrical & Computer Engineering, and
Institute for Advanced Computer Studies
University of Maryland
College Park, Maryland, USA
{hhwu, ccshen, nsane, plishker, ssb}@umd.edu
Abstract—Dataflow-based
widely used in model-based design methodologies for signal
processing systems. In this paper, we develop a new model
called the dataflow schedule graph (DSG) for representing a
broad class of dataflow graph schedules. The DSG provides
a graphical representation of schedules based on dataflow
semantics. In conventional approaches, applications are rep-
resented using dataflow graphs, whereas schedules for the
graphs are represented using specialized notations, such as
various kinds of sequences or looping constructs. In contrast,
the DSG approach employs dataflow graphs for representing
both application models and schedules that are derived from
them.
Our DSG approach provides a precise, formal framework
for unambiguously representing, analyzing, manipulating, and
interchanging schedules. We develop detailed formulations of
the DSG representation, and present examples and experimen-
tal results that demonstrate the utility of DSGs in the context
of heterogeneous signal processing system design.
applicationspecificationsare
Keywords-dataflow graphs, heterogeneous computing, models
of computation, scheduling.
I. INTRODUCTION
Dataflow models of computation are widely used for ex-
pressing the functionality of digital signal processing (DSP)
applications (e.g., see [1]). In DSP-oriented dataflow models
of computation, applications are modeled as directed graphs,
where vertices (actors) represent computational modules for
executing (or firing) tasks, and edges represent first-in-first-
out channels for storing data values (tokens), and imposing
data dependencies between actors. Whenever an actor fires,
it consumes and produces tokens from its input and output
edges, respectively.
Scheduling has been studied extensively in the context of
dataflow-based modeling of DSP systems. Dataflow graph
scheduling involves assigning actors to processors, and se-
quencing subsets of actors that share common processing
resources. For dataflow scheduling of DSP systems, a “pro-
cessor” in this context is typically taken to be a hardware
resource on which execution is time-multiplexed by actors
that are assigned to it. In addition to ensuring that dataflow
graph dependencies are respected, scheduling is often geared
towards exploiting parallelism (performance improvement)
and efficient memory utilization (buffer management). Given
the fundamental role of scheduling in dataflow-based design
flows, and its heavy impact on key implementation metrics,
a wide variety of techniques has evolved over the years and
continues to evolve for scheduling DSP dataflow graphs.
Such techniques target objectives such as buffer optimiza-
tion [2], joint code and data minimization [3], quasi-static
scheduling [4], adaptive scheduling [5], [6], and throughput
optimization [7].
As the range of dataflow graph scheduling techniques
continues to expand, based on the heterogeneity of applica-
tion modeling styles and implementation objectives, and the
increasing degree of dynamics in applications, it becomes
increasingly important to develop a common representation
for modeling and working with dataflow schedules. Such
a representation is desirable to enable systematic reuse
of design tool code, analysis techniques, and back-end
implementation methodologies across various scheduling
strategies. Furthermore, a formal representation helps to
integrate different scheduling techniques so that they can be
mixed and matched across different subsystems of a design
based on characteristics and objectives associated with those
subsystems.
In this paper, we address this problem by introducing
a formal framework, called the dataflow schedule graph
(DSG), for precisely representing, analyzing, manipulating,
and interchanging schedules. We have designed the DSG
representation with two major objectives — 1) it should
be rooted in formal dataflow semantics, and 2) it should
accommodate a wide range of schedule classes, includ-
ing static, quasi-static, and dynamic schedules, as well as
both sequential and parallel schedule formats. Furthermore,
because they are based on the same dataflow semantic
framework as the application representations from which
the schedules are derived, DSGs can naturally represent
structures in which schedules are adapted dynamically (e.g.,
in response to changes in input data characteristics).
In Proceedings of the International Heterogeneity in Computing Workshop,
pages 66-77, Anchorage, Alaska, May 2011. Published through the IEEE
Computer Society Press as part of the IPDPS CD-ROM.
Page 2
II. RELATED WORK
A number of dataflow schedule representations have been
explored previously. The generalized schedule tree (GST)
representation provides a tree-based representation of arbi-
trary looped schedules [8]. A novel schedule format based
on dynamic loop counts that is geared towards SDF buffer
memory minimization is developed in [9]. The interproces-
sor communication graph and synchronization graph models
provide dataflow-based schedule representations for parallel
schedules of homogeneous SDF (HSDF) graphs [10]. HSDF
is a restricted form of SDF in which the dataflow rate on
each input and output port is always equal to 1 [11].
A distinguishing characteristic of our proposed DSG
representation is that it is both dataflow based, and capable
of handling dynamic schedule structures as well as dynamic
dataflow application models. This is in contrast to execution-
sequence based representations, which can usually be char-
acterized formally but lack dataflow semantics and are often
restricted to static schedules.
The most closely related modeling technique is the syn-
chronization graph model. In this model, self-timed multi-
processor schedules are represented as interacting dataflow
graph cycles, where each cycle corresponds to the periodic
execution of the actors that are assigned to a given proces-
sor [10]. A significant body of theory and algorithms has
been developed for this model. We are therefore motivated
to generalize the synchronization graph concept beyond self-
timed schedules, and HSDF graphs.
The DSG can be viewed as such a generalization. The
DSG model can represent dynamic schedules, which can be
applied to static or dynamic application models to improve
flexibility (e.g., load balancing robustness or data dependent
control structures). Furthermore, the model is fully based on
dataflow principles, which together with its accommodation
of dynamic dataflow semantics, allows for integration with
dynamic parameter control methods for dataflow graphs,
such as those provided by parameterized dataflow [5] and
scenario-aware dataflow [6].
The DSG representation can be used in conjunction with
existing task graph scheduling techniques, such as those
developed in [12], [13], [14], [15], [16]. For example, the
DSG can be used to model the sequencing structures derived
by the scheduling techniques (e.g., as a standard interface for
code generation) or to bridge subsystems that are scheduled
using different techniques. Indeed, exploring the optimized
integration of DSG based schedule control with new and
existing task graph scheduling techniques is an interesting
direction for further investigation, and one that is especially
relevant in the area of heterogeneous computing systems.
III. CORE FUNCTIONAL DATAFLOW
For concreteness, we develop the DSG in the context of
a specific form of dataflow — the core functional dataflow
(CFDF) model of computation,which can be viewed as a de-
terministic sub-class of enable-invoke dataflow graphs [17].
CFDF is a highly expressive (Turing complete), dynamic
dataflow model. In Section XI, we discuss how the DSG
model can be adapted to other forms of dataflow (beyond
CFDF).
In CFDF, actors are specified as sets of modes, where each
mode has a fixed productionand consumptionrate associated
with each input and output port, respectively. Each actor has
an associated current mode, which is maintained as part of
its state. When an actor is invoked, it executes its current
mode, produces and consumes data (as in other dataflow
models), and updates its current mode. Since different modes
of an actor can have different production and consumption
rates, dynamic dataflow can be modeled flexibly in CFDF.
Adistinguishing aspect of
deterministic superset EIDF) is that separation of enable
and invoke functionality for actors is defined as a first class
characteristic of the model. Specifically, each actor has an
associated enable function, which can be called at any time
between firings (e.g., by a run-time scheduler), and returns
a Boolean value indicating whether or not there is sufficient
data available on the actor input ports to fire (invoke) the
actor in its current mode. Since such an isolated enable check
is available, the invoke function of an actor assumes that
sufficient data is present, and reads its input data without
blocking reads.
In the implementation of dataflow tools, functionalities
corresponding to the enable and invoke methods are of-
ten interleaved — for example, an actor firing may have
computations that are interleaved with blocking reads of
data that provide successive inputs to those computations.
In contrast, there is a clean separation of enable and invoke
capabilities in EIDF. This separation helps to improve the
predictability of an actor invocation (since availability of
the required data can be guaranteed in advance by the
enable method), and in prototyping efficient scheduling and
synthesis techniques (since enable and invoke functionality
can be called separately by the scheduler). This separation
also leads naturally to a concept of guarded execution,
whereby an actor firing is conditionally executed depending
on whether or not it is enabled.
CFDF(and thenon-
IV. THE DATAFLOW SCHEDULE GRAPH
REPRESENTATION
Given a CFDF representation GA of an application, a
dataflow schedule graph (DSG) is a dataflow graph that
satisfies certain technical constraints (described later in this
section), and represents the time-multiplexed execution of
GA across a set of hardware resources. Here, a hardware
resource represents an arbitrary computational resource,
such as a processor core, dedicated accelerator or FPGA
subsystem, that executes actors sequentially. Constraints
imposed on the DSG ensure that each hardware resource
Page 3
can execute at most one actor from GAat any given time.
Tokens that flow along edges of the DSG serve to enable
actors for execution (as it becomes their turn to execute).
DSG tokens can also contain values that are manipulated
and queried during execution of the DSG to achieve various
forms of data- or parameter-dependent schedule control.
In DSGs, special actors, called schedule control actors
(SCAs) and reference actors (RAs), are selected or developed
as an integral part of the schedule modeling framework.
In contrast to conventional dataflow actors, which represent
functional components from the original application specifi-
cation (application actors), SCAs are dataflow actors that are
dedicated to coordinating control flow in derived schedules.
On the other hand, RAs can be viewed as “pointers” to
application actors. These pointers are equipped with op-
tional auxiliary computations. Intuitively, an RA represents
a scheduling “wrapper” that specifies the computation that
is executed when the corresponding actor is “visited” during
schedule execution.The simplest form of RA is one that sim-
ply performs a guarded execution of the actor that it points
to. However, more capabilities can be incorporated into RAs
using the optional auxiliary computations mentioned above.
V. REFERENCE ACTORS
An RA has a single input port and a single output port.
An RA is a homogeneous synchronous dataflow actor in the
enclosing DSG — that is, it consumes a single token on
each firing from its input, and produces a single token on
its output.
Given an RA A, we represent the application graph actor
pointed to by A with the symbol ref (A), and we refer to
ref (A) as the referenced actor of A.
As illustrated in Figure 1, an RA A consists of two
functions preAand postA, which are executed, respectively,
before and after the guarded execution phase of A. This
guarded execution phase, represented by the block labeled
“guarded firing” in Figure 1, represents the guarded execu-
tion of A in terms of CFDF semantics (see Section III).
firing
RA
A
buffer
state of actor
buffer
post
pre
guarded
Figure 1.The internal structure of an RA.
We refer to the functions preAand postAas subfunctions
of the enclosing RA. Intuitively, the RA subfunctions pro-
vide a mechanism to process and manipulate data that is used
throughout the graph to control execution of actors (e.g., to
facilitate conditional execution or data dependent iteration
in various parts of the graph). The data manipulated by RA
subfunctions is encapsulated within the DSG tokens that are
produced and consumed by the enclosing RA.
To clarify the operational structure of DSGs, it is useful
to emphasize that the tokens flowing on a DSG are strictly
for schedule control purposes. Furthermore, because actors
in the application graph are allowed to execute only when
they have sufficient data (as specified by the CFDF enabling
conditions), and CFDF is a deterministic dataflow model,
schedule control by DSGs does not violate determinacy —
such control only dictates how actors are time multiplexed
when they are mapped to the same hardware resource.
RAs can contain internal state. Such local (actor-specific)
state is widely known to be compatible with dataflow repre-
sentations since in dataflow graphs, state can be modeled as
self loops with delays (initial tokens) [11], [18]. Thus, the
use of state in RAs does not violate our ability to interpret
DSGs as genuine dataflow representations.
The following categories of data can be used as inputs in
RA subfunctions:
• The value represented by the current DSG token —
i.e., the DSG token that is consumed by the enclosing
RA firing (preAonly). This value can be of any type.
The type is a design issue of the particular DSG control
structure that is being developed for a specific schedule
or the particular class of control structures that is being
targeted by a particular scheduling tool.
• The state of the enclosing RA.
• The state of the referenced actor.
The following categories of data serve as outputs for (i.e.,
can be modified by) RA subfunctions:
• The state of the enclosing RA.
• The value of the token that is produced by the RA
(postAonly).
Firing of an RA involves the following sequence of steps:
1) The RA consumes a token from its input edge. This
token is passed as input to preA, which executes, and
updates the state of RA.
2) A guarded execution of refAis carried out. That is,
refAis fired once if it is enabled.
3) An execution of postAis carried out. This execution
operates on the state of the RA. The output value from
this execution is produced as the output of the RA
firing.
The general purpose of preAand postAis to manipulate
DSG tokens. The values of DSG tokens, in conjunction
with SCAs, contribute to overall schedule control. Com-
putations in preAand postAare optional. For example,
an RA can simply execute the referenced actor uncondi-
tionally, maintain no internal (RA) state, and pass input
DSG values from input to output without modification. Such
Page 4
“lightweight” RAs are typical in the construction of static
scheduling structures, as well as in dynamic structures where
dynamic schedule control is managed by SCAs. When code
is generated from DSGs, such lightweight RAs can easily
be detected and “optimized away” so that they do not result
in run-time overhead.
An example of a non-lightweight RA is one that updates
DSG tokens with estimates of the amount of energy or
execution time taken by the associated firings. Such infor-
mation can then be used by the enclosing DSG to adapt
overall schedule control — e.g., when the DSG is embedded
within a parameterized dataflow system or other kind of
reconfigurable dataflow graph framework (e.g., see [5], [6]).
VI. SCHEDULE CONTROL ACTORS
To model dynamic scheduling structures, SCAs generally
play an important role in conjunction with RAs. An SCA is
an actor that can have any positive number of input ports and
any positive number of output ports. In other words, an SCA
must have at least one input port and output port, and may
have any number of additional input or output ports. The
dataflow behavior of an SCA exhibits the following lumped
homogeneous synchronous dataflow (LHSDF) condition: for
every firing f of an SCA C, we have that nc = np = 1,
where ncrepresents the total number of tokens consumed by
C across all input ports during f, and nprepresents the total
number of tokens produced across all output ports during f.
Note that an SCA C can have internal state, and if we
model that state as a self-loop edge for C, then this edge is
treated independently of the LHSDF condition — i.e., such
a self-loop edge is a standard HSDF edge whose dataflow
does not “count towards” the values of ncand np.
A token in a DSG can be interpreted loosely as an “actor
level program counter” for a given target processor. The
LHSDF condition for SCAs along with the HSDF semantics
of RAs guarantee that there is only one such program
counter (thread of control) that is “demanded of” each target
processor. This ensures that the schedule execution modeled
by the DSG conforms to the assumption that individual
target processors execute actors sequentially.
Note that while our proposed DSG model is used to model
schedules for CFDF graphs, SCAs and hence DSGs do
not necessarily conform to CFDF semantics. The primary
requirement for SCAs in the context of the associated actor
level program counter concept is most naturally captured by
LHSDF semantics as opposed to CFDF.
We introduce several types of SCA actors that will be
used in this paper. Table I summarizes properties of these
actors. The loop actor has two pairs of inputs and outputs.
One pair is used to perform computations within the loop
repeatedly, while the other pair is used for conditionally
branching into and exiting the loop based on certain control
conditions. Since there is only one DSG token, execution
always proceeds unambiguously either inside or outside the
loop.
SCA actors can be paired with other SCA actors to pro-
vide special control functions that involve their coordination.
For example, if and fi provide DSGs with the capability of
selecting computations conditionally. The number of outputs
for a given if actor must match the number of inputs to the
corresponding fi actor to provide conditional selection of
the computations that are enclosed by the matching if and
fi pair.
The pair snd and rec is used for interprocessor commu-
nication and synchronization in concurrent DSGs (CDSGs),
which are discussed further in Section VIII.
Table I
EXAMPLES OF SCAS.
SCA
loop
if
fi
snd
rec
# of inputs
2
1
≥2
1
2
# of outputs
2
≥ 2
1
2
1
VII. SEQUENTIAL DATAFLOW SCHEDULE GRAPHS
A DSG for a single-processor schedule represents the
time-multiplexed (sequential) execution of a set of actors on
a single processing resource. Execution of the DSG models
the evolution of actor firings in the associated sequential
schedule. To preserve this sequential execution property, a
sequential DSG (SDSG) imposes the restriction that at most
one token can be present in the entire DSG at any given
time. This requirement formally captures the interpretation
of DSG tokens as actor level program counters in the context
of single-processor schedules. Just as the program counter
in a conventional processor “points to” a single instruction
at any given time, the unique SDSG token points to a single
SDSG actor, which is the next actor to execute.
For example, consider the class of single appearance
schedules for SDF graphs [3]. These schedules are repre-
sented in terms of looped schedules such that each actor
appears exactly once, implying, for example, minimal code
size under inline implementation. For example, the looped
schedule (3(2ab)c), involving 3 actors a,b,c, and 2 loops
represented by the two nested, parenthesized terms, repre-
sents the firing sequence ababcababcababc.
To demonstrate SDSGs for single appearance schedules,
we apply the loop SCA that was introduced in Section VI.
Figure 2(a) shows an SDF graph (GA) and an associated
single appearance schedule (A(2B)C). A simple SDSG
(GS) is shown in Figure 2(b). In this example, loop1, which
is an instance of the loop actor, implements an outer loop
that models a finite blocking factor J. This blocking factor
value gives the number of times that the schedule is to
Page 5
be repeated. If the schedule is to be repeated indefinitely
(J = ∞), then loop1should be removed, and the output of
RC should be connected directly to RA.
The actor loop2, which is also an instance of the loop SCA
defined in Section VI, implements control for an inner loop
that corresponds to the nested subschedule (2B). A token
in this SDSG does not carry any values; it simply points to
the next actor in the SDSG that is to be executed.
The “D” symbols on the graph in Figure 2 correspond to
delays, and are implemented as initial tokens in the graph.
Functionally, a delay corresponds to the z−1operator in
signal processing.
Execution of the SDSG shown in Figure 2(b) proceeds as
follows. The delay (initial token) on the edge (RC,loop1)
causes execution to begin with a firing of loop1. This
actor loop1has one input port, one output port, and an
internal state that maintains a loop iteration count no, which
corresponds to the number of remaining schedule iterations,
and is initialized to the blocking factor value J. Each time
loop1fires, it first checks the value of no. If no= 0, then
the firing completes with an output token produced on the
output edge that is connected to END. On the other hand, if
no> 0, then the value of nois decremented, and the firing
completes with a token produced on the output edge that is
connected to RA.
This token has the effect of passing processor control to
RA, which then fires the referenced actor A once and passes
control (through its output token) to loop2.
The actor loop2has two input ports in1 and in2 and two
output ports out1 and out2, as shown in Figure 2(b). loop2
also has a state variable ni, which maintains the number of
iterations remaining in the current inner loop invocation.
When loop2consumes a DSG token from in1, it resets ni
to 2, and produces an output token on out1 to enable RB.
On the other hand, when loop2consumes its input from in2,
it first decrements the value of ni. If after this decrement
operation ni > 0, then it again produces an output token
on out1; otherwise, it produces an output token on out2,
which effectively exits the inner loop, and passes control to
RC.
Actors RB and RC, like RA, operate by consuming a
single token each from their unique input edges, firing their
associated referenced actors, and producing a single output
token on their unique output edges. In the case of RC, the
output token produced has the effect of passing control to
the next invocation of the outer loop iteration control.
We emphasize that under correct operation, an SDSG
contains at most one token. Thus, for an enabled SCA that
has multiple input edges, there is never ambiguity about
which input edge the next firing will consume data from
— the SCA will simply consume the input token from the
unique edge that has a nonzero buffer population.
D
A
RRB
RC
GS
BC
A
2112
schedule: (A(2B)C)
GA
out1
out2
1
1
1
1
D
in1
in2
1
1
1
loop
2
loop
(b)
(a)
D
Figure 2.
single appearance schedule (A(2B)C).
(a) An SDF graph (b) A design example of an SDSG for the
VIII. CONCURRENT DATAFLOW SCHEDULE GRAPHS
Efficient parallel computation is an important motivation
for use of dataflow graphs in many implementation contexts.
For this purpose, the concept of the DSG can be naturally
extended to handle concurrent execution of multiple SDSG
“threads”. Multiple SDSGs can be integrated to execute
concurrently through the use of a special kind of actor
called an inter-SDSG coordination actor (ICA). We refer
to the resulting class of communicating, concurrent SDSGs
as concurrent DSGs (CDSGs).
Two specific ICAs are snd and rec, which perform com-
munication and associated synchronization of data that is
passed between different processors. As shown in Figure 3,
snd and rec both have one pair of input and output ports
each — INPC and OUTPC — for the execution-enabling
SDSG token (i.e., the token that is analogous to a program
counter or “PC”, as described in Section VII). Additionally,
the snd actor has a second output port that is used to send
data to another processor, and similarly, the rec actor has
a second input port that is used to receive interprocessor
communication (IPC) data. We refer to these output and
input ports as OUTIPC and INIPC, respectively.
Every instance of a snd actor is paired with a correspond-
ing rec actor in the sense that the OUTIPCport of each snd
actor is connected to the INIPC port of the corresponding
rec actor. The snd represents the communication of a single
token, including any necessary synchronization functionality
(e.g., checking for available buffer space) from the sending
processor to the processor on which the corresponding rec
actor resides. Similarly, the rec represents receipt of a single
token, including any associated synchronization functional-
ity (e.g., to check whether the corresponding interprocessor
communication buffer is non-empty before reading).
In general, the synchronization and data communica-
tion features of the rec and snd actors can be decou-