Page 1
Tradeoff exploration between reliability, power
consumption, and execution time
Ismail Assayad1, Alain Girault2, and Hamoudi Kalla3
1ENSEM (RTSE team), University Hassan II of Casablanca, Morroco.
2INRIA and Grenoble University (POP ART team and LIG lab), France.
3University of Batna (SECOS team), Algeria.
Abstract. We propose an off-line scheduling heuristics which, from a
given software application graph and a given multiprocessor architec-
ture (homogeneous and fully connected), produces a static multiproces-
sor schedule that optimizes three criteria: its length (crucial for real-time
systems), its reliability (crucial for dependable systems), and its power
consumption (crucial for autonomous systems). Our tricriteria schedul-
ing heuristics, TSH, uses the active replication of the operations and the
data-dependencies to increase the reliability, and uses dynamic voltage
and frequency scaling to lower the power consumption.
1Introduction
For autonomous critical real-time embedded systems (e.g., satellite), guaran-
teeing a very high level of reliability is as important as keeping the power con-
sumption as low as possible. We present an off-line scheduling heuristics that,
from a given software application graph and a given multiprocessor architecture,
produces a static multiprocessor schedule that optimizes three criteria: its length
(crucial for real-time systems), its reliability (crucial for dependable systems),
and its power consumption (crucial for autonomous systems). We target homo-
geneous distributed architecture, such as multicore processors. Our tricriteria
scheduling heuristics uses the active replication of the operations and the data-
dependencies to increase the reliability, and uses dynamic voltage and frequency
scaling (DVFS) to lower the power consumption. However, DVFS has an impact
of the failure rate of processors, because lower voltage leads to smaller critical
energy, hence the system becomes sensitive to lower energy particles. As a re-
sult, the failure probability increases. The two criteria length and reliability are
thus antagonistic with each other and with the
schedule length, which makes this problem all
the more difficult.
Let us address the issues raised by multicrite-
ria optimization. Figure 1 illustrates the partic-
ular case of two criteria to be minimized. Each
point x1to x7represents a solution, that is, a
different tradeoff between the Z1and Z2crite-
ria: the points x1, x2, x3, x4, and x5are Pareto
optima [17]; the points x2, x3, and x4are strong
optima while the points x1and x5are weak op-
Z2(x6)
x4
x6
x7
Z1(x6)
x5
Second criterion Z2
First
criterion
Z1
x3
x1
x2
Fig.1: Pareto front for a bicrite-
ria minimization problem.
hal-00655478, version 1 - 29 Dec 2011
Author manuscript, published in "SAFECOMP 6894 (2012) 437--451"
Page 2
tima. The set of all Pareto optima is called the Pareto front.
It is fundamental to understand that no single solution among the points
x2, x3, and x4(the strong Pareto optima) can be said, a priori, to be the best
one. Indeed, those three solutions are non-comparable, so choosing among them
can only be done by the user, depending on the precise requirements of his/her
application. This is why we advocate producing, for a given problem instance,
the Pareto front rather than a single solution. Since we have three criteria, it
will be a surface in the 3D space (length,reliability,power).
The main contribution of this paper is TSH, the first tricriteria scheduling
heuristics able to produce a Pareto front in the space (length,reliability,power),
and taking into account the impact of voltage on the failure probability. Thanks
to the use of active replication, TSH is able to provide any required level of relia-
bility. TSH is an extension of our previous bicriteria (length,reliability) heuristics
called BSH [6]. The tricriteria extension presented in this paper is necessary be-
cause of the crucial impact of the voltage on the failure probability.
2Principle of the method and overview
To produce a Pareto front, the usual method involves transforming all the
criteria except one into constraints, and then minimizing the last remaining
criterion iteratively [17]. Figure 2 illustrates the particular case of two crite-
ria Z1and Z2. To obtain the Pareto front, Z1is
transformed into a constraint, with its first value
set to K1
ing Z2under the constraint Z1<+∞, which pro-
duces the Pareto point x1. For the second run,
the constraint is set to the value of x1, that is
K2
constraint Z1<K2
point x2, and so on. Another way is to slice the in-
terval [0,+∞) into a finite number of contiguous
sub-intervals of the form [Ki
1
].
The application algorithm graphs we are dealing with are large (tens to
hundreds of operations, each operation being a software block), thereby making
infeasible exact scheduling methods, or even approximated methods with back-
tracking, such as branch-and-bound. We therefore have to use list scheduling
heuristics, which have demonstrated their good performances in the past [10].
We propose in this paper a suitable list scheduling heuristics, adapted from [6].
Using list scheduling to minimize a criterion Z2 under the constraint that
another criterion Z1remains below some threshold K1(as in Figure 2), requires
that Z1be an invariant measure, not a varying one. For instance, the energy is
a strictly increasing function of the schedule: if S?is a prefix schedule of S, then
the energy consumed by S is strictly greater than the energy consumed by S?.
Hence, the energy is not an invariant measure. As a consequence, if we attempt
to use the energy as a constraint (i.e., Z1=E) and the schedule length as a
criteria to be minimized (i.e., Z2=L), then we are bound to fail. Indeed, the fact
that all the scheduling decisions made at the stage of any intermediary schedule
1=+∞. The first run involves minimiz-
1=Z1(x1): we therefore minimize Z2under the
1, which produces the Pareto
1,Ki+1
Z1
K1
x4
x2
x3
x5
K4
1
K3
1
Z2
x1
K2
11= +∞
Fig.2: Transformation method
to produce the Pareto front.
hal-00655478, version 1 - 29 Dec 2011
Page 3
S?meet the constraint E(S?)<K1 cannot guarantee that the final schedule S
will meet the constraint E(S)<K1. In contrast, the power consumption is an
invariant measure (being the energy divided by the time), and this is why we
take the power consumption as a criterion instead of the energy consumption
(see Section 3.5).
The reliability too is not an invariant measure: it is neither an increasing
nor a decreasing function of the schedule. So the same reasoning applies if the
reliability is taken as a constraint. This is why we take instead, as a criterion,
the global system failure rate per time unit (GSFR), first defined in [6]. By
construction, the GSFR is an invariant measure of the schedule’s reliability (see
Section 3.7).
For these reasons, each run of our tricriteria scheduling heuristics TSH mini-
mizes the schedule length under the double constraint that the power consump-
tion and the GSFR remain below some thresholds, respectively Pobjand Λobj. By
running TSH with decreasing values of Pobjand Λobj, starting with +∞ and +∞,
we are able to produce the Pareto front in the 3D space (length,GSFR,power).
This Pareto front shows the existing tradeoffs between the three criteria, allow-
ing the user to choose the solution that best meets his/her application needs.
Finally, our method for producing a Pareto front could work with any other
scheduling heuristics minimizing the schedule length under the constraints of
both the reliability and the power.
3
3.1
Most embedded real-time systems are reactive, and therefore consist of some
algorithm executed periodically, triggered by a periodic execution clock. Our
model is therefore that of an application algorithm graph Alg which is repeated
infinitely. Alg is an acyclic oriented graph (X,D) (See Figure 3(a)). Its nodes
(the set X) are software blocks called operations. Each arc of Alg (the set D) is a
data-dependency between two operations. If X ?Y is a data-dependency, then X
is a predecessor of Y , while Y is a successor of X. Operations with no predecessor
(resp. successor) are called input operations (resp. output). Operations do not
have any side effect, except for input/output operations: an input operation
(resp. output) is a call to a sensor driver (resp. actuator).
Models
Application algorithm graph
I1
I2G
O1
I3
C
A
F
B
D
E
O2
L24
P4P3
P1 P2
L34
(b)
L12
L14L23
L13
(a)
Fig.3. (a) An example of algorithm graph Alg: I1, I2, and I3 are input operations,
O1 and O2 are output operations, A-G are regular operations; (b) An example of an
architecture graph Arc with four processors, P1 to P4, and six communication links.
hal-00655478, version 1 - 29 Dec 2011
Page 4
The Alg graph is acyclic but it is infinitely repeated in order to take into
account the reactivity of the modeled system, that is, its reaction to external
stimuli produced by its environment.
3.2
We assume that the architecture is an homogeneous and fully connected multi-
processor one. It is represented by an architecture graph Arc, which is a non-
oriented bipartite graph (P,L,A) whose set of nodes is P ∪L and whose set
of edges is A (see Figure 3(b)). P is the set of processors and L is the set
of communication links. A processor is composed of a computing unit, to exe-
cute operations, and one or more communication units, to send or receive data
to/from communication links. A point-to-point communication link is composed
of a sequential memory that allows it to transmit data from one processor to
another. Each edge of Arc (the set A) always connects one processor and one
communication link. Here we assume that the Arc graph is complete.
3.3Execution characteristics
Along with the algorithm graph Alg and the architecture graph Arc, we are also
given a function Exe : (X ×P)∪(D ×L) ?→ R+giving the worst-case execution
time (WCET) of each operation onto each processor and the worst-case com-
munication time (WCCT) of each data-dependency onto each communication
link. An intra-processor communication takes no time to execute. Since the ar-
chitecture is homogeneous, the WCET of a given operation is identical on all
processors (similarly for the WCCT of a given data-dependency).
The WCET analysis is the topic of much work [18]. Knowing the execution
characteristics is not a critical assumption since WCET analysis has been applied
with success to real-life processors actually used in embedded systems, with
branch prediction, caches, and pipelines. In particular, it has been applied to
one of the most critical embedded system that exists, the Airbus A380 avionics
software [16].
Architecture model
3.4
The graphs Alg and Arc are the specification of the system. Its implementation
involves finding a multiprocessor schedule of Alg onto Arc. It consists of two
functions: the spatial allocation function Ω gives, for each operation of Alg (resp.
for each data-dependency), the subset of processors of Arc (resp. the subset of
communication links) that will execute it; and the temporal allocation function
Θ gives the starting date of each operation (resp. each data-dependency) on its
processor (resp. its communication link): Ω : X ?→ 2Pand Θ : X × P ?→ R+.
In this work we only deal with static schedules, for which the function Θ
is static, and our schedules are computed off-line; i.e., the start time of each
operation (resp. each data-dependency) on its processor (resp. its communication
link) is statically known. A static schedule is without replication if for each
operation X (and for each data-dependency), |Ω(X)|=1. In contrast, a schedule
is with (active) replication if for some operation X (or some data-dependency),
|Ω(X)|≥2. The number |Ω(X)| is called the replication factor of X. A schedule is
Static schedules
hal-00655478, version 1 - 29 Dec 2011
Page 5
partial if not all the operations of Alg have been scheduled, but all the operations
that are scheduled are such that all their predecessors are also scheduled. Finally,
the length of a schedule is the max of the termination times of the last operation
scheduled on each of the processors of Arc. For a schedule S, we note it L(S).
3.5Voltage, frequency, and power consumption
The maximum supply voltage is noted Vmaxand the corresponding highest op-
erating frequency is noted fmax. For each operation, its WCET assumes that the
processor operates at fmaxand Vmax(and similarly for the WCCT of the data-
dependencies). Because the circuit delay is almost linearly related to 1/V [3],
there is a linear relationship between the supply voltage V and the operating
frequency f. From now on, we will assume that the operating frequencies are
normalized, that is, fmax=1 and any other frequency f is in the interval (0,1).
Accordingly, the execution time of the operation or data-dependency X placed
onto the hardware component C, be it a processor or a communication link,
which is running at frequency f (taken as a scaling factor) is:
Exe(X,C,f) = Exe(X,C)/f
(1)
The power consumption P of a single operation placed on a single processor
is computed according to the classical model of Zhu et al. [19]:
P = Ps+ h(Pind+ Pd)
Pd= CefV2f
(2)
where Psis the static power (power to maintain basic circuits and to keep the
clock running), h is equal to 1 when the circuit is active and 0 when it is inac-
tive, Pindis the frequency independent active power (the power portion that is
independent of the voltage and the frequency; it becomes 0 when the system is
put to sleep, but the cost of doing so is very expensive [5]), Pdis the frequency
dependent active power (the processor dynamic power and any power that de-
pends on the voltage or the frequency), Cef is the switch capacitance, V is the
supply voltage, and f is the operating frequency. Cefis assumed to be constant
for all operations, which is a simplifying assumption, since one would normally
need to take into account the actual switching activity of each operation to com-
pute accurately the consummed energy. However, such an accurate computation
is infeasible for the application sizes we consider here.
For a multiprocessor schedule S, we cannot apply directly Eq (2). Instead,
we must compute the total energy E(S) consumed by S, and then divide by the
schedule length L(S):
P(S) = E(S)/L(S) (3)
We compute E(S) by summing the contribution of each processor, depending
on the voltage and frequency of each operation placed onto it. On the proces-
sor pi, the energy consumed by each operation is the product of the active power
Pi
oj∈pi
ind+Pi
dby its execution time. As a conclusion, the total consumed energy is:
|P|
?
E(S) =
i=1
?
(Pi
ind+ Pi
d).Exe(oj,pi)
(4)
hal-00655478, version 1 - 29 Dec 2011
Page 6
3.6
Both processors and communication links can fail, and they are fail-silent (a
behavior which can be achieved at a reasonable cost [1]). Classically, we adopt
the failure model of Shatz and Wang [15]: failures are transient and the maximal
duration of a failure is such that it affects only the current operation executing
onto the faulty processor; this is the “hot” failure model. The occurrence of
failures on a processor (same for a communication link) follows a Poisson law
with a constant parameter λ, called its failure rate per time unit. Modern fail-
silent processors can have a failure rate around 10−6/hr [1].
Failures are transient. Those are the most common failures in modern em-
bedded systems, all the more when processor voltage is lowered to reduce the
energy consumption, because even very low energy particles are likely to create
a critical charge leading to a transient failure [19]. Besides, failure occurrences
are assumed to be statistically independent events. For hardware faults, this hy-
pothesis is reasonable, but this would not be the case for software faults [9].
The reliability of a system is defined as the probability that it operates cor-
rectly during a given time interval. According to our model, the reliability of the
processor P (resp. the communication link L) during the duration d is R=e−λd.
Hence, the reliability of the operation or data-dependency X placed onto the
hardware component C (be it a processor or a communication link) is:
Failure hypothesis
R(X,C) = e−λCExe(X,C)
(5)
From now on, the function R will either be used with two variables as in
Eq (5), or with only one variable to denote the reliability of a schedule (or a
part of a schedule).
Since the architecture is homogeneous, the failure rate per time unit is iden-
tical for each processor (noted λp) and similarly for each communication link
(noted λ?).
3.7
As we have demonstrated in Section 2, we must use the global system failure
rate (GSFR) instead of the system’s reliability as a criterion. The GSFR is the
failure rate per time unit of the obtained multiprocessor schedule, seen as if it
were a single operation scheduled onto a single processor [6]. The GSFR of a
static schedule S, noted Λ(S), is computed by Eq (6):
Global system failure rate (GSFR)
Λ(S)=−logR(S)
U(S)
with R(S)=
?
(oi,pj)∈S
R(oi,pj) and U(S)=
?
(oi,pj)∈S
Exe(oi,pj) (6)
Eq (6) uses the reliability R(S), which, in the case of a static schedule S
without replication, is simply the product of the reliability of each operation of S
(by definition of the reliability, Section 3.6). Eq (6) also uses the total processor
utilization U(S) instead of the schedule length L(S), so that the GSFR can
be computed compositionally. According to Eq (6), the GSFR is invariant: for
any schedules S1and S2such that S=S1◦S2, where “◦” is the concatenation of
schedules, if Λ(S1)≤K and Λ(S2)≤K, then Λ(S)≤K (Proposition 1 in [6]).
hal-00655478, version 1 - 29 Dec 2011
Page 7
Finally, it is very easy to translate a reliability objective Robjinto a GSFR
objective Λobj: one just needs to apply the formula Λobj=−logRobj/D, where D
is the mission duration. This shows that the GSFR criterion is usable in practice.
4
4.1
Two operation parameters of a chip can be modified to lower the power
consumption: the frequency and the voltage. We assume that each processor
can be operated with a finite set of supply voltages, noted V. We thus have
V={V0,V1,...,Vmax}. To each supply voltage V corresponds an operating fre-
quency f. We choose not to modify the operating frequency and the supply
voltage of the communication links.
We assume that the cache size is adapted to the application, therefore en-
suring that the execution time of an application is linearly related to the fre-
quency [12] (i.e., the execution time is doubled when frequency is halved).
To lower the energy consumption of a chip, we use Dynamic Voltage and
Frequency Scaling (DVFS), which lowers the voltage and increases proportionally
the cycle period. However, DVFS has an impact of the failure rate [19]. Indeed,
lower voltage leads to smaller critical energy, hence the system becomes sensitive
to lower energy particles. As a result, the fault probability increases both due to
the longer execution time and to the lower energy: the voltage-dependent failure
rate λ(f) is:
The tricriteria scheduling algorithm TSH
Decreasing the power consumption
λ(f) = λ0.10
b(1−f)
1−fmin
(7)
where λ0is the nominal failure rate per time unit, b>0 is a constant, f it the
frequency scaling factor, and fmin is the lowest operating frequency. At fmin
and Vmin, the failure rate is maximal: λmax=λ(fmin)=λ0.10b.
We apply DVFS to the processors and we assume that the voltage switch
time can be neglected compared to the WCET of the operations. To take into
account the voltage in the schedule, we modify the spatial allocation function
Ω to give the supply voltage of the processor for each operation: Ω : X ?→ Q,
where Q is the domain of the sets of pairs ?p,v? ∈ P × V.
Figure 4 shows a simple schedule S
where operations X and Z are placed
onto P1, operation Y onto proces-
sor P2, and the data-dependency X?Y
is placed onto the link L12. Since we
do not apply DVFS to the communi-
cation links, we only compute the en-
ergy consumed by the processors (see
Eq (4)):
L
time
0
X ? Y
Y
(Cef,Pind)
X
(V1,f1)
L12
P2
(Cef,Pind)
P1
Z
(V2,f2)
(V3,f3)
Fig.4: A simple schedule of length L.
– On P1: E(P1) = PindL + CefV2
– On P2: E(P2) = PindL + CefV2
1f1Exe(X,P1,f1) + CefV2
3f3Exe(Y,P1,f3).
2f2Exe(Z,P1,f2).
hal-00655478, version 1 - 29 Dec 2011
Page 8
By applying Eqs (1) and (3), we thus obtain:
P(S)=E(P1)+E(P2)
L
=2Pind+Cef
L
.?V2
1Exe(X,P1)+V2
2Exe(Z,P1)+V2
3Exe(Y,P2)?
The general formula for a schedule S is therefore:
P(S) = |P|.Pind+
Cef
L(S)
|P|
?
i=1
?
oj∈pi
V (oj)2.Exe(oj,pi)
(8)
4.2
According to Eq (6), decreasing the GSFR is equivalent to increasing the re-
liability. Several techniques can be used to increase the reliability of a system.
Their common point is to include some form of redundancy (this is because the
target architecture Arc, with the failure rates of its components, is fixed). We
have chosen the active replication of the operations and the data-dependencies,
which consists in executing several copies of a same operation onto as many
distinct processors (resp. data-dependencies onto communication links).
To compute the GSFR of a static schedule with replication, we use Reliability
Block-Diagrams (RBD) [11]. An RBD is an acyclic oriented graph (N,E), where
each node of N is a block representing an element of the system, and each arc
of E is a causality link between two blocks. Two particular connection points are
its source S and its destination D. An RBD is operational if and only if there
exists at least one operational path from S to D. A path is operational if and
only if all the blocks in this path are operational. The probability that a block
be operational is its reliability. By construction, the probability that an RBD be
operational is thus the reliability of the system it represents.
In our case, the system is the multiprocessor static schedule, possibly partial,
of Alg onto Arc. Each block represents an operation X placed onto a processor P
or a data-dependency X ?Y placed onto a communication link L. The reliability
of a block is therefore computed according to Eq (5).
Computing the reliability in this way requires the occurrences of the failures
to be statistically independent events. Without this hypothesis, the fact that some
blocks belong to several paths from S to D makes the reliability computation
infeasible. At each step of the scheduling heuristics, we compute the RBD of the
partial schedule obtained so far, then we compute the reliability based on this
RBD, and finally we compute the GSFR of the partial schedule with Eq (6).
Finally, computing the reliability of an RBD with replications is, in general,
exponential in the size of the schedule. To avoid this problem, we insert routing
operations so that the RBD of any partial schedule is
always serial-parallel (i.e., a sequence of parallel macro-
blocks), hence making the GSFR computation linear [6].
The idea is that, for each data dependency X ? Y such
that is has been decided to replicate X k times and Y
? times, a routing operation R will collect all the data
sent by the k replicas of X and send it to the ? replicas
of Y (see Figure 5).
Decreasing the GSFR
R
Y?
X1
...
Xk
Y1
...
Fig.5: A routing oper-
ation.
hal-00655478, version 1 - 29 Dec 2011
Page 9
4.3
To obtain the Pareto front in the space (length,GSFR,power), we pre-define a
virtual grid in the objective plane (GSFR,power), and for each cell of the grid
we solve one different single-objective problem constrained to this cell, by using
scheduling heuristics TSH presented below.
TSH is a greedy list scheduling heuristic. It takes as input an algorithm
graph Alg, a homogeneous architecture graph Arc, the function Exe giving the
WCETs and WCCTs, and two constraints Λobjand Pobj. It produces as output
a static multiprocessor schedule S of Alg onto Arc, such that the GSFR of S is
smaller than Λobj, the power consumption is smaller than Pobj, and such that its
length is as small as possible. TSH uses active replication of operations to meet
the Λobj constraint, dynamic voltage scaling to meet the Pobj constraint, and
the power-efficient schedule pressure as a cost function to minimize the schedule
length.
TSH works with two lists of operations of Alg: the candidate operations O(n)
and the already scheduled operations O(n)
current iteration of the scheduling algorithm. One operation is scheduled at each
iteration. Initially, O(0)
of Alg. At any iteration (n), all the operations in O(n)
predecessors are in O(n)
The power-efficient schedule pressure is a variant of the schedule pressure
cost function [8], which tries to minimize the length of the critical path of the
algorithm graph by exploiting the scheduling margin of each operation. The
schedule pressure σ is computed for each operation oi, and each processor pjas:
Scheduling heuristics
cand
sched. The superscript (n) denotes the
schedis empty while O(0)
candcontains the input operations
candare such that all their
sched.
σ(n)(oi,pj) = ETS(n)(oi,pj) + LTE(n)(oi) − CPL(n−1)
where CPL(n−1)is the critical path length of the partial schedule composed of
the already scheduled operations, ETS(n)(oi,pj) is the earliest time at which
the operation oican start its execution on the processor pj, and LTE(n)(oi) is
the latest start time from end of oi, defined to be the length of the longest path
from oito Alg’s output operations; this path contains the “future” operations
of oi. When computing LTE(n)(oi), since the future operations of oi are not
scheduled yet, we do not know their actual voltage, and therefore neither what
their execution time will be (this will only be known when these future operations
will be actually scheduled). Hence, for each future operation, we compute its
average WCET for all existing supply voltages.
First, we generalize the schedule pressure σ to a set of processors:
(9)
σ(n)(oi,Pk) = ETS(n)(oi,Pk) + LTE(n)(oi) − CPL(n−1)
where ETS(n)(oi,Pk)=maxpj∈PkETS(n)(oi,pj).
Then, we consider the schedule length as a criterion to be minimized, and the
GSFR and the power as two constraints to be met: for each candidate operation
oi∈ O(n)
(10)
cand, we compute the best subset of pairs ?processor,voltage? to execute
hal-00655478, version 1 - 29 Dec 2011
Page 10
oiwith the power-efficient schedule pressure of Eq (11): Q(n)
?
where Q is the set of all subsets of pairs ?p,v? such that p ∈ P and v ∈ V (see
Section 4.1), and Λ(n)(oi,Qk) (resp. P(n)(oi,Qk)) is the GSFR (resp. the power
consumption) of the partial schedule after replicating and scheduling oion all
the processors of Qkwith their respective specified voltages. When computing
Λ(n)(oi,Qk), the failure rate of each processor is computed by Eq (7) according
to its voltage in Qk. Finally, P(n)(oi,Qk) is computed by Eq (8).
To guarantee that the constraint Λ(n)(oi,Qk)≤Λobj is met, the subset Qk
is selected such that the GSFR of the parallel macro-block that contains the
replicas of oion the processors of Qkis less than Λobj. If this last macro-block
B is such that Λ(B)≤Λobj and if Λ(n−1)≤Λobj, then Λ(n)≤Λobj (thanks to the
invariance property of the GSFR).
Similarly, the subset Qk is selected such that the power constraint
P(n)(oi,Qk)≤Pobj is met. There can exist several valid possibilities for the
subset Qk(valid in the sense that the power constraint is met). However, some
of them may lead to the impossibility of finding a valid schedule for the next
scheduled operation, during step n+1. In particular, this is the case when
the next scheduled operation does not increase the schedule length, because
it fits in a slack of the previous schedule: L(n+1)=L(n). At the same time,
the total energy increases strictly because of the newly scheduled operation:
E(n+1)>E(n). By hypothesis, we have P(n)=E(n)/L(n)≤Pobj, but it follows
that P(n+1)=E(n+1)/L(n+1)=E(n+1)/L(n)>E(n)/L(n)=P(n), so even though
P(n)≤Pobj, it may very well be the case that P(n+1)>Pobj. To prevent this and
guarantee the invariance property of P, we over-estimate the power consump-
tion, by computing the consumed energy as if all the ending slacks were “filled”
by an operation executed at Pmax. Pmax is the computed power under the
highest frequency fmsuch that Pind+Pmax= Pind+CefV2f ≤ Pobj/N, where
N is the processors number. If the consumed power with fmexceeds Pobj, then
the next highest operating frequency f ≤ fmis selected, and so on. Thanks to
this over-estimation, even if
the next scheduled operation
fits in a slack and does not in-
crease the length, we are sure
that it will not increase the
power-consumption either.
This is illustrated in Figure 6.
For lack of space, we do not
study in this paper the impact
of this over-estimation on the
total schedule length.
Once we have computed, for each candidate operation oiof O(n)
subset of pairs ?processor,voltage? to execute oi, with the power-efficient sched-
best(oi) = Qjs.t.:
σ(n)(oi,Qj)= min
Qk∈Q
σ(n)(oi,Qk)|Λ(n)(oi,Qk)≤Λobj∧P(n)(oi,Qk)≤Pobj
?
(11)
over-estimation
L
0
Pind
Pind
Pind
Pind
Pind
(V2,f2)
(Cef,Pind)
?
L12
P2
(Cef,Pind)
P1
Pd
Pd
Pd
X ? Y
time
Pmax
Y
(V3,f3)
X
(V1,f1)
Z
Fig.6: Over-estimation of the energy consumption.
cand, the best
hal-00655478, version 1 - 29 Dec 2011
Page 11
ule pressure of Eq (11), we compute the most urgent of these operations by:
ourg= oi∈ O(n)
cands.t. σ(n)?oi,Q(n)
best(oi)?=max
oj∈O(n)
cand
?
σ(n)?oj,Q(n)
best(oj)??
(12)
Finally, we schedule this most urgent operation ourg on the processors of
Q(n)
scheduled and candidate operations: O(n)
O(n)
5 Simulation results
We perform two kinds of simulations. Firstly, Figure 7 shows the Pareto fronts
produced by TSH for a randomly generated Alg graph of 30 operations, and a
fully connected and homogeneous Arc graph of respectively 3 and 4 processors;
we have used the same random graph generator as in [6]. The nominal failure
rate per time unit of all the processors is λp= 10−5; the nominal failure rate per
time unit of all the links is λ?= 5.10−4; these values are reasonnable for modern
fail-silent processors [1]; the set of supply voltages is V = {0.25,0.50,0.75,1.0}
(scaling factor).
best(oj), and we finish the current iteration (n) by updating the lists of
sched:= O(n−1)
cand− {ourg} ∪ {t?∈ succ(ourg) | pred(t?) ⊆ O(n)
sched∪ {ourg} and O(n+1)
sched}.
cand
:=
GSFR
GSFR
Fig.7. Pareto front generated for a random graph of 30 operations on 3 processors
(left) or 4 processors (right).
The virtual grid of the Pareto front is defined such that both high and small
values of Pobj and Λobj are covered within a reasonable grid size. Hence, the
decreasing values of Pobjand Λobj, starting with +∞ and +∞, are selected from
two sets of values: Λobj∈{α.10−β} where α ∈ {4,8} and β ∈ {1,2,...20}, and
Pobj∈{0.8,0.6,0.4,0.2}. TSH being a heuristics, changing the parameters of this
grid could change locally some points of the Pareto front, but not its overall
shape.
The two figures connect the set of non-dominated Pareto optima (the surface
obtained in this way is only depicted for a better visual understanding; by no
means do we assume that points interpolated in this way are themselves Pareto
optima, only the computed dots are). The figures show an increase of the sched-
ule length for points with decreasing power consumptions and/or failure rates.
hal-00655478, version 1 - 29 Dec 2011
Page 12
The “cuts” observed at the top and the left of the plots are due to low power
constraints and/or low failure rates constraints.
Figure 7 exposes to the designer a choice of several tradeoffs between the
execution time, the power consumption, and the reliability level. For instance
in Figure 7 (right), we see that, to obtain a GSFR of 10−10with a power con-
sumption of 1.5 V , then we must accept a schedule three times longer than if we
impose no constraint on the GSFR nor the power. We also see that, by provid-
ing a 4 processor architecture, we can obtain schedules with a shorter execution
length even though we impose identical constraints to the GSFR and the power.
Fig.8. Average schedule length in function of the power (left) or the GSFR (right).
Secondly, Figure 8 shows how the schedule length varies, respectively in func-
tion of the required power consumption (left) or of the required GSFR (right).
Both curves are averaged over 30 randomly generated Alg graphs. We can see
that the average schedule length increases when the constraint Pobjon the power
consumption decreases. This was expected since the two criteria, schedule length
and power consumption, are antagonistic. Similarly, the average schedule length
increases when the constraint Λobjon the GSFR decreases. Again, the two cri-
teria, schedule length and GSFR, are antagonistic.
6
Many solutions exist in the literature to optimize the schedule length and the
energy consumption (e.g., [13]), or to optimize the schedule length and the re-
liability (e.g., [4,7,2]), but very few tackle the problem of optimizing the three
criteria (length,reliability,energy). The closest to our work are [19,14].
Zhu et al. have studied the impact of the supply voltage on the failure
rate [19], in a passive redundancy framework (primary backup approach). They
use DVFS to lower the energy consumption and they study the tradeoff between
the energy consumption and the performability (defined as the probability of
finishing the application correctly within its deadline in the presence of faults).
A lower frequency implies a higher execution time and therefore less slack time
for scheduling backup replicas, meaning a lower performability. However, their
input problem is not a multiprocessor scheduling one since they study the sys-
Related work
hal-00655478, version 1 - 29 Dec 2011
Page 13
tem as a single monolithic operation executed on a single processor. Thanks to
this simpler setting, they are able to provide an analytical solution based on the
probability of failure, the WCET, the voltage, and the frequency.
Pop et al. have addressed the (length,reliability,energy) tricriteria optimiza-
tion problem on an heterogeneous architecture [14]. Both length and reliability
are taken as a constraint. These two criteria are not invariant measures, and
we have demonstrated in Section 2 that such a method cannot always guarantee
that the constraints are met. Indeed, their experimental results show that the re-
liability decreases with the number of processors, therefore making it impossible
to meet an arbitrary reliability constraint. Secondly, they assume that the user
will specify the number of processor failures to be tolerated in order to satisfy
the desired reliability constraint. Thirdly, they assume that all the communica-
tions take place through a reliable bus. For these three reasons, it is not possible
to compare TSH with their method.
7Conclusion
We have presented a new off-line tricriteria scheduling heuristics, called TSH,
to minimize the schedule length, its global system failure rate (GSFR), and its
power consumption. TSH uses the active replication of the operations and the
data-dependencies to increase the reliability, and uses dynamic voltage and fre-
quency scaling to lower the power consumption. Both the power and the GSFR
are taken as constraints, so TSH attempts to minimize the schedule length while
satisfying these constraints. By running TSH with several values of these con-
straints, we are able to produce a set of non-dominated Pareto solutions, which is
a surface in the 3D space (length,GSFR,power). This surface exposes the exist-
ing tradeoffs between the three antagonistic criteria, allowing the user to choose
the solution that best meets his/her application needs. TSH is an extension of
our previous bicriteria (length,reliability) heuristics BSH [6]. The tricriteria ex-
tension is necessary because of the crucial impact of the voltage on the failure
probability.
To the best of our knowledge, this is the first reported method that allows
the user to produce the Pareto front in the 3D space (length,GSFR,power). This
advance comes at the price of several assumptions: the architecture is assumed
to be homogeneous and fully connected, the processors are assumed to be fail-
silent and their failures are assumed to be statistically independent, the power
switching time is neglected, and the failure model is assumed to the exponential.
References
1. M. Baleani, A. Ferrari, L. Mangeruca, M. Peri, S. Pezzini, and A. Sangiovanni-
Vincentelli. Fault-tolerant platforms for automotive safety-critical applications. In
International Conference on Compilers, Architectures and Synthesis for Embedded
Systems, CASES’03, San Jose (CA), USA, November 2003. ACM, New-York.
2. A. Benoit, F. Dufoss´ e, A. Girault, and Y. Robert. Reliability and performance op-
timization of pipelined real-time systems. In International Conference on Parallel
Processing, ICPP’10, San Diego (CA), USA, September 2010.
3. T.D. Burd and R.W. Brodersen. Energy efficient CMOS micro-processor design.
In Hawaii International Conference on System Sciences, HICSS’95, Honolulu (HI),
USA, 1995. IEEE, Los Alamitos.
hal-00655478, version 1 - 29 Dec 2011
Page 14
4. A. Dogan and F.¨Ozg¨ uner. Matching and scheduling algorithms for minimizing
execution time and failure probability of applications in heterogeneous computing.
IEEE Trans. Parallel and Distributed Systems, 13(3):308–323, March 2002.
5. E. Elnozahy, M. Kistler, and R. Rajamony. Energy-efficient server clusters. In
Workshop on Power-Aware Computing Systems, WPACS’02, pages 179–196, Cam-
bridge (MA), USA, February 2002.
6. A. Girault and H. Kalla.A novel bicriteria scheduling heuristics providing a
guaranteed global system failure rate. IEEE Trans. Dependable Secure Comput.,
6(4):241–254, December 2009.
7. A. Girault, E. Saule, and D. Trystram. Reliability versus performance for critical
applications. J. of Parallel and Distributed Computing, 69(3):326–336, March 2009.
8. T. Grandpierre, C. Lavarenne, and Y. Sorel. Optimized rapid prototyping for
real-time embedded heterogeneous multiprocessors. In International Workshop on
Hardware/Software Co-Design, CODES’99, Rome, Italy, May 1999. ACM, New-
York.
9. J.C. Knight and N.G. Leveson. An experimental evaluation of the assumption
of independence in multi-version programming.
12(1):96–109, 1986.
10. J.Y-T. Leung, editor. Handbook of Scheduling. Algorithms: Models, and Perfor-
mance Analysis. Chapman & Hall/CRC Press, 2004.
11. D. Lloyd and M. Lipow. Reliability: Management, Methods, and Mathematics,
chapter 9. Prentice-Hall, 1962.
12. R. Melhem, D. Moss´ e, and E.N. Elnozahy. The interplay of power management and
fault recovery in real-time systems. IEEE Trans. Comput., 53(2):217–231, 2004.
13. T. Pering, T.D. Burd, and R.W. Brodersen. The simulation and evaluation of
dynamic voltage scaling algorithms. In International Symposium on Low Power
Electronics and Design, ISLPED’98, pages 76–81, Monterey (CA), USA, August
1998. ACM, New-York.
14. P. Pop, K. Poulsen, and V. Izosimov.
ergy/reliability trade-offs in fault-tolerant time-triggered embedded systems. In
International Conference on Hardware-Software Codesign and System Synthesis,
CODES+ISSS’07, Salzburg, Austria, October 2007. ACM, New-York.
15. S.M. Shatz and J.-P. Wang. Models and algorithms for reliability-oriented task-
allocation in redundant distributed-computer systems. IEEE Trans. Reliability,
38(1):16–26, April 1989.
16. J. Souyris, E.L. Pavec, G. Himbert, V. J´ egu, G. Borios, and R. Heckmann. Com-
puting the worst case execution time of an avionics program by abstract inter-
pretation. In International Workshop on Worst-case Execution Time, WCET’05,
pages 21–24, Mallorca, Spain, July 2005.
17. V. T’kindt and J.-C. Billaut. Multicriteria Scheduling: Theory, Models and Algo-
rithms. Springer-Verlag, 2006.
18. R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley,
G. Bernat, C. Ferdinand, R. Heckmann, F. Mueller, I. Puaut, P. Puschner,
J. Staschulat, and P. Stenstr¨ om. The determination of worst-case execution times
— overview of the methods and survey of tools. ACM Trans. Embedd. Comput.
Syst., 7(3), April 2008.
19. D. Zhu, R. Melhem, and D. Moss´ e. The effects of energy management on reliability
in real-time embedded systems. In International Conference on Computer Aided
Design, ICCAD’04, pages 35–40, San Jose (CA), USA, November 2004.
IEEE Trans. Software Engin.,
Scheduling and voltage scaling for en-
hal-00655478, version 1 - 29 Dec 2011
Download full-text