# Tradeoff Exploration between Reliability, Power Consumption, and Execution Time.

**ABSTRACT** For autonomous critical real-time embedded (e.g., satellite), guaranteeing a very high level of reliability is as important as keeping the power consumption as low as possible. We propose an off-line scheduling heuristic which, from a given software application graph and a given multiprocessor architecture (homogeneous and fully connected), produces a static multiprocessor schedule that optimizes three criteria: its length (crucial for real-time systems), its reliability (crucial for dependable systems), and its power consumption (crucial for autonomous systems). Our tricriteria scheduling heuristic, called TSH, uses the active replication of the operations and the data-dependencies to increase the reliability and uses dynamic voltage and frequency scaling to lower the power consumption. We demonstrate the soundness of TSH. We also provide extensive simulation results to show how TSH behaves in practice: first, we run TSH on a single instance to provide the whole Pareto front in 3D; second, we compare TSH versus the ECS heuristic (Energy-Conscious Scheduling) from the literature; and third, we compare TSH versus an optimal Mixed Linear Integer Program.

**0**Bookmarks

**·**

**47**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**We consider a task graph mapped on a set of homogeneous processors. We aim at minimizing the energy consumption while enforcing two constraints: a prescribed bound on the execution time (or makespan), and a reliability threshold. Dynamic voltage and frequency scaling (DVFS) is an approach frequently used to reduce the energy consumption of a schedule, but slowing down the execution of a task to save energy is decreasing the reliability of the execution. In this work, to improve the reliability of a schedule while reducing the energy consumption, we allow for the re-execution of some tasks. We assess the complexity of the tri-criteria scheduling problem (makespan, reliability, energy) of deciding which task to re-execute, and at which speed each execution of a task should be done, with two different speed models: either processors can have arbitrary speeds (continuous model), or a processor can run at a finite number of different speeds and change its speed during a computation (VDD model). We propose several novel tri-criteria scheduling heuristics under the continuous speed model, and we evaluate them through a set of simulations. The two best heuristics turn out to be very efficient and complementary.11/2011;

Page 1

Tradeoff exploration between reliability, power

consumption, and execution time

Ismail Assayad1, Alain Girault2, and Hamoudi Kalla3

1ENSEM (RTSE team), University Hassan II of Casablanca, Morroco.

2INRIA and Grenoble University (POP ART team and LIG lab), France.

3University of Batna (SECOS team), Algeria.

Abstract. We propose an off-line scheduling heuristics which, from a

given software application graph and a given multiprocessor architec-

ture (homogeneous and fully connected), produces a static multiproces-

sor schedule that optimizes three criteria: its length (crucial for real-time

systems), its reliability (crucial for dependable systems), and its power

consumption (crucial for autonomous systems). Our tricriteria schedul-

ing heuristics, TSH, uses the active replication of the operations and the

data-dependencies to increase the reliability, and uses dynamic voltage

and frequency scaling to lower the power consumption.

1Introduction

For autonomous critical real-time embedded systems (e.g., satellite), guaran-

teeing a very high level of reliability is as important as keeping the power con-

sumption as low as possible. We present an off-line scheduling heuristics that,

from a given software application graph and a given multiprocessor architecture,

produces a static multiprocessor schedule that optimizes three criteria: its length

(crucial for real-time systems), its reliability (crucial for dependable systems),

and its power consumption (crucial for autonomous systems). We target homo-

geneous distributed architecture, such as multicore processors. Our tricriteria

scheduling heuristics uses the active replication of the operations and the data-

dependencies to increase the reliability, and uses dynamic voltage and frequency

scaling (DVFS) to lower the power consumption. However, DVFS has an impact

of the failure rate of processors, because lower voltage leads to smaller critical

energy, hence the system becomes sensitive to lower energy particles. As a re-

sult, the failure probability increases. The two criteria length and reliability are

thus antagonistic with each other and with the

schedule length, which makes this problem all

the more difficult.

Let us address the issues raised by multicrite-

ria optimization. Figure 1 illustrates the partic-

ular case of two criteria to be minimized. Each

point x1to x7represents a solution, that is, a

different tradeoff between the Z1and Z2crite-

ria: the points x1, x2, x3, x4, and x5are Pareto

optima [17]; the points x2, x3, and x4are strong

optima while the points x1and x5are weak op-

Z2(x6)

x4

x6

x7

Z1(x6)

x5

Second criterion Z2

First

criterion

Z1

x3

x1

x2

Fig.1: Pareto front for a bicrite-

ria minimization problem.

hal-00655478, version 1 - 29 Dec 2011

Author manuscript, published in "SAFECOMP 6894 (2012) 437--451"

Page 2

tima. The set of all Pareto optima is called the Pareto front.

It is fundamental to understand that no single solution among the points

x2, x3, and x4(the strong Pareto optima) can be said, a priori, to be the best

one. Indeed, those three solutions are non-comparable, so choosing among them

can only be done by the user, depending on the precise requirements of his/her

application. This is why we advocate producing, for a given problem instance,

the Pareto front rather than a single solution. Since we have three criteria, it

will be a surface in the 3D space (length,reliability,power).

The main contribution of this paper is TSH, the first tricriteria scheduling

heuristics able to produce a Pareto front in the space (length,reliability,power),

and taking into account the impact of voltage on the failure probability. Thanks

to the use of active replication, TSH is able to provide any required level of relia-

bility. TSH is an extension of our previous bicriteria (length,reliability) heuristics

called BSH [6]. The tricriteria extension presented in this paper is necessary be-

cause of the crucial impact of the voltage on the failure probability.

2Principle of the method and overview

To produce a Pareto front, the usual method involves transforming all the

criteria except one into constraints, and then minimizing the last remaining

criterion iteratively [17]. Figure 2 illustrates the particular case of two crite-

ria Z1and Z2. To obtain the Pareto front, Z1is

transformed into a constraint, with its first value

set to K1

ing Z2under the constraint Z1<+∞, which pro-

duces the Pareto point x1. For the second run,

the constraint is set to the value of x1, that is

K2

constraint Z1<K2

point x2, and so on. Another way is to slice the in-

terval [0,+∞) into a finite number of contiguous

sub-intervals of the form [Ki

1

].

The application algorithm graphs we are dealing with are large (tens to

hundreds of operations, each operation being a software block), thereby making

infeasible exact scheduling methods, or even approximated methods with back-

tracking, such as branch-and-bound. We therefore have to use list scheduling

heuristics, which have demonstrated their good performances in the past [10].

We propose in this paper a suitable list scheduling heuristics, adapted from [6].

Using list scheduling to minimize a criterion Z2 under the constraint that

another criterion Z1remains below some threshold K1(as in Figure 2), requires

that Z1be an invariant measure, not a varying one. For instance, the energy is

a strictly increasing function of the schedule: if S?is a prefix schedule of S, then

the energy consumed by S is strictly greater than the energy consumed by S?.

Hence, the energy is not an invariant measure. As a consequence, if we attempt

to use the energy as a constraint (i.e., Z1=E) and the schedule length as a

criteria to be minimized (i.e., Z2=L), then we are bound to fail. Indeed, the fact

that all the scheduling decisions made at the stage of any intermediary schedule

1=+∞. The first run involves minimiz-

1=Z1(x1): we therefore minimize Z2under the

1, which produces the Pareto

1,Ki+1

Z1

K1

x4

x2

x3

x5

K4

1

K3

1

Z2

x1

K2

11= +∞

Fig.2: Transformation method

to produce the Pareto front.

hal-00655478, version 1 - 29 Dec 2011

Page 3

S?meet the constraint E(S?)<K1 cannot guarantee that the final schedule S

will meet the constraint E(S)<K1. In contrast, the power consumption is an

invariant measure (being the energy divided by the time), and this is why we

take the power consumption as a criterion instead of the energy consumption

(see Section 3.5).

The reliability too is not an invariant measure: it is neither an increasing

nor a decreasing function of the schedule. So the same reasoning applies if the

reliability is taken as a constraint. This is why we take instead, as a criterion,

the global system failure rate per time unit (GSFR), first defined in [6]. By

construction, the GSFR is an invariant measure of the schedule’s reliability (see

Section 3.7).

For these reasons, each run of our tricriteria scheduling heuristics TSH mini-

mizes the schedule length under the double constraint that the power consump-

tion and the GSFR remain below some thresholds, respectively Pobjand Λobj. By

running TSH with decreasing values of Pobjand Λobj, starting with +∞ and +∞,

we are able to produce the Pareto front in the 3D space (length,GSFR,power).

This Pareto front shows the existing tradeoffs between the three criteria, allow-

ing the user to choose the solution that best meets his/her application needs.

Finally, our method for producing a Pareto front could work with any other

scheduling heuristics minimizing the schedule length under the constraints of

both the reliability and the power.

3

3.1

Most embedded real-time systems are reactive, and therefore consist of some

algorithm executed periodically, triggered by a periodic execution clock. Our

model is therefore that of an application algorithm graph Alg which is repeated

infinitely. Alg is an acyclic oriented graph (X,D) (See Figure 3(a)). Its nodes

(the set X) are software blocks called operations. Each arc of Alg (the set D) is a

data-dependency between two operations. If X ?Y is a data-dependency, then X

is a predecessor of Y , while Y is a successor of X. Operations with no predecessor

(resp. successor) are called input operations (resp. output). Operations do not

have any side effect, except for input/output operations: an input operation

(resp. output) is a call to a sensor driver (resp. actuator).

Models

Application algorithm graph

I1

I2G

O1

I3

C

A

F

B

D

E

O2

L24

P4P3

P1P2

L34

(b)

L12

L14L23

L13

(a)

Fig.3. (a) An example of algorithm graph Alg: I1, I2, and I3 are input operations,

O1 and O2 are output operations, A-G are regular operations; (b) An example of an

architecture graph Arc with four processors, P1 to P4, and six communication links.

hal-00655478, version 1 - 29 Dec 2011

Page 4

The Alg graph is acyclic but it is infinitely repeated in order to take into

account the reactivity of the modeled system, that is, its reaction to external

stimuli produced by its environment.

3.2

We assume that the architecture is an homogeneous and fully connected multi-

processor one. It is represented by an architecture graph Arc, which is a non-

oriented bipartite graph (P,L,A) whose set of nodes is P ∪L and whose set

of edges is A (see Figure 3(b)). P is the set of processors and L is the set

of communication links. A processor is composed of a computing unit, to exe-

cute operations, and one or more communication units, to send or receive data

to/from communication links. A point-to-point communication link is composed

of a sequential memory that allows it to transmit data from one processor to

another. Each edge of Arc (the set A) always connects one processor and one

communication link. Here we assume that the Arc graph is complete.

3.3 Execution characteristics

Along with the algorithm graph Alg and the architecture graph Arc, we are also

given a function Exe : (X ×P)∪(D ×L) ?→ R+giving the worst-case execution

time (WCET) of each operation onto each processor and the worst-case com-

munication time (WCCT) of each data-dependency onto each communication

link. An intra-processor communication takes no time to execute. Since the ar-

chitecture is homogeneous, the WCET of a given operation is identical on all

processors (similarly for the WCCT of a given data-dependency).

The WCET analysis is the topic of much work [18]. Knowing the execution

characteristics is not a critical assumption since WCET analysis has been applied

with success to real-life processors actually used in embedded systems, with

branch prediction, caches, and pipelines. In particular, it has been applied to

one of the most critical embedded system that exists, the Airbus A380 avionics

software [16].

Architecture model

3.4

The graphs Alg and Arc are the specification of the system. Its implementation

involves finding a multiprocessor schedule of Alg onto Arc. It consists of two

functions: the spatial allocation function Ω gives, for each operation of Alg (resp.

for each data-dependency), the subset of processors of Arc (resp. the subset of

communication links) that will execute it; and the temporal allocation function

Θ gives the starting date of each operation (resp. each data-dependency) on its

processor (resp. its communication link): Ω : X ?→ 2Pand Θ : X × P ?→ R+.

In this work we only deal with static schedules, for which the function Θ

is static, and our schedules are computed off-line; i.e., the start time of each

operation (resp. each data-dependency) on its processor (resp. its communication

link) is statically known. A static schedule is without replication if for each

operation X (and for each data-dependency), |Ω(X)|=1. In contrast, a schedule

is with (active) replication if for some operation X (or some data-dependency),

|Ω(X)|≥2. The number |Ω(X)| is called the replication factor of X. A schedule is

Static schedules

hal-00655478, version 1 - 29 Dec 2011

Page 5

partial if not all the operations of Alg have been scheduled, but all the operations

that are scheduled are such that all their predecessors are also scheduled. Finally,

the length of a schedule is the max of the termination times of the last operation

scheduled on each of the processors of Arc. For a schedule S, we note it L(S).

3.5Voltage, frequency, and power consumption

The maximum supply voltage is noted Vmaxand the corresponding highest op-

erating frequency is noted fmax. For each operation, its WCET assumes that the

processor operates at fmaxand Vmax(and similarly for the WCCT of the data-

dependencies). Because the circuit delay is almost linearly related to 1/V [3],

there is a linear relationship between the supply voltage V and the operating

frequency f. From now on, we will assume that the operating frequencies are

normalized, that is, fmax=1 and any other frequency f is in the interval (0,1).

Accordingly, the execution time of the operation or data-dependency X placed

onto the hardware component C, be it a processor or a communication link,

which is running at frequency f (taken as a scaling factor) is:

Exe(X,C,f) = Exe(X,C)/f

(1)

The power consumption P of a single operation placed on a single processor

is computed according to the classical model of Zhu et al. [19]:

P = Ps+ h(Pind+ Pd)

Pd= CefV2f

(2)

where Psis the static power (power to maintain basic circuits and to keep the

clock running), h is equal to 1 when the circuit is active and 0 when it is inac-

tive, Pindis the frequency independent active power (the power portion that is

independent of the voltage and the frequency; it becomes 0 when the system is

put to sleep, but the cost of doing so is very expensive [5]), Pdis the frequency

dependent active power (the processor dynamic power and any power that de-

pends on the voltage or the frequency), Cef is the switch capacitance, V is the

supply voltage, and f is the operating frequency. Cefis assumed to be constant

for all operations, which is a simplifying assumption, since one would normally

need to take into account the actual switching activity of each operation to com-

pute accurately the consummed energy. However, such an accurate computation

is infeasible for the application sizes we consider here.

For a multiprocessor schedule S, we cannot apply directly Eq (2). Instead,

we must compute the total energy E(S) consumed by S, and then divide by the

schedule length L(S):

P(S) = E(S)/L(S) (3)

We compute E(S) by summing the contribution of each processor, depending

on the voltage and frequency of each operation placed onto it. On the proces-

sor pi, the energy consumed by each operation is the product of the active power

Pi

oj∈pi

ind+Pi

dby its execution time. As a conclusion, the total consumed energy is:

|P|

?

E(S) =

i=1

?

(Pi

ind+ Pi

d).Exe(oj,pi)

(4)

hal-00655478, version 1 - 29 Dec 2011