Off-Line Real-Time Fault-Tolerant Scheduling
ABSTRACT We address the problem of off-line fault tolerant scheduling of an algorithm onto a multiprocessor architecture with distributed memory and provide a generic algorithm which solves this problem. We take into account two kinds of failures: fail-silent and omission. The basic technique we use is the replication of operations and data communications. We then discuss the principles which govern the execution of schedulings with replication under the state-machine and the primary/backup arbitrations between replicas. We also show how to compute the execution date for each operation and the timeouts which are used for detecting failures. We end with a heuristic which, using this calculus, computes a possibly non optimal scheduling by finding plain schedulings for each failure pattern and then combining them into a scheduling with replication.
-
Citations (0)
-
Cited In (0)
Page 1
Off-line real-time fault-tolerant scheduling
?
C˘ at˘ alin Dima, Alain Girault,
INRIA Rhˆ one-Alpes, ZIRST - 655 Av. de l’Europe,
38330 Montbonnot St. Martin, France
Christophe Lavarenne, Yves Sorel,
INRIA Rocquencourt, Domaine de Voluceau,
B.P.105 - 78153, Le Chesnay Cedex, France
Abstract
We addressthe problem of off-line fault tolerant schedul-
ing of an algorithm onto a multiprocessor architecture with
distributed memory and provide a generic algorithm which
solves this problem. We take into account two kinds of fail-
ures: fail-silent and omission. The basic technique we use
is the replication of operations and data communications.
We then discuss the principles which govern the execution
of schedulings with replication under the state-machine and
the primary/backup arbitrations between replicas. We also
show how to compute the execution date for each operation
and the timeouts which are used for detecting failures. We
end with a heuristic which, using this calculus, computes a
possibly non optimal scheduling by finding plain schedul-
ings for each failure pattern and then combining them into
a scheduling with replication.
Keywords: Fault-tolerance, Embedded distributed systems,
Scheduling, Dependable systems.
1Introduction
Embedded systems are almost always associated to hard
real-time constraints, i.e., deadlines whose missing may
produce irrecoverable damage to the system. Moreover,
such systems are often implemented on distributed archi-
tecturesfor reasonsof performanceincrease, fault-tolerance
or topological distribution. One of the main problems when
programming such systems is the scheduling of the tasks
onto the distributed target architecture such that the dead-
lines are always met. The two classical options are off-line
and on-line scheduling. The off-line technique assures bet-
ter real-time properties than the on-line technique, i.e., the
possibility to meet tighter real-time constraints. In contrast,
the on-line technique is more resilient and does not presup-
pose complete determinacy of the behavior of the system.
?This works has been funded by the INRIA TOL` ERE research action.
Published in Euromicro Workshop on Parallel and Distributed Processing,
Mantova, Italy, February 2001.
For embedded systems complete determinacy of the behav-
ior is desirable so off-line scheduling is often preferred to
on-line scheduling.
The problem changes dramatically when failures have
to be taken into consideration. Since failures cannot, by
their nature, be predicted, the very basic assumption for
off-line scheduling, the determinism, is demolished and it
seemsthat thistechnique shouldleavespace totheother[4].
But, as some other studies have shown this is not the
case [2, 5, 1]: a certain degree of nondeterminism can be
permitted at the scheduling time within the system. How-
ever these studies have focused on processor failures, as-
suming restricted architecture graphs with reliable chan-
nels [2, 5] or independent tasks [1]. On the other hand,
the studies which focus on channel failures tend to consider
only on-line scheduling since they are naturally connected
to communication protocols [8, 9, 11].
We investigate here the problem of off-line fault-tolerant
scheduling, where both processors and channels may get
faulty and no assumption is made on the topology of the ar-
chitectures. We provide a heuristic based algorithm which
solves this problem. Concretely, our algorithm takes as in-
put a specification of the algorithm to be distributed
specification of the target architecture
failures
time constraints
duration of the algorithm onto the architecture
as output a schedule of onto
to the failures of. It also indicates thanks to
not this schedule satisfies.
We arenotinterestedhereintoanalgorithmthatgivesthe
best fault tolerant scheduling w.r.t. the execution durations.
This problem embodies the problem of real-time schedul-
ing, which is a well-known NP-complete problem [3], and
therefore our problem is NP-complete too. Rather we pro-
vide a heuristic that gives one scheduling, possibly not the
best. This scheduling is then checked for meeting the given
real-time constraints. In the eventuality of a negative an-
swer, the user can modify the placement constraints or even
add more hardware and start the heuristic again with the
modified problem instance. The whole process ends when
?????, a
?????, a list of pattern
?????, some placement constraints
?????, some real-
, and tolerant
whether or
?????, and information about the execution
, satisfying
?????. It gives
???
??
?
1
Page 2
the given real-time constraints are met.
The contributions of this paper are:
?
The statement of the off-line fault tolerant scheduling
problemfor a generic class of distributed systemslack-
ing centralized control and for two types of failures:
fail-silent and omission.
?
The statement of the principles that govern the execu-
tion of fault tolerant schedulings on distributed archi-
tectures.
?
An algorithm which solves the stated off-line fault-
tolerant scheduling problem.
The paper runs as follows: we introduce, in Section 2,
the notion of scheduling with replication. We then discuss
the principles that govern the execution of these schedul-
ings in Section 3. The algorithm is presented in Section 4,
illustrated with an example. We end with a short section
containing conclusions and directions of further study.
2Schedulings
We work with distributed systems composed of proces-
sors and channels. We model this by an undirected bipartite
graph
where
represent the processors, is the set of nodes that repre-
sent the channels and
processor-channel connections. We call two channels
adjacent iffthereexistssome processor
i.e.,and
Processors perform different operations and deliver their
results to other processors by means of channels. Each
processor-channel connection is governed by a communi-
cation coprocessor. The connection between the processor
and each of its coprocessors is by means of several buffers
of length 1, with blocking read/write facility, meaning that
a processor cannot put more than one data into the same
buffer1. We also assume that distinct communications use
distinct buffers. Each coprocessor runs some communica-
tion protocol which assures real-time message delivery with
a known upper bound on the duration of the communica-
tion in the absence of competition for getting control of the
channel. Communications on multipoint channels are as-
sumed to be broadcast-like. This feature, combined with a
strict order on the send and receive actions on each co-
processor, is necessary in order to achieve a deterministic
behavior as well as the predictability of the maximal dura-
tion of execution.
The algorithms to be scheduled are represented by di-
rected acyclic graphs, or task dags for short,
The nodesrepresent operations and the edges
?????
??????? ?"!#??
is the set of nodes that
?
!%$&?(')?????*'+?
.
represents the
,
?
,.-
/ connectedtoboth,
?
,
?
/
?102! ?
/
?
,3-
?104!
?+5)?
??6#?"72?.
8
096
1Hence our model of the architecture is a bounded asynchronous one.
?
8;:
?
8=<
?>0?7
represent data dependencies between opera-
tions, i.e., the fact that a specific output of the source oper-
ation
tation. The task dags are to be executed repeatedly on the
architecture graph, that is, their execution must be cyclic,
and the duration of this cycle depends upon the placement
of the operations onto the architecture.
An example of a task dag and an algorithm graph is pre-
sented in Figure 1 (
and
are supposed to flow from left to right. In the architecture
graph, the nodes and
the nodes
8
: is needed by the target operation
8
< for its compu-
@A resp.). In the task dag data
?
:
?"?
<
?B??C??D
are the processors and
?
: and
?
< are the channels.
?
@
?EF
G
H
?
A
?
IKJ
IML
IMN
IPO
G
O
G
J
Figure 1.
graph.
?
@
?
A task dag;
?
A
?
An architecture
The failures we want to tolerate are of two types:
1. fail-silent: once a component is faulty it will never
recoverand it will not provide any output for all inputs
it receives henceforth.
2. omission: a component may fail to produce the output
itshouldhaveproducedwhenreceivingacertaininput,
but a new input may produce the correct output for this
new input, as if the previous input was discarded.
A failure pattern is a pair
. The intuition is that the processors in
the channels in
the architecture graph that has to assure the execution of
the task dag. As fail-silent behaviors are particular cases of
omission behaviors, when a failure pattern is a mix of both,
we consider, for the ease of reasoning, that all the failures
are omission failures.
When studying the fault-tolerance problem one usually
is concerned with tolerating a number
each cycle of theexecution. Thiscan be generalizedby con-
sidering also that only some designated failure patterns may
occur. For example we may consider that for a certain type
of processors a certain ratio of failures is tolerable, different
???
-
?Q?
-
?
with
?
-
$R?
and
?
-
$S??
- and
?
- are faulty and therefore it is the rest of
TVU of faults within
2
Page 3
types having different ratios. It could also happen that some
ofourprocessor nodesare actuatorswhose actioncannot be
replicated, therefore we cannot tolerate any failure of these
actuators. Hence we will take into account families of fail-
ure patterns, i.e., families
and.
The placement constraints and execution durations are
encoded in the following two functions:
W
???YX"?B?ZX[?]\
:_^
X
^a`where
??Xb$c?
?1Xd$e?
?gfih
?j'26lknm%op?rq;sut
on the processor
defines, for each operation
8
and processor
on
/ , the maximal duration of executing
means that
8
/ . The value
s
8 cannot be scheduled
/ .
?(vuh
of transmitting the data produced by
?&'?7Skamwo
defines, for each data dependency
and channel the maximal duration
x
?
?
8;:
?
8=<
?
,
0y?
8;: and needed by
8=< along
, .
We assume that any type of data can be sent onto any
channel. In contrast, therestriction thatsome operationcan-
not be executed onto some processor stands for situations
like the lack of sufficient local resources or for input/output
operations.
A scheduling is a mapping associating to each proces-
sor and channel in the architecture some sequence of op-
erations, resp. data dependencies in the task dag such that
the architecture “behaves like” the task dag. As we want
the scheduling to be tolerant to failures we need to have
replicas of the same operation on several distinct proces-
sors, and similarly for data dependencies. Hence what we
call scheduling is slightly more general than the usual no-
tion of scheduling of an algorithm onto an architecture [6].
Formally, a scheduling is a pair of functions
withand
are the sets of sequences over
the length of the sequence
in this sequence, and
must satisfy the following requirements:
z{?
?[|n?B}~?
|
h
?knm6)}
h
?kam7
where
. We denote
the
6)
and
72
6
, resp.
7
|Y?
/
?
|Y?
/
?,
|Y?
/
?3
?-th element
the empty sequence. A scheduling
1. If
?
8;-
?
8
?Z07
and
|Y?
/
?.
?
such that
?u8 then
with
?
there exists
-
|Y?
/
?.
-
?e8
-
or
?
there exists
such that
,
0r??
/
?
,
?102!
and
i?
}V?
with
,
?
}V?
,
?.
?
?
8;-
?
8
?.
2. If
}?
,
?3
there exists
?
?
?
8:
?
8<
?
then there exists some
/
04?
?
/
?
,
?10!
such that
?
?
|Y?
/
?
such that
|Y?
/
?.
?
?u8:
or
?
there exists another
,
-
0y?
with
?
,
-
?
/
??0?!
and
?
}?
,
?
such that
}?
,3-
?3
?
?
?
8:
?
8=<
?.
3. If
|Y?
/
?.
?
??8 and
|Y?
/
?3
-
??8 then
?
- ; similarly, if
then
}?
,
?3
?
?
?
8;:
?
8=<
?
and
}V?
,
?3
-
?
?
8:
?
8=<
?
?
- .
4. For each
/
0?
and
-c
|Y?
/
?
,
W
|Y?
/
?.
-
??Q|Y?
/
?3
?
\?
, there exists
07
.
5. For each
such that
8
0?6
/
0??
and
|Y?
/
?
|Y?
/
?.
?
?u8 .
The orsaboveareinclusive,i.e., bothconditionsmay oc-
cur. Requirement 2 allows routing of data dependencies and
assume that this takes no time from the “router” processor.
A plain scheduling is a scheduling in which each oper-
ation is scheduled only once. Hence plain schedulings are
the schedulings without failure tolerance.
Finally we are given a real-time constraint
bound on the duration of the scheduling within each cycle.
as an upper
3 How
Scheduling
to“Execute”aFault-Tolerant
We discuss here the principles which govern the exe-
cutions of schedulings. This section might also be seen
as a discussion of the principles of code generation. This
discussion is necessary since our model abstracts from the
existence of coprocessors. This implies that, after a fault-
tolerant scheduling is obtained, the sequence of send or
receive operations on each coprocessor has to be de-
duced from the sequence of communications on each chan-
nel.
The first principle is related to normal executions of a
scheduling, i.e., executions within each cycle in the absence
of faults, while the second principle is related to transitory
executions, i.e., to cycles in which some failure pattern oc-
curs.
The first principle
schedulings (i.e., in the absence of faults) and concerns the
arbitration between multiple replicas of an operation:
Assume operation
needs some data provided by operation
ance is achieved by replication, it will often be the case that
multiple copies of
andsendtheir datato
occur:
governs “normal” executions of
8 is scheduled onto processor
/ and
8
- . As fault toler-
8;- are scheduled on different processors
8 ondifferentchannels. Twoproblems
1. Which copy of
copies arrive at
?
8;-
?
8
?
is to be used by
8 when these
8 on different channels.
2. Which copy of
on the same channel if two or more copies compete for
the channel.
8;- will send the data dependency
?
8;-
?
8
?
In the first case it is natural to assume that the first com-
munication arrived is the one actually used by
being simply discarded upon their arrival. This is a state
machine principle [10] of arbitration between copies.
8 , the others
3
Page 4
In the second case we require that on each channel
there exists at most one copy of the data dependency
That is, the differentcopies of
and only the winning copy will proceed. Classical choices
for the arbitration mechanism are:
,
?
8-
?
8
?.
8- compete for sending
?
8;-
?
8
?
?
The state-machine arbitration [10]: the first operation
completing its execution is the one that wins the arbi-
tration, the others simply discarding their communica-
tion operations.
?
The primary/backup arbitration [7]: the copies of the
same operation are divided into primary and backup.
In a normal execution, it is the primary copy which al-
ways delivers the data. When its failure is detected by
the backup copies, a coherent choice of a new primary
copy is performed using a table of choices computed
beforehand.
In the state-machine case there is nothing to choose at
runtime: the concurrency between the copies assures that
thefirst that completes its executionis the one that sends the
data. In the primary/backup case it is still natural to design
the choice of the primary copy as the one that assures the
minimal latest time of delivery.
Hence the first principle implies that, for each processor
/
X, the generated code for the coprocessor of the connection
contains a send, while the coprocessor of the con-
nection contains a receive. For the state-machine
approach, the sends must be “conditioned by success”
while for the primary-backup approach the send of the pri-
marycopyis “unconditioned”and theothers are “triggered”
by some watchdog timeout.
This principle allows us to compute the starting and
ending execution time (in the absence of faults) for each
scheduling. These are in fact two partial functions
starting time) and (the ending time) whose definition is
presented in the sequel, together with the intuitive explana-
tion:
?
/
X
?
,
?
?
/
?
,
?
(the
1. The domains of both
and
consists of
?
Pairs
on
?
/
?
8
?
where
/
04?
and
8
046
and
is scheduled
/ , i.e.
|Y?
/
?3
?
?u8 for some
?
|Y?
/
?
.
?
Triples
is scheduled on
?
,
?
8
?
8;-
?
where
,
0l??
8
?
8;-
?07
, , i.e.
}?
,
?3
?
?
?
8
?
8;-
?
for some
?
and
}?
,
?
.
?
/
?
8
?
?
/
?
8
?
represent the starting and ending
execution time for the replica of
uled on
starting and ending execution time for the replica of
which is scheduled on
8 which is sched-
represent the
/ , while
?
,
?
8
?
8;-
?
and
?
,
?
8
?
8;-
?
?
8
?
8-
?
, .
2. The replica of
after the operation which precedes
ecuted and only after the reception of at least one
copy of each data dependency
8
executed on
/
is executed only
on
8/
was ex-
?
8;-
?
8
?
needed by
8 :
?
/
?
8
?
??;>M? ¢¡?£
?
,
?
8;-
?
8
?
;,
04? ?_¤
?¥
}?
,
?
s.t.
}V?
,
?.
?
?
8;-
?
8
?.¦
?
8;-
?
8
?§0¨7?¦
?
?
?
/
?B|Y?
/
?3
k
??
1©eªa¦?«
the data dependency
Here, in the innermost set
?
,
?
8;-
?
8
?
¤
,
0y? ?_¤
?
}?
,
?
s.t.
is fixed, only
}?
,
?3
?
?
8;-
?
8
?.¦
?
8;-
?
8
?
, may vary.
3. The replica of
after the communication which precedes
was executed and only after at least one processor con-
nected to
has been transmitted on a channel adjacent to
?
8
?
8;-
?
transmitted on
, is executed only
?
8
?
8;-
?
on
,
/ has ended
8 or after this data dependency
, :
W
,
?.?
8
?
8
-
?
\
?
???;2¬
?
,
?
8
:
?
8
<
?
}?
,
?3
k
?
?
8
:
?
8
<
?_?
©iª
¦
?
# ¡1®¯
?
/
?
8
?
_/
0¨?°?_¤
14
|Y?
/
?
s.t.
|Y?
/
?3
?>8
¦
?
?
,3-
?
8
?
8;-
?
¤
,3-
0r? ?
,3- adjacent to
and
,
?
¤
i?
}V?
,3-
?
s.t.
}?
,3-
?.
?
?
8
?
8;-
?3¦[«a±
4.
?
/
?
8
?
?
?
/
?
8
?2²
f
?
/
?
8
?
?
,
?
8;:
?
8=<
?
?
?
,
?
8:
?
8=<
?²
v
?
,
?
8:
?
8<
?.
Also, we denote
cution of all operations,
x³
8
?
z
?
the largest ending time of exe-
x³
8
?
z
?
?´#¯
µ
?
/
?
8
?
¤
¶4
|Y?
/
?
s.t.
|Y?
/
?.
?>8
t
?
?
,
?
8
?
8;-
?
¤
1r
}V?
,
?
s.t.
}V?
,
?.
?
?
8
?
8;-
?Bt
«
Concerning the computation of the starting and ending
execution times in the primary/backup approach, we note
that an execution without failures in this approach can be
seen as an execution of a plain scheduling by the fact that
each operationis entitled toreceiveall itsdata dependencies
from a single source. Hence, instead of the mixed max-min
calculus above, we would have a plain max-calculus, like in
e.g. [6].
The second principle
of a scheduling, i.e., the cycle in which a failure pattern oc-
curs, and we call it the principle of reconfiguration. Since
we want the system to perform real-time computation and
not to stop upon the occurrence of a failure and run some re-
configuration protocol, we need to settle a consistent recon-
figuration policy for each processor. The only information
that is accessible to each processor at the time of a failure is
the absence of a certain communication, which implies that
either the source processor, or the communication channel,
or both are faulty.
is related to a transitory execution
4
Page 5
We detect failures using the watchdog mechanism: each
communication operation on a coprocessor (including the
send operations!)is guarded by a watchdog which is
armed at the moment when the coprocessor has finished the
previouscommunication. The timeout value each watchdog
is loaded with represents the latest time at which the com-
munication should take place. When the watchdog reaches
its timeout, the coprocessor interrupts its processor and no-
tifies it about the absence of communication.
The processor which is interrupted by a coprocessor due
to a watchdog timeout must perform some reconfigurations
on its scheduling. These reconfigurations are dependent
upon the failure type, i.e., fail-stop or omission:
?
In the fail-stop case, the processor drops the faulty
communication operation from the sequence of oper-
ations to be executed on the coprocessor, since this
communication will never take place any more.
?
In the omission case the processor has nothing to re-
configure since the faulty communication might take
place during a future cycle.
We will also need timeouts for each operation
uled on some processor
the arrival of all its data dependencies. Then, this waiting
is guarded by a watchdog which is loaded with a timeout
representing the latest time
dencies. When the watchdog reaches its timeout, the pro-
cessor is interrupted from its waiting and drops
and starts its waiting for the next operation. In the case of
fail-silent assumption,
scheduling on
on the communications which were triggered after the exe-
cution of
ply that on the channels on which
its data, say there will be no more replicas of
sent by other processors. It is only after the occurrence of
a failure of all replicas of
uration on that channel, and this reconfiguration is assured
by the use of watchdogs for the communication operations,
including the sends.
We will present the timeout computation in the full ver-
sion of this paper. We just mention it is based upon a cal-
culus of maximal execution times, done at the scheduling
time. Also we mention that the timeouts corresponding to
the primary/backup approach can be much larger than the
computed timeouts for the state machine approach, because
the delay when the primary copy and several backup copies
getfaulty is the sumof the delays for each of thecopy,while
in the state machine approach it is simply the max of the de-
lays.
The reconfiguration process can be formallydescribed as
follows:
8 sched-
/ : each operation
8 must wait for
8 should receive its data depen-
8 is skipped
8 is removed definitively from the
/ . However we require no reconfigurations
8 : the failure of just one replica of
8 does not im-
8 was supposed to send
?
8
?
8-
??
8
?
8;-
?
?
8
?
8;-
?
that we need a reconfig-
1. Suppose
|Y?
/
?3
?8
for some
·
|Y?
/
?
and
¤n?
and with
Then drop the operation
size of
8-
?
8
?+0¸7
such that there exist no
§-
such that
|Y?
/
?3
§-
?¹8;- andthereexistsno
by one by shifting the other operations.
,
0?
with
?
/
?
,
?10!
§-?
}V?
,
?
such that
}V?
,
?.
§-
?
?
8;-
?
8
?.
8 from
|Y?
/
?
and reduce the
|Y?
/
?
2. Suppose
pose also that there exists no
such that for some
}V?
,
?3
?
?
8
?
8;-
?
for some
p
}V?
,
?
. Sup-
/
0R?
with
?
/
?
,
?g0
!
-
|Y?
/
?
we would have
|Y?
/
?3
§-
?{8 . Moreover, suppose that for no
Then drop the tuple
size of by one by shifting the other operations.
,3-
0´?
with
,3- adjacent to
, and no
§--°º
}?
,3-
?
do we have
}V?
,
?.
§--
?
?
8
?
8;-
?.
?
8
?
8;-
?
from
}V?
,
?
and reduce the
}V?
,
?
If we cannot apply any of these steps and we did not get
a void pair
(i.e. with
this pair is a correct scheduling.
4 The Heuristic
?[|n?B}~?
|Y?
/
?
a?j» for all
/
0p?
) then
We start with the given dag
?
5 , the architecture graph
??? , theplacement constraints
architecture graph that results due to the occurrence of the
failure pattern
where
fand
vandthe familyof fail-
ure patterns
W
???YX]?Q?1Xµ?
\
:_^
X
^a`. We denote
???.¼ the reduced
???
X
?Q?
X
?, defined as:
?
?
¼
h
?
W
?
X
??
X
?!
X[\
?
X
?
?i½K?
X
??
X
?
?i½P?
X
?!
X
?
!´¾
W
?
X
'?
X
??
X
'?
X
\
We consider that the failure patterns are incomparable one
to another, i.e., that for each
or
i.e.,withand
are tolerated by the fact that their occurrence reduces less
the architecture graph.
We first try to schedule the task dag on each of the re-
duced architecture graphs that result due to some failure
pattern. This phase is done with the aid of some scheduling
algorithm, e.g., SynDEx’s [6]. At this time the schedulings
are plain. Then, if at this phase, for some of the failure pat-
terns there exists no plain scheduling, the algorithm stops
with a negative answer because no scheduling with replica-
tion can be found to support this failure pattern. Note that
some of the scheduling algorithms may work only if the
reduced architecture graphs are connected, that is, if none
of the failure patterns splits the architecture graph into two
or more connected components. If we fall in this case, we
might need to run the plain scheduling algorithm for each
of the connected components resulting from one failure pat-
tern.
The first phase provides a family of plain schedulings
?
rÀ¿ we have
for some
?
X)
$
?Á,
?Á
$e?
X
?ÂÁ
$e?
X,
?
X¨
$e?dÁ. Smaller failure patterns,
???
-
?B?
-
??
-
$e?
X
?
-
$e?
X
+
*¿
W
?[|
X
?Q}
X
?
\
:_^
X
^a`. We say that the operation
8 is assigned
5