Article

Off-Line Real-Time Fault-Tolerant Scheduling

05/2002;
Source: CiteSeer

ABSTRACT We address the problem of off-line fault tolerant scheduling of an algorithm onto a multiprocessor architecture with distributed memory and provide a generic algorithm which solves this problem. We take into account two kinds of failures: fail-silent and omission. The basic technique we use is the replication of operations and data communications. We then discuss the principles which govern the execution of schedulings with replication under the state-machine and the primary/backup arbitrations between replicas. We also show how to compute the execution date for each operation and the timeouts which are used for detecting failures. We end with a heuristic which, using this calculus, computes a possibly non optimal scheduling by finding plain schedulings for each failure pattern and then combining them into a scheduling with replication.

0 0
 · 
0 Bookmarks
 · 
24 Views

Full-text

View
0 Downloads
Available from

Keywords

algorithm
 
basic technique
 
calculus
 
data communications
 
execution date
 
generic algorithm
 
non optimal scheduling
 
off-line fault tolerant scheduling
 
omission
 
plain schedulings
 
principles
 
replicas
 
replication
 
scheduling
 
schedulings
 
timeouts