[show abstract][hide abstract] ABSTRACT: The tolerance theory by Arora and Kulkarni views a fault-tolerant program as the composition of a fault-intolerant program and fault tolerance components called detectors and correctors.At its core, the theory assumes that the correctness specifications under consideration are fusion closed.In general, fusion closure of specifications can be achieved by adding history variables to the program. However, addition of history variables causes an exponential growth of the state space of the program.To redress this problem, we present a method which can be used to add history information to a program in a way that (in a certain sense) minimizes the additional states. Hence, automated methods that add fault tolerance can now be efficiently applied to environments with not fusion closed specifications.
[show abstract][hide abstract] ABSTRACT: Self-stabilizing systems can automatically recover from arbitrary state perturbations in finite time. They are therefore well-suited for dynamic, failure prone environments. Spanning-tree construction in distributed systems is a fundamental task which forms the basis for many other network algorithms (like token circulation or routing).This paper surveys self-stabilizing algorithms that construct a spanning tree within a network of processing entities. Lower bounds and related work are also discussed.
[show abstract][hide abstract] ABSTRACT: We present a theory of fast and perfect detector components that extends the theory of detectors and correctors of Arora and Kulkarni, and based on which, we develop an algorithm that automatically transforms a fault-intolerant program into a fail-safe fault-tolerant program. Apart from presenting novel insights into the working principles of detectors, the theory also allows the definition of a detection latency efficiency metric for a fail-safe fault-tolerant program. We prove that in contrast to an earlier algorithm by Kulkarni and Arora, our algorithm produces fail-safe fault-tolerant programs with optimal detection latency. The application area of our results is in the domain of distributed embedded applications.
[show abstract][hide abstract] ABSTRACT: The Byzantine failure model allows arbitrary behavior of a certain fraction of network nodes in a distributed system. It was introduced to model and analyse the effects of very severe hardware faults in aircraft control systems. Lately, the Byzantine failure model has been used in the area of network security where Byzantine-tolerance is equated with resilience against malicious attackers. We discuss two reasons why one should be careful in doing so. Firstly, Byzantine-tolerance is not concerned with secrecy and so special means have to be employed if secrecy is a desired system property. Secondly, in contrast to the domain of hardware faults, in a security setting it is difficult to compute the assumption coverage of the Byzantine failure model, i.e., the probability that the failure assumption holds in practice. For this latter point we develop a methodology which allows to estimate the reliability of a Byzantine-tolerant solution exposed to attackers of different strengths.