Self-Stabilization by Local Checking and Global Reset (Extended Abstract).
ABSTRACT We describe a method for transforming asynchronous network protocols into protocols that can sustain any transient fault, i.e., be come self-stabilizing. We combine the known notion of local checking with a new notion of internal reset, and prove that given any self-stabilizing internal reset protoco l, any locally-checkable protocol can be made self-stabilizing. Our proof is construct ive in the sense that we provide explicit code. The method applies to many practical network problems, including spanning tree construction, topology update, an d virtual circuit setup.
Article: k -stabilization of reactive tasks01/1998;
- [Show abstract] [Hide abstract]
ABSTRACT: A central theme in distributed network algorithms concerns understanding and coping with the issue of locality. Yet despite considerable progress, research efforts in this direction have not yet resulted in a solid basis in the form of a fundamental computational complexity theory for locality. Inspired by sequential complexity theory, we focus on a complexity theory for distributed decision problems. In the context of locality, solving a decision problem requires the processors to independently inspect their local neighborhoods and then collectively decide whether a given global input instance belongs to some specified language. We consider the standard LOCAL model of computation and define LD(t) (for local decision) as the class of decision problems that can be solved in t communication rounds. We first study the intriguing question of whether randomization helps in local distributed computing, and to what extent. Specifically, we define the corresponding randomized class BPLD(t,p,q), containing all languages for which there exists a randomized algorithm that runs in t rounds, accepts correct instances with probability at least p, and rejects incorrect ones with probability at least q. We show that p2 + q = 1 is a threshold for the containment of LD(t) in BPLD(t,p,q). More precisely, we show that there exists a language that does not belong to LD(t) for any t=o(n) but does belong to BPLD(0,p,q) for any p,q ∈ (0,1) such that p2 + q ≤ 1. On the other hand, we show that, restricted to hereditary languages, BPLD(t,p,q)=LD(O(t)), for any function t, and any p, q ∈ (0,1) such that p2 + q > 1. In addition, we investigate the impact of nondeterminism on local decision, and establish several structural results inspired by classical computational complexity theory. Specifically, we show that nondeterminism does help, but that this help is limited, as there exist languages that cannot be decided locally nondeterministically. Perhaps surprisingly, it turns out that it is the combination of randomization with nondeterminism that enables to decide all languages in constant time. Finally, we introduce the notion of local reduction, and establish a couple of completeness results.Journal of the ACM (JACM). 10/2013; 60(5).
To appear in Proc. 8th WDAG, October 1994
Self-Stabilizationby Local Checking and Global Reset
??, Boaz Patt-Shamir
?, George Varghese
?and Shlomi Dolev
?Dept. of Computer Science, Johns Hopkins University
?Lab. for Computer Science, MIT
?Dept. of Computer Science, Washington University
?Dept. of Computer Science, Texas A&M University
?School of Computer Science, Carleton University
Abstract. We describe a method for transforming asynchronous network protocols
into protocols that can sustain any transient fault, i.e., become self-stabilizing. We
combine the known notion of local checking with a new notion of internal reset, and
prove that given any self-stabilizing internal reset protocol, any locally-checkable
protocol can be made self-stabilizing. Our proof is constructive in the sense that
we provide explicit code. The method applies to many practical network problems,
including spanning tree construction, topology update, and virtual circuit setup.
A network protocol is called self-stabilizing (or stabilizing for short) if when started from
an arbitrary state, it eventually exhibits the desired behavior. In the context of computer
networks, a self-stabilizing system may have an initial state with arbitrary messages at the
links and arbitrary corruption of the state variables at the nodes. The practical appeal of
stabilizing protocols is that they are simpler (i.e., they avoid a slew of mechanisms to deal
with a catalog of anticipated faults), and they are more robust (e.g., they can recover from
transient faults such as memory corruption as well as common faults such as link and node
Since the pioneering work of Dijkstra , the theory of self-stabilization has been
extensively studied (e.g., [9, 16, 12, 2, 5]). While most of the work was directed at self-
stabilization of specific tasks, some work was devoted to designing general algorithmic
transformers that take a protocol as input, and produce as their output a self-stabilizing
version of that protocol. These transformers typically exhibit trade-offs between their
generality (i.e., the range of input protocols they can transform) and the efficiency of
the resulting protocols. One such general transformation is given by Katz and Perry ,
where they show how to compile an arbitrary asynchronous protocol into a stabilizing
equivalent. Briefly, the idea in  is that a leader node periodically takes “snapshots” of
the global network state, and resets the system if some inconsistency is detected. We call
this method global checking and correction. Due to its generality, this transformation is
expensive in terms of space and communication; another drawback of this approach is that
it requires an additional self-stabilizing mechanism that maintains routes that connect all
nodes to some leader.
Afek, Kutten and Yung  suggested that global inconsistency could sometimes be
detected by checking the states of neighbors — i.e., by local means. Using the idea of
to maintain diffusing computations. Their reset protocol requires an underlying stabilizing
The idea of local detection of faults is formalized in [6, 7, 28] under the name of
localchecking.In[6,28], the classof locallycorrectableprotocols isalsodefined;these are
a transformer that useslocal checkingand local correction is described. The transformer of
 is efficient, but it can be applied only to protocols that are both locally checkable and
locally correctable. Unfortunately, many interesting network protocols can be shown to be
locally checkable, but not locally correctable.
In this paper, motivated on the one hand by the inefficiency of the transformer of ,
and by the narrowness of the transformer of  on the other hand, we introduce a new
algorithmic transformer that can be used to make a wide class of protocols self-stabilizing.
The idea is to combine local checking and global correction: bad states are detected by
local checking mechanism, a global correction action (called “reset”) is used to recover
from faults. We contend that local checking and global reset is the right balance in many
practical situations. First, we argue that global detection mechanisms such as the self-
stabilizing snapshot  incur unnecessary large overhead (in terms of time, space and
communication) practically always, since networks are fairly failure-free. Local checking
detects faults quickly, and it can be done, as we show in this paper, with only a small
increase in communication cost. Secondly, as mentioned above, there are many protocols
that are locally checkable but not locally correctable (e.g., spanning tree construction and
topology update [27, 20, 21, 22]). In these cases we are forced to use other techniques —
Even though resetting an entire network may seem drastic and inefficient, there is
evidence that this is not the case.For instance,consider routing protocols.The stabilization
which is the time that takes for many protocols to compute their results anyhow, even after
being started in a good state. Empirical results also support the claim that resets perform
quite well in practice. Specifically, DEC SRC’s AN-1 network  employs a variant of
global reset for dealing with topology changes (by making a reset request whenever a
link fails or comes up).
failures very fast . The reason is that usually the routing protocol only operates for a
small fraction of the time at a node; the remaining processing is devoted to forwarding
data. During a reset, however, no data forwarding is done; all processing and bandwidth is
devoted to the reset. The moral from the AN-1 experience is that reset schemes work well
for smallsizednetworks;for larger networks,the sameapproachshouldwork if the routing
protocol is hierarchical  and each level is reset independently.
The main result of this paper is a precise description and statement of the method of
local checking and global reset. We provide formalization, analysis, and code. We believe
that in doing so we contribute something thatwill help both theoreticians and practitioners.
We remark that the ideas of local checking and global reset are not new; for instance, the
stabilizing spanning tree protocol of  uses local detection, and Arora and Gouda  use
?The AN-1 designers found that the protocol recovered from link
?The AN-1 reset is performed using a version of Finn’s unbounded counter protocol .
reset to maintain diffusing computations. The contribution of this paper is in introducing
a general transformer that can be used to stabilize any locally checkable protocol. The
descriptionof the transformer entails a descriptionof a local checkingmechanism,detailed
requirements that the reset protocol being used must meet, and a description of the way to
construct the resulting self-stabilizing protocol.
Itisimportanttoobserve thatthe classicalnotionofreset [14,1,6]is insufficientforour
purposes.In these papers,the taskis specifiedinterms of an externalentity thattriggers the
reset: for example, the reset can be triggered by a change in the topology (e.g., link crash).
The important point is that this specificationformalizes a reset that is invoked regardless of
the way it affects the system. Below, we call such resets external. Notice that an external
reset is inadequate for a general transformer: in our method there is no external entity.
We have a protocol, which is checked by the local checking mechanism, that can trigger
the reset, which in turn changes the state of the protocol being checked. If while resetting
inconsistentstatesof the original protocol are created,the localcheckingmechanismmight
invoke the reset again,resulting in an endless vicious cycle of reset invocations. Therefore,
anothernotion ofresetis required in this setting.One of the contributions inthis paper is an
appropriate specification of a stronger reset, hereafter called internal reset. Intuitively, the
requirement of external reset that there are only finitely many reset invocations is replaced
in internal reset by a specification that guarantees that when used properly, the reasons for
invoking reset eventually disappear.
Interestingly, some reset implementations [1, 6] are known to produce intermediate
globalinconsistencies[1, 28]. In this paper,however, we show thata certain“pairwise con-
sistency” is sufficient; fortunately, it turns out that the above protocols (although designed
as external resets) meet the requirements of internal reset.
The remainder of the paper is organized as follows. We start, in Section 2, with an
overview of the network model and the definition of stabilization used in this paper. In
Section 3 we define the notion of local checkability (this is a straightforward formulation
of the ideas in [5, 6]). In Section 4 we give a definition of the requirements of internal
resetprotocols.Then,in Section 5,we give our main result, that connectsthe known notion
of local checkability with the new notion of internal reset. Namely, we present a theorem
that says that any locally checkable protocol can be made self-stabilizing using any self-
and outline a proof of correctness for the combined protocol (explicit code is omitted from
this extended abstract.) Some applications of our main result are mentioned in Section 6.
In this section we describe our network model. We first review briefly the underlying
formal model of Input/Output Automata (see [17, 18] for full definitions),and establishthe
notation we use throughout this paper. We also formalize the notion of self-stabilization in
this framework. In the second part of this section, we specify the network model we are
dealing with in this paper.
IO Automata, Stabilization, Time Complexity. An Input/Output Automaton (abbreviated
IOA henceforth) is a state machine whose state transitions are given labels called actions.
There are three kinds of actions. The environment affects the automaton through input
actions which must be responded to in any state. The automaton affects the environment
through outputactions;these actionsare controlledby the automaton. Internalactionsonly
change the state of the automaton without affecting the environment. Formally, an IOA
N is defined by a state set
the action set into input, output, and internal actions), a transition relation
S?N?, an action set
A?N?, a signature
G?N? (that classifies
the automaton’s name when it is clear from the context. An action
For an automaton
that is obtainedfrom
uninitialized IOA for which
actions. More formally, IOAs can be composed (under certain compatibility conditions) to
generate a composite state machine; an action which is output of one of the components
and input of the other is performed simultaneously.
When an IOA “runs” it produces an execution. Formally, an execution fragment is an
alternating sequence of states andactions
enabled eventually occurs.
initial state and is fair. A schedule is a subsequence of an execution consisting only of the
actions. A behavior is a subsequence of a schedule consisting only of its input and output
actions. Each IOA generates a set of behaviors. An IOA
the behaviors of
definition and require only that
?N??A?N??S?N?, and a non-empty set of initial states
I?N??S?N?. We omit
a is said to be enabled
s if there exist
?S such that
??R. Input actions are always enabled.
N and non-empty set
L?S?N?, we define
NjL to be the automaton
N by settingthe initial statesto be
L. In this paperwe often dealwith
S is finite. IOAs communicate by means of shared
i??. An execution fragment is fair if any internal or output action that is continuously
?An execution is an execution fragment that begins with an
A implements another IOA
A are a subset of the behaviors of
B. For stabilization, we weaken this
A eventuallyexhibit a behavior of
B. Formally, we saythat
A stabilizes to
this definition (based on a definition by Nancy Lynch) is formulated in terms of external
behavior, as opposed to a (somewhat circular) state-based definition.
For time complexity, we use the timed IOA model of  (see  for formal details).
Informally, we assume that every internal or output action that is continuously enabled
occurs in 1unit of time. We saythat
suffix that occurs within time
is the smallest
to be identical to
instead of 1 time unit.
B if every behavior
A has a suffix which is a behavior of
B. Note that
A stabilizes to
B in time
t if every behavior of
A has a
t and is a behavior of
B. The stabilization time from
t such that
A stabilizes to
B in time
t. For any automaton
N except that the time associated with each action is now
x time units
Network Model. For the remainder of this paper we fix an underlying network topology,
modeled by a directed symmetric graph
communicationlink. We denote the numberof networknodes by
diameter is denoted
we describe verbally the links and node automata. Formal definitions are omitted from this
G??V?E? with unique node identifiers.
v?V represents a processor, and each directed edge represents a unidirectional
n?jVj,and the network
d? diam?G?. Each node and link is modeled by an IOA. Below,
?The IOA model specifies fairnessin terms of equivalence classes; here we assume eachaction is in
a separate class.
G??V?E? is called symmetric if for all
u?v?E we have that
In our model, links have bounded storage, i.e., only a bounded number of outstanding
packets are stored on each link at any instant. The justification for this assumption is
twofold: first, not much can be done with unbounded links in a stabilizing setting , and
secondly, real links are inherently bounded anyway. In this paper we abstract this property
by postulating that a link can store at any given instant at most one outstanding packet.
packet from some packet alphabet
Figure 1) includes an input action SEND
output action RECEIVE
? at any instant. The external interface to the link (see
?p? (interpreted as “send packet
?p?, (interpreted as “deliver packet
v”), and an output action
u?v(interpreted as “the
?u?v? link is currently free”).
?If a SEND
?p? occurs when
and when it is taken, its effect is to set
packet is just dropped). We note that by our timing assumptions, a packet stored in a link
will be delivered in one unit of time.
??, the effect is that
followingdiscipline for sendingpackets.It hasa boundedoutputqueue calledqueue ?v
?p? is enabled,
u?vis enabled. If
?p? occurs when
???, there is no change of state (intuitively, the incoming
Fig.1. Schematic representation of a single link, connecting queued node automaton
v. The link
u is symmetric and is not shown.
A node automaton
automaton which we call a queued node automaton. A queued node automaton
u has, for each neighbor
v, output actions SEND
?p? to send
v, input actions RECEIVE
?p? to receive packets from
v, and an input action
u?vto obtain indications of the
?u? v?-link state. In this paper, we use a special node
u has the
a boolean flag free ?v
is received from the link at
Queued node automata allow us to easily superimpose a local checking process. The
local checking process at each node needs access to the state of the node and also requires
the sending of control packets. (The requirement of access to state rules out the possibility
of formalizing the local checkingprocess as a separate automaton). Queuednode automata
use a particular discipline for sending data packets on a link; this discipline makes it easy
to multiplex data and control packets on each link. Any node automaton requires some
discipline anyway to deal with bounded links. Thus the use of a particular discipline is not
overly restrictive and makes it easy to add local checking.
For the given graph
? for each neighbor
u (see Figure 1). Whenever a FREE
u,the free ?v? flag is set. A SEND
?p? action is only performed
p is the head of queue ?v? and free
?v? is set; its effect is to clear free
G??V?E?, we define the automaton for
G by the composition
?Ourconvention foraction subscriptsis thatthe first representsthe senderand the secondrepresents