Content uploaded by Mehran Alidoost Nia
Author content
All content in this area was uploaded by Mehran Alidoost Nia on Jul 01, 2018
Content may be subject to copyright.
Content uploaded by Mehran Alidoost Nia
Author content
All content in this area was uploaded by Mehran Alidoost Nia on Jun 30, 2018
Content may be subject to copyright.
Probabilistic Analysis of Self-Stabilizing Systems:
A Case Study on a Mutual Exclusion Algorithm
Mehran Alidoost Nia and Fathiyeh Faghih
DRTS Research Lab, School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
{alidoostnia, f.faghih}@ut.ac.ir
Abstract— The heterogeneity in cyber-physical systems (CPS)
and the diverse situations that they may face with, along with the
environmental hazards raise the need to self-stabilization. The
uncertain nature of CPS necessitates a probabilistic view for
analyzing the system stabilization-time that is a highly critical
metric in distributed/time-sensitive applications. Calculating the
worst-case expected stabilization-time and possible improvements
help to have safer designs of CPS applications. In this paper, a
mutual exclusion algorithm based on PIF (Propagation of
Information with Feedback) self-stabilizing algorithm is selected
in synchronous environment as a case study. Using probabilistic
analysis, we present a set of guidelines for utilizing this algorithm
in time-sensitive applications. We have also utilized an
approximation method for improving the scalability of our
probabilistic analysis and did a set of experiments to show how this
analysis could be used in the design of topologies with the goal of
having an optimal worst-case expected stabilization-time. Our
results show that using this approach, we can significantly improve
the worst-case expected stabilization-time.
Keywords— cyber-physical systems, probabilistic formal
analysis, self-stabilizing algorithms, propagation of information
with feedback, real-time systems.
I. INTRODUCTION
Cyber-Physical Systems (CPS) cover various time-sensitive
applications, ranging from medical systems to wireless sensor
networks (WSN). Environmental hazards play a decisive role in
terms of reliability and performance of the system. In such an
unstable environment, transient faults that cause temporary
disorder in system's global state should be considered to avoid
system failure in safety critical and time-sensitive applications.
Self-stabilization refers to a special type of fault-tolerance in
distributed systems, first introduced by Dijkstra [1]. In a self-
stabilizing system, if the system goes outside its set of legitimate
states (LS) because of a set of transient faults or bad
initialization, it is guaranteed to recover back to its LS in a finite
number of steps. Self-stabilization has been studied extensively
and several self-stabilizing algorithms have been proposed for
different problems used in applications such a networking and
robotics. Among those, we can mention the leader election [2],
mutual exclusion [3], spanning tree [4], and three coloring [5].
Convergence is the key property of self-stabilizing systems.
However, quantitative metrics such as recovery time (or namely
stabilization-time) can be as critical, when self-stabilization is
used to design a system for real-time and safety-critical tasks.
Worst-case stabilization-time of a self-stabilizing algorithm is
defined as the largest number of steps it takes to recover to its
set of legitimate states starting from any arbitrary initial state.
Self-stabilization is proved to be impossible for a number of
problems, including token circulation and leader election in
anonymous networks. In order to deal with the impossibility
results, a number of variants of self-stabilization have been
introduced, including weak-stabilization [6] and probabilistic-
stabilization [6]. In a probabilistic self-stabilization, the system
recovers to its set of legitimate states with a probability close to
one. Herman in [7] proposed a probabilistic algorithm for token
ring in an anonymous ring. A precise evaluation of the
stabilization-time for this probabilistic algorithm is known to be
a very difficult task. Kwiatkowska et. al. in [8] used a
probabilistic verification method to evaluate the worst-case
expected stabilization-time. Using the PRISM model checker,
they did a number of experiments on the rings with different
sizes and used the analysis results for improving the worst-case
expected stabilization-time of the algorithm.
Mutual exclusion is a well-known problem in distributed
computing, which has been extensively studied in the self-
stabilization community as well. Dijkstra was the first to propose
three solutions for mutual exclusion in a ring in his seminal
paper on self-stabilization. After that a number of self-stabilizing
algorithms have been proposed for mutual exclusion in different
topologies and settings [9-11]. Jubran and Theel in [12]
proposed an algorithm for mutual exclusion in a tree with the
goal of having a better response time in a synchronous setting,
where the request to enter the critical section happens rarely. For
example, consider a set of sensors that are used to detect rare
events such as fire or flood. Whenever a sensor detects
something unusual, it should be able to get all resources in a
mutually exclusive manner. Therefore, it requests to enter the
critical section, and the expectation is to grant its request in a
small amount of time. The authors have done a theoretical
analysis on the response time as well as the stabilization-time (in
case of transient faults). We argue that we could do a better
analysis of the stabilization-time if we consider the probability
of request in each node. In order to the probabilistic analysis, we
have used a similar approach to the one introduced in [7] and
computed the worst-case expected stabilization-time. Note that
this algorithm has not been initially designed as a probabilistic
self-stabilizing protocol. However, considering the probabilistic
nature of nodes, we have first translated it to a probabilistic
algorithm, and then used the PRISM tool to analyze the worst-
case expected stabilization-time for different probabilities of
request in the nodes of different topologies. We have used the
analysis results to give a number of suggestions to the designers
who want to use this protocol in real-time applications.
State space explosion is a well-known problem in the model-
checking community. Probabilistic model checking suffers from
the same limitation as well. In order to deal with this limitation,
we use an approximation method [13] to reduce the state space
and then do the probabilistic analysis. Our results show that this
method can extensively improve the scalability of our analysis.
We argue that this approach can be used in the analysis of
probabilistic self-stabilizing algorithms to improve their
scalability, as the one presented in [14].
The rest of the paper is structured as follows. In the next
section, technical background of the work is discussed. In
Section 3, we have a brief review on the SSUPS self-stabilizing
algorithm. Section 4 is dedicated to the analysis of basic tree
structures and the resulting guidelines. In Section 5, we propose
a method to deal with the complexity issue in large-scale
systems. In section 6, we review related work. Finally, in Section
7, we give the concluding remarks.
II. BACKGROUND
In this section, we give a brief overview on Discrete-Time
Markov Chain (DTMC) that is a basis for modeling and analysis
of self-stabilizing in this research work [15].
Definition 1. A discrete-Time Markov Chain (DTMC) over a
set of atomic propositions is defined as a tuple
, where is a (finite) set of states, is a set of
possible initial states, is a probabilistic
transition relation, in which
for all ,
and maps each state to a subset of atomic
propositions.
Properties to be verified over DTMCs are written in logics such
as PCTL. A key element of these logics is the probabilistic
operator . A formula , where , p is a
probability ( ) and is a path formula, asserts that the
probability of being true satisfies the bound .
More formally, the syntax of PCTL is defined as follows, where
denotes the Until operator, X indicates the Next operator,
, and .
(1)
(2)
A distributed system is defined over a set of variables V, and
is composed of a set of processes Π. A state of the system is a
valuation of all variables. The set of all possible states of the
system is called the state space, represented by . In a self-
stabilizing distributed system, the set of legitimate states is a
subset of the state space, in which the system is in a valid
configuration ( ). A self-stabilizing system can be
represented as a DTMC, where the set of atomic propositions
in which L stands for legitimate states.
Definition 4. A distributed system is self-stabilizing if and
only if the following two conditions hold:
A. Convergence: Starting from any arbitrary state, the system
will eventually converge to its set of legitimate states LS, with
probability 1.
B. Closure: For each s
LS, if a state s’ exists with
, then it follows that Note that denotes the
probability of transition from s to s’.
In order to measure the expected stabilization time over a
DTMC, a reward rate is given to each transition.
Definition 4. Expected reward of a path , over
a DTMC M is denoted as , and computed by (3), where
represents the reward rate assigned to transition
, which is also called step size in i-th step:
In a self-stabilizing system, we are interested in computing
the maximum expected reward () or worst-case expected
stabilization-time. Considering to be the
set of paths starting from any arbitrary initial state to LS,
is computed according to (4). In this paper, we use and
worst-case expected stabilization-time interchangeably.
We map the notion of expected reward to expected-time, as
the reward rate is adjusted by timing parameters of the system.
We then derive maximum expected-time using the
formula.
Fig. 1. A simple example of a self-stabilizing system modeled by DTMC.
As an example, consider the self-stabilizing system modeled
as a DTMC in Figure 1. In this example, the set of states is
and the set of legitimate states is , which
is labeled by . For the sake of simplicity, we consider the
reward rate of every transition equal to 1 namely .
Considering all paths starting from outside the legitimate states
to a state in LS, we must compare 13 values, computed by (3).
For instance, we calculate the expected value of path π1=s0s3s5s8
as follows:
Applying (4), MER equals to 2.7, which corresponds to the
paths π2=s0s3s6s9 and π4=s1s3s6s9.
III. PIF SELF-STABILIZING ALGORITHM
Propagation of Information with Feedback (PIF) algorithm is
usually designed for the tree topologies. It starts by sending a
broadcast message from the initiator (root), and all processes
except for the terminal ones (leaves) participate in the broadcast
by sending the message to their descendants. Once the message
reaches a leaf node, it acknowledges the initiator by sending a
feedback to its ancestors. The broadcast finishes once the root
gets feedback from all its children. There are different self-
stabilizing PIF algorithms proposed in the literature [16]. In this
paper, we focus on one of the PIF algorithms proposed for a
synchronous network, called SSUPS [12]. It injects
synchronicity to PIF and follows Dijkstra's 4-state machine [1].
The main application of the algorithm is to assign mutually
exclusive privileges to the processes in a tree to access critical
sections.
A. Topology
The nodes are structured in a tree, where each node has a set
of local variables <xi, upi, li, ai, pi>. xi and upi are the state
variables (similar to the variables in Dijkstra's 4-state machine),
li is a pointer to one of the process's neighbors (or no process), ai
illustrates activation status (whether it requests a privilege or
not), and pi determines the privilege received by the process. By
default, up variable is true for the root and false for the leaf
nodes. Each node can read its own, as well as its neighbors' local
variables, including its parent and set of children, and can
change its own variables according to the guarded commands in
its local algorithm.
B. Algorithm
The system uses four types of tokens including search token,
positive/negative feedback token, execute token, and complete
token that are respectively listed in (5)-(9). Note that ch indicates
a child of the node p, θp refers to the parent of node p, and Cp
indicates the set of children of node p.
(6)
(7)
The algorithm is divided into two different cycles. In the first
cycle, the algorithm searches for active nodes using the first two
tokens (Equations (5)-(7)). The second cycle includes sending
the execution token and receiving the completion signal by the
root which is accomplished by the other two tokens (Equations
(8)-(9)). The root initializes the entire feedback loop and it has
two states including id (idle) and rq (request). Once it sends a
request signal to its children, they should propagate the same
request to their descendants. Intermediate nodes have three
states including id (idle), rq (request), and rp (reply). In the first
cycle, they behave just like the root and issue requests to their
children. But in the second cycle, when feedbacks are being sent,
they behave like a leaf node and issue reply to their ancestors.
Leaves are non-root processes with a single neighbor.
The algorithm consists of three sub-algorithms associated
with the root, intermediates and leaves that are listed as
Algorithms 1-3. We have used the notation similar to the input
language of PRISM. In the algorithms, indices 1, 2, and 3 stand
for root, intermediate, and leaf nodes respectively. The primary
condition to receive an execute token by a process is that it must
be active and referred by a causality chain of other nodes from
the root to itself. The chain is formed by the l variables.
The variable l is significant to detect an execution path. It
works just like a pointer which makes a reference to the other
nodes. A set of references construct an execution/active path. If
there is no execution path in the tree, root takes the execute
token.
Algorithm 1. Formal implementation of root process using PRISM style.
Algorithm 2. Formal implementation of intermediates using PRISM style.
Algorithm 3. Formal implementation of leaf process using PRISM style.
C. Probabilistic Modification of the Algorithm
In order to do probabilistic analysis on the algorithm, we first
needed to find the nondeterministic parameter of it. Investigating
the Algorithms 1-3, we can find that the activation variable of
the processes is a nondeterministic variable, which cannot be
controlled by the algorithm. The variable “a” reflects whether a
process is active or not. We can use it to transform the algorithm
into a probabilistic one, since active changes are independent of
the algorithm. Note that a process is active, when it has a request
to enter its critical section. Activation/deactivation of a process
directly effects on the existence of an execution path.
The original algorithm is modified using composition of
commands. Theses commands are composed as follows, where
each ri is a probabilistic variable and “+” is the choice operator
in the PRISM syntax.
1: module process1
2: x1 : bool ;
3: l1 : [-1..2];
4: a1 : bool;
5: p1 : bool;
9: (
10: endmodule
1: module process2
2: x2 : bool;
3: up2: bool;
4: l2 : [-1..3];
5: a2 : bool;
6: p2 : bool;
false;
14: up2 & !x2 & (
15: endmodule
1: module process3;
2: x3 : bool;
3: l3 : [-1..3] ;
4: a3 : bool;
5: p3 : bool;
8: (x2! =x3)
10: endmodule
Original commands in Algorithm 1:
(!x1)&= true)
(!x1)&(x1 = x2 & !u
The corresponding probabilistic (composed) command:
(!x1)&(x1 = x2 & !up2) r0: = true)& +
(1-r0):
Original commands in Algorithm 2:
(
The corresponding probabilistic (composed) command:
(x1! = x2) & !x2 r1: +
(1-r1):
Original commands in Algorithm 3:
The corresponding probabilistic (composed) command:
r2: +
(1-r2):
To make sure that the modified representation of the
algorithm satisfies the self-stabilization conditions, we have
checked the closure and convergence properties written in PCTL
logic in the syntax of PRISM in (10) and (11). Briefly speaking,
the convergence property checks whether starting from any
arbitrary initial state, the system reaches a legitimate state with
probability 1, and the closure property checks whether the
system stays in LS, after it reaches there. In order to write these
properties, we have used the formula stable, which formulates
the legitimate states conditions (LS conditions). The LS
conditions are taken from [12], and presented in Algorithm 4.
Note that the LS conditions depend on factors such as existence
of execution paths, and hence, they depend on the tree structure.
The formulation in Algorithm 4 corresponds to a simple tree
structure including a root process, an intermediate and a leaf
node.
In order to find the maximum expected stabilization-time of
the algorithm in PRISM, each transition is assigned a reward
denoted by R, which for simplicity is considered to be 1 in this
work. Using (12), PRSIM finds the MER value of the system.
Algorithm 4. LS conditions for a simple tree of 3 nodes in a line, in PRISM
IV. PROBABILISTIC ANALYSIS & EXPERIMENTAL RESULTS
In this section, we discuss our results on probabilistic analysis
of the algorithm, where we investigate the effect of the
activation probability of each processing node on the maximum
expected stabilization-time. In the following, the index of each
variable corresponds to the process index. As shown in Figure
2, we have selected a set of basic topologies for probabilistic
analysis. For the sake of space, we cannot include all tree
topologies consisting of three or four processes.
To mention the activation probability of the root, we have
used the probabilistic variable r0. In other words, in each
analysis, the root is considered to be active with probability (1-
r0), and deactivated with probability r0. Similarly, the
deactivation probability of each process is denoted by the
sequence of r1, r2, …, rn. The effect of these probabilistic
variables is shown in a set of probability-MER charts in Figure
3, where the MER denotes the worst-case expected stabilization-
time of the system.
From now on, we propose our findings in the form of a set
of observations and analytical results. The former refers to our
observations taken from the experimental results on specific tree
structure, while the latter includes what we have observed on the
results of all our experiments on the limited size tree structures.
Fig. 2. Basic PIF tree structures used for probabilistic analysis.
Figure 3 depicts the effect of the activation probability of
different nodes on the worst-case expected stabilization-time in
the tree structures shown in Figure 2. For the sake of better
presentation of the results, we have placed probability in y-axes,
and the MER in x-axes.
Figure 3-(a) shows the relationship between r0 and the MER
in T1 (Figure 2). As can be easily observed, the MER decreases,
as the deactivation probability of the root increases in this tree
structure.
In Figure 3-(b), we show our results on studying the worst-
case expected stabilization-time with different values of r1
(deactivation probability of node P1) in T1. As can be seen, the
MER has direct relationship with r1. We have also studied the
MER with different probabilities of r2 and r3. Interestingly, we
have observed that r2 and r3 have negligible effect on the worst-
case expected stabilization-time. The other interesting result is
that as we repeated this experiment for different values of r1, the
effect of r2 and r3 on the MER decreased as r1 increased.
Figure 3-(c) depicts the relationship between r0 and the
MER in T2. We have magnified the chart, so we can easily
observe some peaks around probabilities 0.45 and 0.5. The best
stabilization-time is achieved with deactivation probability 0.4.
In Figure 3-(d), we illustrate the relationship between MER
and other probabilistic variables including r1, r2, r3 and r4 in
T2. As we can see, MER values corresponding to deactivation
probability of middle-branch (r2 and r4) has symmetric
relationship comparing to values in other branches (r1 and r3).
1: formula f1 = (up2&x2&l2= -1&a3);
2: formula f2 = (!up2&x2&l1 = id2)|((!up2&x2&l1 = id1)&
(x3&l2 =id3))|((!up2&x2&l2 = -1&!a2) & (x3&l3 = -1&!a3);
3: formula f3 = (x1)&((x1&l1 = -1)|(x3&up3&l3 = -1))
&((x2&l1 = id2)|(x3&l2 = id3&x2));
4: formula f4 = (!x1)&((a2&!a3&l1=id2)|(a3&l2= d3&l1=d2));
5: formula stable = f1 | f2 | f3 | f4;
Fig. 3. Probability-MER (Worst-Case Expected Stabilization-Time) graph generated by different tree structures.
As an important observation, we found that increasing the
deactivation probability of the nodes in longer paths (having
intermediate nodes) increases the system MER. We can see this
trend in r2 and r4 corresponding to nodes P2 and P4 in T2.
T3 has a linear and simple structure. Figure 3-(e) illustrates
the relationship between r0 and the MER of the system. As an
observation, we can see that the MER has an indirect
relationship with the deactivation probability of the root. Also,
Figure 3-(f) indicates the relationship between deactivation
probability of the first intermediate (r1) and the MER. We have
also observed that in linear paths, the probability of deactivation
in the first intermediate (r1) has higher impact on the MER
comparing to the other intermediates (e.g. r2). Overall, in this
structure, the optimal MER is achieved when deactivation
probability of intermediates is increased.
Figure 3-(g) shows the relationship between the MER and
the deactivation probability of the root (r0) in T4, which has a
balanced topology. As we can observe, the best MER is achieved
by the deactivation probability r0=0.5. Note that for the values
r0<0.5, the MER was very big, and hence, we removed this
range for the sake of presentation. Figure 3-(h) shows the
relationship between the deactivation probability of other nodes
(r1, r2, r3, r4) and the MER in T4. It shows that the optimal
MER is achieved when the activation probability of intermediate
is kept very small. In our experiment, we set r2 to 1, and decrease
the value of r1. As we can observe, it led to increase in the MER
of the system. If we repeat the same experiment by keeping r1=1
and change r2, we will get similar results. We did this
experiment in a manner that the leaves have a constant
probability of deactivation. The effect of leaves’ probability (r3
and r4) is negligible.
Figure 3-(i) depicts the relationship between r0 and the MER
in T5, which shows that optimal worst-case expected
stabilization-time is achieved when the probability of r0 is equal
to 0.5. Figure 3-(j) illustrates the relationship between (r1, r2,
r3, r4) and the MER in the same tree. We can observe that the
increase in activation probability of nodes in longer branch of
T5 (P2, P3, P4) leads to better MER. Note that the left branch does
not include any intermediate, and it effects on this experiment as
we saw earlier. So, we keep the probability of this branch
constant when experimenting other probabilities (r2, r3, r4).
Studying the analysis results on a set of tree structure leads
to the following analytical facts.
Analysis 1. Existence of at least one active intermediate node
improves the stabilization-time of the system. We can highlight
this achievement through T1 and Figure 3-(b).
Analysis 2. In the design process, we can place the nodes with
highest probability of being deactivated in longer paths. We may
refer to the structure T2 and experiment shown in Figure 3-(d).
Analysis 3. Activation/deactivation of the first intermediate
node has the most impact on the MER of the system, if the tree
has linear structure. As an example, we can refer to T3 and the
analysis result depicted in Figure 3-(f).
Analysis 4. If the tree has symmetric paths, the optimal
probability distribution among active branches will be
symmetric to achieve the best MER. We can verify this fact by
looking at r1 and r2 in Figure 3-(h).
Table 1 shows the state-space of each tree, which ranges
from thousands to billions of states. We have used MTBDD
engine [17] that enables us to analyze billions of states. To do
the experiments, we have developed an automatic code
generator for different tree structures, which is available at [18].
The generated codes can be used directly in PRISM model-
checker. It helps automating the process of probabilistic
analysis.
TABLE I. MER IN DIFFERENT SCENARIOS RESULTED FROM PIF TREES.
Structure
# of States
# of Transitions
Avg. Comp.
Time (s)
T1
5120000
17280000
1.51
T2
509607936
2436562944
6.13
T3
10240000
29120000
1.295
T4
1019215872
4013162496
20.692
T5
1019215872
4299816960
17.478
V. DISCUSSION
In this section, we first discuss the results of our analysis.
Then, we extend the set of Analysis findings to large-scale
systems.
A. Discussion and Comparison
So far, we have analyzed the SSUPS algorithm under
different tree topologies with limited size. The results show the
effect of activation probability of each process on the MER. In
this section, we discuss the significance of our analysis results
by demonstrating the amount of decrease we can achieve on the
MER if we use these findings in the design of the trees that use
this algorithm.
Figure 4 shows the best and the worst MER we could
measure for each tree structure in Figure 2. We achieved the best
MER by utilizing the results of our probabilistic analysis, as
discussed in the previous section. This chart shows how much a
good design, using probabilistic analysis, can reduce the MER,
compared to the MER resulting from a superficial design.
Fig. 4. Min and Max MER measured for each tree structure (see Figure 2)
TABLE II. MER IN DIFFERENT SCENARIOS RESULTED FROM PIF TREES.
Structure
Min MER
Max MER
# of dist. processes
T1
9.49
37.38
4
T2
9.37
23.75
5
T3
7.47
18.5
4
T4
10.28
16.3
5
T5
9.54
14.99
5
By looking at Table 2, the difference between minimum and
maximum MER is more perceptible. It may be expected that the
MER should be increased by increasing the number of
processes. However, our results on T1 (with 4 processes) and T2
(with five processes) show that it is not always the case. It shows
that in this algorithm, number of processes is not the only
effective parameter on the worst-case stabilization-time.
We argue that the result of probabilistic analysis of a self-
stabilizing algorithm can be used in practice. In the case of this
paper, the result can be easily applied by having a good guess
about the activation probabilities of different nodes. In general,
if the probability of the valuation of a system variable is studied,
one way to utilize the result of the probabilistic analysis is to use
state encoding. In this approach, the probability of a specific
value of the variable is decreased/increased purposefully [20].
B. Extending the Analysis
The main limitation of our analysis results is that they are
limited to small-sized trees, which is due to the well-known
problem in model checking, called the state space explosion
[19]. To extend our results and increase the scalability of our
analysis, we use an approximation technique, called ε-
approximate probabilistic bisimulation [13]. It reduces state-
space of DTMC using bisimulation technique under ε, which is
an approximation level for transition probability function. On
one hand, this approach helps out to reduce state-space of the
system and facilitate the analysis procedure. On the other hand,
it preserves performance metrics of DTMC under bisimulation
reduction like the MER. However, this reduction has some cost
itself. As a consequence of this approach, we approximate
bisimilar states that have some differences in transition
probability. According to ε-approximate rule, the differences in
transition probability must not exceed ε. So, the results are
acceptable under significant level ε. In fact, we propose a
realistic estimation for the MER of the system in large-scale.
Enhancing the scalability of our analysis can help us to use
it in practice. As an example, consider the scenario, where this
algorithm is implemented in a WSN. If we want to add a new
sensor to our network, we can use our probabilistic analysis to
find a good position for the new node, with the goal of achieving
a small MER. To do that, we can first implement the large-scale
model by DTMC, and then, approximate the model by ε-
approximate probabilistic bisimulation, add the node to the tree,
and measure the MER of the new system. We can repeat this
experiment for different possible positions, and choose the
position which leads to the best MER.
Fig. 5. A binary tree topology for examination of PIF algorithm.
As an example, we use a binary tree topology including 15
nodes as indicated in Figure 5. After approximation, the model
is verified by PRISM probabilistic model-checker. According to
probability of activation/deactivation, we can achieve Min and
Max values for the MER as shown in Figures 6 and 7.
In our experiment, we first add a node to the left-most child
of the tree p8 and compute the difference between MER of the
current tree to the new one that includes 16 nodes. Again, we
repeat the experiment by putting the new node under p10, p12, and
p14 respectively. According to harmonic activation/deactivation
probability assignment rule (Analysis 2) and under condition of
existing at least one active intermediate node (Analysis 1), we
expect to get the best stabilization-time by putting the new node
under p8. Comparing to the other positions, Figure 6 verifies this
fact.
In another experiment, we are going to add two nodes to the
binary tree. In this experiment, we are interested to find optimal
topology of the tree where the worst-case expected stabilization-
time would be as minimal as possible. This experiment is more
complicated than the previous one. It is due to the number of
choices we have, that can effect on the topology of PIF tree. So,
one can put either the same two nodes under one node of the tree
or distribute them under two different nodes (e.g. under p8 and
p10). According to the Analysis 2 and Analysis 4, we expect to
obtain optimal topology by putting the nodes under left-most
child (p8). In this case, we should determine whether we add the
second node under p8 or not. To do that, we add the second node
under p8, and compute the difference of the MERs. Again, we
repeat this computation by putting the second node under p10,
p12, and p14 respectively. Figure 7 illustrates this experiment in
different positions. The result shows that adding two nodes
under p8 decreases stabilization-time of the system. It is due to
Analysis 4 that a balanced tree topology has the optimal structure
in term of stabilization-time.
Fig. 6. The effect of adding the first node on stabilization-time in PIF tree.
Fig. 7. The effect of adding the second node on stabilization-time in PIF tree.
Note that forming an optimal tree topology is not always
viable in the real world. In the design process, we may face
physical constraints that poses some limitations on the position
of the nodes. In this case, our analysis helps the designer to find
a near-optimal tree topology, which leads to a better MER.
VI. RELATED WORK
Self-stabilization was first introduced by Dijkstra in [1],
where he proposed three solutions for token ring. His idea was
to design local algorithms for distributed nodes in such a way
that the system can tolerate any transient fault that may change
the state of the system. Later, researchers looked into designing
self-stabilizing algorithms for well-known problems in
distributed computing, such as spanning tree of three coloring.
Self-stabilization properties including convergence to the set of
legitimate states and staying there (in the absence of faults) are
very important. However, when these algorithms are used in
real-time applications, their quantitative metrics, such as
recovery become as important. Different metrics for recovery
time, including worst-case and average-case recovery time have
been introduced [7]. Self-stabilization is proven to be impossible
for a number of problems, such as token ring in anonymous
rings. One of the solutions proposed for these cases is
probabilistic algorithms [6]. In these algorithms, a process has
more than one choice for action in its local algorithm. Later,
Kwiatkowska et. al. studied the effect of choosing different
probabilities for non-deterministic actions on the worst-case
expected recovery-time [8]. Probabilistic formal analysis has
also been studied in other applications, among which we can
mention the probabilistic analysis of Air Traffic Control (ATC)
[21] or SATS (Small Aircraft Transportation System) [22]. In
such applications, synchronicity is significant due to modelling
of concurrency in aircrafts movements. To model these systems,
on one hand we need to implement synchronicity, and on the
other hand, we require to reflect uncertainties existing in real-
world as probabilistic variables. Probabilistic analysis has
helped to analyze timing metrics in the presence of random
variables, which is worst-case expected time in this case [22].
VII. CONCLUSION
Self-stabilization is a solution to guarantee system recovery
in case of any transient fault. In real-time applications, the
system must be able to give a quick feedback, and hence, the
quantitative metrics, specifically stabilization-time is an
important factor. In this paper, we studied a self-stabilizing
mutual exclusion algorithm, which is designed for safety-critical
application. We argue that considering the probability of request
to enter the critical sections in different parts of the tree, and
calculating the system's worst-case expected stabilization-time,
we can have good suggestions for the network designers. We
have analyzed different topologies of tree structures and
proposed a set of guidelines including a set of observations and
four Analytical results that help the designers to improve the
stabilization-time of the system. The guidelines can be used in
practice by giving hints on how to organize the nodes in large-
scale tree structures, so that the best MER is achieved. We have
also utilized an approximate method to improve the scalability
of our probabilistic analysis. As for the future work, we plan to
work on the scalability of the analysis with the goal of proposing
a way for parametric analysis of algorithms, which is
independent of the number of the nodes in the tree.
REFERENCES
[1] E. W. Dijkstra, “Self-stabilizing systems in spite of distributed control”,
Communications Magazine of the ACM, Vol. 17, Issue 11. pp. 643-644,
1974.
[2] D. Fajardo-Delgado, J. A. Fernández-Zepeda and A. G. Bourgeois,
"Randomized self-stabilizing leader election in preference-based
anonymous trees," 2010 IEEE International Symposium on Parallel &
Distributed Processing, Atlanta, GA, 2010, pp. 1-8.
[3] O. Jubran and O. Theel, "Recurrence in Self-Stabilization," 2015 IEEE
34th Symposium on Reliable Distributed Systems (SRDS), Montreal, QC,
2015, pp. 58-67.
[4] L. Blin, F. Boubekeur and S. Dubois, "A Self-Stabilizing Memory
Efficient Algorithm for the Minimum Diameter Spanning Tree under an
Omnipotent Daemon," 2015 IEEE International Parallel and Distributed
Processing Symposium, Hyderabad, 2015, pp. 1065-1074.
[5] A. Mansouri and M. S. Bouhlel, "An efficient self-stabilizing vertex
coloring algorithm," 2016 SAI Computing Conference (SAI), London,
2016, pp. 655-660.
[6] S. Devismes, S. Tixeuil and M. Yamashita, "Weak vs. Self vs.
Probabilistic Stabilization," 2008 The 28th International Conference on
Distributed Computing Systems, Beijing, 2008, pp. 681-688.
[7] T. Herman, “Probabilistic self-stabilization”, In Information Processing
Letters, Vol. 35, Issue 2, pp. 63-67, 1990.
[8] M. Kwiatkowska, G. Norman and D. Parker, “Probabilistic Verication of
Herman's Self-Stabilisation Algorithm” Springer Journal of Formal
Aspects of Computing. Vol 24, Issue 4, pp. 661-670, 2012.
[9] S. Dolev, Self-Stabilization, MIT Press, Cambridge, MA, 2000, pp. 5-56.
[10] R. W. Buskens and R. P. Bianchini, "Self-stabilizing mutual exclusion in
the presence of faulty nodes," Twenty-Fifth International Symposium on
Fault-Tolerant Computing, Pasadena, CA, USA, 1995, pp. 144-153.
[11] M. Mizuno, M. Nesterenko and H. Kakugawa, "Lock-based self-
stabilizing distributed mutual exclusion algorithms," Proceedings of 16th
International Conference on Distributed Computing Systems, 1996, pp.
708-716.
[12] O. Jubran and O. Theel, "Exploiting Synchronicity for Immediate
Feedback in Self-Stabilizing PIF Algorithms," 2014 IEEE 20th Pacific
Rim International Symposium on Dependable Computing, Singapore,
2014, pp. 106-115.
[13] G. Bian and A. Abate, “On the relationship between bisimulation and
trace equivalence in an approximate probabilistic context,” 20th
International Conference on Foundations of Software Science and
Computation Structures, Vol. 10203 of LNCS, 2017, pp. 321–337.
[14] M. Demirbas and A. Arora, "Specification-Based Design of Self-
Stabilization," in IEEE Transactions on Parallel and Distributed Systems,
vol. 27, no. 1, pp. 263-270, Jan. 1 2016.
[15] A. Abate, “Approximation Metrics Based on Probabilistic Bisimulations
for General State-Space Markov Processes: A Survey,” In Electronic
Notes in Theoretical Computer Science, Vol. 297, 2013, pp. 3-25.
[16] D. Bein, A. K. Datta, M. H. Karaata and S. Zaman, "An optimal snap-
stabilizing multi-wave algorithm," 25th IEEE International Conference on
Distributed Computing Systems Workshops, 2005, pp. 35-41.
[17] P. Kissmann and J. Homann, “BDD ordering heuristics for classical
planning,” Journal of Articial Intelligence Research, Issue 51, pp. 779-
804, 2014.
[18] M. Alidoost Nia, “An automated PRISM code generator for PIF self-
stabilizing algorithm”, 2017, https://github.com/alidoostnia/pif/.
[19] C. Baier and J. P. Katoen, Principles of model checking, MIT Press,
Cambridge, MA, 2008, pp. 19-89.
[20] N. Fallahi, B. Bonakdarpour and S. Tixeuil, "Rigorous Performance
Evaluation of Self-Stabilization Using Probabilistic Model Checking,"
IEEE 32nd International Symposium on Reliable Distributed Systems,
Braga, 2013, pp. 153-162.
[21] Y. Zhao and K. Y. Rozier, "Probabilistic model checking for comparative
analysis of automated air traffic control systems," 2014 IEEE/ACM
International Conference on Computer-Aided Design (ICCAD), San Jose,
CA, 2014, pp. 690-695.
[22] M. U. Sardar, N. Afaq, K. A. Hoque, T. T. Johnson, and O. Hasan,
“Probabilistic Formal Verification of the SATS Concept of Operation,”
8th NASA Formal Methods Symposium, MN, USA, 2016, pp. 191-205.