Content uploaded by Mehran Alidoost Nia

Author content

All content in this area was uploaded by Mehran Alidoost Nia on Jul 01, 2018

Content may be subject to copyright.

Content uploaded by Mehran Alidoost Nia

Author content

All content in this area was uploaded by Mehran Alidoost Nia on Jun 30, 2018

Content may be subject to copyright.

Probabilistic Analysis of Self-Stabilizing Systems:

A Case Study on a Mutual Exclusion Algorithm

Mehran Alidoost Nia and Fathiyeh Faghih

DRTS Research Lab, School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran

{alidoostnia, f.faghih}@ut.ac.ir

Abstract— The heterogeneity in cyber-physical systems (CPS)

and the diverse situations that they may face with, along with the

environmental hazards raise the need to self-stabilization. The

uncertain nature of CPS necessitates a probabilistic view for

analyzing the system stabilization-time that is a highly critical

metric in distributed/time-sensitive applications. Calculating the

worst-case expected stabilization-time and possible improvements

help to have safer designs of CPS applications. In this paper, a

mutual exclusion algorithm based on PIF (Propagation of

Information with Feedback) self-stabilizing algorithm is selected

in synchronous environment as a case study. Using probabilistic

analysis, we present a set of guidelines for utilizing this algorithm

in time-sensitive applications. We have also utilized an

approximation method for improving the scalability of our

probabilistic analysis and did a set of experiments to show how this

analysis could be used in the design of topologies with the goal of

having an optimal worst-case expected stabilization-time. Our

results show that using this approach, we can significantly improve

the worst-case expected stabilization-time.

Keywords— cyber-physical systems, probabilistic formal

analysis, self-stabilizing algorithms, propagation of information

with feedback, real-time systems.

I. INTRODUCTION

Cyber-Physical Systems (CPS) cover various time-sensitive

applications, ranging from medical systems to wireless sensor

networks (WSN). Environmental hazards play a decisive role in

terms of reliability and performance of the system. In such an

unstable environment, transient faults that cause temporary

disorder in system's global state should be considered to avoid

system failure in safety critical and time-sensitive applications.

Self-stabilization refers to a special type of fault-tolerance in

distributed systems, first introduced by Dijkstra [1]. In a self-

stabilizing system, if the system goes outside its set of legitimate

states (LS) because of a set of transient faults or bad

initialization, it is guaranteed to recover back to its LS in a finite

number of steps. Self-stabilization has been studied extensively

and several self-stabilizing algorithms have been proposed for

different problems used in applications such a networking and

robotics. Among those, we can mention the leader election [2],

mutual exclusion [3], spanning tree [4], and three coloring [5].

Convergence is the key property of self-stabilizing systems.

However, quantitative metrics such as recovery time (or namely

stabilization-time) can be as critical, when self-stabilization is

used to design a system for real-time and safety-critical tasks.

Worst-case stabilization-time of a self-stabilizing algorithm is

defined as the largest number of steps it takes to recover to its

set of legitimate states starting from any arbitrary initial state.

Self-stabilization is proved to be impossible for a number of

problems, including token circulation and leader election in

anonymous networks. In order to deal with the impossibility

results, a number of variants of self-stabilization have been

introduced, including weak-stabilization [6] and probabilistic-

stabilization [6]. In a probabilistic self-stabilization, the system

recovers to its set of legitimate states with a probability close to

one. Herman in [7] proposed a probabilistic algorithm for token

ring in an anonymous ring. A precise evaluation of the

stabilization-time for this probabilistic algorithm is known to be

a very difficult task. Kwiatkowska et. al. in [8] used a

probabilistic verification method to evaluate the worst-case

expected stabilization-time. Using the PRISM model checker,

they did a number of experiments on the rings with different

sizes and used the analysis results for improving the worst-case

expected stabilization-time of the algorithm.

Mutual exclusion is a well-known problem in distributed

computing, which has been extensively studied in the self-

stabilization community as well. Dijkstra was the first to propose

three solutions for mutual exclusion in a ring in his seminal

paper on self-stabilization. After that a number of self-stabilizing

algorithms have been proposed for mutual exclusion in different

topologies and settings [9-11]. Jubran and Theel in [12]

proposed an algorithm for mutual exclusion in a tree with the

goal of having a better response time in a synchronous setting,

where the request to enter the critical section happens rarely. For

example, consider a set of sensors that are used to detect rare

events such as fire or flood. Whenever a sensor detects

something unusual, it should be able to get all resources in a

mutually exclusive manner. Therefore, it requests to enter the

critical section, and the expectation is to grant its request in a

small amount of time. The authors have done a theoretical

analysis on the response time as well as the stabilization-time (in

case of transient faults). We argue that we could do a better

analysis of the stabilization-time if we consider the probability

of request in each node. In order to the probabilistic analysis, we

have used a similar approach to the one introduced in [7] and

computed the worst-case expected stabilization-time. Note that

this algorithm has not been initially designed as a probabilistic

self-stabilizing protocol. However, considering the probabilistic

nature of nodes, we have first translated it to a probabilistic

algorithm, and then used the PRISM tool to analyze the worst-

case expected stabilization-time for different probabilities of

request in the nodes of different topologies. We have used the

analysis results to give a number of suggestions to the designers

who want to use this protocol in real-time applications.

State space explosion is a well-known problem in the model-

checking community. Probabilistic model checking suffers from

the same limitation as well. In order to deal with this limitation,

we use an approximation method [13] to reduce the state space

and then do the probabilistic analysis. Our results show that this

method can extensively improve the scalability of our analysis.

We argue that this approach can be used in the analysis of

probabilistic self-stabilizing algorithms to improve their

scalability, as the one presented in [14].

The rest of the paper is structured as follows. In the next

section, technical background of the work is discussed. In

Section 3, we have a brief review on the SSUPS self-stabilizing

algorithm. Section 4 is dedicated to the analysis of basic tree

structures and the resulting guidelines. In Section 5, we propose

a method to deal with the complexity issue in large-scale

systems. In section 6, we review related work. Finally, in Section

7, we give the concluding remarks.

II. BACKGROUND

In this section, we give a brief overview on Discrete-Time

Markov Chain (DTMC) that is a basis for modeling and analysis

of self-stabilizing in this research work [15].

Definition 1. A discrete-Time Markov Chain (DTMC) over a

set of atomic propositions is defined as a tuple

, where is a (ﬁnite) set of states, is a set of

possible initial states, is a probabilistic

transition relation, in which

for all ,

and maps each state to a subset of atomic

propositions.

Properties to be verified over DTMCs are written in logics such

as PCTL. A key element of these logics is the probabilistic

operator . A formula , where , p is a

probability ( ) and is a path formula, asserts that the

probability of being true satisﬁes the bound .

More formally, the syntax of PCTL is defined as follows, where

denotes the Until operator, X indicates the Next operator,

, and .

(1)

(2)

A distributed system is defined over a set of variables V, and

is composed of a set of processes Π. A state of the system is a

valuation of all variables. The set of all possible states of the

system is called the state space, represented by . In a self-

stabilizing distributed system, the set of legitimate states is a

subset of the state space, in which the system is in a valid

configuration ( ). A self-stabilizing system can be

represented as a DTMC, where the set of atomic propositions

in which L stands for legitimate states.

Definition 4. A distributed system is self-stabilizing if and

only if the following two conditions hold:

A. Convergence: Starting from any arbitrary state, the system

will eventually converge to its set of legitimate states LS, with

probability 1.

B. Closure: For each s

LS, if a state s’ exists with

, then it follows that Note that denotes the

probability of transition from s to s’.

In order to measure the expected stabilization time over a

DTMC, a reward rate is given to each transition.

Definition 4. Expected reward of a path , over

a DTMC M is denoted as , and computed by (3), where

represents the reward rate assigned to transition

, which is also called step size in i-th step:

In a self-stabilizing system, we are interested in computing

the maximum expected reward () or worst-case expected

stabilization-time. Considering to be the

set of paths starting from any arbitrary initial state to LS,

is computed according to (4). In this paper, we use and

worst-case expected stabilization-time interchangeably.

We map the notion of expected reward to expected-time, as

the reward rate is adjusted by timing parameters of the system.

We then derive maximum expected-time using the

formula.

Fig. 1. A simple example of a self-stabilizing system modeled by DTMC.

As an example, consider the self-stabilizing system modeled

as a DTMC in Figure 1. In this example, the set of states is

and the set of legitimate states is , which

is labeled by . For the sake of simplicity, we consider the

reward rate of every transition equal to 1 namely .

Considering all paths starting from outside the legitimate states

to a state in LS, we must compare 13 values, computed by (3).

For instance, we calculate the expected value of path π1=s0s3s5s8

as follows:

Applying (4), MER equals to 2.7, which corresponds to the

paths π2=s0s3s6s9 and π4=s1s3s6s9.

III. PIF SELF-STABILIZING ALGORITHM

Propagation of Information with Feedback (PIF) algorithm is

usually designed for the tree topologies. It starts by sending a

broadcast message from the initiator (root), and all processes

except for the terminal ones (leaves) participate in the broadcast

by sending the message to their descendants. Once the message

reaches a leaf node, it acknowledges the initiator by sending a

feedback to its ancestors. The broadcast finishes once the root

gets feedback from all its children. There are different self-

stabilizing PIF algorithms proposed in the literature [16]. In this

paper, we focus on one of the PIF algorithms proposed for a

synchronous network, called SSUPS [12]. It injects

synchronicity to PIF and follows Dijkstra's 4-state machine [1].

The main application of the algorithm is to assign mutually

exclusive privileges to the processes in a tree to access critical

sections.

A. Topology

The nodes are structured in a tree, where each node has a set

of local variables <xi, upi, li, ai, pi>. xi and upi are the state

variables (similar to the variables in Dijkstra's 4-state machine),

li is a pointer to one of the process's neighbors (or no process), ai

illustrates activation status (whether it requests a privilege or

not), and pi determines the privilege received by the process. By

default, up variable is true for the root and false for the leaf

nodes. Each node can read its own, as well as its neighbors' local

variables, including its parent and set of children, and can

change its own variables according to the guarded commands in

its local algorithm.

B. Algorithm

The system uses four types of tokens including search token,

positive/negative feedback token, execute token, and complete

token that are respectively listed in (5)-(9). Note that ch indicates

a child of the node p, θp refers to the parent of node p, and Cp

indicates the set of children of node p.

(6)

(7)

The algorithm is divided into two different cycles. In the first

cycle, the algorithm searches for active nodes using the first two

tokens (Equations (5)-(7)). The second cycle includes sending

the execution token and receiving the completion signal by the

root which is accomplished by the other two tokens (Equations

(8)-(9)). The root initializes the entire feedback loop and it has

two states including id (idle) and rq (request). Once it sends a

request signal to its children, they should propagate the same

request to their descendants. Intermediate nodes have three

states including id (idle), rq (request), and rp (reply). In the first

cycle, they behave just like the root and issue requests to their

children. But in the second cycle, when feedbacks are being sent,

they behave like a leaf node and issue reply to their ancestors.

Leaves are non-root processes with a single neighbor.

The algorithm consists of three sub-algorithms associated

with the root, intermediates and leaves that are listed as

Algorithms 1-3. We have used the notation similar to the input

language of PRISM. In the algorithms, indices 1, 2, and 3 stand

for root, intermediate, and leaf nodes respectively. The primary

condition to receive an execute token by a process is that it must

be active and referred by a causality chain of other nodes from

the root to itself. The chain is formed by the l variables.

The variable l is significant to detect an execution path. It

works just like a pointer which makes a reference to the other

nodes. A set of references construct an execution/active path. If

there is no execution path in the tree, root takes the execute

token.

Algorithm 1. Formal implementation of root process using PRISM style.

Algorithm 2. Formal implementation of intermediates using PRISM style.

Algorithm 3. Formal implementation of leaf process using PRISM style.

C. Probabilistic Modification of the Algorithm

In order to do probabilistic analysis on the algorithm, we first

needed to find the nondeterministic parameter of it. Investigating

the Algorithms 1-3, we can find that the activation variable of

the processes is a nondeterministic variable, which cannot be

controlled by the algorithm. The variable “a” reflects whether a

process is active or not. We can use it to transform the algorithm

into a probabilistic one, since active changes are independent of

the algorithm. Note that a process is active, when it has a request

to enter its critical section. Activation/deactivation of a process

directly effects on the existence of an execution path.

The original algorithm is modified using composition of

commands. Theses commands are composed as follows, where

each ri is a probabilistic variable and “+” is the choice operator

in the PRISM syntax.

1: module process1

2: x1 : bool ;

3: l1 : [-1..2];

4: a1 : bool;

5: p1 : bool;

9: (

10: endmodule

1: module process2

2: x2 : bool;

3: up2: bool;

4: l2 : [-1..3];

5: a2 : bool;

6: p2 : bool;

false;

14: up2 & !x2 & (

15: endmodule

1: module process3;

2: x3 : bool;

3: l3 : [-1..3] ;

4: a3 : bool;

5: p3 : bool;

8: (x2! =x3)

10: endmodule

Original commands in Algorithm 1:

(!x1)&= true)

(!x1)&(x1 = x2 & !u

The corresponding probabilistic (composed) command:

(!x1)&(x1 = x2 & !up2) r0: = true)& +

(1-r0):

Original commands in Algorithm 2:

(

The corresponding probabilistic (composed) command:

(x1! = x2) & !x2 r1: +

(1-r1):

Original commands in Algorithm 3:

The corresponding probabilistic (composed) command:

r2: +

(1-r2):

To make sure that the modified representation of the

algorithm satisfies the self-stabilization conditions, we have

checked the closure and convergence properties written in PCTL

logic in the syntax of PRISM in (10) and (11). Briefly speaking,

the convergence property checks whether starting from any

arbitrary initial state, the system reaches a legitimate state with

probability 1, and the closure property checks whether the

system stays in LS, after it reaches there. In order to write these

properties, we have used the formula stable, which formulates

the legitimate states conditions (LS conditions). The LS

conditions are taken from [12], and presented in Algorithm 4.

Note that the LS conditions depend on factors such as existence

of execution paths, and hence, they depend on the tree structure.

The formulation in Algorithm 4 corresponds to a simple tree

structure including a root process, an intermediate and a leaf

node.

In order to find the maximum expected stabilization-time of

the algorithm in PRISM, each transition is assigned a reward

denoted by R, which for simplicity is considered to be 1 in this

work. Using (12), PRSIM finds the MER value of the system.

Algorithm 4. LS conditions for a simple tree of 3 nodes in a line, in PRISM

IV. PROBABILISTIC ANALYSIS & EXPERIMENTAL RESULTS

In this section, we discuss our results on probabilistic analysis

of the algorithm, where we investigate the effect of the

activation probability of each processing node on the maximum

expected stabilization-time. In the following, the index of each

variable corresponds to the process index. As shown in Figure

2, we have selected a set of basic topologies for probabilistic

analysis. For the sake of space, we cannot include all tree

topologies consisting of three or four processes.

To mention the activation probability of the root, we have

used the probabilistic variable r0. In other words, in each

analysis, the root is considered to be active with probability (1-

r0), and deactivated with probability r0. Similarly, the

deactivation probability of each process is denoted by the

sequence of r1, r2, …, rn. The effect of these probabilistic

variables is shown in a set of probability-MER charts in Figure

3, where the MER denotes the worst-case expected stabilization-

time of the system.

From now on, we propose our findings in the form of a set

of observations and analytical results. The former refers to our

observations taken from the experimental results on specific tree

structure, while the latter includes what we have observed on the

results of all our experiments on the limited size tree structures.

Fig. 2. Basic PIF tree structures used for probabilistic analysis.

Figure 3 depicts the effect of the activation probability of

different nodes on the worst-case expected stabilization-time in

the tree structures shown in Figure 2. For the sake of better

presentation of the results, we have placed probability in y-axes,

and the MER in x-axes.

Figure 3-(a) shows the relationship between r0 and the MER

in T1 (Figure 2). As can be easily observed, the MER decreases,

as the deactivation probability of the root increases in this tree

structure.

In Figure 3-(b), we show our results on studying the worst-

case expected stabilization-time with different values of r1

(deactivation probability of node P1) in T1. As can be seen, the

MER has direct relationship with r1. We have also studied the

MER with different probabilities of r2 and r3. Interestingly, we

have observed that r2 and r3 have negligible effect on the worst-

case expected stabilization-time. The other interesting result is

that as we repeated this experiment for different values of r1, the

effect of r2 and r3 on the MER decreased as r1 increased.

Figure 3-(c) depicts the relationship between r0 and the

MER in T2. We have magnified the chart, so we can easily

observe some peaks around probabilities 0.45 and 0.5. The best

stabilization-time is achieved with deactivation probability 0.4.

In Figure 3-(d), we illustrate the relationship between MER

and other probabilistic variables including r1, r2, r3 and r4 in

T2. As we can see, MER values corresponding to deactivation

probability of middle-branch (r2 and r4) has symmetric

relationship comparing to values in other branches (r1 and r3).

1: formula f1 = (up2&x2&l2= -1&a3);

2: formula f2 = (!up2&x2&l1 = id2)|((!up2&x2&l1 = id1)&

(x3&l2 =id3))|((!up2&x2&l2 = -1&!a2) & (x3&l3 = -1&!a3);

3: formula f3 = (x1)&((x1&l1 = -1)|(x3&up3&l3 = -1))

&((x2&l1 = id2)|(x3&l2 = id3&x2));

4: formula f4 = (!x1)&((a2&!a3&l1=id2)|(a3&l2= d3&l1=d2));

5: formula stable = f1 | f2 | f3 | f4;

Fig. 3. Probability-MER (Worst-Case Expected Stabilization-Time) graph generated by different tree structures.

As an important observation, we found that increasing the

deactivation probability of the nodes in longer paths (having

intermediate nodes) increases the system MER. We can see this

trend in r2 and r4 corresponding to nodes P2 and P4 in T2.

T3 has a linear and simple structure. Figure 3-(e) illustrates

the relationship between r0 and the MER of the system. As an

observation, we can see that the MER has an indirect

relationship with the deactivation probability of the root. Also,

Figure 3-(f) indicates the relationship between deactivation

probability of the first intermediate (r1) and the MER. We have

also observed that in linear paths, the probability of deactivation

in the first intermediate (r1) has higher impact on the MER

comparing to the other intermediates (e.g. r2). Overall, in this

structure, the optimal MER is achieved when deactivation

probability of intermediates is increased.

Figure 3-(g) shows the relationship between the MER and

the deactivation probability of the root (r0) in T4, which has a

balanced topology. As we can observe, the best MER is achieved

by the deactivation probability r0=0.5. Note that for the values

r0<0.5, the MER was very big, and hence, we removed this

range for the sake of presentation. Figure 3-(h) shows the

relationship between the deactivation probability of other nodes

(r1, r2, r3, r4) and the MER in T4. It shows that the optimal

MER is achieved when the activation probability of intermediate

is kept very small. In our experiment, we set r2 to 1, and decrease

the value of r1. As we can observe, it led to increase in the MER

of the system. If we repeat the same experiment by keeping r1=1

and change r2, we will get similar results. We did this

experiment in a manner that the leaves have a constant

probability of deactivation. The effect of leaves’ probability (r3

and r4) is negligible.

Figure 3-(i) depicts the relationship between r0 and the MER

in T5, which shows that optimal worst-case expected

stabilization-time is achieved when the probability of r0 is equal

to 0.5. Figure 3-(j) illustrates the relationship between (r1, r2,

r3, r4) and the MER in the same tree. We can observe that the

increase in activation probability of nodes in longer branch of

T5 (P2, P3, P4) leads to better MER. Note that the left branch does

not include any intermediate, and it effects on this experiment as

we saw earlier. So, we keep the probability of this branch

constant when experimenting other probabilities (r2, r3, r4).

Studying the analysis results on a set of tree structure leads

to the following analytical facts.

Analysis 1. Existence of at least one active intermediate node

improves the stabilization-time of the system. We can highlight

this achievement through T1 and Figure 3-(b).

Analysis 2. In the design process, we can place the nodes with

highest probability of being deactivated in longer paths. We may

refer to the structure T2 and experiment shown in Figure 3-(d).

Analysis 3. Activation/deactivation of the first intermediate

node has the most impact on the MER of the system, if the tree

has linear structure. As an example, we can refer to T3 and the

analysis result depicted in Figure 3-(f).

Analysis 4. If the tree has symmetric paths, the optimal

probability distribution among active branches will be

symmetric to achieve the best MER. We can verify this fact by

looking at r1 and r2 in Figure 3-(h).

Table 1 shows the state-space of each tree, which ranges

from thousands to billions of states. We have used MTBDD

engine [17] that enables us to analyze billions of states. To do

the experiments, we have developed an automatic code

generator for different tree structures, which is available at [18].

The generated codes can be used directly in PRISM model-

checker. It helps automating the process of probabilistic

analysis.

TABLE I. MER IN DIFFERENT SCENARIOS RESULTED FROM PIF TREES.

Structure

# of States

# of Transitions

Avg. Comp.

Time (s)

T1

5120000

17280000

1.51

T2

509607936

2436562944

6.13

T3

10240000

29120000

1.295

T4

1019215872

4013162496

20.692

T5

1019215872

4299816960

17.478

V. DISCUSSION

In this section, we first discuss the results of our analysis.

Then, we extend the set of Analysis findings to large-scale

systems.

A. Discussion and Comparison

So far, we have analyzed the SSUPS algorithm under

different tree topologies with limited size. The results show the

effect of activation probability of each process on the MER. In

this section, we discuss the significance of our analysis results

by demonstrating the amount of decrease we can achieve on the

MER if we use these findings in the design of the trees that use

this algorithm.

Figure 4 shows the best and the worst MER we could

measure for each tree structure in Figure 2. We achieved the best

MER by utilizing the results of our probabilistic analysis, as

discussed in the previous section. This chart shows how much a

good design, using probabilistic analysis, can reduce the MER,

compared to the MER resulting from a superficial design.

Fig. 4. Min and Max MER measured for each tree structure (see Figure 2)

TABLE II. MER IN DIFFERENT SCENARIOS RESULTED FROM PIF TREES.

Structure

Min MER

Max MER

# of dist. processes

T1

9.49

37.38

4

T2

9.37

23.75

5

T3

7.47

18.5

4

T4

10.28

16.3

5

T5

9.54

14.99

5

By looking at Table 2, the difference between minimum and

maximum MER is more perceptible. It may be expected that the

MER should be increased by increasing the number of

processes. However, our results on T1 (with 4 processes) and T2

(with five processes) show that it is not always the case. It shows

that in this algorithm, number of processes is not the only

effective parameter on the worst-case stabilization-time.

We argue that the result of probabilistic analysis of a self-

stabilizing algorithm can be used in practice. In the case of this

paper, the result can be easily applied by having a good guess

about the activation probabilities of different nodes. In general,

if the probability of the valuation of a system variable is studied,

one way to utilize the result of the probabilistic analysis is to use

state encoding. In this approach, the probability of a specific

value of the variable is decreased/increased purposefully [20].

B. Extending the Analysis

The main limitation of our analysis results is that they are

limited to small-sized trees, which is due to the well-known

problem in model checking, called the state space explosion

[19]. To extend our results and increase the scalability of our

analysis, we use an approximation technique, called ε-

approximate probabilistic bisimulation [13]. It reduces state-

space of DTMC using bisimulation technique under ε, which is

an approximation level for transition probability function. On

one hand, this approach helps out to reduce state-space of the

system and facilitate the analysis procedure. On the other hand,

it preserves performance metrics of DTMC under bisimulation

reduction like the MER. However, this reduction has some cost

itself. As a consequence of this approach, we approximate

bisimilar states that have some differences in transition

probability. According to ε-approximate rule, the differences in

transition probability must not exceed ε. So, the results are

acceptable under significant level ε. In fact, we propose a

realistic estimation for the MER of the system in large-scale.

Enhancing the scalability of our analysis can help us to use

it in practice. As an example, consider the scenario, where this

algorithm is implemented in a WSN. If we want to add a new

sensor to our network, we can use our probabilistic analysis to

find a good position for the new node, with the goal of achieving

a small MER. To do that, we can first implement the large-scale

model by DTMC, and then, approximate the model by ε-

approximate probabilistic bisimulation, add the node to the tree,

and measure the MER of the new system. We can repeat this

experiment for different possible positions, and choose the

position which leads to the best MER.

Fig. 5. A binary tree topology for examination of PIF algorithm.

As an example, we use a binary tree topology including 15

nodes as indicated in Figure 5. After approximation, the model

is verified by PRISM probabilistic model-checker. According to

probability of activation/deactivation, we can achieve Min and

Max values for the MER as shown in Figures 6 and 7.

In our experiment, we first add a node to the left-most child

of the tree p8 and compute the difference between MER of the

current tree to the new one that includes 16 nodes. Again, we

repeat the experiment by putting the new node under p10, p12, and

p14 respectively. According to harmonic activation/deactivation

probability assignment rule (Analysis 2) and under condition of

existing at least one active intermediate node (Analysis 1), we

expect to get the best stabilization-time by putting the new node

under p8. Comparing to the other positions, Figure 6 verifies this

fact.

In another experiment, we are going to add two nodes to the

binary tree. In this experiment, we are interested to find optimal

topology of the tree where the worst-case expected stabilization-

time would be as minimal as possible. This experiment is more

complicated than the previous one. It is due to the number of

choices we have, that can effect on the topology of PIF tree. So,

one can put either the same two nodes under one node of the tree

or distribute them under two different nodes (e.g. under p8 and

p10). According to the Analysis 2 and Analysis 4, we expect to

obtain optimal topology by putting the nodes under left-most

child (p8). In this case, we should determine whether we add the

second node under p8 or not. To do that, we add the second node

under p8, and compute the difference of the MERs. Again, we

repeat this computation by putting the second node under p10,

p12, and p14 respectively. Figure 7 illustrates this experiment in

different positions. The result shows that adding two nodes

under p8 decreases stabilization-time of the system. It is due to

Analysis 4 that a balanced tree topology has the optimal structure

in term of stabilization-time.

Fig. 6. The effect of adding the first node on stabilization-time in PIF tree.

Fig. 7. The effect of adding the second node on stabilization-time in PIF tree.

Note that forming an optimal tree topology is not always

viable in the real world. In the design process, we may face

physical constraints that poses some limitations on the position

of the nodes. In this case, our analysis helps the designer to find

a near-optimal tree topology, which leads to a better MER.

VI. RELATED WORK

Self-stabilization was first introduced by Dijkstra in [1],

where he proposed three solutions for token ring. His idea was

to design local algorithms for distributed nodes in such a way

that the system can tolerate any transient fault that may change

the state of the system. Later, researchers looked into designing

self-stabilizing algorithms for well-known problems in

distributed computing, such as spanning tree of three coloring.

Self-stabilization properties including convergence to the set of

legitimate states and staying there (in the absence of faults) are

very important. However, when these algorithms are used in

real-time applications, their quantitative metrics, such as

recovery become as important. Different metrics for recovery

time, including worst-case and average-case recovery time have

been introduced [7]. Self-stabilization is proven to be impossible

for a number of problems, such as token ring in anonymous

rings. One of the solutions proposed for these cases is

probabilistic algorithms [6]. In these algorithms, a process has

more than one choice for action in its local algorithm. Later,

Kwiatkowska et. al. studied the effect of choosing different

probabilities for non-deterministic actions on the worst-case

expected recovery-time [8]. Probabilistic formal analysis has

also been studied in other applications, among which we can

mention the probabilistic analysis of Air Traffic Control (ATC)

[21] or SATS (Small Aircraft Transportation System) [22]. In

such applications, synchronicity is significant due to modelling

of concurrency in aircrafts movements. To model these systems,

on one hand we need to implement synchronicity, and on the

other hand, we require to reflect uncertainties existing in real-

world as probabilistic variables. Probabilistic analysis has

helped to analyze timing metrics in the presence of random

variables, which is worst-case expected time in this case [22].

VII. CONCLUSION

Self-stabilization is a solution to guarantee system recovery

in case of any transient fault. In real-time applications, the

system must be able to give a quick feedback, and hence, the

quantitative metrics, specifically stabilization-time is an

important factor. In this paper, we studied a self-stabilizing

mutual exclusion algorithm, which is designed for safety-critical

application. We argue that considering the probability of request

to enter the critical sections in different parts of the tree, and

calculating the system's worst-case expected stabilization-time,

we can have good suggestions for the network designers. We

have analyzed different topologies of tree structures and

proposed a set of guidelines including a set of observations and

four Analytical results that help the designers to improve the

stabilization-time of the system. The guidelines can be used in

practice by giving hints on how to organize the nodes in large-

scale tree structures, so that the best MER is achieved. We have

also utilized an approximate method to improve the scalability

of our probabilistic analysis. As for the future work, we plan to

work on the scalability of the analysis with the goal of proposing

a way for parametric analysis of algorithms, which is

independent of the number of the nodes in the tree.

REFERENCES

[1] E. W. Dijkstra, “Self-stabilizing systems in spite of distributed control”,

Communications Magazine of the ACM, Vol. 17, Issue 11. pp. 643-644,

1974.

[2] D. Fajardo-Delgado, J. A. Fernández-Zepeda and A. G. Bourgeois,

"Randomized self-stabilizing leader election in preference-based

anonymous trees," 2010 IEEE International Symposium on Parallel &

Distributed Processing, Atlanta, GA, 2010, pp. 1-8.

[3] O. Jubran and O. Theel, "Recurrence in Self-Stabilization," 2015 IEEE

34th Symposium on Reliable Distributed Systems (SRDS), Montreal, QC,

2015, pp. 58-67.

[4] L. Blin, F. Boubekeur and S. Dubois, "A Self-Stabilizing Memory

Efficient Algorithm for the Minimum Diameter Spanning Tree under an

Omnipotent Daemon," 2015 IEEE International Parallel and Distributed

Processing Symposium, Hyderabad, 2015, pp. 1065-1074.

[5] A. Mansouri and M. S. Bouhlel, "An efficient self-stabilizing vertex

coloring algorithm," 2016 SAI Computing Conference (SAI), London,

2016, pp. 655-660.

[6] S. Devismes, S. Tixeuil and M. Yamashita, "Weak vs. Self vs.

Probabilistic Stabilization," 2008 The 28th International Conference on

Distributed Computing Systems, Beijing, 2008, pp. 681-688.

[7] T. Herman, “Probabilistic self-stabilization”, In Information Processing

Letters, Vol. 35, Issue 2, pp. 63-67, 1990.

[8] M. Kwiatkowska, G. Norman and D. Parker, “Probabilistic Verication of

Herman's Self-Stabilisation Algorithm” Springer Journal of Formal

Aspects of Computing. Vol 24, Issue 4, pp. 661-670, 2012.

[9] S. Dolev, Self-Stabilization, MIT Press, Cambridge, MA, 2000, pp. 5-56.

[10] R. W. Buskens and R. P. Bianchini, "Self-stabilizing mutual exclusion in

the presence of faulty nodes," Twenty-Fifth International Symposium on

Fault-Tolerant Computing, Pasadena, CA, USA, 1995, pp. 144-153.

[11] M. Mizuno, M. Nesterenko and H. Kakugawa, "Lock-based self-

stabilizing distributed mutual exclusion algorithms," Proceedings of 16th

International Conference on Distributed Computing Systems, 1996, pp.

708-716.

[12] O. Jubran and O. Theel, "Exploiting Synchronicity for Immediate

Feedback in Self-Stabilizing PIF Algorithms," 2014 IEEE 20th Pacific

Rim International Symposium on Dependable Computing, Singapore,

2014, pp. 106-115.

[13] G. Bian and A. Abate, “On the relationship between bisimulation and

trace equivalence in an approximate probabilistic context,” 20th

International Conference on Foundations of Software Science and

Computation Structures, Vol. 10203 of LNCS, 2017, pp. 321–337.

[14] M. Demirbas and A. Arora, "Specification-Based Design of Self-

Stabilization," in IEEE Transactions on Parallel and Distributed Systems,

vol. 27, no. 1, pp. 263-270, Jan. 1 2016.

[15] A. Abate, “Approximation Metrics Based on Probabilistic Bisimulations

for General State-Space Markov Processes: A Survey,” In Electronic

Notes in Theoretical Computer Science, Vol. 297, 2013, pp. 3-25.

[16] D. Bein, A. K. Datta, M. H. Karaata and S. Zaman, "An optimal snap-

stabilizing multi-wave algorithm," 25th IEEE International Conference on

Distributed Computing Systems Workshops, 2005, pp. 35-41.

[17] P. Kissmann and J. Homann, “BDD ordering heuristics for classical

planning,” Journal of Articial Intelligence Research, Issue 51, pp. 779-

804, 2014.

[18] M. Alidoost Nia, “An automated PRISM code generator for PIF self-

stabilizing algorithm”, 2017, https://github.com/alidoostnia/pif/.

[19] C. Baier and J. P. Katoen, Principles of model checking, MIT Press,

Cambridge, MA, 2008, pp. 19-89.

[20] N. Fallahi, B. Bonakdarpour and S. Tixeuil, "Rigorous Performance

Evaluation of Self-Stabilization Using Probabilistic Model Checking,"

IEEE 32nd International Symposium on Reliable Distributed Systems,

Braga, 2013, pp. 153-162.

[21] Y. Zhao and K. Y. Rozier, "Probabilistic model checking for comparative

analysis of automated air traffic control systems," 2014 IEEE/ACM

International Conference on Computer-Aided Design (ICCAD), San Jose,

CA, 2014, pp. 690-695.

[22] M. U. Sardar, N. Afaq, K. A. Hoque, T. T. Johnson, and O. Hasan,

“Probabilistic Formal Verification of the SATS Concept of Operation,”

8th NASA Formal Methods Symposium, MN, USA, 2016, pp. 191-205.