Conference PaperPDF Available

Probabilistic analysis of self-stabilizing systems: A case study on a mutual exclusion algorithm

Authors:

Abstract and Figures

The heterogeneity in cyber-physical systems (CPS) and the diverse situations that they may face with, along with the environmental hazards raise the need to self-stabilization. The uncertain nature of CPS necessitates a probabilistic view for analyzing the system stabilization-time that is a highly critical metric in distributed/time-sensitive applications. Calculating the worst-case expected stabilization-time and possible improvements help to have safer designs of CPS applications. In this paper, a mutual exclusion algorithm based on PIF (Propagation of Information with Feedback) self-stabilizing algorithm is selected in synchronous environment as a case study. Using probabilistic analysis, we present a set of guidelines for utilizing this algorithm in time-sensitive applications. We have also utilized an approximation method for improving the scalability of our probabilistic analysis and did a set of experiments to show how this analysis could be used in the design of topologies with the goal of having an optimal worst-case expected stabilization-time. Our results show that using this approach, we can significantly improve the worst-case expected stabilization-time.
depicts the effect of the activation probability of different nodes on the worst-case expected stabilization-time in the tree structures shown in Figure 2. For the sake of better presentation of the results, we have placed probability in y-axes, and the MER in x-axes. Figure 3-(a) shows the relationship between r0 and the MER in T1 (Figure 2). As can be easily observed, the MER decreases, as the deactivation probability of the root increases in this tree structure. In Figure 3-(b), we show our results on studying the worstcase expected stabilization-time with different values of r1 (deactivation probability of node P1) in T1. As can be seen, the MER has direct relationship with r1. We have also studied the MER with different probabilities of r2 and r3. Interestingly, we have observed that r2 and r3 have negligible effect on the worstcase expected stabilization-time. The other interesting result is that as we repeated this experiment for different values of r1, the effect of r2 and r3 on the MER decreased as r1 increased. Figure 3-(c) depicts the relationship between r0 and the MER in T2. We have magnified the chart, so we can easily observe some peaks around probabilities 0.45 and 0.5. The best stabilization-time is achieved with deactivation probability 0.4. In Figure 3-(d), we illustrate the relationship between MER and other probabilistic variables including r1, r2, r3 and r4 in T2. As we can see, MER values corresponding to deactivation probability of middle-branch (r2 and r4) has symmetric relationship comparing to values in other branches (r1 and r3).
… 
Content may be subject to copyright.
Probabilistic Analysis of Self-Stabilizing Systems:
A Case Study on a Mutual Exclusion Algorithm
Mehran Alidoost Nia and Fathiyeh Faghih
DRTS Research Lab, School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
{alidoostnia, f.faghih}@ut.ac.ir
Abstract The heterogeneity in cyber-physical systems (CPS)
and the diverse situations that they may face with, along with the
environmental hazards raise the need to self-stabilization. The
uncertain nature of CPS necessitates a probabilistic view for
analyzing the system stabilization-time that is a highly critical
metric in distributed/time-sensitive applications. Calculating the
worst-case expected stabilization-time and possible improvements
help to have safer designs of CPS applications. In this paper, a
mutual exclusion algorithm based on PIF (Propagation of
Information with Feedback) self-stabilizing algorithm is selected
in synchronous environment as a case study. Using probabilistic
analysis, we present a set of guidelines for utilizing this algorithm
in time-sensitive applications. We have also utilized an
approximation method for improving the scalability of our
probabilistic analysis and did a set of experiments to show how this
analysis could be used in the design of topologies with the goal of
having an optimal worst-case expected stabilization-time. Our
results show that using this approach, we can significantly improve
the worst-case expected stabilization-time.
Keywords cyber-physical systems, probabilistic formal
analysis, self-stabilizing algorithms, propagation of information
with feedback, real-time systems.
I. INTRODUCTION
Cyber-Physical Systems (CPS) cover various time-sensitive
applications, ranging from medical systems to wireless sensor
networks (WSN). Environmental hazards play a decisive role in
terms of reliability and performance of the system. In such an
unstable environment, transient faults that cause temporary
disorder in system's global state should be considered to avoid
system failure in safety critical and time-sensitive applications.
Self-stabilization refers to a special type of fault-tolerance in
distributed systems, first introduced by Dijkstra [1]. In a self-
stabilizing system, if the system goes outside its set of legitimate
states (LS) because of a set of transient faults or bad
initialization, it is guaranteed to recover back to its LS in a finite
number of steps. Self-stabilization has been studied extensively
and several self-stabilizing algorithms have been proposed for
different problems used in applications such a networking and
robotics. Among those, we can mention the leader election [2],
mutual exclusion [3], spanning tree [4], and three coloring [5].
Convergence is the key property of self-stabilizing systems.
However, quantitative metrics such as recovery time (or namely
stabilization-time) can be as critical, when self-stabilization is
used to design a system for real-time and safety-critical tasks.
Worst-case stabilization-time of a self-stabilizing algorithm is
defined as the largest number of steps it takes to recover to its
set of legitimate states starting from any arbitrary initial state.
Self-stabilization is proved to be impossible for a number of
problems, including token circulation and leader election in
anonymous networks. In order to deal with the impossibility
results, a number of variants of self-stabilization have been
introduced, including weak-stabilization [6] and probabilistic-
stabilization [6]. In a probabilistic self-stabilization, the system
recovers to its set of legitimate states with a probability close to
one. Herman in [7] proposed a probabilistic algorithm for token
ring in an anonymous ring. A precise evaluation of the
stabilization-time for this probabilistic algorithm is known to be
a very difficult task. Kwiatkowska et. al. in [8] used a
probabilistic verification method to evaluate the worst-case
expected stabilization-time. Using the PRISM model checker,
they did a number of experiments on the rings with different
sizes and used the analysis results for improving the worst-case
expected stabilization-time of the algorithm.
Mutual exclusion is a well-known problem in distributed
computing, which has been extensively studied in the self-
stabilization community as well. Dijkstra was the first to propose
three solutions for mutual exclusion in a ring in his seminal
paper on self-stabilization. After that a number of self-stabilizing
algorithms have been proposed for mutual exclusion in different
topologies and settings [9-11]. Jubran and Theel in [12]
proposed an algorithm for mutual exclusion in a tree with the
goal of having a better response time in a synchronous setting,
where the request to enter the critical section happens rarely. For
example, consider a set of sensors that are used to detect rare
events such as fire or flood. Whenever a sensor detects
something unusual, it should be able to get all resources in a
mutually exclusive manner. Therefore, it requests to enter the
critical section, and the expectation is to grant its request in a
small amount of time. The authors have done a theoretical
analysis on the response time as well as the stabilization-time (in
case of transient faults). We argue that we could do a better
analysis of the stabilization-time if we consider the probability
of request in each node. In order to the probabilistic analysis, we
have used a similar approach to the one introduced in [7] and
computed the worst-case expected stabilization-time. Note that
this algorithm has not been initially designed as a probabilistic
self-stabilizing protocol. However, considering the probabilistic
nature of nodes, we have first translated it to a probabilistic
algorithm, and then used the PRISM tool to analyze the worst-
case expected stabilization-time for different probabilities of
request in the nodes of different topologies. We have used the
analysis results to give a number of suggestions to the designers
who want to use this protocol in real-time applications.
State space explosion is a well-known problem in the model-
checking community. Probabilistic model checking suffers from
the same limitation as well. In order to deal with this limitation,
we use an approximation method [13] to reduce the state space
and then do the probabilistic analysis. Our results show that this
method can extensively improve the scalability of our analysis.
We argue that this approach can be used in the analysis of
probabilistic self-stabilizing algorithms to improve their
scalability, as the one presented in [14].
The rest of the paper is structured as follows. In the next
section, technical background of the work is discussed. In
Section 3, we have a brief review on the SSUPS self-stabilizing
algorithm. Section 4 is dedicated to the analysis of basic tree
structures and the resulting guidelines. In Section 5, we propose
a method to deal with the complexity issue in large-scale
systems. In section 6, we review related work. Finally, in Section
7, we give the concluding remarks.
II. BACKGROUND
In this section, we give a brief overview on Discrete-Time
Markov Chain (DTMC) that is a basis for modeling and analysis
of self-stabilizing in this research work [15].
Definition 1. A discrete-Time Markov Chain (DTMC) over a
set of atomic propositions is defined as a tuple  
 , where is a (finite) set of states,   is a set of
possible initial states,     is a probabilistic
transition relation, in which  
 for all   ,
and   maps each state to a subset of atomic
propositions.
Properties to be verified over DTMCs are written in logics such
as PCTL. A key element of these logics is the probabilistic
operator . A formula , where   , p is a
probability (  ) and is a path formula, asserts that the
probability of being true satises the bound  .
More formally, the syntax of PCTL is defined as follows, where
denotes the Until operator, X indicates the Next operator,  
, and  .
   (1)
    (2)
A distributed system is defined over a set of variables V, and
is composed of a set of processes Π. A state of the system is a
valuation of all variables. The set of all possible states of the
system is called the state space, represented by . In a self-
stabilizing distributed system, the set of legitimate states is a
subset of the state space, in which the system is in a valid
configuration (  ). A self-stabilizing system can be
represented as a DTMC, where the set of atomic propositions
 in which L stands for legitimate states.
Definition 4. A distributed system is self-stabilizing if and
only if the following two conditions hold:
A. Convergence: Starting from any arbitrary state, the system
will eventually converge to its set of legitimate states LS, with
probability 1.
B. Closure: For each s
LS, if a state s exists with  
, then it follows that    Note that  denotes the
probability of transition from s to s.
In order to measure the expected stabilization time over a
DTMC, a reward rate is given to each transition.
Definition 4. Expected reward of a path   , over
a DTMC M is denoted as , and computed by (3), where
 represents the reward rate assigned to transition
, which is also called step size in i-th step:
  

 
In a self-stabilizing system, we are interested in computing
the maximum expected reward () or worst-case expected
stabilization-time. Considering    to be the
set of paths starting from any arbitrary initial state to LS, 
is computed according to (4). In this paper, we use  and
worst-case expected stabilization-time interchangeably.
   
We map the notion of expected reward to expected-time, as
the reward rate is adjusted by timing parameters of the system.
We then derive maximum expected-time using the 
formula.
Fig. 1. A simple example of a self-stabilizing system modeled by DTMC.
As an example, consider the self-stabilizing system modeled
as a DTMC in Figure 1. In this example, the set of states is  
 and the set of legitimate states is   , which
is labeled by   . For the sake of simplicity, we consider the
reward rate of every transition equal to 1 namely  .
Considering all paths starting from outside the legitimate states
to a state in LS, we must compare 13 values, computed by (3).
For instance, we calculate the expected value of path π1=s0s3s5s8
as follows:
  
Applying (4), MER equals to 2.7, which corresponds to the
paths π2=s0s3s6s9 and π4=s1s3s6s9.
III. PIF SELF-STABILIZING ALGORITHM
Propagation of Information with Feedback (PIF) algorithm is
usually designed for the tree topologies. It starts by sending a
broadcast message from the initiator (root), and all processes
except for the terminal ones (leaves) participate in the broadcast
by sending the message to their descendants. Once the message
reaches a leaf node, it acknowledges the initiator by sending a
feedback to its ancestors. The broadcast finishes once the root
gets feedback from all its children. There are different self-
stabilizing PIF algorithms proposed in the literature [16]. In this
paper, we focus on one of the PIF algorithms proposed for a
synchronous network, called SSUPS [12]. It injects
synchronicity to PIF and follows Dijkstra's 4-state machine [1].
The main application of the algorithm is to assign mutually
exclusive privileges to the processes in a tree to access critical
sections.
A. Topology
The nodes are structured in a tree, where each node has a set
of local variables <xi, upi, li, ai, pi>. xi and upi are the state
variables (similar to the variables in Dijkstra's 4-state machine),
li is a pointer to one of the process's neighbors (or no process), ai
illustrates activation status (whether it requests a privilege or
not), and pi determines the privilege received by the process. By
default, up variable is true for the root and false for the leaf
nodes. Each node can read its own, as well as its neighbors' local
variables, including its parent and set of children, and can
change its own variables according to the guarded commands in
its local algorithm.
B. Algorithm
The system uses four types of tokens including search token,
positive/negative feedback token, execute token, and complete
token that are respectively listed in (5)-(9). Note that ch indicates
a child of the node p, θp refers to the parent of node p, and Cp
indicates the set of children of node p.
    
               (6)
               (7)
    
            
The algorithm is divided into two different cycles. In the first
cycle, the algorithm searches for active nodes using the first two
tokens (Equations (5)-(7)). The second cycle includes sending
the execution token and receiving the completion signal by the
root which is accomplished by the other two tokens (Equations
(8)-(9)). The root initializes the entire feedback loop and it has
two states including id (idle) and rq (request). Once it sends a
request signal to its children, they should propagate the same
request to their descendants. Intermediate nodes have three
states including id (idle), rq (request), and rp (reply). In the first
cycle, they behave just like the root and issue requests to their
children. But in the second cycle, when feedbacks are being sent,
they behave like a leaf node and issue reply to their ancestors.
Leaves are non-root processes with a single neighbor.
The algorithm consists of three sub-algorithms associated
with the root, intermediates and leaves that are listed as
Algorithms 1-3. We have used the notation similar to the input
language of PRISM. In the algorithms, indices 1, 2, and 3 stand
for root, intermediate, and leaf nodes respectively. The primary
condition to receive an execute token by a process is that it must
be active and referred by a causality chain of other nodes from
the root to itself. The chain is formed by the l variables.
The variable l is significant to detect an execution path. It
works just like a pointer which makes a reference to the other
nodes. A set of references construct an execution/active path. If
there is no execution path in the tree, root takes the execute
token.
Algorithm 1. Formal implementation of root process using PRISM style.
Algorithm 2. Formal implementation of intermediates using PRISM style.
Algorithm 3. Formal implementation of leaf process using PRISM style.
C. Probabilistic Modification of the Algorithm
In order to do probabilistic analysis on the algorithm, we first
needed to find the nondeterministic parameter of it. Investigating
the Algorithms 1-3, we can find that the activation variable of
the processes is a nondeterministic variable, which cannot be
controlled by the algorithm. The variable “a” reflects whether a
process is active or not. We can use it to transform the algorithm
into a probabilistic one, since active changes are independent of
the algorithm. Note that a process is active, when it has a request
to enter its critical section. Activation/deactivation of a process
directly effects on the existence of an execution path.
The original algorithm is modified using composition of
commands. Theses commands are composed as follows, where
each ri is a probabilistic variable and +” is the choice operator
in the PRISM syntax.
1: module process1
2: x1 : bool ;
3: l1 : [-1..2];
4: a1 : bool;
5: p1 : bool;




9: (
10: endmodule
1: module process2
2: x2 : bool;
3: up2: bool;
4: l2 : [-1..3];
5: a2 : bool;
6: p2 : bool;




 false;





14: up2 & !x2 & (
15: endmodule
1: module process3;
2: x3 : bool;
3: l3 : [-1..3] ;
4: a3 : bool;
5: p3 : bool;
 

8: (x2! =x3) 

10: endmodule
Original commands in Algorithm 1:
(!x1)&= true)
(!x1)&(x1 = x2 & !u
The corresponding probabilistic (composed) command:
(!x1)&(x1 = x2 & !up2) r0: = true)& +
(1-r0): 
Original commands in Algorithm 2:
(

The corresponding probabilistic (composed) command:
(x1! = x2) & !x2 r1:  +
(1-r1): 
Original commands in Algorithm 3:


The corresponding probabilistic (composed) command:
 r2:  +
(1-r2): 
To make sure that the modified representation of the
algorithm satisfies the self-stabilization conditions, we have
checked the closure and convergence properties written in PCTL
logic in the syntax of PRISM in (10) and (11). Briefly speaking,
the convergence property checks whether starting from any
arbitrary initial state, the system reaches a legitimate state with
probability 1, and the closure property checks whether the
system stays in LS, after it reaches there. In order to write these
properties, we have used the formula stable, which formulates
the legitimate states conditions (LS conditions). The LS
conditions are taken from [12], and presented in Algorithm 4.
Note that the LS conditions depend on factors such as existence
of execution paths, and hence, they depend on the tree structure.
The formulation in Algorithm 4 corresponds to a simple tree
structure including a root process, an intermediate and a leaf
node.
In order to find the maximum expected stabilization-time of
the algorithm in PRISM, each transition is assigned a reward
denoted by R, which for simplicity is considered to be 1 in this
work. Using (12), PRSIM finds the MER value of the system.
  
   
 
Algorithm 4. LS conditions for a simple tree of 3 nodes in a line, in PRISM
IV. PROBABILISTIC ANALYSIS & EXPERIMENTAL RESULTS
In this section, we discuss our results on probabilistic analysis
of the algorithm, where we investigate the effect of the
activation probability of each processing node on the maximum
expected stabilization-time. In the following, the index of each
variable corresponds to the process index. As shown in Figure
2, we have selected a set of basic topologies for probabilistic
analysis. For the sake of space, we cannot include all tree
topologies consisting of three or four processes.
To mention the activation probability of the root, we have
used the probabilistic variable r0. In other words, in each
analysis, the root is considered to be active with probability (1-
r0), and deactivated with probability r0. Similarly, the
deactivation probability of each process is denoted by the
sequence of r1, r2, …, rn. The effect of these probabilistic
variables is shown in a set of probability-MER charts in Figure
3, where the MER denotes the worst-case expected stabilization-
time of the system.
From now on, we propose our findings in the form of a set
of observations and analytical results. The former refers to our
observations taken from the experimental results on specific tree
structure, while the latter includes what we have observed on the
results of all our experiments on the limited size tree structures.
Fig. 2. Basic PIF tree structures used for probabilistic analysis.
Figure 3 depicts the effect of the activation probability of
different nodes on the worst-case expected stabilization-time in
the tree structures shown in Figure 2. For the sake of better
presentation of the results, we have placed probability in y-axes,
and the MER in x-axes.
Figure 3-(a) shows the relationship between r0 and the MER
in T1 (Figure 2). As can be easily observed, the MER decreases,
as the deactivation probability of the root increases in this tree
structure.
In Figure 3-(b), we show our results on studying the worst-
case expected stabilization-time with different values of r1
(deactivation probability of node P1) in T1. As can be seen, the
MER has direct relationship with r1. We have also studied the
MER with different probabilities of r2 and r3. Interestingly, we
have observed that r2 and r3 have negligible effect on the worst-
case expected stabilization-time. The other interesting result is
that as we repeated this experiment for different values of r1, the
effect of r2 and r3 on the MER decreased as r1 increased.
Figure 3-(c) depicts the relationship between r0 and the
MER in T2. We have magnified the chart, so we can easily
observe some peaks around probabilities 0.45 and 0.5. The best
stabilization-time is achieved with deactivation probability 0.4.
In Figure 3-(d), we illustrate the relationship between MER
and other probabilistic variables including r1, r2, r3 and r4 in
T2. As we can see, MER values corresponding to deactivation
probability of middle-branch (r2 and r4) has symmetric
relationship comparing to values in other branches (r1 and r3).
1: formula f1 = (up2&x2&l2= -1&a3);
2: formula f2 = (!up2&x2&l1 = id2)|((!up2&x2&l1 = id1)&
(x3&l2 =id3))|((!up2&x2&l2 = -1&!a2) & (x3&l3 = -1&!a3);
3: formula f3 = (x1)&((x1&l1 = -1)|(x3&up3&l3 = -1))
&((x2&l1 = id2)|(x3&l2 = id3&x2));
4: formula f4 = (!x1)&((a2&!a3&l1=id2)|(a3&l2= d3&l1=d2));
5: formula stable = f1 | f2 | f3 | f4;
Fig. 3. Probability-MER (Worst-Case Expected Stabilization-Time) graph generated by different tree structures.
As an important observation, we found that increasing the
deactivation probability of the nodes in longer paths (having
intermediate nodes) increases the system MER. We can see this
trend in r2 and r4 corresponding to nodes P2 and P4 in T2.
T3 has a linear and simple structure. Figure 3-(e) illustrates
the relationship between r0 and the MER of the system. As an
observation, we can see that the MER has an indirect
relationship with the deactivation probability of the root. Also,
Figure 3-(f) indicates the relationship between deactivation
probability of the first intermediate (r1) and the MER. We have
also observed that in linear paths, the probability of deactivation
in the first intermediate (r1) has higher impact on the MER
comparing to the other intermediates (e.g. r2). Overall, in this
structure, the optimal MER is achieved when deactivation
probability of intermediates is increased.
Figure 3-(g) shows the relationship between the MER and
the deactivation probability of the root (r0) in T4, which has a
balanced topology. As we can observe, the best MER is achieved
by the deactivation probability r0=0.5. Note that for the values
r0<0.5, the MER was very big, and hence, we removed this
range for the sake of presentation. Figure 3-(h) shows the
relationship between the deactivation probability of other nodes
(r1, r2, r3, r4) and the MER in T4. It shows that the optimal
MER is achieved when the activation probability of intermediate
is kept very small. In our experiment, we set r2 to 1, and decrease
the value of r1. As we can observe, it led to increase in the MER
of the system. If we repeat the same experiment by keeping r1=1
and change r2, we will get similar results. We did this
experiment in a manner that the leaves have a constant
probability of deactivation. The effect of leaves’ probability (r3
and r4) is negligible.
Figure 3-(i) depicts the relationship between r0 and the MER
in T5, which shows that optimal worst-case expected
stabilization-time is achieved when the probability of r0 is equal
to 0.5. Figure 3-(j) illustrates the relationship between (r1, r2,
r3, r4) and the MER in the same tree. We can observe that the
increase in activation probability of nodes in longer branch of
T5 (P2, P3, P4) leads to better MER. Note that the left branch does
not include any intermediate, and it effects on this experiment as
we saw earlier. So, we keep the probability of this branch
constant when experimenting other probabilities (r2, r3, r4).
Studying the analysis results on a set of tree structure leads
to the following analytical facts.
Analysis 1. Existence of at least one active intermediate node
improves the stabilization-time of the system. We can highlight
this achievement through T1 and Figure 3-(b).
Analysis 2. In the design process, we can place the nodes with
highest probability of being deactivated in longer paths. We may
refer to the structure T2 and experiment shown in Figure 3-(d).
Analysis 3. Activation/deactivation of the first intermediate
node has the most impact on the MER of the system, if the tree
has linear structure. As an example, we can refer to T3 and the
analysis result depicted in Figure 3-(f).
Analysis 4. If the tree has symmetric paths, the optimal
probability distribution among active branches will be
symmetric to achieve the best MER. We can verify this fact by
looking at r1 and r2 in Figure 3-(h).
Table 1 shows the state-space of each tree, which ranges
from thousands to billions of states. We have used MTBDD
engine [17] that enables us to analyze billions of states. To do
the experiments, we have developed an automatic code
generator for different tree structures, which is available at [18].
The generated codes can be used directly in PRISM model-
checker. It helps automating the process of probabilistic
analysis.
TABLE I. MER IN DIFFERENT SCENARIOS RESULTED FROM PIF TREES.
Structure
# of States
# of Transitions
T1
5120000
17280000
T2
509607936
2436562944
T3
10240000
29120000
T4
1019215872
4013162496
T5
1019215872
4299816960
V. DISCUSSION
In this section, we first discuss the results of our analysis.
Then, we extend the set of Analysis findings to large-scale
systems.
A. Discussion and Comparison
So far, we have analyzed the SSUPS algorithm under
different tree topologies with limited size. The results show the
effect of activation probability of each process on the MER. In
this section, we discuss the significance of our analysis results
by demonstrating the amount of decrease we can achieve on the
MER if we use these findings in the design of the trees that use
this algorithm.
Figure 4 shows the best and the worst MER we could
measure for each tree structure in Figure 2. We achieved the best
MER by utilizing the results of our probabilistic analysis, as
discussed in the previous section. This chart shows how much a
good design, using probabilistic analysis, can reduce the MER,
compared to the MER resulting from a superficial design.
Fig. 4. Min and Max MER measured for each tree structure (see Figure 2)
TABLE II. MER IN DIFFERENT SCENARIOS RESULTED FROM PIF TREES.
Structure
Min MER
Max MER
# of dist. processes
T1
9.49
37.38
4
T2
9.37
23.75
5
T3
7.47
18.5
4
T4
10.28
16.3
5
T5
9.54
14.99
5
By looking at Table 2, the difference between minimum and
maximum MER is more perceptible. It may be expected that the
MER should be increased by increasing the number of
processes. However, our results on T1 (with 4 processes) and T2
(with five processes) show that it is not always the case. It shows
that in this algorithm, number of processes is not the only
effective parameter on the worst-case stabilization-time.
We argue that the result of probabilistic analysis of a self-
stabilizing algorithm can be used in practice. In the case of this
paper, the result can be easily applied by having a good guess
about the activation probabilities of different nodes. In general,
if the probability of the valuation of a system variable is studied,
one way to utilize the result of the probabilistic analysis is to use
state encoding. In this approach, the probability of a specific
value of the variable is decreased/increased purposefully [20].
B. Extending the Analysis
The main limitation of our analysis results is that they are
limited to small-sized trees, which is due to the well-known
problem in model checking, called the state space explosion
[19]. To extend our results and increase the scalability of our
analysis, we use an approximation technique, called ε-
approximate probabilistic bisimulation [13]. It reduces state-
space of DTMC using bisimulation technique under ε, which is
an approximation level for transition probability function. On
one hand, this approach helps out to reduce state-space of the
system and facilitate the analysis procedure. On the other hand,
it preserves performance metrics of DTMC under bisimulation
reduction like the MER. However, this reduction has some cost
itself. As a consequence of this approach, we approximate
bisimilar states that have some differences in transition
probability. According to ε-approximate rule, the differences in
transition probability must not exceed ε. So, the results are
acceptable under significant level ε. In fact, we propose a
realistic estimation for the MER of the system in large-scale.
Enhancing the scalability of our analysis can help us to use
it in practice. As an example, consider the scenario, where this
algorithm is implemented in a WSN. If we want to add a new
sensor to our network, we can use our probabilistic analysis to
find a good position for the new node, with the goal of achieving
a small MER. To do that, we can first implement the large-scale
model by DTMC, and then, approximate the model by ε-
approximate probabilistic bisimulation, add the node to the tree,
and measure the MER of the new system. We can repeat this
experiment for different possible positions, and choose the
position which leads to the best MER.
Fig. 5. A binary tree topology for examination of PIF algorithm.
As an example, we use a binary tree topology including 15
nodes as indicated in Figure 5. After approximation, the model
is verified by PRISM probabilistic model-checker. According to
probability of activation/deactivation, we can achieve Min and
Max values for the MER as shown in Figures 6 and 7.
In our experiment, we first add a node to the left-most child
of the tree p8 and compute the difference between MER of the
current tree to the new one that includes 16 nodes. Again, we
repeat the experiment by putting the new node under p10, p12, and
p14 respectively. According to harmonic activation/deactivation
probability assignment rule (Analysis 2) and under condition of
existing at least one active intermediate node (Analysis 1), we
expect to get the best stabilization-time by putting the new node
under p8. Comparing to the other positions, Figure 6 verifies this
fact.
In another experiment, we are going to add two nodes to the
binary tree. In this experiment, we are interested to find optimal
topology of the tree where the worst-case expected stabilization-
time would be as minimal as possible. This experiment is more
complicated than the previous one. It is due to the number of
choices we have, that can effect on the topology of PIF tree. So,
one can put either the same two nodes under one node of the tree
or distribute them under two different nodes (e.g. under p8 and
p10). According to the Analysis 2 and Analysis 4, we expect to
obtain optimal topology by putting the nodes under left-most
child (p8). In this case, we should determine whether we add the
second node under p8 or not. To do that, we add the second node
under p8, and compute the difference of the MERs. Again, we
repeat this computation by putting the second node under p10,
p12, and p14 respectively. Figure 7 illustrates this experiment in
different positions. The result shows that adding two nodes
under p8 decreases stabilization-time of the system. It is due to
Analysis 4 that a balanced tree topology has the optimal structure
in term of stabilization-time.
Fig. 6. The effect of adding the first node on stabilization-time in PIF tree.
Fig. 7. The effect of adding the second node on stabilization-time in PIF tree.
Note that forming an optimal tree topology is not always
viable in the real world. In the design process, we may face
physical constraints that poses some limitations on the position
of the nodes. In this case, our analysis helps the designer to find
a near-optimal tree topology, which leads to a better MER.
VI. RELATED WORK
Self-stabilization was first introduced by Dijkstra in [1],
where he proposed three solutions for token ring. His idea was
to design local algorithms for distributed nodes in such a way
that the system can tolerate any transient fault that may change
the state of the system. Later, researchers looked into designing
self-stabilizing algorithms for well-known problems in
distributed computing, such as spanning tree of three coloring.
Self-stabilization properties including convergence to the set of
legitimate states and staying there (in the absence of faults) are
very important. However, when these algorithms are used in
real-time applications, their quantitative metrics, such as
recovery become as important. Different metrics for recovery
time, including worst-case and average-case recovery time have
been introduced [7]. Self-stabilization is proven to be impossible
for a number of problems, such as token ring in anonymous
rings. One of the solutions proposed for these cases is
probabilistic algorithms [6]. In these algorithms, a process has
more than one choice for action in its local algorithm. Later,
Kwiatkowska et. al. studied the effect of choosing different
probabilities for non-deterministic actions on the worst-case
expected recovery-time [8]. Probabilistic formal analysis has
also been studied in other applications, among which we can
mention the probabilistic analysis of Air Traffic Control (ATC)
[21] or SATS (Small Aircraft Transportation System) [22]. In
such applications, synchronicity is significant due to modelling
of concurrency in aircrafts movements. To model these systems,
on one hand we need to implement synchronicity, and on the
other hand, we require to reflect uncertainties existing in real-
world as probabilistic variables. Probabilistic analysis has
helped to analyze timing metrics in the presence of random
variables, which is worst-case expected time in this case [22].
VII. CONCLUSION
Self-stabilization is a solution to guarantee system recovery
in case of any transient fault. In real-time applications, the
system must be able to give a quick feedback, and hence, the
quantitative metrics, specifically stabilization-time is an
important factor. In this paper, we studied a self-stabilizing
mutual exclusion algorithm, which is designed for safety-critical
application. We argue that considering the probability of request
to enter the critical sections in different parts of the tree, and
calculating the system's worst-case expected stabilization-time,
we can have good suggestions for the network designers. We
have analyzed different topologies of tree structures and
proposed a set of guidelines including a set of observations and
four Analytical results that help the designers to improve the
stabilization-time of the system. The guidelines can be used in
practice by giving hints on how to organize the nodes in large-
scale tree structures, so that the best MER is achieved. We have
also utilized an approximate method to improve the scalability
of our probabilistic analysis. As for the future work, we plan to
work on the scalability of the analysis with the goal of proposing
a way for parametric analysis of algorithms, which is
independent of the number of the nodes in the tree.
REFERENCES
[1] E. W. Dijkstra, “Self-stabilizing systems in spite of distributed control”,
Communications Magazine of the ACM, Vol. 17, Issue 11. pp. 643-644,
1974.
[2] D. Fajardo-Delgado, J. A. Fernández-Zepeda and A. G. Bourgeois,
"Randomized self-stabilizing leader election in preference-based
anonymous trees," 2010 IEEE International Symposium on Parallel &
Distributed Processing, Atlanta, GA, 2010, pp. 1-8.
[3] O. Jubran and O. Theel, "Recurrence in Self-Stabilization," 2015 IEEE
34th Symposium on Reliable Distributed Systems (SRDS), Montreal, QC,
2015, pp. 58-67.
[4] L. Blin, F. Boubekeur and S. Dubois, "A Self-Stabilizing Memory
Efficient Algorithm for the Minimum Diameter Spanning Tree under an
Omnipotent Daemon," 2015 IEEE International Parallel and Distributed
Processing Symposium, Hyderabad, 2015, pp. 1065-1074.
[5] A. Mansouri and M. S. Bouhlel, "An efficient self-stabilizing vertex
coloring algorithm," 2016 SAI Computing Conference (SAI), London,
2016, pp. 655-660.
[6] S. Devismes, S. Tixeuil and M. Yamashita, "Weak vs. Self vs.
Probabilistic Stabilization," 2008 The 28th International Conference on
Distributed Computing Systems, Beijing, 2008, pp. 681-688.
[7] T. Herman, Probabilistic self-stabilization, In Information Processing
Letters, Vol. 35, Issue 2, pp. 63-67, 1990.
[8] M. Kwiatkowska, G. Norman and D. Parker, “Probabilistic Verication of
Herman's Self-Stabilisation Algorithm Springer Journal of Formal
Aspects of Computing. Vol 24, Issue 4, pp. 661-670, 2012.
[9] S. Dolev, Self-Stabilization, MIT Press, Cambridge, MA, 2000, pp. 5-56.
[10] R. W. Buskens and R. P. Bianchini, "Self-stabilizing mutual exclusion in
the presence of faulty nodes," Twenty-Fifth International Symposium on
Fault-Tolerant Computing, Pasadena, CA, USA, 1995, pp. 144-153.
[11] M. Mizuno, M. Nesterenko and H. Kakugawa, "Lock-based self-
stabilizing distributed mutual exclusion algorithms," Proceedings of 16th
International Conference on Distributed Computing Systems, 1996, pp.
708-716.
[12] O. Jubran and O. Theel, "Exploiting Synchronicity for Immediate
Feedback in Self-Stabilizing PIF Algorithms," 2014 IEEE 20th Pacific
Rim International Symposium on Dependable Computing, Singapore,
2014, pp. 106-115.
[13] G. Bian and A. Abate, “On the relationship between bisimulation and
trace equivalence in an approximate probabilistic context,” 20th
International Conference on Foundations of Software Science and
Computation Structures, Vol. 10203 of LNCS, 2017, pp. 321337.
[14] M. Demirbas and A. Arora, "Specification-Based Design of Self-
Stabilization," in IEEE Transactions on Parallel and Distributed Systems,
vol. 27, no. 1, pp. 263-270, Jan. 1 2016.
[15] A. Abate, “Approximation Metrics Based on Probabilistic Bisimulations
for General State-Space Markov Processes: A Survey,” In Electronic
Notes in Theoretical Computer Science, Vol. 297, 2013, pp. 3-25.
[16] D. Bein, A. K. Datta, M. H. Karaata and S. Zaman, "An optimal snap-
stabilizing multi-wave algorithm," 25th IEEE International Conference on
Distributed Computing Systems Workshops, 2005, pp. 35-41.
[17] P. Kissmann and J. Homann, “BDD ordering heuristics for classical
planning,” Journal of Articial Intelligence Research, Issue 51, pp. 779-
804, 2014.
[18] M. Alidoost Nia, “An automated PRISM code generator for PIF self-
stabilizing algorithm”, 2017, https://github.com/alidoostnia/pif/.
[19] C. Baier and J. P. Katoen, Principles of model checking, MIT Press,
Cambridge, MA, 2008, pp. 19-89.
[20] N. Fallahi, B. Bonakdarpour and S. Tixeuil, "Rigorous Performance
Evaluation of Self-Stabilization Using Probabilistic Model Checking,"
IEEE 32nd International Symposium on Reliable Distributed Systems,
Braga, 2013, pp. 153-162.
[21] Y. Zhao and K. Y. Rozier, "Probabilistic model checking for comparative
analysis of automated air traffic control systems," 2014 IEEE/ACM
International Conference on Computer-Aided Design (ICCAD), San Jose,
CA, 2014, pp. 690-695.
[22] M. U. Sardar, N. Afaq, K. A. Hoque, T. T. Johnson, and O. Hasan,
“Probabilistic Formal Verification of the SATS Concept of Operation,”
8th NASA Formal Methods Symposium, MN, USA, 2016, pp. 191-205.
... A discrete-time Markov chain (DTMC) over a set of atomic propositions AP is denoted by M = (S, S 0 , P, L), where S is a (finite) set of states, S 0 ⊆S is a set of initial states, P : S × S→[0, 1] is a probabilistic transition relation, in which ∑ s ′ ∈S P(s, s ′ ) = 1 for all s ∈ S, and L : S→2 AP maps each state to a subset of atomic propositions.□ A Markov decision process (MDP) is used for sequential decisionmaking when the system has uncertainty [10]. As mentioned, a self-adaptive system needs to decide about a look-ahead horizon, and choose appropriate adaptation action against the change in the next few steps. ...
... In order to calculate the maximum cumulated reward, the model of the self-adaptive system must be verified against a property that has the ability to describe the probabilistic behavior of the model. We use PCTL logic [10] to verify the model as described by (3). The following formula denotes the property that computes maximum cumulated reward of the system, and must be verified at runtime: ...
Article
Cyber-physical systems need self-adaptation as a mean to autonomously deal with changes. For runtime adaptation, a cyber-physical system repeatedly monitors the environment for detecting possible changes. Faults in the monitoring devices due to the dynamic and uncertain environment is very likely, necessitating resilient monitoring. In this paper, we discuss imperfect monitoring in self-adaptive systems, and propose a model-driven methodology to represent the self-adaptive system using a parametric Markov decision process, where the changes are reflected by a set of model parameters. Fault in the monitoring device may result in some parameter valuation miss. We propose a comprehensive framework for parameter estimation using behavioral patterns of the system by a pattern-matching component. The proposed method simulates the current behavior of the system using random walk patterns, and matches it with a history of patterns to estimate the omitted data. The results show an accuracy of 94% under imperfect monitoring. In addition, we elaborate a set of theoretical proofs to support error analysis, and determine a certain upper-bound of error to guarantee an accurate decision-making process. We establish a logical connection between the error and the accuracy of decisions, and introduce tolerable error metric to guarantee the accuracy of decisions under estimation.
... A Markov decision process (MDP) is used for sequential decision-making when the system has uncertainty [10]. As mentioned, a self-adaptive system needs to decide about a lookahead horizon, and choose appropriate adaptation action against the change in the next few steps. ...
... In order to calculate the maximum cumulated reward, the model of the self-adaptive system must be verified against a property that has the ability to describe the probabilistic behavior of the model. We use PCTL logic [10] to verify the model as described in Definition (5). The following formula denotes the property that computes maximum cumulated reward of the system, and must be verified at runtime: ...
Conference Paper
Full-text available
Ubiquitous and perpetual nature of cyber-physical systems (CPSs) have made them mostly battery-operated in many applications. The batteries need recharge via environmental energy sources. Solar energy harvesting is a conventional source for CPSs, whereas it is not perfectly predictable due to environmental changes. Thus, the system needs to adaptively control its consumption with respect to the energy harvesting. In this paper, we propose a model-driven approach for analyzing self-adaptive solar energy harvesting systems; it uses a feedback control loop to monitor and analyze the behavior of the system and the environment, and decides which adaptation action must be triggered against the changes. We elaborate a data-driven method to come up with the prediction of the incoming changes, especially those from the environment. The method takes the energy harvesting data for prediction purposes, and models the environment as a Markov chain. We empower the proposed system against the runtime monitoring faults as well. In this regard, the system is able to verify an incomplete model, i.e. when some data is missed. To this aim, we propose a pattern-matching system that simulates the current behavior of the system using random walk, and matches it with the history to estimate the omitted data. The results show an accuracy of at least 96% when decisions are made by imperfect monitoring.
... To deal with such changes at runtime, the 4 system needs some type of adaptation, meaning that it must re- 5 sponse appropriately and dynamically to those specific changes 6 [1] . This is a significant requirement in the domain of cyber- 7 physical systems (CPSs), and because we expect it to be done au- 8 tonomously, self-adaptation is a major requirement in that domain. 9 CPS applications with the capability of self-adaptation must be 10 continuously monitored during execution. ...
... The second approach that is followed in this paper, is proactive The models of the system must take uncertainties into account 72 in order to capture some uncertainty in the environment and local 73 system. As we need to reveal uncertainties in the form of a set of 74 probabilistic variables, it is necessary to use probabilistic model- 75 checking [8] . Using statistical approaches and time series, one can 76 reflect uncertainties to the model as a set of probabilistic vari-77 ables. ...
Article
Full-text available
Cyber-physical systems (CPS) are expected to continuously monitor the physical components to autonomously calculate appropriate runtime reactions to deal with the uncertain environmental conditions. Self-adaptation, as a promising concept to fulfil a set of provable rules, majorly needs runtime quantitative verification (RQV). Taking a few probabilistic variables into account to represent the uncertainties, the system configuration will be extremely large. Thus, efficient approaches are needed to reduce the model state-space, preferably with certain bounds on the approximation error. In this paper, we propose an approximation framework to efficiently approximate the entire model of a self-adaptive system. We split up the large model into strongly-connected components (SCCs), apply the approximation algorithm separately on each SCC, and integrate the result of each part using a centralized algorithm. Due to a number of changes in probabilistic variables, it is not possible to use static models. Addressing this issue, we have deployed parametric Markov decision process. In order to apply approximation on the model, the notion of ε-approximate probabilistic bisimulation is utilized that introduces the approximation level ε. We show that our approximation framework offers a certain error bound on each level of approximation. Then, we denote that the approximation framework is appropriate to be applied in decision-making process of self-adaptive systems where the models are relatively large. The results reveal that we can achieve up to 50% size reduction in the approximate model while maintaining the accuracy about 95%. In addition, we discuss about the trade-off between efficiency and accuracy of our approximation framework.
Code
Full-text available
In order to facilitate the process of probabilistic analysis, we have developed an automatic code generator which is customized for PIF algorithm. We needed to test different tree structures to analyze synchronous PIF. To do that, a template for PIF commands is proposed. The template is organized based on the PIF code for primary tree structure including three processing nodes. We extend the same structure to the more complex topologies. The details of implementation is presented in this project. We have used java environment to support PRISM automatic code generation.
Preprint
Full-text available
The objective of NASA's Small Aircraft Transportation System (SATS) Concept of Operations (ConOps) is to facilitate High Volume Operation (HVO) of advanced small aircraft operating in non-towered non-radar airports. Given the safety-critical nature of SATS, its analysis accuracy is extremely important. However, the commonly used analysis techniques, like simulation and traditional model checking, do not ascertain a complete verification of SATS due to the wide range of possibilities involved in SATS or the inability to capture the ran-domized and unpredictable aspects of the SATS ConOps environment in their models. To overcome these limitations, we propose to formulate the SATS ConOps as a fully synchronous and probabilistic model, i.e., SATS-SMA, that supports simultaneously moving aircraft. The distinguishing features of our work include the preservation of safety of aircraft while improving throughput at the airport. Important insights related to takeoff and landing operations during the Instrument Meteorological Conditions (IMC) are also presented.
Conference Paper
Full-text available
Ensuring aircraft stay safely separated is the primary consideration in air traffic control. To achieve the required level of assurance for this safety-critical application, the Automated Airspace Concept (AAC) proposes a network of components providing multiple levels of separation assurance, including conflict detection and resolution. In our previous work, we conducted a formal study of this concept including specification, validation, and verification utilizing the NuSMV and CadenceSMV model checkers to ensure there are no potentially catastrophic design flaws remaining in the AAC design before the next stage of production. In this paper, we extend that work to include probabilistic model checking of the AAC system. We are motivated by the system designers requirement to compare different design options to optimize the functional allocation of the AAC components. Probabilistic model checking provides quantitative measures for evaluating different design options, helping system designers to understand the impact of parameters in the model on a given critical safety requirement. We detail our approach to modeling and probabilistically analyzing this complex system consisting of a real-time algorithm, a logic protocol, and human factors. We utilize both Discrete Time Markov Chain (DTMC) and Continuous Time Markov Chain (CTMC) models to capture the important behaviors in the AAC components. The separation assurance algorithms, which are defined over specific time ranges, are modeled using a DTMC. The emergence of conflicts in an airspace sector and the reaction times of pilots, which can be simplified as Markov processes on continuous time, are modeled as a CTMC. Utilizing these two models, we calculate the probability of an unresolved conflict as a measure of safety and compare multiple design options.
Article
This work introduces a notion of approximate probabilistic trace equivalence for labelled Markov chains, and relates this new concept to the known notion of approximate probabilistic bisimulation. In particular this work shows that the latter notion induces a tight upper bound on the approximation between finite-horizon traces, as expressed by a total variation distance. As such, this work extends corresponding results for exact notions and analogous results for non-probabilistic models. This bound can be employed to relate the closeness in satisfaction probabilities over bounded linear-time properties, and allows for probabilistic model checking of concrete models via abstractions. The contribution focuses on both finite-state and uncountable-state labelled Markov chains, and claims two main applications: firstly, it allows an upper bound on the trace distance to be decided for finite state systems; secondly, it can be used to synthesise discrete approximations to continuous-state models with arbitrary precision.
Conference Paper
A system with the property of self-stabilization can have the advantages of fault tolerance, robustness for dynamic topologies, and straightforward initialization. This paper propose a new self-stabilizing algorithm for the proper coloring of edges using a maximum Δ + 1 colors and convergence time O (m (n + Δ)).
Article
Symbolic search using binary decision diagrams (BDDs) can often save large amounts of memory due to its concise representation of state sets. A decisive factor for this method's success is the chosen variable ordering. Generally speaking, it is plausible that dependent variables should be brought close together in order to reduce BDD sizes. In planning, variable dependencies are typically captured by means of causal graphs, and in preceding work these were taken as the basis for finding BDD variable orderings. Starting from the observation that the two concepts of "dependency" are actually quite different, we introduce a framework for assessing the strength of variable ordering heuristics in sub-classes of planning. It turns out that, even for extremely simple planning tasks, causal graph based variable orders may be exponentially worse than optimal. Experimental results on a wide range of variable ordering variants corroborate our theoretical findings. Furthermore, we show that dynamic reordering is much more effective at reducing BDD size, but it is not cost-effective due to a prohibitive runtime overhead. We exhibit the potential of middle-ground techniques, running dynamic reordering until simple stopping criteria hold.
Conference Paper
Self-stabilization ensures that a system converges to a legitimate execution in finite time, where a legitimate execution comprises a sequence of configurations satisfying some safety condition. In this work, we investigate the notion of recurrence, which denotes how frequently a condition is satisfied in an execution of a system. We use this notion in self-stabilization to address the convergence of a system to a behavior that guarantees a minimum recurrence of some condition. We apply this notion to show how the design of distributed mutual exclusion algorithms can be altered to achieve a high service time under various convergence time and space complexities. As a particular contribution, we present a self-stabilizing mutual exclusion algorithm that has optimal service time together with optimal stabilization time complexity of (D/2 − 1) for synchronous executions and under any topology, where D is the diameter of the topology. In addition, we rectify an earlier proof stating that D/2 is a lower bound, to conclude that (D/2 − 1) is optimal for synchronous executions.
Conference Paper
A distributed algorithm, run by distributed processes, satisfies mutual exclusion if at most one process is granted a privilege to access the critical section in each execution step (safety), and each process is privileged infinitely often in each execution (fairness). The design of mutual exclusion algorithms is, in particular, impacted to satisfy the fairness property. In this work, we focus on a class of synchronous systems, where processes rarely request a privilege, that the fairness property is satisfied anyway if the process selection is fast enough. We also consider that systems of this class have to satisfy self-stabilization, which ensures that a system eventually achieves its desired behavior, and does not leave it voluntarily, regardless of the system's initial behavior. We present a self-stabilizing synchronous Propagation of Information with Feedback (PIF) algorithm for trees. The algorithm exploits the synchronous environment to provide immediate feedback of requesting processes, which in turn guarantees fast selection of unique processes to be granted privileges.
Article
Research in system stabilization has traditionally relied on the availability of a complete system implementation. As such, it would appear that the scalability and reusability of stabilization is limited in practice. To redress this perception, in this paper, we show for the first time that system stabilization may be designed knowing only the system specification but not the system implementation. We refer to stabilization designed thus as specification-based design of stabilization and identify “local everywhere specifications” and “convergence refinements” as being amenable to the specification-based design of stabilization. Using our approach, we present the design of Dijkstra’s 4-state stabilizing token-ring system starting from an abstract fault-intolerant token-ring system. We also present an illustration of automated design of specification-based stabilization on a 3-state token-ring system.