Page 1

BRANCHING BISIMULATION FOR PROBABILISTIC

SYSTEMS: CHARACTERISTICS AND DECIDABILITY

Suzana Andova?and Tim A.C. Willemse†‡

?Department of Computer Science, Twente University

P.O. Box 217, 7500 AE Enschede, The Netherlands

†Nijmegen Institute for Computing and Information Sciences (NIII), Radboud University Nijmegen

P.O. Box 9010, 6500 GL Nijmegen, The Netherlands

‡Department of Mathematics and Computer Science, Eindhoven University of Technology

P.O. Box 513, 5600 MB Eindhoven, The Netherlands

suzana@cs.utwente.nl, timw@cs.ru.nl

Abstract

We address the concept of abstraction in the setting of probabilistic reactive systems, and study its for-

mal underpinnings for the strictly alternating model of Hansson. In particular, we define the notion of

branching bisimilarity and study its properties by studying two other equivalence relations, viz. coloured

trace equivalence and branching bisimilarity using maximal probabilities. We show that both alternatives

coincide with branching bisimilarity. The alternative characterisations have their own merits and focus

on different aspects of branching bisimilarity. Coloured trace equivalence can be understood without

knowledge of probability theory and is independent of the notion of a scheduler. Branching bisimilarity,

rephrased in terms of maximal probabilities gives rise to an algorithm of polynomial complexity for de-

ciding the equivalence. Together they give a better understanding of branching bisimilarity. Furthermore,

we show that the notions of branching bisimilarity in the alternating model of Hansson and in the non-

alternating model of Segala differ: branching bisimilarity in the latter setting turns out to discriminate

between systems that are intuitively branching bisimilar.

1Introduction

One of the hallmarks of process theory is the notion of abstraction. Abstractions allow one to reason about

systems in which details, unimportant to the purposes at hand, have been hidden. It is an invaluable tool

when dealing with complex systems. Research in process theory has made great strides in coping with

abstraction in areas that focus on functional behaviours of systems. However, when it comes to theories

focusing on functional behaviours and extra-functional behaviours such as probabilistic behaviour, we

suddenly find that many issues are still unresolved.

This paper addresses abstraction in the setting of systems that have both non-deterministic and proba-

bilistic traits, hereafter referred to as probabilistic systems. The model that we use throughout this paper

to describe such systems is that of graphs that adhere to the strictly alternating regime as studied by Hans-

son [14], rather than the non-alternating model [19, 20] as proposed by Segala et al. In particular, we study

the notion of branching bisimilarity for this model. The need for this particular equivalence relation is

already convincingly argued by e.g. Van Glabbeek and Weijland in [13], and by Groote and Vaandrager

in [12]. Recall that branching bisimilarity for probabilistic systems has been defined earlier for the non-

alternating model by Segala and Lynch [20] and a variation on that notion was defined by Stoelinga [21].

However, we stress that the differences in the alternating model and the non-alternating model lead to in-

compatibilities of the notions of branching bisimilarity in both settings. In fact, these differences are a key

motivation for our investigation: while our notion of branching bisimulation satisfies the properties com-

monly attributed to it, the existing notions turn out to be too strict in their current phrasing (as we explain

in detail in section 7, page 7), and discriminate between systems that are intuitively branching bisimilar.

1

Page 2

Van Glabbeek and Weijland [13] showed that a key property of branching bisimilarity is that it pre-

serves the branching structure of processes, i.e. it preserves computations together with the potentials in

all intermediate states of a system that are passed through, even when unobservable events are involved.

Roughly speaking, the potentials are the options the system has to branch and behave. This property sets

branching bisimilarity apart from weak bisimilarity, which does not have this property. They illustrated this

property by defining two new equivalences, called concrete coloured trace equivalence (in a setting without

abstraction) and coloured trace equivalence (in a setting with abstraction), which both use colours to code

for the potentials. Subsequently, they showed that strong bisimilarity and concrete coloured trace equiva-

lence coincide, proving that colours can indeed be used to code for the potentials of a system. Next, they

showed that also branching bisimilarity and coloured trace equivalence coincide, and both are strictly finer

than weak bisimilarity. This proved that branching bisimilarity indeed preserves the branching structure of

the system.

Although our setting is considerably more complex than the non-probabilistic setting, the key concept

of preservation of potentials should still hold. We show that this is indeed the case by defining probabilistic

counterparts of concrete coloured trace equivalence and coloured trace equivalence, and show that these

coincide with strong bisimilarity and branching bisimilarity, respectively. A major advantage of (concrete)

coloured trace equivalence is that it can be understood without knowledge of probability theory and without

appealing to schedulers.

Another property of branching bisimilarity (one that is due to the alternating model, and which can also

be found for weak bisimilarity [18]), is the preservation of maximal probabilities. We show that branching

bisimilarity can be rephrased in terms of such maximal probabilities, thus yielding another alternative

definition of branching bisimulation. Apart from the more appetising phrasing that this yields, this result is

alsoatthebasisofthecomplexityresultsfordecidingbranchingbisimilarity. Wealsoprovidethealgorithm

for deciding branching bisimilarity.

Both alternative phrasings of branching bisimulation have their own merits and focus on orthogonal

aspects. We emphasise that together, these are instrumental in understanding branching bisimulation and

its properties for probabilistic systems.

This paper is outlined as follows. In section 2, we introduce the semantic model we use in the remain-

der of this paper, together with the notions of strong bisimulation and branching bisimulation. In section 3,

we prove that branching bisimulation can be rephrased in terms of maximal probabilities, and we discuss

the decidability of branching bisimulation in detail. Section 4 formalises the notions of colours and blends.

Then, in sections 5 and 6 we define concrete coloured trace equivalence and coloured trace equivalence

and we show that these two equivalence relations coincide with strong bisimilarity and branching bisimi-

larity, respectively. In section 7 we give an overview of related work, which in turn provides the motivation

for conducting this research in the first place. Section 8 summarises the results of this paper and addresses

issues for further research.

Acknowledgements. Thanks are due to Jos Baeten, Christel Baier, Holger Hermanns, Joost-Pieter Katoen,

Ana Sokolova and Frits Vaandrager for fruitful discussions and useful comments on the topics addressed

in this paper.

2 Semantic Model

We use graphs1to model probabilistic systems. The graphs we consider follow the strictly alternating

regime of Hansson [14]. They can be used to describe systems that have both non-deterministic and prob-

abilistic characteristics.

Graphs consist of two types of nodes: probabilistic nodes and non-deterministic nodes. These nodes are

connected by two types of directed edges, called probabilistic transitions and non-deterministic transitions.

The latter are labelled with actions from a set of action labels, representing atomic activities of a system

or with the unobservable event, which is denoted τ and which is not part of the set of action labels of any

graph. A graph, not containing τ-transitions is referred to as a concrete graph. The probabilistic transitions

1The model we use is also known as Labelled Concurrent Markov Chains. We use the term graph to stay in line with [13]

2

Page 3

model the probabilistic behaviour of a system. We assume the existence of a special node nil, which is not

part of the set of nodes of any graph. This node is used as a terminal node for all graphs.

Definition 2.1. A graph is a 7-tuple ?N,P,s,Act,→,?,pr?, where

• N is a non-empty finite set of non-deterministic nodes. We write Nnilfor the set N ∪ {nil}.

• P is a non-empty finite set of probabilistic nodes. We write Pnilfor the set P ∪ {nil}.

• s ∈ P is the initial node, also called root.

• Act is a finite set of action labels. We abbreviate the set Act ∪ {τ} with Actτ.

• →⊆ N × Actτ× Pnilis the non-deterministic transition relation. We require that for all n ∈ N,

there is at least one (n,a,p) ∈→ for some a ∈ Actτand p ∈ Pnil.

• ?⊆ P × N is a probabilistic transition relation.

• pr: ?→ (0,1] is a total function for which?

We write n

is denoted G. In the remainder of this paper, x,y,... range over G. We write Nx,Px,sx, etc. for the

components of the graph x, and use Sxto denote the union Px∪ Nx. We write Snil,xfor the set Sx∪ {nil}.

When x is the only graph under consideration, or when no confusion can arise, we drop the subscripts

altogether.

As a derived notion, we introduce the cumulative probability µ:Snil× 2Snil→ [0,1], which yields the

total probability of reaching a set of nodes via probabilistic transitions: µ(p,M)

p ∈ P and 0 otherwise.

n∈Npr(p,n) = 1 for all p ∈ P.

a− → p rather than (n,a,p) ∈→ and p ? n rather than (p,n) ∈?. The set of all graphs

def

=?

n∈M∩Npr(p,n) if

There are several variations on the graph model that we use throughout this paper. In [18], a more lib-

eral version is considered, in which the alternation between probabilistic transitions and non-deterministic

transitions is not as strict as in our model: in between two probabilistic transitions, one or more non-

deterministic transitions may be specified. Other variations allow for non-deterministic nodes as starting

nodes. From a theoretical point of view, these variations do not add to the expressive power of the model,

and the theory outlined in this paper easily transfers to those models.

2.1Strong Bisimulation

Equivalence relations can be seen as a characterisation of the discriminating power of specific observers.

Strong bisimilarity [17] is known to capture the capabilities of one of the most powerful observers that still

has some appealing properties. It compares the stepwise behaviour of nodes in graphs and relates nodes

when this behaviour is found to be identical.

Definition 2.2. Let x and y be graphs, let N = Nx∪ Nyand let P = Px∪ Py. A relation R ⊆ N2

is a strong bisimulation relation when for all nodes s and t for which sRt holds, we have

1. if s ∈ N and t ∈ N and s

2. if s ∈ P and t ∈ P then µ(s,M) = µ(t,M) holds for all M ∈ (Nnil∪ P)/R.

We say that x and y are strong bisimilar, denoted x ↔ y iff there is a strong bisimulation relation R such

that sxRsy.

A corollary of requirement 2 in the definition of strong bisimilarity is that all probabilistic nodes that can

be related by some strong bisimulation relation share the same cumulative probability of reaching another

equivalence class. This justifies the overloading of the notation µ for cumulative probability to denote the

probability of reaching a set of nodes from an entire equivalence class rather than from a single node. For

a strong bisimulation relation R, we define µ([s]R,M)

M ∈ (Nnil∪ P)/R.

Proposition 2.3. ↔ is an equivalence relation on G.

nil∪ P2

a− → s?then there is some t?, such that t

a− → t?and s?Rt?holds.

def

= µ(s,M) for arbitrary s ∈ P and arbitrary

?

3

Page 4

2.2 Paths, Probabilities and Schedulers

A decomposition of a graph in a set of so-named computation trees is necessary for further quantitative

analysis of the graph: rather than conducting the analysis on the graph itself, the computation trees are

analysed.

The decomposition requires all non-determinism in the graph to be resolved. This is typically achieved

by employing a scheduler (also known as adversary or policy). A scheduler resolves the non-determinism

by selecting at most one of possibly many non-deterministic transitions in each non-deterministic nodes. A

computation tree is then obtained from the graph by resolving a non-deterministic choice according to the

scheduler and keeping probabilistic information for the relevant nodes. Dependent on the type of scheduler,

this choice is based on e.g. some history, randomisation or local information.

Wesubsequentlyformalisethenotionofschedulers. Letxbeagraph. Apathstartinginanodes0∈ Snil

is an alternating finite sequence c ≡ s0l1...lnsn, or an alternating infinite sequence c ≡ s0l1s1 ... of

nodes and labels, where for all i ≥ 1, si∈ Sniland li∈ Actτ∪ (0,1] and

1. for all nodes sj∈ N (j ≥ 0), we require sj

2. for all nodes sj∈ P (j ≥ 0), we require sj? sj+1and lj+1= pr(sj,sj+1).

Pathsalwaysconsistofatleastonenode(itsstartingnode). Forapathcstartingins0, wewritefirst(c) = s0

for the initial node of c and, if c is a finite path, we write last(c) for the last node of c. The set of all nodes

occurring in c is denoted nodes(c). We denote the trace of c by trace(c), which is the sequence of action

labels from the set Actτ that occur in c. The concatenation of two paths is again a path: given a path

c ≡ s0l1...lnsn(for n ≥ 0) and a path c?with last(c) = first(c?), we denote their concatenation by c◦c?

and it is defined as the path s0l1...lnc?. If c ≡ (s0l1s1) ◦ c?we write rest(c) = c?.

The set of all paths starting in s0is denoted Path(s0) and the set of finite paths starting in s0is denoted

Pathf(s0). A path c is a maximal path iff c is a finite path with last(c) = nil or c is an infinite path. The

set of maximal paths starting in s0is denoted Pathm(s0).

lj+1

− − − → sj+1.

Definition 2.4. A scheduler of paths starting in a node s0is a partial function σ:Pathf(s0) ?→ (→ ∪{⊥})

(where ⊥ represents “halt”). If, for some c ∈ Pathf(s0), σ(c) is defined we require that the following two

conditions are met:

1. if last(c) ∈ N, then σ(c) = ⊥ or σ(c) = last(c)

2. if last(c) ∈ Pnil, then σ(c) = ⊥.

Moreover, we impose the following sanity restrictions on σ: for all c ∈ Pathm(s0) ∩ Pathf(s0), we have

σ(c) = ⊥ and for all c ∈ Pathf(s0) with last(c) ∈ N, we require that σ(c) is defined. We denote the

set of all schedulers of a node s0by Sched(s0). When defining schedulers, we will often leave the extra

definitions that are required to meet these sanity restrictions implicit and focus on the remaining rules.

a− → t for some a and t.

Remark that the second condition in the definition of a scheduler expresses that a finite path c ending in a

probabilistic node can only be scheduled (if scheduled at all) to ⊥. In case such a path is not scheduled,

then σ is defined for all extensions of this path by a probabilistic transition. This is also illustrated in

example 2.7 at the end of this section.

For most practical purposes, we are not interested in all paths of a graph, but only in those paths that

are scheduled by a given scheduler. Let σ ∈ Sched(s0) be a scheduler of a node s0in a graph x. We write

SPath(s0,σ) for the set of all finite and infinite paths c ≡ s0l1s1... where for each si∈ N we have

σ(s0l1s1 ... si) = si

− − → si+1. The set of maximal scheduled paths starting in s0that is induced by

σ is denoted SPathm(s0,σ) and contains all infinite scheduled paths and all finite scheduled paths c for

which σ(c) = ⊥.

Note that our sanity restrictions on schedulers turns finite maximal paths into finite maximal scheduled

paths (since the former are necessarily scheduled to ⊥). This is required for a proper definition of the

li+1

4

Page 5

probability space and a probability measure.

Several types of schedulers are defined in the literature, such as randomised schedulers, determinate

schedulers and history dependent schedulers. For the exhibition of the theory, we do not fix a specific type

of schedulers, but in section 3.3 we show that a particular type of scheduler, so-called simple schedulers

are sufficiently powerful for our purposes.

Definition 2.5. Let s0∈ Snilbe a node and let σ ∈ Sched(s0) be a scheduler. We say that σ is a simple

scheduler if for all c,c?∈ Pathf(s0) with last(c) = last(c?), σ(c) = σ(c?).

Obviously, for a graph x the set of all schedulers that can be defined for a given node s0may be infinite,

while the set of all simple schedulers for that graph is finite. This fact will be used in section 3.3 where an

algorithm for deciding branching bisimulation on graphs is given.

Definition 2.6. A probabilistic tree is a 7-tuple ?N,P,s,Act,→,?,pr?, where

• N is a non-empty countable set of non-deterministic nodes.

• P is a non-empty countable set of probabilistic nodes.

• → :N × Actτ→ Pnilis the non-deterministic transition function.

• s ∈ P, Act, ? and pr are defined along the lines of definition 2.1.

Graphs and probabilistic trees differ with respect to the non-deterministic branching degree that is allowed.

While graphs have finite non-deterministic branching degree, probabilistic trees have branching degree 1.

In other words, all non-deterministic transitions are uniquely determined by a pair consisting of a non-

deterministic node and an action label. Furthermore, the set of nodes of a graph are necessarily finite,

while probabilistic trees can have infinitely many nodes. It is well-known that probabilistic trees can be

used to represent fully probabilistic systems (see e.g. [1, 4]).

Every scheduler σ ∈ Sched(s0) for a graph x defines a probabilistic tree CTx(s0,σ) whose nodes are

finite scheduled paths in x. The probabilistic and non-deterministic transitions of CTx(s0,σ) are uniquely

defined by the transition relations of x and σ in the obvious way. The probabilistic tree CTx(s0,σ) is called

a computation tree starting in s0and induced by σ. When no confusion can arise we omit the index x. The

probabilistic transition relation ? of x is used to define a probability on a finite path in CTx(s0,σ). These

probabilities are then employed to define a probability measure for the probability space associated to σ.

We proceed with the formal definitions. Let c ≡ s0l1...lnsnbe a finite path. Then, the probability of c,

denoted P(c), is defined as:

1. P(c) =?

2. P(c) = 1 otherwise.

li∈(0,1]liif at least one li∈ (0,1] for 1 ≤ i ≤ n.

Let c be a finite scheduled path. Then, the basic cylinder of c, induced by σ is given by

c↑ = {c?∈ SPathm(s0,σ) | c is a prefix of c?}

The probability measure of c↑, denoted by P(c↑) is defined as P(c↑) = P(c). The probability space

(Ωσ,Fσ,Pσ) induced by σ ∈ Sched(s0) is defined as follows2:

1. Ωσ= SPathm(s0,σ)

(1)

2. Fσis the smallest sigma-algebra on SPathm(s0,σ) that contains all basic cylinders c↑ for c a finite

scheduled σ-path

2Note that we here overload the notation P.

5

Page 6

3. Pσis a probability measure on Fσis completely defined by P(·).

Let CT(s0,σ) be a computation tree for graph x. Recall that every node in CT(s0,σ) is a finite path in x

starting in s0. We say that a node t of x appears (or, it has an appearance) in CT(s0,σ) if there is a node

c in CT(s0,σ) such that last(c) = t. In case we are also interested in the node c of CT(s0,σ) that gives

rise to an appearance of a node t of x in CT(s0,σ), we say that t is due to node c. In general, there may

be more nodes c,c?in CT(s0,σ) to which t is due. To distinguish between these, we sometimes reason

about the occurrence tcwhen we mean that t is due to the node c in CT(s0,σ). Note that from the context,

it is always clear whether we mean the node c in the computation tree or the node t in the graph when we

reason about a particular occurrence tc. We say that tcand tc? are different occurrences of t in CT(s,σ) iff

c ?= c?.

Let tcbe an occurrence of t due to node c in CT(s,σ). Note that by definition, we have c ∈ Pathf(s)

with last(c) = t. The scheduler σ that is used to obtain the computation tree CT(s,σ) is said to induce a

scheduler (σ−c) ∈ Sched(t). This scheduler is defined as follows:

for all paths c?∈ Pathf(t)

Clearly, when we consider the path consisting of a single node s, we obtain (σ−s)(c?) = σ(c?) for all

c?. This induced scheduler (σ−c) agrees with the original scheduler σ ∈ Sched(s), but its “starting”

node is shifted towards some other node, and therefore, it only defines a computation tree that starts in

last(c). This means that the computation tree CT(last(c),(σ−c)) yields a subtree of the computation tree

CT(s,σ). Finally, we define the depth of an occurrence tc, which is given by the depth of the node c in the

computation tree.

The notions that we have introduced thus far are illustrated in example 2.7.

(σ−c)(c?) = σ(c◦c?)

(2)

Example 2.7.

because of the non-deterministic node n that has two outgoing non-deterministic transitions. It is also not

a concrete graph because one of the transitions is labelled with τ.

Consider the graph of figure 1 with initial node p. The graph is not fully probabilistic

1

a

1

2

cb

1

2

τ

p

k

q

m

nil

n

nil

Figure 1: A graph with an unobservable self-loop

The set of paths, starting in p is as follows:

Path(p)=

{p, p 1 k} ∪ {p 1 k a q (1

∪{p 1 k a q (1

∪{p 1 k a q (1

2n τ q)i| i ≥ 0 } ∪ {p 1 k a q (1

2n b nil | i ≥ 0} ∪ {p 1 k a q (1

2n τ q)i 1

2n τ q)i 1

2n τ q)i 1

2n τ q)ω}

2n | i ≥ 0}

2m | i ≥ 0}

2n τq)i 1

2m c nil | i ≥ 0} ∪ {p 1 k a q (1

Among this set, the only infinite path is p 1 k a q (1

paths are the infinite path and all paths that end in nil:

2n τ q)ω; all remaining paths are finite. The maximal

Pathm(p)=

{p 1 k a q (1

∪{p 1 k a q (1

∪{p 1 k a q (1

2n τq)i 1

2n τ q)i 1

2n τ q)ω}

2n b nil | i ≥ 0}

2m c nil | i ≥ 0}

6

Page 7

To illustrate the effect of a particular scheduler on the set of paths of a graph we consider the following

scheduler:

?

Note that we have left some parts of the definition of σ1implicit: finite maximal paths and paths ending in

non-deterministic nodes are not (correctly) covered by σ1. By our convention (see definition 2.4), σ1as-

signs⊥tothosepathswhentheyarenotexplicitlydefined. WefindthatSPath(p,σ1) = {p,p1k,p1k aq,

p 1 k a q1

p 1 k a q1

scheduling stops at this point. On the contrary, it is defined for all extensions of the path p 1 k a q, which

are obtained using one of the specified probabilistic transitions, in this case for the paths p 1 k a q1

p 1 k a q1

σ1(p 1 k) = k

σ1(c)

a− → q

is undefined for any other finite path c

2n,p 1 k a q1

2m}. Remark that the scheduler is undefined for p 1 k a q. This does not mean, however, that the

2m}, and its subset of maximal scheduled paths is SPathm(p,σ1) = {p 1 k a q1

2n,

2n and

2m.

The second scheduler we consider is slightly more involved, and we use it to illustrate the probability

of sets of paths. Let σ2be defined as follows:

We find that SPath(p,σ1) = {p,p 1 k,p 1 k a q,p 1 k a q1

its subset of maximal scheduled paths is SPathm(p,σ1) = {p 1 k a q

probability P, for various paths is as follows: P((p)↑) = P((p 1 k)↑) = P((p 1 k a q)↑) = 1, while

P((p 1 k a q1

though σ2is defined for a path such as p 1 k a q (1

since it is not a scheduled path by σ2.

σ2(p 1 k) = k

σ2(p 1 k a q1

σ2(p 1 k a q (1

σ2(c)

a− → q

2n) = n

2n τ q)10 1

b− → nil

2n) = n

b− → nil

is undefined for the remaining finite paths c

2n,p 1 k a q1

2m,p 1 k a q1

2n b nil,p 1 k a q

2n b nil}, and

1

2m}. The

1

2n)↑) =1

2, P((p 1 k a q1

2n b nil)↑) =1

2n τ q)10 1

2and P((p 1 k a q1

2n, this finite path is not a node in CT(p,σ2),

2m)↑) =1

2. Note that even

The last scheduler that we consider is defined as follows:

Note that σ3is a simple scheduler, as it always schedules a non-deterministic transition on the basis of the

last non-deterministic node of the path. The set of scheduled paths is as follows:

σ3(p 1 k) = k

σ3(c) = n

σ3(c) = p

σ3(c)

a− → q

τ− → q

c− → nil

if last(c) = n

if c is such that last(c) = m

is undefined for any other finite path c

SPath(p,σ3)=

{p,p 1 k} ∪ {p 1 k a q (1

∪{p 1 k a q (1

∪{p 1 k a q (1

2n τ q)i 1

2n τ q)i| i ≥ 0} ∪ {p 1 k a q (1

2m | i ≥ 0 }

2n τ q)i 1

2n τ q)i 1

2n | i ≥ 0 }

2n τ q)i 1

2m c nil | i ≥ 0 } ∪ {p 1 k a q (1

2m c nil | i ≥ 0 } ∪ {p 1 k a q (1

2n τ q)ω}

2n τ q)ω}. For every and SPathm(p,σ3) = {p 1 k a q (1

c ∈ SPathf(p,σ3) we have c↑ = SPathm(p,σ3) and P(SPathm(p,σ3)) = 1. Furthermore, the node n

appears in the computation tree CT(p,σ3). It has several different occurrences. For instance, consider the

nodes c1≡ p 1 k a q1

the occurrences nc1and nc2are different occurrences of n in CT(p,σ3).

2n τ q1

2n and c2≡ p 1 k a q1

2n τ q1

2n τ q1

2n in CT(p,σ3). Then we say that

Note that CT(p,σ1) and CT(p,σ2) are finite computation trees while CT(p,σ3) is infinite.

2.3Branching Bisimulation

Strong bisimilarity is most appropriate when considering concrete graphs, but the equivalence is too fine

in a setting with abstraction. This is because it treats the unobservable event τ as if it were any other

7

Page 8

observable event. While abstraction is of utmost importance in the analysis of probabilistic systems, it is

also one of the harder notions to grasp. This is because the unobservable event (represented by the action

τ) plays an almost diabolical role: while the τ itself may not be visible, its effect might be noted by the

disabling or enabling of some observable events. For instance, while the inspection of a coin that has been

put in a coffee-machine may be unobservable, it manifests itself through a (consistent) rejection of that

particular coin. This illustrates that we cannot bluntly remove all τ actions from a graph: only the ones that

do not manifest themselves may be removed. We call such τ actions inert.

The equivalence relation we define in this section is called branching bisimilarity. It is strictly in be-

tween strong bisimilarity and weak bisimilarity (for the latter see e.g. [18]). Branching bisimilarity enjoys

several pleasing properties. Unlike strong bisimilarity, it treats the inert τ actions as unobservable. Further,

in contrast to weak bisimilarity, it preserves the non-deterministic branching structure of graphs. This is

due to the fact that it differentiates between τ actions that are truly inert and τ actions that are not really

inert.

We briefly repeat one of the central ideas behind branching bisimulation from the non-probabilistic

setting (see e.g. [3, 13]). The crucial point in that setting is that a node t can be related to a node s by

a branching bisimulation relation only whenever all (observable) transitions s

matched by transitions t

branching bisimulation relation. Unlike in e.g. weak bisimulation or delay bisimulation, it is required that

this sequence of τ transitions traverses through nodes that all can be related to s (see figure 2). In our

setting, the sequences of transitions readily translate to paths.

a− → s?from node s can be

τ− → ···

τ− → t??

a− → t?from node t such that t?can again be related to s?by the

s

a

τa

tt??

t?

s?

Figure 2: Branching bisimulation in a non-probabilistic setting.

Before we turn to the definition of branching bisimulation, we fix some shorthand notation to ease

notational burden and to capture the ideas depicted by figure 2 in a formal framework. Let c be an arbitrary

finite path. Then the path c satisfies a path-predicate φ, denoted by c sat φ, is defined as follows for the

following path-predicates:

1. c sat s =⇒Ms?iff first(c) = s, last(c) = s?, trace(c) = τ∗and nodes(c) ⊆ M.

2. c sat s =⇒M·

3. c sat s =⇒M· ? s??iff ∃c?∃l ∈ (0,1] : c ≡ c?ls??and c?sat s =⇒Mlast(c?).

Note that by requiring that c is a finite path, we have last(c) = s??in the last two path-predicates. Moreover,

we also find that last(c?)

a finite number of nodes from the set M may be visited, provided that this does not require the execution

of an observable action (unless, as is stated for the second path-predicate, it is the last action and s??∈ M).

Proposition 2.8. Let s,s?∈ Snil, a ∈ Actτand l ∈ (0,1]. Let M ⊆ Snilbe such that s ∈ M. Then

1. s sat s =⇒Ms.

2. (s a s?) sat s =⇒M·

3. (s l s?) sat s =⇒M· ? s?.

a− → s??iff ∃c?: c ≡ c?as??and c?sat s =⇒Mlast(c?).

a− → s??(resp. pr(last(c?),s??) = l). The intuition behind the path-predicates is that

a− → s?.

8

Page 9

Let σ be a scheduler, and let M,M?be sets of nodes. Let Bσ(s

σ-scheduled paths that start in s and silently (i.e. using τ actions) traverse through a set of nodes M and

reach a node in M?by executing a given a action (a ∈ Actτ). More concretely, let Bσ(s

defined as follows:

a

=⇒M M?) be the set of all maximal

a

=⇒MM?) be

Bσ(s

a

=⇒MM?) =

{c ∈ SPath(s,σ) | σ(c) = ⊥ and either

c sat s =⇒M·

c sat s =⇒M· ? s?,s?∈ M?,a = τ, or

c ≡ s,a = τ,M = M?}

a− → s?,s?∈ M?, or

(3)

When a = τ, we generally write Bσ(s =⇒MM?) instead of Bσ(s

function µ to denote the normalised cumulative probability. Given two disjoint, non-empty sets of nodes

M and M?and a node p ∈ M, the function µM(p,M?) is used to denote the probability of entering M?

from p (in one step) weighted by the probability of remaining in M. Formally, we have

Definition 2.9. Let x and y be graphs. Let N = Nx∪Ny, P = Px∪Py, S = Sx∪Syand Snil= S ∪{nil}.

Let R be an equivalence relation on Snil. R is a branching bisimulation relation when for all nodes s and

t for which sRt holds, we have

1. if s ∈ N and s

2. if s ∈ P then for some scheduler σ, µ[s]R(s,M) = P(Bσ(t =⇒[t]RM)) for all M ∈ Snil/R\

{[s]R}.

We say that x and y are branching bisimilar, denoted x ↔by iff there is a branching bisimulation relation

R on Snil, such that sxRsy.

In words, branching bisimilarity requires all non-deterministic transitions (i.e. also the inert τ transitions)

emanating from a node in an equivalence class to be schedulable from all nodes related to that node, with

probability 1. We say that all nodes in the same equivalence class have the same potentials. The second

condition requires that a single scheduler for a node can be used to simulate the normalised cumulative

probability of a related probabilistic node. This particular scheduler can be employed to find a “silent” path

(i.e. a path with unobservable actions only) through a set of nodes that are related to the originating node

before it leaves this class of nodes and reaches another equivalence class. This last step is done either via

the execution of another τ action or by a probabilistic transition.

τ

=⇒MM?). Next, we overload the

µM(p,M?) =

µ(p,M?)

1 − µ(p,M)

0

if p ∈ P and µ(p,M) ?= 1

otherwise

(4)

a− → s?, then there is a scheduler σ, such that P(Bσ(t

a

=⇒[t]R[s?]R)) = 1.

Example 2.10. Consider the two graphs of figure 3. We find that the two graphs are branching bisim-

ilar. For instance, the non-deterministic node k and the probabilistic node p?are in the same equiva-

lence class. This can be seen as follows. Suppose that R is the branching bisimulation relation. We

have µ[p?]R(p?,[n?]R) = µ[p?]R(p?,[m?]R) =

σ ∈ Sched(k) as follows:

Using this scheduler, we find that P(Bσ(k =⇒[k]R[n]R)) = P(Bσ(k =⇒[k]R[m]R)) =1

node p?is in the same equivalence class as node k, we must show the existence of a scheduler that mimics

the non-deterministic τ-transition of node k with probability 1. This boils down to preventing p?from

leaving its own class, which is achieved by the scheduler η ∈ Sched(p?), defined as η(p) = ⊥. Nota Bene:

in section 8, we use the same example to illustrate that using branching bisimulation in the non-alternating

setting, the two graphs are not branching bisimilar. The crux turns out to be the non-deterministic node k.

?

1

2. To mimic these probabilities, we define a scheduler

σ(k (τ p1

σ(k τ p (1

σ(k τ p (1

σ(c?)

5k)i) = k

5k τ p)i 2

5k τ p)i 2

τ− → p

5n) = ⊥

5m) = ⊥

for all i ≥ 0

for all i ≥ 0

for all i ≥ 0

is undefined for any other finite path c?

2. To see that

9

Page 10

1

1

2

cb

1

2

1

2

cb

1

2

τa

1

a

a

bc

τ

τ

2

3

1

5

1

3

2

5

2

5

k

mn

n?

m?

p

p?

Figure 3: Two branching bisimilar graphs.

3 Branching Bisimilarity: Maximal Probabilities and Decidability

Finding a branching bisimulation relation between two graphs can be quite hard. The culprit is the fact that

in both conditions of the branching bisimulation relation definition, a quantification over an infinite set of

schedulers appears. From this set, a scheduler must be picked that meets the conditions of the bisimulation

relation. Moreover, this feat must be repeated for all nodes of the two graphs, making the entire process of

checking for branching bisimulation rather cumbersome and even problematic to automate.

As we will show in this section, the above problems are not insurmountable. For instance, Philippou et

al. [18] showed that weak bisimilarity can be rephrased in terms of maximal probabilities. Since branching

bisimulation and weak bisimulation are closely related, this raises the question whether also branching

bisimulation might be rephrased in terms of maximal probabilities. In section 3.2 we give an affirmative

answer to this question. This result allows us to narrow down the choice of schedulers to those schedulers

that induce maximal probabilities.

This result is at the basis of a decision procedure for branching bisimulation. Instead of the infinite set

of schedulers that must be checked in definition 2.9, we can now narrow down the search criterion to those

schedulers that induce a maximal probability.

We first introduce some auxiliary notation in section 3.1. Some of this notation will only be used in

the main proofs in section 3.2, in which we show that definition 2.9 can be rephrased in terms of maximal

probabilities. Then, in section 3.3 we provide results for deciding branching bisimulation, together with

the algorithm for doing so.

3.1 Preliminaries

For the remainder of this section, we fix a graph x. Let a ∈ Actτand M,M?⊆ Snil, and let s ∈ Snil.

In section 2.3 (equation 3), we introduced the notation Bσ(s

silently traverse through M?before executing action a and reaching M. The probability of this set of paths

is highly dependable on the scheduler σ. Given that the set of probabilities is ordered, we can search for

the maximal probability among this set by selecting an appropriate scheduler. We introduce the following

notation:

a

=⇒M? M) for a set of scheduled paths that

Pmax(s

a

=⇒M? M)

def

= max

σ∈Sched(s)P(Bσ(s

a

=⇒M? M))

(5)

Ifa = τ weomitaandwesimplywritePmax(s =⇒M? M). Notethateventhoughthemaximalprobability

is a unique number, this does not mean that that there is necessarily a single scheduler that induces this

maximal probability.

10

Page 11

The following series of propositions are useful in understanding the interplay between maximal proba-

bilities, branching bisimulation and (operations on) schedulers.

Proposition 3.1. Let R be a branching bisimulation relation on Snil. Let a ∈ Actτbe an action and let

M ∈ Snil/Rbe an equivalence class. For all nodes s ∈ Snilwith Pmax(s

every t ∈ [s]R, Pmax(t

Proof.

The result follows directly from the definition of branching bisimulation. Namely, the existence

of a node t ∈ [s]Rfor which Pmax(t

Pmax(s

Now let us assume that R is an equivalence relation on Snil. Let a ∈ Actτ and M ∈ Snil/R. Let σ ∈

Sched(s) be a scheduler such that Pmax(s

that sRt and let tcbe an occurrencein the computation tree CT(s,σ).

Proposition 3.2. Let (σ−c) ∈ Sched(t) be the scheduler induced by σ, as defined in section 2.2. Then

Pmax(t

Proof.

We prove this by contradiction. Assume that Pmax(t

This implies Pmax(s

induces maximal probabilities. Therefore we find that Pmax(t

which finishes the proof.

a

=⇒[s]RM) = 0 we find that for

a

=⇒[t]RM) = 0.

a

=⇒[t]RM) > 0 is in immediate conflict with the assumptions

a

=⇒[s]RM) = 0 and sRt.

?

a

=⇒[s]RM) = P(Bσ(s

a

=⇒[s]RM)). Let t be a node in x such

a

=⇒[t]RM) = P(B(σ−c)(t

a

=⇒[t]RM)).

a

=⇒[t]RM) > P(B(σ−c)(t

=⇒[s]RM)) which contradicts our assumption that σ

a

=⇒[t]RM) = P(B(σ−c)(t

a

=⇒[t]RM)).

a

=⇒[s]RM) > P(Bσ(s

a

a

=⇒[t]RM)),

?

Vice versa, assume that η ∈ Sched(t) is such that Pmax(t

induces the maximal probability of reaching M from t. Let σ be as defined above, and let σ+∈ Sched(s)

be the scheduler defined as follows:

?

Proposition 3.3. We find Pmax(s

The remaining shorthand notations and propositions are used mainly in the proofs that appear in the next

two sections. As such, they can be skipped on first reading this paper.

a

=⇒[t]RM) = P(Bη(t

a

=⇒[t]RM)), i.e. η

σ+(c) = σ(c)

σ+(c) = η(c2)

if c is such that t / ∈ nodes(c)

if there is a c1, such that c ≡ c1◦c2with t / ∈ nodes(c1) and first(c2) = t.

a

=⇒[s]RM) = P(Bσ+(s

a

=⇒[s]RM)) = P(Bσ(s

a

=⇒[s]RM)).

?

Definition 3.4. Let s,t ∈ Snilbe arbitrary nodes, and let σ ∈ Sched(s) be a scheduler starting in node s.

Let M,M?⊆ Snilbe subsets of Snil. We introduce the following two shorthands:

Bσ(s

a

=⇒M−tM?)

def

= {c ∈ Bσ(s

a

=⇒MM?) | c ≡ s or c ≡ (s l s?)◦c?for some l ∈ Actτ∪ (0,1],

s?∈ Sniland path c?satisfying t / ∈ nodes(c?)}

and

Bσ(s

a

=⇒M+tM?)

a

=⇒M−tM?) denotes the subset of Bσ(s

through t; if s = t then it starts in t but it never returns to node t again. The complement of this set is

given by the subset Bσ(s

leaving the root node. Now, when s = t it denotes the set of paths that start in t and that returns to t at least

once more.

def

= Bσ(s

a

=⇒MM?) \ Bσ(s

a

=⇒M−tM?)

a

=⇒MM?) containing all paths that do not passIn words, Bσ(s

a

=⇒M+tM?), which contains all paths that do pass through t at least once after

Proposition 3.5. P(Bσ(s

Proof. Standard result from probability theory.

a

=⇒MM?)) = P(Bσ(s

a

=⇒M−tM?)) + P(Bσ(s

a

=⇒M+tM?)).

?

Proposition 3.6. If for some σ ∈ Sched(s), we find Pmax(s

then there is an occurrence scin CT(s,σ), satisfying P(B(σ−c)(s

a

=⇒[s]RM) = P(Bσ(s

a

=⇒[s]R−sM)) > 0.

a

=⇒[s]RM)) > 0

11

Page 12

Proof.

obtain that each path starting in the root s contains countably infinitely many different occurrences sciand

therefore it never reaches M. Then, Bσ(s

Corollary 3.7. Let σ be as defined in proposition 3.6. Then for any occurrence scin CT(s,σ), we find

B(σ−c)(s

By assuming that for every occurrence scin CT(s,σ), P(B(σ−c)(s

a

=⇒[s]R−sM)) = 0 we

a

=⇒[s]RM) = ∅ and therefore, Pmax(s

a

=⇒[s]RM) = 0.

?

a

=⇒[s]RM) ?= ∅ and therefore, P(B(σ−c)(s

a

=⇒[s]RM)) > 0.

?

3.2Branching bisimulation using Maximal Probabilities

Using the concept of maximal probabilities as outlined in the previous section, we show that the definition

of branching bisimulation can be rewritten to an equivalent definition in which we employ the notion of

maximal probabilities. This is stated by the following theorem.

Theorem 3.8. Let x and y be two graphs, and denote the set of their nodes by S. Let Snil= S ∪ {nil}.

Then, a relation R on Snilis a branching bisimulation relation iff the following two conditions are met for

all nodes s,t ∈ Snilsatisfying sRt:

1. Pmax(s

2. Pmax(s =⇒[s]RM) = Pmax(t =⇒[t]RM) for all M ∈ Snil/R\ {[s]R}.

The remainder of this section is devoted to proving the above theorem. The two directions of the proof

will be discussed separately. For the implication, we prove that branching bisimilar nodes have the same

maximal probabilities of executing actions or reaching other equivalence classes. Due to the different way

in which branching bisimulation treats the unobservable event τ and observable actions a ∈ Act, we split

the proof for our claim for these two classes of events. In lemma 3.9 we deal with the τ transitions and in

lemma 3.11 we prove the claim for actions a ∈ Act. Lemma 3.12 states that by requiring equal maximal

probabilities we also obtain a branching bisimulation relation.

Fix two graphs x and y, and denote the set of their probabilistic nodes by P and the set of their non-

deterministic nodes by N. We write S = P ∪ N and Snil= S ∪ {nil}.

Lemma 3.9. Let R be a branching bisimulation on Sniland C ∈ Snil/R.

i. If s,t ∈ C, then Pmax(s =⇒CM) = Pmax(t =⇒CM), for all M ?= C.

ii. If s ∈ P ∩C and µ(s,C) ?= 1 then for all M ∈ Snil/Rwith M ?= C, Pmax(s =⇒CM) = µC(s,M).

a

=⇒[s]RM) = Pmax(t

a

=⇒[t]RM) for all a ∈ Act and M ∈ Snil/R.

Proof. We first show that by employing claim (i.), the second claim follows straightforwardly. We then

proceed to prove claim (i.).

(ii.) We distinguish two cases. Suppose µ(s,C) = 0. In this case, the claim follows immediately. Now,

suppose that µ(s,C) ?= 0, then using claim (i.) we find:

Pmax(s =⇒CM)

s?∈C

=

µ(s,M) + Pmax(s =⇒CM) ·

=

µ(s,M) + Pmax(s =⇒CM) · µ(s,C)

=

µ(s,M) +

?

pr(s,s?) · Pmax(s?=⇒CM)

?

s?∈C

pr(s,s?)

from which we obtain: Pmax(s =⇒CM) =

(i.), which together with the above line of reasoning finishes the proof for claim (ii.).

µ(s,M)

1−µ(s,C)= µC(s,M). We next focus on the proof of claim

(i.) Consider an arbitrary equivalence class C ∈ Snil/R. We observe that claim (i.) follows immediately

when |C| = 1, so the interesting case is when |C| > 1. So, assume that |C| > 1.

We first focus on the non-deterministic nodes in class C: assume that n ∈ C∩N. If Pmax(n =⇒CM) ?= 0,

then either n

and therefore (by definition of branching bisimulation) for all other t ∈ C, Pmax(t =⇒CM) = 1, and the

τ− → p for some p ∈ M or n

τ− → p for some p ∈ C ∩P. In the first case, Pmax(n =⇒CM) = 1

12

Page 13

result follows. In the second case, Pmax(n =⇒CM) = max{ Pmax(p =⇒CM) | n

other words, for some pn∈ C ∩ P such that n

Pmax(n =⇒CM) = Pmax(pn=⇒CM).

Consequently, it suffices to investigate probabilistic nodes only. Let us assume that there is a node s in C

with the highest maximal probability to reach M among the other nodes in C and that there is a node t ∈ C

with the strictly smaller maximal probability to reach M. We will show that the assumption that such a

node t exists leads to contradiction. Formally, let us assume that

τ− → p and pRn}. In

τ− → pn, we find:

(6)

∃s ∈ C ∩ P : ∀s?∈ C : Pmax(s =⇒CM) ≥ Pmax(s?=⇒CM)

∧ ∃t ∈ C : Pmax(s =⇒CM) > Pmax(t =⇒CM)

Depending on the probability of s to leave the class C in one transition (i.e. the value of µ(s,C)) we

distinguish three cases. We show that each case leads to a contradiction of assumption (7).

(7)

(= 0:) Assume that µ(s,C) = 0. From the definition of branching bisimulation we immediately obtain

Pmax(s =⇒C M) = µ(s,M), since by all transitions emanating from s the class C is left. Since

t ∈ C, again by the definition of branching bisimulation there is a scheduler σt∈ Sched(t) such that

P(Bσt(t =⇒CM)) = µ(s,M), from which we obtain

Pmax(t =⇒CM) ≥ Pmax(s =⇒CM).

But this leads to an immediate violation of assumption (7).

(?= 0:) Assume that 0 < µ(s,C) < 1. Then:

Pmax(s =⇒CM)=

µ(s,M) +

µ(s,M) +

µ(s,M) + µ(s,C) · Pmax(s =⇒CM).

?

s?∈C,pr(s,s?)>0

s?∈C

pr(s,s?) · Pmax(s?=⇒CM)

?

=

pr(s,s?) · Pmax(s?=⇒CM)

≤

Hence, Pmax(s =⇒CM) ≤

and the fact that sRt we find that there is a scheduler σt ∈ Sched(t) such that P(t =⇒C M) =

µC(s,M). Hence, we obtain

Pmax(t =⇒CM) ≥ Pmax(s =⇒CM).

This is again in contradiction with assumption (7).

µ(s,M)

1−µ(s,C)= µC(s,M). But by the definition of branching bisimulation

(= 1:) Assume that µ(s,C) = 1. Then Pmax(s =⇒C M) =

assume that there is a node s?∈ C such that s ? s?and Pmax(s =⇒C M) > Pmax(s?=⇒C M).

Together with assumption (7) and using the fact that

?

n:s?npr(s,n) · Pmax(n =⇒C M). Now,

?

n:s?npr(s,n) = µ(s,C) = 1 we immediately

arrive at a contradiction:

Pmax(s =⇒CM)=

?

n:s?npr(s,n) · Pmax(n =⇒CM)

?

Pmax(s =⇒CM)

<

n:s?npr(s,n) · Pmax(s =⇒CM)

=

Therefore, we have that for all n ∈ C such that s ? n

Pmax(s =⇒CM) = Pmax(n =⇒CM).

(8)

We continue by assuming that σ ∈ Sched(s) is a scheduler that yields Pmax(s =⇒CM). We now

consider the computation tree CT(s,σ). We will first show that all nodes from C that appear in

13

Page 14

CT(s,σ) have the same maximal probability as s to (silently) reach M. Clearly, it is possible that

not all nodes from C appear in CT(s,σ). In order to prove the claim for those nodes, we show that

at least for one probabilistic node s?which appears in CT(s,σ), µ(s?,C) ?= 1 holds. Then the result

follows from the previous two cases that we considered.

First we observe that for every node p that appears in CT(s,σ) the scheduler σ induces at least one

scheduler σp∈ Sched(p) satisfying:

Pmax(p =⇒CM) = P(Bσp(p =⇒CM)).

(9)

Second, let pnbe a probabilistic node in CT(s,σ) with depth 3, that is, s ? n and n

n. (Note that this means that for a path s l n, the scheduler σ schedules transition n

σ(sl n) = n

tree CT(s,σ). Clearly, P(Bσn(n =⇒C M)) = P(Bσpn(pn =⇒C M)). Moreover, from (9) it

follows that Pmax(n =⇒CM) = P(Bσn(n =⇒CM)) and Pmax(pn=⇒CM) = P(Bσpn(pn=⇒C

M)). According to equation (8), we find that Pmax(s =⇒C M) = Pmax(n =⇒C M). From this,

we can conclude that Pmax(s =⇒C M) = P(Bσpn(pn=⇒C M)) = Pmax(pn=⇒C M). Using

induction on the depth of the probabilistic node in CT(s,σ) the claim that for every node s?that

appears in CT(s,σ), Pmax(s?=⇒CM) = Pmax(s =⇒CM) can be shown to hold.

For the nodes that do not appear in CT(s,σ), we proceed as follows: Now, if we assume that for

every node s?that appears in CT(s,σ), µ(s?,C) = 1 holds, then we obtain that Pmax(s =⇒CM) = 0

which contradicts (7). Therefore, there is a node p that appears in CT(s,σ) with µ(p,C) < 1 and

for which Pmax(p =⇒CM) = Pmax(s =⇒CM) as proven above. The conclusion follows from the

previous analysis.

τ− → pnfor some

τ− → pn, that is,

τ− → pn.) Weidentifythenodepninthegraphwiththenodesl nτ pninthecomputation

Summarising, we conclude that assumption (7) leads to a contradiction. Hence, we have proven that the

following claim holds:

∀s,s?∈ C : Pmax(s =⇒CM) = Pmax(s?=⇒CM).

(10)

?

Lemma 3.9 shows that all nodes in one equivalence class have the same maximal probabilities to reach

another class M via a set of τ-paths. Moreover, this maximal probability equals the normalised cu-

mulative probability of reaching that class. Henceforth, we use the notation Pmax(s =⇒[s]RM)) and

µ[s]R([s]R,M) interchangeably for all s ∈ Snil.

Proposition 3.10. Let R be a branching bisimulation relation on Sniland let C ∈ Snil/R. If there is a node

n ∈ C such that n

M ?= C.

Proof. This follows from the definition of branching bisimulation. (Hint: we can conclude that for each

q ∈ P ∩ C if q ? n?then n?∈ C.)

From this proposition, we immediately find the following lemma, which together with lemma 3.9 wraps up

the proof for the implication part of theorem 3.8.

a− → p for a ?= τ, then µC(C,M) = 0 for all equivalence classes M ∈ Snil/Rsatisfying

?

Lemma 3.11. Let R be a branching bisimulation relation on Snil. Let a ∈ Act be an action. If sRt, then

Pmax(s

Proof. From proposition 3.10 it follows that for every node s, all actions a ∈ Act and equivalence classes

M, Pmax(s

Next, we focus on the proof of the contraposition of theorem 3.8. We repeat this part of the theorem as

lemma 3.12.

a

=⇒[s]RM) = Pmax(t

a

=⇒[t]RM).

a

=⇒[s]RM) = 0 or Pmax(s

a

=⇒[s]RM) = 1. The claim then follows immediately.

?

14

Page 15

Lemma 3.12. Let R be an equivalence relation on Snil. Then R is a branching bisimulation relation if for

all nodes s,t ∈ Snil, for which sRt holds, the following two conditions are met:

i. Pmax(s

ii. Pmax(s =⇒[s]RM) = Pmax(t =⇒[t]RM) for all M ∈ Snil/R\ {[s]R}.

a

=⇒[s]RM) = Pmax(t

a

=⇒[t]RM) for all a ∈ Act and M ∈ Snil/R

Proof. Assume that R satisfies the conditions of the above lemma. We need to prove that R satisfies the

two conditions of definition 2.9.

a. Let sRt, s ∈ N and s

follows that Pmax(t

condition of definition 2.9.

a− → s?. Hence, Pmax(s

=⇒[t]R[s?]R) = 1 as well, which (directly) implies the correctness of the first

a

=⇒[s]R[s?]R) = 1. From the first condition (i.) it

a

b. Let s ∈ P and sRt. We distinguish two cases:

b.1 Assume that µ(s,[s]R) = 1. Then µ(s,M) = 0 for all M ∈ Snil/R\ {[s]R}. We define

a scheduler σ ∈ Sched(t) as σ(t) = ⊥. Then P(Bσ(t =⇒[t]RM)) = 0 and therefore,

µ[s]R(s,M) = P(Bσ(t =⇒[t]RM)) for all M ∈ Snil/R\ {[s]R}.

b.2 Assume that µ(s,[s]R) < 1 and M ∈ Snil/R\ {[s]R}. Then

Pmax(s =⇒[s]RM) = µ(s,M) + µ(s,[s]R) · Pmax(s =⇒[s]RM)

from which we derive

Pmax(s =⇒[s]RM) = µ[s]R(s,M).

(11)

Subcase b.2.1: Assume that there is an n ∈ [s]R∩N such that n

M) = 1 it follows that Pmax(s =⇒[s]RM) = 1 as well. Moreover, from (11) we have that

µ[s]R(s,M) = 1. Therefore, µ[s]R(s,M?) = 0 for M??= M and M??= [s]R, since µ[s]R(s,·)

is a probability mass function over set Snil\[s]R.3Hence, using (11) we obtain Pmax(s =⇒[s]R

M?) = 0 for every M?∈ Snil/R\ {[s]R,M}. Then sRt implies Pmax(t =⇒[t]RM) = 1

and Pmax(t =⇒[t]RM?) = 0. We take a scheduler η ∈ Sched(t) such that P(Bη(t =⇒[t]R

M)) = Pmax(t =⇒[t]RM) for which P(Bη(t =⇒[t]RM?)) = 0, M ?= M?and M??= [s]R,

since this holds for any scheduler in Sched(t). Thus η is the required scheduler in the second

condition of definition 2.9.

Subcase b.2.2: We next analyse the case in which for all n ∈ [s]R∩N, n

We aim to show that all maximal probabilities Pmax(t =⇒[t]RM), for M ∈ Snil/R\ {[s]R},

can be obtained by a single scheduler η ∈ Sched(t). This, together with (11), brings the proof

to an end.

First, we sketch the approach we take. Given two arbitrary equivalence classes M1(M1?=

[t]R)andM2(M2?= [t]R), wewillshowthatifoneschedulerinducesthemaximalprobability

Pmax(t =⇒[t]RM1) then the same scheduler induces the maximal probability Pmax(t =⇒[t]R

M2). This procedure can be extended and generalised over all equivalence classes Mi ∈

Snil/R\ {[s]R} for which Pmax(t =⇒[t]RMi) > 0.

So, assume that M1,M2∈ Snil/R\{[t]R} with Pmax(t =⇒[t]RM1) > 0 and Pmax(t =⇒[t]R

M2) > 0. Let η1,η2 ∈ Sched(t) and P(Bη1(t =⇒[t]RM1)) = Pmax(t =⇒[t]RM1)

and P(Bη2(t =⇒[t]RM2)) = Pmax(t =⇒[t]RM2). Now let us consider the computation

trees CT(t,η1) and CT(t,η2). Note that both computation trees have t as a root. Assume that

schedulers η1and η2schedule the same transitions up to a node with depth k. In addition, we

assume that all nodes with depth k for which η1and η2schedule differently are ordered by

<k. Suppose that nk1is the least node (by the ordering) with depth k for which η1(c) ?= η2(c)

3This probability mass function describes a discrete random variable X representing a node reached in one probabilistic transition

from s under the condition that the class [s]Ris left.

τ− → M. Since Pmax(n =⇒[n]R

τ− → s?implies nRs?.

15

Page 16

where nk1occurs in both CT(t,η1) and CT(t,η2) due to a node c in both computation trees

(this node c is a path c ∈ SPath(t,η1) ∩ SPath(t,η2) and hence a node in both computation

trees, because we have assumed that both schedulers η1and η2schedule in the same way for

all prefixes of the path c). Clearly, last(c) = nk1. Moreover, nk1is a nondeterministic node as

η1and η2cannot schedule a probabilistic node differently! Let us assume η1(c) = nk1

and η2(c) = nk1

τ− → p1

τ− → p2. From our assumption we have that p1,p2∈ [t]R. Therefore,

Pmax(p1=⇒[t]RM2) = Pmax(p2=⇒[t]RM2) = Pmax(t =⇒[t]RM2).

Thus, there is a scheduler η(1)

Pmax(p1=⇒[t]RM2). Now we have two schedulers: (η1−cτp1), which we denote by η(1)

andη(1)

2

inSched(p1). Fortheseschedulers, wefindPmax(p1=⇒[t]RM1) = P(Bη(1)

M1)) and Pmax(p1=⇒[t]RM2) = P(Bη(1)

nk1

scheduled by η2). Moreover, p1is “closer” to M1and M2than the root (t) of CT(t,η1) and

CT(t,η2) in the sense that all paths that start at t and reach M1or M2in CT(t,η1) or CT(t,η2)

respectively are finite. Note that the set of infinite paths that are part of the computation trees

have probability measure 0, and hence, we do not need to consider those. The procedure con-

tinues by comparing the schedulers η(1)

1

and η(1)

and η2. Remark that node p1cannot be processed further until all nondeterministic nodes with

depth k have been investigated for η1and η2.

If transitions scheduled by η(m)

1

, m ≥ 1 (which are all basically induced by the scheduler η1)

are always chosen over transitions by any other scheduler in consideration, we obtain that η1

induces the maximal probability Pmax(t =⇒[t]RM2) as well.

2

∈ Sched(p1) for which we have P(Bη(1)

2

(p1=⇒[t]RM2)) =

1,

1

(p1=⇒[t]R

2(c) =

2

(p1=⇒[t]RM2)) and η(1)

1(c) = η(1)

τ− → p1(note that we have preferred the transition scheduled by η1over the transition

2

in the same way we have done it with η1

?

Proof. (Theorem 3.8) Follows immediately from lemmata 3.9 and 3.11.

?

As a result of theorem 3.8, we find the following corollary, which states that branching bisimilarity is an

equivalence relation on the set of all graphs.

Corollary 3.13.

↔bis an equivalence relation on G.

?

3.3 Deciding branching bisimulation

In this section, we extend on the result that we obtained in the previous section. More concretely, we

show that the alternative definition of branching bisimulation in terms of maximal probabilities is at the

basis for deciding branching bisimulation. In line with the results obtained by Philippou et al. [18], we

show that it suffices to consider a finite subset of all possible schedulers (given a graph). Whereas in [18],

so-named determinate schedulers are introduced and used, we draw our attention to an even smaller class

of schedulers, viz. the class of simple schedulers (see also section 2.2). Remark that the computation

tree under a simple scheduler can always be represented by a fully-probabilistic graph, even when the

computation tree itself may be of infinite size. This fact can be used to show that deciding branching

bisimulation amounts to solving a linear optimisation problem.

We proceed as follows. First, the main theorem of this section is stated and proved, showing that among

the schedulers that induce maximal probabilities, there is always at least one simple scheduler.

Theorem 3.14. Let x be a graph. We denote the set of its nodes by S and Snil= S ∪ {nil}. Let R be a

branching bisimulation relation on Snil. Let s ∈ Snil, a ∈ Actτand M ∈ Snil/R. Then, there is a simple

scheduler σ?such that Pmax(s

a

=⇒[s]RM) = P(Bσ?(s

a

=⇒[s]RM)) when a ?= τ or M ?= [s]R.

16

Page 17

Proof. We show that from a given scheduler σ with Pmax(s

a scheduler σ?can be derived such that Pmax(s

paths that end in a node t (for some t) schedules the same transition. The case Pmax(s

trivial, so we assume that we have Pmax(s

Let R be a branching bisimulation relation on Snil. Assume that scheduler σ ∈ Sched(s) is such that

Pmax(s

betweenthe casewhenthasfinitely manyoccurrencesand thecase wherethasinfinitelymanyoccurrences

in the computation tree.

a

=⇒[s]RM) = P(Bσ(s

=⇒[s]RM)) and which for all

a

=⇒[s]RM)) > 0

a

=⇒[s]RM) = P(Bσ?(s

a

a

=⇒[s]RM) = 0 is

a

=⇒[s]RM) > 0.

a

=⇒[s]RM) = P(Bσ(s

a

=⇒[s]RM)). Assume that node t appears in CT(s,σ). We distinguish

1. Suppose t has finitely many occurrences in CT(s,σ): then there is an occurrence tc(i.e. t is due to

c) in CT(s,σ) such that the appearance of t in the subtree of CT(s,σ) with the root in c is only due

to c. Or in terms of B set, B(σ−c)(t

scheduler in Sched(t) induced by σ as described in section 2.2. Clearly, CT(t,(σ−c)) does not have

any occurrence of t except its root. Now, we can define a scheduler σ?∈ Sched(s) that schedules the

same transitions to all paths that end at t.

?

Note that c?may be t in which case c?◦c??= c??.

2. Suppose that t has infinitely many occurrences in CT(s,σ). If there is a subtree of CT(s,σ) with

root in some occurrence of t which does not contain any other occurrences of t, then we proceed in

the same way as in the previous case.

Now assume that there is no such a subtree of CT(s,σ). This means that for every occurrence of t in

CT(s,σ) there is a path in CT(s,σ) starting in this occurrence of t that passes infinitely many times

through t. Note that this does not mean that all paths starting in this occurrence of t have to pass

infinitely many times through t. On the contrary, according to proposition 3.6 there is an occurrence

tcin CT(s,σ), such that P(B(σ−c)(t

Now we focus out attention on the tree CT(t,(σ−c)). For short let us denote (σ−c) by η. Let us

enumerate all (different) occurrences tci, i > 0 in this tree where t is due to ciin CT(t,η). We define

a function t-depth which to every occurrence tciin CT(t,η) assigns the number of times cipasses

through t including the ending in t and excluding the starting in t (i.e. the root of CT(t,η)). W.L.G.

we can assume that CT(t,(η−ci)) = CT(t,(η−cj)) if t-depth(ti) = t-depth(tj). Thus we start with

the computation tree CT(t,η). Then

a

=⇒[t]RM) = B(σ−c)(t

a

=⇒[t]R−tM) where (σ−c) is the

σ?(c) =

σ(c)

(σ−c)(c??)

if t / ∈ nodes(c)

if c ≡ c?◦c??where first(c??) = t and t / ∈ nodes(rest(c??))

(12)

a

=⇒[t]R−tM)) > 0.

P(Bη(t

a

=⇒[t]RM))=

=

P(Bη(t

P(Bη(t

a

=⇒[t]R−tM)) + P(Bη(t

a

=⇒[t]R−tM)) + P(ci) · P(B(η−ci)(t

a

=⇒[t]R+tM))

a

=⇒[t]RM))

where (η − ci) is the scheduler in Sched(t) induced by η as described in section 2.2. Moreover,

P(B(η−ci)(t

M)). Recall that cidenotes the unique scheduled path from the root t to the occurrence tciwith

t−depth 1 in CT(t,η). (According to our assumption there is only one such an occurrence, otherwise

the second summand would be?

easily that P(B(σ−c)(t

We proceed by defining a scheduler σ?∈ Sched(t) (which can easily be extended to a scheduler

starting in s) that schedules the same transitions to all paths that end in t:

a

=⇒[t]RM)) = Pmax(t

a

=⇒[t]RM) = P(Bη(t

a

=⇒[t]RM)) = P(Bσ(s

a

=⇒[s]R

t-depth(ci)=1P(ci) · P(B(η−ci)(t

a

=⇒[t]RM)).) Let us denote

α = P(B(σ−c)(t

a

=⇒[t]R−tM)) and β = P(ci). Note that β ?= 1 since α > 0. Now we obtain

a

=⇒[t]RM)) =

α

1−β.

σ?(c◦t◦c?)

def

= (σ−c)(t◦c?),

where t / ∈ rest(c?)

(13)

Then

P(Bσ?(t

a

=⇒[t]RM))=

=

P(Bσ?(t

P(B(σ−c)(t

a

=⇒[t]R−tM)) + P(Bσ?(t

a

=⇒[t]R−tM) + P(ci) · P(Bσ?(t

a

=⇒[t]R+tM))

a

=⇒[t]RM)).

17

Page 18

from which P(Bσ?(t

P(Bσ?(t

a

=⇒[t]RM)) = α + β · P(Bσ?(t

=⇒[t]RM)) =

a

=⇒[t]RM)) and finally,

a

α

1 − β.

With this we have shown that Pmax(t

a

=⇒[t]RM) = P(Bσ?(t

a

=⇒[t]RM)).

?

Aswealreadymentioned, theaboveresultholdsthekeytothealgorithmofpolynomialtimecomplexity

for deciding branching bisimulation.

The algorithm for deciding branching bisimulation is similar to the algorithm for deciding weak bisim-

ulation described in [18]. Since the reader can find many details in that paper, we will not elaborate on

those details here.

The technique that is employed by the algorithm uses the well-known partitioning technique (which is

also used in algorithms for deciding other bisimulation relations, [12]). Starting from the trivial partition

{Snil}, a sequence of partitions of Snilis generated, each of them finer than any previous. The procedure is

repeated until a partition that corresponds to a branching equivalence is obtained. One partition is refined

by means of a splitter.

Definition 3.15. Let Π be a partition of Snil. The tuple (C,a,M), where C ∈ Π, a ∈ Actτand M ∈ Π,

is a splitter of Π if there are s,s?∈ C for which Pmax(s

M ?= C.

In other words, a splitter (C,a,M) is found if the partition does not correspond to a branching bisimula-

tion. Namely, the class C contains two nodes that do not have the same maximal probability to reach the

class M by executing a. Thus, C should be split into at least two classes in the next partition. The main

algorithm is given below. It calls several procedures that we explain afterwards.

a

=⇒CM) ?= Pmax(s?

a

=⇒CM) where a ?= τ or

Input:

Output:

Steps:

?N,P,s,Act,→,?,pr?

Snil/ ↔b

Π := {Snil};

(C,a,M) := FindSplit(Π);

while C ?= ∅ do

Π := Refine(Π,C,a,M)

(C,a,M) := FindSplit(Π)

od

return Π

The procedure FindSplit(Π) for partition Π finds and returns a splitter (C,a,M) if one exists and returns

(∅,ε,∅) otherwise.

Input:

Π

Output:

(C,a,M)

Steps:

for a ∈ Actτdo

for C ∈ Π do

for M ∈ Π do

for s,s?∈ C do

maxs:= FindMax(s,a,M), maxs? := FindMax(s?,a,M)

if maxs?= maxs? return (C,a,M)

od

od

od

od

return (∅,ε,∅)

The function FindMax(s,a,M) computes the maximal probability to reach M from s by executing a. To

this end to each node s a variable Xτ

bound by the following system of equations:

sis associated, and if a ?= τ a variable Xa

sas well. The variables are

18

Page 19

Xa

s=

?

1,

max{Xa

0

sπ

?t

π · Xa

t,s ∈ Sp,t ∈ [s]Π

s ∈ Sn,s ∈ M

s ∈ Sn,s

otherwise

t| s

τ− → t},

a

?−→ M,t ∈ [s]Π

Xτ

s=

1,

?

max{Xτ

0

s ∈ M

s ∈ Sp,s / ∈ M,t ∈ [s]Π

s ∈ Sn,s / ∈ M,t ∈ [s]Π

otherwise

sπ

?t

π · Xτ

t,

t| s

τ− → t},

As explained in [18] a solution of a system in such a form can be found by solving a linear optimisation

programming problem. Namely, for all equations in the form X = max{Xi| i ∈ I} a set of inequalities

X ≥ Xiis introduced and then the optimisation problem reduces to finding minimum of the function

?

s∈SXτ

involved.

s+ Xa

s. This problem can be solved in polynomial time in the number of variables that are

4 Colours and Blends

The focus of the previous section was on the understanding of the interplay between probabilities and

functionality, i.e. we looked at the notion of branching bisimulation from a rather quantitative point of view.

In the remaining sections of this paper, we investigate branching bisimulation from a different perspective,

more focused on the qualitative aspects of branching bisimulation.

We claimed, in our introduction (and repeated this in section 2, page 9), that one of the pleasing prop-

erties of branching bisimulation is that it preserves the potentials of a node, thereby preserving the non-

deterministic branching structure of the system. In the sections that will follow, we add weight to this

claim: we show how we can employ colours to code for the potentials and prove that the observation of the

colours of a node can be used to distinguish between inert transitions and non-inert transitions.

Before we commence, we provide the mathematical underpinnings and notations to facilitate mathe-

matical reasoning about colours. Let C be a sufficiently large, but finite set of unique colours. A raw blend

is a mix of colours in a particular ratio, i.e. a raw blend b is a bag of pairs (c,π) ∈ C × (0,1], with the

sanity-condition?

resented by bags rather than sets, since we want to consider blends in which the same quantity of a colour

appears more than once (e.g. for a colour c ∈ C, we want to allow the raw blend {(c,1

we use ordinary set notation for bags, as, from the context it is always clear whether we are dealing with

bags or sets.

The function probe:Br×C → [0,1], defined as bprobec =?

which holds iff bprobec > 0. Thus, for a raw blend b and a colour c, the predicate b?c is true iff the colour

c occurs with a positive weight in blend b.

In the remainder of this paper, we use a subset of raw blends, simply called blends. A raw blend is a

blend b iff for all colours c, b?c implies (c,bprobec) ∈ b. In other words, a colour occurs only once in a

blend. Alternatively, a blend can be seen as a partial function with domain C and co-domain (0,1], thus

representing a distribution of colours. Let B be the set of blends. We have B ⊂ Br. Raw blends can be

turned into blends using the operator ?:Br → B. For a raw blend b, the blend ?(b) is given by the set

?(b)= {(c,bprobec) | for all c satisfying b?c}

For reasons of convenience, we freely interpret a blend consisting of a single element as a colour (i.e.

we write b ∈ C iff |b| = 1), and a colour is interpreted as a blend (i.e. we think of the colour c as the blend

{(c,1)}).

(c,π)∈bπ = 1. The set of all raw blends is denoted Br, i.e. Bris a set of bags. In short,

raw blends are built from fractions of colours, that together add up to 1. Raw blends are necessarily rep-

2),(c,1

2)}). Note that

(c,π)∈bπ, yields the “weight” a colour c

has in the raw blend b. To test whether a colour actually occurs in a blend, we introduce the predicate b?c,

def

19

Page 20

5Concrete Coloured Traces

Information that can be obtained from any (reactive) system is trace information. By this, we mean a

sequence of actions that are observed during execution of the system.

Definition5.1. Aconcretetrace, startinginanodesofagraphxisafinitesequenceofactionsa1a2...an,

(ai∈ Actτ) for which there exists a finite path c, with first(c) = s and trace(c) = a1a2...an.

Note that both the probabilistic information and the non-deterministic branching structure are lost in such

traces. Hence, it may come as no surprise that an equivalence that is based on the comparison of the sets

of concrete traces of two systems is necessarily coarser (i.e. less discriminative) than strong bisimilarity.

We show that we can use colours and blends to recapture this information, and obtain a “decorated trace

equivalence” (in the sense of e.g. [2, 8, 15]) that coincides with strong bisimilarity. The colours can be used

to encode the potentials of the system in a node, while the blends can be used to encode the probabilistic

information. Graphs that are endowed with a colouring of their nodes are referred to as coloured graphs.

Definition 5.2.

assigning blends or colours to the nodes of x.

A coloured graph is a tuple ?x,γ?, where x is a graph and γ is a labelling function,

We next consider “decorated traces” of a coloured graph. We assume that we can observe the colours and

blends of the nodes (but not the probabilistic and the non-deterministic branching structure of the graph).

In other words, by executing the system, we can observe sequences of blends, colours and actions. We

refer to such decorated traces as concrete coloured traces.

Definition 5.3. Let ?x,γ? be a coloured graph. A concrete coloured trace, starting in a node s ∈ Snilis a

sequence of one of the following forms:

1. γ(s) is a concrete coloured trace for s = nil.

2. b?

0a1b1b?

with trace(c) = a1...am+1, first(c) = s = n0and for all 1 ≤ i ≤ m + 1, γ(pi) = biand

γ(ni−1) = b?

3. γ(s) u when s ∈ P, s ? n for some n ∈ N and u is a concrete coloured trace starting in n.

In our coloured graphs, we use colours as an indication for the potentials of a node. This suggests that we

should distinguish between informative colourings and non-informative colourings. We make the following

observations:

1...am+1bm+1whens ∈ N andthereisatleastonepathc ≡ n0a1p1...nmam+1pm+1

i−1.

1. in the non-probabilistic case colours suffice (see e.g. [3, 13]) to code for the potentials of a node.

2. for each node p ∈ P, the cumulative probability µ(p,M) can be seen as a function that assigns

values to each partition of the set of nodes. This roughly corresponds to the notion of a blend.

This leads us to consider a subset of coloured graphs in which non-deterministic nodes are labelled with

colours and probabilistic nodes are labelled with a blend that codes for the probability distributions over

successor nodes.

Definition 5.4. A properly coloured graph is a coloured graph ?x,γ? where γ satisfies:

1. all nodes n ∈ Nnilare labelled with a colour γ(n) ∈ C.

2. all nodes p ∈ P are labelled with the blend ?({(γ(n),pr(p,n)) | p ? n}).

We say that the colouring of a coloured graph is proper to indicate that we are in fact dealing with a prop-

erly coloured graph.

20

Page 21

The assumption that we can use colours to code for the potentials in a graph is not immediately vindi-

cated. For instance, assigning the same colour to nodes from which different actions are possible conflicts

with the idea that colours code for the potentials of a node. To rule out such situations, we distinguish

between colourings that respect our assumption and those that violate our assumption. Colourings that re-

spect our assumption are referred to as consistent. Formally, given a set of graphs, we say that the colouring

of their nodes is consistent iff non-deterministic nodes have the same colour and probabilistic nodes have

the same blend only if they have the same concrete coloured trace sets.

Example 5.5. The graph x = ?{n},{p},p,{a},n

have many consistent colourings. For instance, the colouring γ that assigns the colour blue to all nodes is

a− → p,p ? n,pr(p,n) = 1?, depicted in figure 4 can

1a

Figure 4: Graph x.

a11

b

Figure 5: Graph y.

consistent and proper. The colouring γ that assigns the colour blue to node n and the “blend” yellow to p

is consistent but not proper. Generalising, a coloured graph that is coloured using a trivial colouring, i.e.

a colouring that assigns different colours to each node, is consistently coloured (but almost never properly

coloured). The graph y = ?{n,m},{p,q},p,{a,b},{n

pr(q,m) = 1}?, depicted in figure 5 has a non-proper and non-consistent colouring γ, assigning blue to all

nodes. The same graph also admits a proper and consistent colouring γ. For instance, take γ such that it

assigns blue to nodes p and n, and yellow to nodes q and m.

a− → q,m

b− → p},{p ? n,q ? m},{pr(p,n) =

Definition 5.6.

consistent, proper colouring γ, ?x,γ? and ?y,γ? have the same concrete coloured traces, or, equivalently,

their root nodes have the same colour or blend.

Graphs x and y are concrete coloured trace equivalent, notation x ≡cc y if for some

Concrete coloured trace equivalence is an equivalence relation on graphs. In fact, we next establish a firm

relation between concrete coloured trace equivalence and strong bisimilarity. First, we show that concrete

coloured trace equivalence is at least as discriminating as strong bisimilarity, i.e. graphs that are strong

bisimilar are also concrete coloured trace equivalent.

Lemma 5.7. For all x and y, x ↔ y implies x ≡ccy.

Proof. Let x and y be graphs. We denote the union of their nodes by S, the union of their non-deterministic

nodes by N and the union of their probabilistic nodes by P. We denote the union of S and the special

termination node nil by Snil.

Assume x ↔ y. Let R be the largest strong bisimulation relation on Snilthat only relates probabilistic

nodes to probabilistic nodes and non-deterministic nodes to non-deterministic nodes; nil is related to itself.

Let Γ:Snil/R→ B be a total, injective mapping with the following two characteristics:

1. Γ(M) ∈ C when M ⊆ Nnil,

2. Γ(M) = ?({(Γ(M?),µ(M,M?)) | µ(M,M?) ?= 0}) when M ⊆ P.

This mapping is well-defined. Now, consider the coloured graphs x and y that are obtained by colouring all

nodes with the colour of their equivalence classes. Formally, we define the coloured graphs ?x,γ?,?y,γ?

where γ is defined as γ(s) = Γ([s]R). By definition of Γ, γ yields properly coloured graphs. By con-

struction, the root nodes of ?x,γ? and ?y,γ? have the same colour. Hence, it suffices to show that γ is a

consistent colouring. We distinguish two cases.

1. First, we show that non-deterministic nodes that have the same colour, also have the same sets of

concrete coloured traces. Let n0,n1 ∈ N be two arbitrary nodes with γ(n0) = γ(n1). Then, by

21

Page 22

definition of γ and injectivity of Γ, we know that n0Rn1. Let b?

coloured trace starting in n0. Since b?

a p0∈ P with γ(p0) = b1such that we have n0

n1

− → p1for some p1with p0Rp1. Thus, γ(p0) = γ(p1). Thus, b?

subtrace that starts in n1. Hence, it remains to show that when probabilistic nodes have the same

colour, they also have the same sets of concrete coloured traces.

0a1b1b?

1...ambmbe a concrete

0a1b1is a concrete coloured subtrace of t, we know there is

a1

− → p0. By strong bisimilarity, we then also have

0a1b1is also a concrete coloured

a1

2. Next, we show that two probabilistic nodes with the same blend (or colour) also have the same sets

of concrete coloured traces. Let p0,p1 ∈ P be arbitrary nodes with γ(p0) = γ(p1). Then, by

definition of γ and injectivity of Γ, we know that p0Rp1. Let b0b?

coloured trace starting in p0. Since b0?b?

there is a node n0with γ(n0) = b?

µ(p0,[n0]R) = µ(p1,[n0]R), and hence, b0b?

By case 1, we then also know that b0b?

the above arguments for m times, we find that also b0b?

trace starting in p0.

0a1b1b?

1...ambmbe a concrete

0(which follows from the definition of Γ), we know that

0, such that µ(p0,[n0]R) > 0. Since p0Rp1, it then follows that

0is also a concrete coloured subtrace that starts in p1.

0a1b1is a concrete coloured trace starting in p1. Repeating

0a1b1b?

1...ambmis a concrete coloured

Hence, we can conclude that the colouring γ is both proper and consistent. This means that we have

x ≡ccy.

Second, weshowthatstrongbisimilarityisatleastasdiscriminatingasconcretecolouredtraceequivalence,

i.e. graphs that are concrete coloured trace equivalent are also strong bisimilar.

?

Lemma 5.8. For all graphs x,y, x ≡ccy implies x ↔ y.

Proof. Let x and y be graphs. We denote the union of their nodes by S, the union of their non-deterministic

nodes by N and the union of their probabilistic nodes by P. We write Snilfor S ∪ {nil}.

Let γ be a consistent colouring of the nodes Snil, such that the graphs ?x,γ? and ?y,γ? are properly coloured

graphs. Assumethatγ(sx) = γ(sy). DefinetherelationRonSnilassRtiffγ(s) = γ(t)forall(s,t) ∈ N2

and all (s,t) ∈ P2. By definition, we have sxRsy. Thus, it suffices to show that R is a strong bisimulation

relation. We proceed by showing that R satisfies the two conditions of definition 2.2.

1. Let n0 ∈ N and n1 ∈ N such that n0Rn1. Assume that n0

that γ(n0) aγ(p0) is a concrete coloured trace starting in n0. Since γ is consistent, we know that

γ(n0) aγ(p0) is also a concrete coloured trace starting in n1. In turn, this means that there is a node

p1with γ(p1) = γ(p0), such that n1

nil

a− → p0for some p0. Then we know

a− → p1. Hence, by definition of R, also p0Rp1.

2. Let p0 ∈ Pxand p1 ∈ Pysuch that p0Rp1. Let M ∈ Snil/R. Suppose that M ⊆ P. Then,

immediately we obtain µ(p0,M) = µ(p1,M) = 0. Thus, without loss of generality, we assume that

M ⊆ Nnil. The properness of γ implies γ(M) ∈ C. Because p0Rp1, we also have γ(p0) = γ(p1),

from which we immediately find that γ(p0)probeγ(M) = γ(p1)probeγ(M). Since γ is proper,

we have γ(p0)probeγ(M) = ?({(γ(n),pr(p0,n)) | p ? n})probeγ(M) = µ(p0,M). Likewise,

we derive γ(p1)probeγ(M) = µ(p1,M). Thus, we have µ(p0,M) = µ(p1,M).

Hence, R is a strong bisimulation relation.

The following theorem, stating that strong bisimilarity and concrete coloured trace equivalence are equi-

discriminating, is an immediate consequence of lemmata 5.7 and 5.8. This means that strong bisimilarity

and concrete coloured trace equivalence both preserve potentials and probabilistic information. Moreover,

this also proves that colours and blends can be used to code for the potentials of a system, a result that we

can reuse in the setting with abstraction.

?

Theorem 5.9. For all graphs x and y, x ↔ y iff x ≡ccy.

?

22

Page 23

6Coloured Traces

In the previous section, we showed that colours and blends can fill in the missing information in concrete

traces, allowing us to define a trace-based equivalence that coincides with strong bisimulation. A natural

question is whether this feat can be repeated in a setting with abstraction. The results in this section answer

this question positively.

We start by making the following observation, which is crucial for our further reasonings: abstraction

obscures the strict separation between probabilistic nodes and non-deterministic nodes. This is because

unobservable events allow us to move between the two without notice.

Consider again the notion of coloured graphs of section 5 and the concrete coloured traces in such

graphs. To facilitate the comparison of potentials of non-deterministic nodes and probabilistic nodes, we

consider a variation on the concrete coloured traces of section 5, which we call pre-coloured traces.

Definition 6.1. Let ?x,γ? be a coloured graph. A concrete coloured trace, starting in a node s ∈ Pnilis

also a pre-coloured trace starting in s. If t is a concrete coloured trace starting in a node s ∈ N then γ(s) t

is also a pre-coloured trace starting in s.

Note that a pre-coloured trace starting in a non-deterministic node n always starts with two occurrences

of the colour (or blend) of node n. This puts us in the position to compare decorated traces starting in

probabilistic nodes with those starting in non-deterministic nodes.

Pre-colouredtracesstillcontainτ actions, whichareintendedtobeunobservable. Aswealreadyargued

in section 2.3, we cannot bluntly remove all τ actions from such pre-coloured traces without affecting the

potentials (and thereby the behaviours) of a system. Intuitively, the idea of using colours (or blends) for

coding for these potentials indicates that by removing only those τ actions in a pre-coloured trace that

are in between nodes with the same colour (or blend), we leave the potentials of the system unaffected.

Pre-coloured traces from which these inert τ actions have been removed are called coloured traces.

Definition 6.2. A coloured trace starting in a node s is a finite sequence b0b?

a subsequence b τ b4, that is obtained from a pre-coloured trace starting in node s in which all subsequences

of the form b (b τ b)+and (b τ b)+b have been replaced with b. When m = 0, we require that b0is the

colour of the node nil.

0a1...ambm, not ending with

Example 6.3.

indicated by distinct patterns). Note that, as we saw in section 2.3, these two graphs are in fact branching

bisimilar. The set of concrete coloured traces of the right graph is given by the following set:

Consider the two coloured graphs of figure 6 (the colours and blends in this graph are

C1= { ,

??b,

c

??

c,

??

a

??,

????b,

????

c,

????

a

??,

??

a

????b,

??

a

?? ??

,

????

a

????b,

?? ??

a

????

c

}

From this set, we obtain the set of pre-coloured traces:

C2= { ,

????b,

????

c,

?? ??

a

??,

????b,

?? ??

c,

?? ??

a

????b,

????

a

?? ??

c

}

Similarly, we can obtain (an infinite) set of pre-coloured traces for the left graph:

C3= { ,

????b,

????

c,

????b,

?? ??

c,

??

(

??

τ

(

??)i,

τ

??

(

(

??

τ

τ

??)j

??)j

??

a

a

??

(

(

??

τ

τ

??)k,

??)k

??

(

??

τ

| i > 0,j,k ≥ 0}

??)j

??

a

???? ??)k

??b,

????????????

c

Finally, we can derive the set of coloured traces from C2and C3. Since there are no sequences of the form

b (b τ b)+or (b τ b)+b in the set C2of pre-coloured traces, the set C2is also a set of coloured traces.

Replacing the sequences of the form (b τ b)+by b in the set of pre-coloured traces C3, we also obtain the

4Remark that the condition that a coloured trace does not end with the subsequence b τ b is required to ensure that the coloured

trace does not end with a potentially inert τ step. If the τ step is not inert, then there must also be an extension of the coloured trace

in which it appears.

23

Page 24

???

???

??

??

????

???

???

???

??? ??

??

??

??

????

??

?????

!"

τa

1

a

a

bc

τ

τ

2

3

1

5

1

3

2

5

2

5

m

1

1

2

cb

1

2

1

2

cb

1

2

Figure 6: Two coloured graphs.

set C2of coloured traces. Thus, the set of coloured traces that we can obtain from C3and C2is the same.

Note that the coloured traces

(

τ

this to

traces does not contribute to any of the coloured traces.

??????)i, for i > 0 are of the form b (b τ b)+. However, we cannot reduce

??, as this colour is different from the colour of node nil. Therefore, this entire subset of pre-coloured

As we observed earlier, the strict distinction between probabilistic nodes and non-deterministic nodes is

obscured. We suggested that this might happen when we can move silently from a non-deterministic node

to a probabilistic node. Now, recall the definition of a proper colouring of section 5. It requires that non-

deterministic nodes are coloured with colours and probabilistic nodes can be coloured with blends. Since

branching bisimulation allows us to move between both types of nodes without notice, we can no longer

assume that this strict colouring regime is sufficient for our purposes. This means that the definition of

a proper colouring, given in section 5 is too strict: the requirement that all non-deterministic nodes are

labelled with real colours must be weakened to cope with unobservable transitions.

Definition 6.4. A properly coloured graph is a coloured graph ?x,γ? where γ satisfies:

1. a node n ∈ Nnilis labelled with a blend γ(n) ?∈ C only if

(a) n

(b) for all a ∈ Act and p ∈ Pnil, n

2. all nodes p ∈ P are labelled with the blend ?({(c,pr(p,n) · (γ(n)probec)) |

p ? n and γ(n)?c}).

We say that the colouring of a coloured graph is proper to indicate that we are dealing with a properly

coloured graph. Next, we overload the notion of consistency as defined in section 5 as follows. For a set

of coloured graphs we say that the colouring that is used to colour the nodes of the graphs is consistent

whenever two nodes have the same colour (or blend) only when they have the same coloured trace sets.

τ− → p for some p.

a− → p implies a = τ and γ(n) = γ(p).

Definition 6.5. Graphs x and y are coloured trace equivalent, notation x ≡c y iff for some consistent,

proper colouring γ, ?x,γ? and ?y,γ? have the same coloured traces, or, equivalently, their root nodes have

the same blend or colour.

Example 6.6. The graphs in figure 6 are coloured trace equivalent. In example 6.3, we showed that the set

of coloured traces match. Moreover, it is easy to see that the graph is consistent and properly coloured if

we assume that

= ?({(

????,1

2),(

??,1

2)}) and

??,

??,

??∈ C.

24

Page 25

Coloured trace equivalence is an equivalence relation on graphs. The following theorem states that two

graphs are branching bisimilar if and only if they are coloured trace equivalent. First, we prove several

propositions and two lemmata that together form the basis for this theorem.

For the remainder of this section, we consider two arbitrary graphs x and y. We denote the union of

their nodes by S, the union of their non-deterministic nodes by N and the union of their probabilistic

nodes by P. We write Snilfor S ∪ {nil}. When we assume x ↔by, we take R to be the largest branching

bisimulation relation relating (the nodes of) x and y. Let Γ:Snil/R→ B be an injective, total function

satisfying:

1. Γ(M) ∈ C when for all classes M??= M, we have µM(M,M?) = 0.

2. Γ(M) = ?({(c,w · (Γ(M?)probec)) | for all M??= M with Γ(M?)?c and w = µM(M,M?) >

0}.

We refer to Γ as an equivalence class colour-coding for the branching bisimilar graphs x and y.

Proposition 6.7. The equivalence class colour-coding function Γ is well-defined.

Proof.

that its recursive definition has a unique solution. We make the following observations.

Showing that each equivalence class colour-coding function Γ is well-defined requires showing

1. There is at least one equivalence class to which we can assign a colour.5

2. Each equivalence class to which a blend is assigned depends on a finite number of other classes that

are either assigned blends or colours. Given that there are only finitely many classes, we can repre-

sent Γ by a dependency matrix. This matrix can be interpreted as a Markov Chain with absorbing

states [16] (which correspond to the colours that have been assigned). The absorption probabilities

for ending up in a particular absorbing state then corresponds to the weight a particular colour has in

the blend.

We formalise the above observations in some detail. Let N be an equivalence class for which we want to

assign a blend. We construct a Markov Chain MCNfor the equivalence class N as follows:

1. for each equivalence class M ∈ Snil/R, there is a state sMin the Markov Chain MCN. The state

sNis the initial state.

2. a state sMis an absorbing state when its corresponding class M satisfies µM(M,M?) = 0 for all

classes M??= M.

3. a state sMis an transient state when its corresponding class M satisfies µM(M,M?) > 0 for some

classes M??= M.

4. for each transient state sM, we have a transition sM→ sM? iff there is a probabilistic node p ∈ M,

such that p ? n for some node n ∈ M?. The probability assigned to this transition is µM(M,M?).

5. every state not reachable from sNis removed, together with their outgoing transitions.

Note that when sMis a transient state in MCN, then MCMis a “sub-chain” of MCN. Moreover, if in

MCN, the state sNis reachable from the state sMthen the Markov Chains MCNand MCMare the same.

The Markov Chain MCNis a finite Markov Chain with absorbing states. Finding the absorption probabil-

ities for such a Markov Chain boils down to solving a system of linear equations. As already observed,

these absorption probabilities are exactly the weights a colour has in a blend. In [16] it is shown that these

absorption probabilities can always be computed for finite Markov Chains. All absorption probabilities

together make up the blends.

?

5The reason for this is as follows: if there is no node n ∈ N, with n

bisimilar to nil. Yet, if there is an n ∈ N with n

(when [n]R?= [p]R) or µ[n]R([n]R,[p]R) = 1 (when [n]R= [p]R). In both cases, the equivalence class colour-coding will

assign a colour to the class [n]R.

a

− → for some a ∈ Act, then the entire graph is branching

− → p (for some p ∈ Pniland a ∈ Act), then we have µ[n]R([n]R,[p]R) = 0

a

25

Page 26

Define the colouring γ:Snil→ B as γ(s) = Γ(M) iff s ∈ M. Henceforth, we refer to this colouring γ as

an equivalence class colouring.

Proposition 6.8. The equivalence class colouring γ induces properly coloured graphs ?x,γ? and ?y,γ?.

Proof. To show that γ is a proper colouring, we proceed as follows.

1. Let n ∈ Nnilbe a non-deterministic node or nil, and suppose γ(n) ?∈ C.

(a) By definition of γ, we find that there is a class M??= [n]R, for which µ[n]R([n]R,M?) > 0.

Since n ∈ Nnil, this can only be the case when n

(b) Let a ∈ Actτ and p ∈ Pnil, and assume n

µ[n]R([n]R,M) > 0. By definition of γ and Γ, at least two such classes exist, hence also

µ[n]R([n]R,M) < 1. This means that there must be a node p?∈ [n]R, such that 1 >

pr(p?,M) > 0. Since R is a branching bisimulation relation, this means that n

be mimicked by p?with probability 1. But this is only possible by a scheduler σ that schedules

σ(p?) = ⊥, and when a = τ and p ∈ [n]R, i.e. γ(n) = γ(p).

2. Let p ∈ P, and let (c,π) ∈ γ(p) be a part of the blend or colour of p. This means that π > 0. By

definition of the equivalence class colour-coding Γ and the operator ?, we find the following relation

between c, π and p:

π =?

The formula µ[p]R([p]R,M) represents the maximal probability of reaching M via a node in [p]R

using silent transitions only (see also section 3, where we established a correspondence between

the normalised cumulative probability and the schedulers inducing maximal probabilities). It can be

defined in terms of the maximal probability of reaching M from all nodes n that can be reached

from p in one step. We use this observation to rewrite equation (14) to:

?

τ− → p for some p ∈ [n]R.

a− → p. Let M ?= [n]Rbe a class for which

a− → p can

M?=[p]Rµ[p]R([p]R,M) · (Γ(M)probec)

(14)

π

=

M?=[p]R

?(?

p?n,[n]R=[p]Rpr(p,n) · µ[n]R([n]R,M) · (Γ(M)probec))

+ (?

Reordering the quantification over M, and substituting M for [n]Rin the second part of equa-

tion (15), we find the following equivalence:

?

p?n,[n]R=Mpr(p,n) · (Γ(M)probec))?

(15)

π

=

+

p?n,[n]R=[p]Rpr(p,n) ·?

M?=[n]Rµ[n]R([n]R,M) · (Γ(M)probec)

?

p?n,[n]?=[p]Rpr(p,n) · (Γ([n]R)probec)

From this, we immediately find that

π =?

Simplifying equation (16) we find π =?

p?n,[n]R=[p]Rpr(p,n) · (γ(n)probec) +?

p?n,[n]R?=[p]Rpr(p,n) · (γ(n)probec)

(16)

p?npr(p,n) · (γ(n)probec), which means that the prob-

abilistic node p is coloured properly.

?

Proposition 6.9. All equivalence class colourings γ for branching bisimilar graphs x and y are consistent.

Proof. We must show that for all nodes s,t ∈ Snilsatisfying γ(s) = γ(t) the sets of coloured traces of s

and t are the same.

So, let s,t ∈ Snilbe arbitrary nodes. Suppose γ(s) = γ(t). Let η ≡ b0b?

starting in s. This coloured trace must come from a pre-coloured trace

0a1...ambmbe a coloured trace

ζ ≡ b0(b0τ b0)k0(b?

0τ b?

0)l0b?

0a1...ambm(bmτ bm)lm

26

Page 27

At this point, we must distinguish between the case when s ∈ P and s ?∈ P. We only investigate the

former. The latter case can be treated similarly. Assume s ∈ P. Then the pre-coloured trace ζ must come

from a path c for which we have

c sat s0=⇒[s0]Rs?

for some si,s?

γ(s) = γ(t), we find that sRt. By repeatedly applying the definition of branching bisimulation, we also

find that there must be a path c?for which we have either:

0? s??

0=⇒[s??

0]s???

0

a1

− → s1...s???

i) = biand γ(s??

m−1

am

− − → sm

i,s??

iand s???

isatisfying γ(si) = γ(s?

i) = γ(s???

i) = b?

ifor all i ≤ m. Since

c?sat t0=⇒[t0]Rt?

or (only when b0= b?

0? t??

0=⇒[t??

0]t???

0

a1

− → t1...t???

m−1

am

− − → tm

0):

c?sat t0=⇒[t0]Rt???

for some ti,t?

trace starting in t, and thus η is a coloured trace starting in t.

0

a1

− → t1=⇒[t1]Rt?

i satisfying siRtiRt?

1? t??

1...t???

m−1

am

− − → tm

i. This means that also ζ is a pre-coloured

i,t??

iand t???

iand s?

iRt?

iRt???

?

Lemma 6.10. For all graphs x and y, x ↔by implies x ≡cy.

Proof. Let x and y be graphs. Assume that x ↔by. Then, using the propositions 6.7, 6.8 and 6.9, we find

that there is a proper and consistent colouring γ for the graphs x and y, such that ?x,γ? and ?y,γ? have the

same sets of coloured traces.

?

Let x be a graph. Let M ⊆ M?⊆ Snilbe two non-empty sets of nodes. We define the distance func-

tion |s|M?

transitions) that is required to reach a node in M via nodes in M?, starting in s ∈ M?.

Note that we take as a convention that when there is no s ? n (and, analogously, when there is no s

then min

M, which yields the minimal number of steps (probabilistic transitions and non-deterministic τ-

|s|M?

M=

0

∞

1 + min

1 + min

s

if s ∈ M

if s ?∈ M?

if s ∈ (P ∩ M?) \ M

if s ∈ (N ∩ M?) \ M

s?n|n|M?

|p|M?

M

τ− →p

M

(17)

τ− → p),

s?n|n|M?

Myields ∞.

Lemma 6.11. For all graphs x and y, x ≡cy implies x ↔by.

Proof. Assume x ≡cy. Then, there is a consistent and proper colouring γ of x and y, such that x and y

have the same set of coloured traces. We show that there is a branching bisimulation relation R, such that

sxRsy. Define R as sRt iff γ(s) = γ(t). By definition of coloured trace equivalence, we have sxRsy. It

thus suffices to show that R satisfies the requirements for a branching bisimulation relation.

1. Let s ∈ N be a non-deterministic node and assume that for some t ∈ Snil, we have sRt. Suppose

s

a− → s?for some a and s?. We distinguish two cases.

(a) Suppose a = τ and s?∈ [s]R. It suffices to show that there is a scheduler σ, such that

P(Bσ(t =⇒[t]R[t]R)) = 1. This is readily achieved by the scheduler σ(t) = ⊥.

(b) Suppose a ?= τ or s??∈ [s]R. It suffices to show that there is a scheduler σ, such that

P(Bσ(t

a− →[t]R[s?]R)) = 1.

27

Page 28

Let M be the set of nodes {t??| t??Rt,t??

σ ∈ Sched(t) be a scheduler that satisfies the following conditions.

Since γ is proper, we find that γ(s) ∈ C (this follows immediately from a ?= τ or s??∈ [s]R).

Usingtheconsistencyofγ, weknowthatthereisapre-colouredtraceγ(s)(γ(s)τ γ(s))kaγ(s?)

starting in t, and, hence, there is a path from t through [t]Rto a node in M. Therefore,

P(Bσ(t

this follows immediately from the fact that all probabilistic nodes p in paths c ∈ Bσ(t

[s?]R) are coloured with γ(s) ∈ C. The properness of γ ensures that we then stay in the class

[s]R(= [t]R) with probability 1 before executing a and entering a class with colour γ(s?).

2. Let s ∈ P be a probabilistic node and assume that for some t ∈ Snil, we have sRt. We distinguish

three cases.

a− → t?,t?Rs?}. Obviously, M ⊆ [s]R. Next, let

σ(c) =

last(c)

last(c)

a− → t?

τ− → u

if γ(last(c)) = γ(t) and γ(t?) = γ(s?)

if γ(last(c)) = γ(t) and γ(u) = γ(t) and

for all t??with last(c)

|t??|[s]R

if γ(last(c)) ?= γ(t)

τ− → t??we require

Mand γ(t??) ?= γ(s?)

M

≥ |u|[s]R

⊥

a

=⇒[t]R[s?])) > 0. It suffices to prove that also P(Bσ(t

a

=⇒[t]R[s?]R)) = 1. But

a

=⇒[t]R

(a) Suppose µ[s]R(s,M) = 0 for all M ?= [s]R. It suffices to show that there is a scheduler

σ ∈ Sched(t), such that P(Bσ(t =⇒[t]RM)) = µ[s]R(s,M) for all M ?= [s]R. This is

readily achieved by the scheduler σ(t) = ⊥.

(b) Suppose there is a unique class M ?= [s]R, such that µ[s]R(s,M) = 1. Since γ is a proper

colouring, this implies that γ(s) = γ(s?) for all nodes s?∈ M. But this cannot be since this

implies that s?Rs, which contradicts M ?= [s]R. Hence, there cannot be a class M ?= [s]R, for

which µ[s]R(s,M) = 1.

(c) Suppose there is a class M ?= [s]R, such that 1 > µ[s]R(s,M) > 0. Let N ⊆ [t]Rbe

the set of nodes for which there is a probabilistic transition leaving class [t]Rin one step, i.e.

N = {t??| t??? t?,t??Rt,t??∈ [t]R}. Let |u|[s]R

u to a node in N. Let σ ∈ Sched(t) be a scheduler that satisfies the following conditions:

Note that there is in general not a single scheduler that is determined by the above conditions,

as there may at some point be more than one node that has a minimal distance from a node in

N. However each σ is such that µ[s]R(s,M) = P(Bσ(t =⇒[t]RM)) for all M ?= [s]R, due

to the properness of the colouring γ.

N

again denote the minimal distance from node

σ(c) =

last(c)

τ− → u

if γ(last(c)) = γ(t) and γ(u) = γ(t) and

for all t??such that last(c)

if γ(last(c)) ?= γ(t)

τ− → t??we require |t??|[t]R

N

≥ |u|[t]R

N

⊥

Hence, relation R satisfies the condition for branching bisimulation, and we have sxRsy. Thus, we find

x ↔by.

Theorem 6.12. For all x and y, x ↔by iff x ≡cy.

Proof. Follows immediately from lemma 6.10 and lemma 6.11.

?

?

7Related Work

The literature reports on two approaches for modelling reactive probabilistic system. The first approach

is the model of probabilistic (simple) automata (often called the non-alternating model), which was intro-

duced in [20, 19]. The second approach, based on the Concurrent Markov Chains of [22], is that of the

28

Page 29

alternating model, which was introduced in [14] by Hansson. The theory outlined in this paper is based on

this latter model.

One might argue that the differences between both models are fairly insignificant, and, up to a certain

point, this is true: as shown in [7], the two models do not differ up to strong bisimulation. However, when

we consider equivalence relations that are sensitive to internal activities, this picture suddenly changes. For

instance, in [7], Segala and Bandini show that weak bisimilarity for the alternating model (defined in [18])

and weak bisimilarity for the non-alternating model (as defined in [19, 20]) are incomparable. We briefly

review the relevant literature and place our contribution and motivation in perspective.

Alternating model vs. non-alternating model.

the notion of branching bisimulation in the non-alternating setting, as defined by Segala and Lynch [19, 20]

we find that their notion is more restrictive. This is illustrated by the following example. Consider the two

graphs of figure 3 (see section 2.3, page 10). In contrast with our notion of branching bisimulation, we find

that these two graphs are not related by branching bisimulation in the non-alternating model. The reason is

obvious: k appears as a node in the “non-alternating” counterpart of the left graph and it cannot be related

to any node in the “non-alternating” counterpart of the right graph. A variation of branching bisimulation

called delay branching bisimulation, which is defined by Stoelinga [21], exhibits the same phenomenon.

In this paper, we showed that our definition of branching bisimilarity satisfies the properties originally

attributed to it (by following the approach as laid out by Van Glabbeek and Weijland [13] in the non-

probabilistic case, see section 5 and section 6). We therefore believe that the definition of branching

bisimulation in the non-alternating setting may be incomplete and further research is required to solve this

issue.

Notethattheso-namedcombinedversionofbranchingbisimulationin[20]relatesprocessesthatarenot

related by our branching bisimulation (but still not the ones from figure 3). This means that our branching

bisimulation and the combined version of branching bisimulation are incomparable. Further investigations

along the lines of [7] are needed to fully explore all differences. This, however, is beyond the scope of this

paper.

Comparing our notion of branching bisimulation with

Branching bisimilarity vs. weak bisimilarity.

ity with weak bisimilarity as defined by Philippou et al. [18], we find that branching bisimilarity is strictly

finer (although there is a big overlap in systems that are both, and for some classes of systems such as

fully probabilistic systems [5], it is known that both equivalences coincide). This is due to the fact that

branching bisimilarity preserves the (non-deterministic) branching structure of a system, whereas weak

bisimilarity does not, which is also the case in the non-probabilistic setting. Note that in [11] a logic in the

PCTL∗style is defined and the soundness and completeness properties of the equivalence relation induced

by the logic are proven with respect to the weak bisimulation of [18]. Having in mind the results in the

non-probabilistic setting saying that CTL∗without the next operator corresponds to branching bisimulation

(e.g. [10]), the result in [11] may suggest that weak and branching bisimulation for the alternating model

(for systems with both probabilistic and non-deterministic behaviour) do coincide. However, this is not

the case: the soundness and completeness results in [11] are due to the “non-standard” semantics given

to the PCTL∗-like operators. Namely, the path formulas are not interpreted on paths but on behaviours -

informally, a behaviour is the observable part of one path. Clearly, with this interpretation the logic cannot

make it possible to see the change of the potentials, which is the essential point that distinguishes weak and

branching bisimulation.

Below, we give two examples to illustrate the differences between weak and branching bisimulation.

The first example shows that two non-probabilistic systems, encoded as graphs are weak bisimilar but not

branching bisimilar (other examples of this can also be found in Van Glabbeek and Weijland [13]).

When we compare our definition of branching bisimilar-

Example 7.1. Consider the two graphs of Fig. 7. These graphs encode two non-probabilistic systems (i.e.

only the trivial probability 1 appears). Using the definition of weak bisimulation [18], one can easily check

that both graphs are weak bisimilar. However, the graphs are not branching bisimilar, or, equivalently, there

is no proper consistent colouring of the two graphs such that both have the same set of coloured traces.

29

Page 30

1

aa

11

1

a

1

cτ

1

b

cτ

1

b

b

Figure 7: Two weak bisimilar graphs (representing non-probabilistic systems) that are not branching bisim-

ilar.

The next example shows that weak bisimilarity and branching bisimilarity do not only differ for non-

probabilistic systems, but that they also differ for real probabilistic systems.

Example 7.2. Consider the two graphs of Fig. 8. Using the definition of weak bisimulation [18], it easily

follows that the two graphs are weak bisimilar. Based on our definition of branching bisimulation, we find

that the two graphs are not equivalent. This is because in the right graph, after executing an action a, there

is still a possibility of executing action c, unlike in the right branch of the left graph.

b

1

1

b

1

b

τ

c

1

3

2

3

τ

1

b

1

b

τ

c

1

3

2

3

τ

aa

1

1

a

Figure 8: Two weak bisimilar graphs (representing probabilistic systems) that are not branching bisimilar.

Decidability.

bisimulation has been conducted. To this date, no algorithm for deciding branching bisimilarity (in the

non-alternating model) has been defined, whereas our notion can be decided in polynomial time (see sec-

tion 3.3). Deciding weak bisimilarity in the alternating setting can be achieved in polynomial time [18],

whereas the best known algorithm for deciding weak bisimilarity in the non-alternating model as defined

in [20] is exponential [9]. Only a finer variant of weak bisimulation (for the non-alternating model), called

weak delay bisimulation [21, 6] is decidable in polynomial time [6].

Finally, we find that no extensive study on the decidability and complexity of branching

30

Page 31

8Summary

We defined the notion of branching bisimulation for the strictly alternating model of Hansson [14] for

probabilistic systems. We showed that it preserves the non-deterministic branching structure of a system

by defining an alternative equivalence, called coloured trace equivalence, that clearly satisfies this prop-

erty, and we subsequently showed that the two equivalences coincide. Furthermore, we showed that the

branching bisimulation conditions can be rephrased to conditions that use schedulers which induce maxi-

mal probabilities.

The alternative characterisations (in terms of colours and in terms of maximal probabilities) each have

their own merits. Coloured trace equivalence is easily understood without knowledge of probability theory,

schedulers, etcetera. Itmoreoverclearlyillustratesthefundamentalpropertyofbranchingbisimulation: the

preservation of potentials and computations (see section 6). The result that indicates that it suffices to use

schedulers which induce maximal probabilities is at the basis for the decision procedure with polynomial

time complexity that we give in section 3.3.

We find that the two alternative characterisations add to the understanding of branching bisimulation in

the alternating model, and to the correctness of the definition. Moreover, we find that it also can be used to

validate the existing notion of branching bisimulation in another setting, i.e. the non-alternating model. A

brief comparison of both notions already indicates that there are fundamental differences between the two

(see section 7). These differences provide compelling evidence that the notion of branching bisimulation

in the non-alternating model may not live up to its name in its current phrasing: we find that processes that

are intuitively branching bisimilar (and can be proven to be branching bisimilar in our setting) cannot be

related in the non-alternating setting. However, more research (possibly along the lines of [7]) is required

to compare the two notions in more detail. This is beyond the scope of this paper.

We pose two open problems. The first open problem is whether coloured trace equivalence gives rise to

a different type of algorithm for deciding branching bisimilarity than the ones that are based on schedulers.

The second open problem is to find an answer to whether the branching bisimulation relation of [20] admits

a characterisation in terms of an equivalence based on colours. Apart from these problems, we are in the

process of giving a complete and sound axiomatisation of branching bisimulation for the basic operators.

References

[1] S. Andova, J.C.M. Baeten, Abstraction in probabilistic process algebra, Proc. Tools and Algorithms

for the Construction and Analysis of Systems, 7th International Conference, TACAS 2001. T. Mar-

garia, Wang Yi, eds., LNCS 2031 Springer Verlag, pp. 204-219, 2001.

[2] J.C.M. Baeten, J.A. Bergstra, J.W. Klop (1987b), Ready-trace semantics for concrete process alge-

bra with the priority operator, The Computer Journal 30(6), pp.498–506, 1987.

[3] J.C.M. Baeten, W.P. Weijland, Process algebra, Cambridge University Press, 1990.

[4] C. Baier, On algorithmic verification methods for probabilistic systems, Habilitation thesis, Univer-

sity of Mannheim, 1998.

[5] C. Baier, H. Hermanns, Weak bisimulation for fully probabilistic processes, Proc. CAV’97, O. Grum-

berg, ed., LNCS 1254, pp. 119–130, 1997.

[6] C. Baier, M. Stoelinga, Norm function for probabilistic bisimulations with delays, Proc. FOS-

SACS’00, J. Tiuryn, ed., LNCS 1784, pp. 1–16, Berlin, Germany, 2000.

[7] E. Bandini, R. Segala, Axiomatizations for probabilistic bisimulation, Proc. ICALP’01, F. Orejas,

P.G. Spirakis, J. van Leeuwen, eds., Crete, Greece, LNCS 2076, Springer Verlag, pp. 370–381, 2001.

[8] B. Bloom, S. Istrail, A.R. Meyer 1995, Bisimulation can’t be traced, JACM 42(1), pp.232–268,

1995.

31

Page 32

[9] S. Cattani, R. Segala, Decision algorithms for probabilistic bisimulation, Proc. CONCUR’02, L.

Brim, P. Janar, M. Kat´ ınsk´ y, A. Kuera, eds., Brno, Czech Republic, LNCS 2421, Springer Verlag,

pp. 371–385, 2002.

[10] R. De Nicola, F. Vaandrager, Three logics for branching bisimulation, Journal of ACM Vol 42, No

2, pp. 458–487, 1995.

[11] J. Desharnais, V. Gupta, R. Jagadeesan, P. Panangaden, Weak bisimulation is sound and complete for

PCTL∗, Proc. CONCUR’02, L. Brim, P. Janar, M. Kat´ ınsk´ y, A. Kuera, eds., Brno, Czech Republic,

LNCS 2421, Springer Verlag, pp. 355–370, 2002.

[12] J.F.Groote, F.Vaandrager, AnEfficientAlgorithmforBranchingBisimulationandStutteringEquiva-

lence, Proceedings 17th ICALP, Warwick, M.S. Paterson, ed., LNCS 443, Springer-Verlag, pp.626–

638, 1990.

[13] R.J. van Glabbeek, W.P. Weijland, Branching time and abstraction in bisimulation semantics, Jour-

nal of ACM, Vol. 43(3), pp.555–600, 1996.

[14] H. Hansson, Time and probability in formal design of distributed systems, Ph.D. thesis, DoCS 91/27,

University of Uppsala, 1991.

[15] C.A.R. Hoare, Communicating sequential processes, Prentice Hall, London, 1985.

[16] V.G. Kulkarni, Modeling and analysis of stochastic systems, Chapman & Hall, 1996.

[17] K.G. Larsen, A. Skou, Bisimulation through probabilistic testing, Information and Computation, 94,

pp.1–28, 1991.

[18] A. Philippou, I. Lee, O. Sokolsky, Weak bisimulation for probabilistic systems, Proc. CONCUR’00,

C. Palamidessi, ed., LNCS 1877, Springer Verlag, University Park, PA, USA, pp. 334–349, 2000.

[19] R. Segala, Modeling and verification of randomized distributed real-time systems, Ph.D. thesis, Mas-

sachusetts Institute of Technology, 1995.

[20] R.Segala, N.A.Lynch, Probabilisticsimulationsforprobabilisticprocesses, NordicJournalofCom-

puting, 2(2):250–273,1995.

[21] M. Stoelinga, Alea jacta est: Verification of probabilistic , real-time and parametric systems, Ph.D.

thesis, Katholieke Universiteit Nijmegen, The Netherlands, 2002.

[22] M.Y. Vardi, Automatic verification of probabilistic concurrent finite state programs, Proc. of 26th

Symp. on Foundations of Com. Sc., IEEE Comp. Soc. Press, pp. 327–338, 1985.

32