Page 1

BRANCHING BISIMULATION FOR PROBABILISTIC

SYSTEMS: CHARACTERISTICS AND DECIDABILITY

Suzana Andova?and Tim A.C. Willemse†‡

?Department of Computer Science, Twente University

P.O. Box 217, 7500 AE Enschede, The Netherlands

†Nijmegen Institute for Computing and Information Sciences (NIII), Radboud University Nijmegen

P.O. Box 9010, 6500 GL Nijmegen, The Netherlands

‡Department of Mathematics and Computer Science, Eindhoven University of Technology

P.O. Box 513, 5600 MB Eindhoven, The Netherlands

suzana@cs.utwente.nl, timw@cs.ru.nl

Abstract

We address the concept of abstraction in the setting of probabilistic reactive systems, and study its for-

mal underpinnings for the strictly alternating model of Hansson. In particular, we define the notion of

branching bisimilarity and study its properties by studying two other equivalence relations, viz. coloured

trace equivalence and branching bisimilarity using maximal probabilities. We show that both alternatives

coincide with branching bisimilarity. The alternative characterisations have their own merits and focus

on different aspects of branching bisimilarity. Coloured trace equivalence can be understood without

knowledge of probability theory and is independent of the notion of a scheduler. Branching bisimilarity,

rephrased in terms of maximal probabilities gives rise to an algorithm of polynomial complexity for de-

ciding the equivalence. Together they give a better understanding of branching bisimilarity. Furthermore,

we show that the notions of branching bisimilarity in the alternating model of Hansson and in the non-

alternating model of Segala differ: branching bisimilarity in the latter setting turns out to discriminate

between systems that are intuitively branching bisimilar.

1Introduction

One of the hallmarks of process theory is the notion of abstraction. Abstractions allow one to reason about

systems in which details, unimportant to the purposes at hand, have been hidden. It is an invaluable tool

when dealing with complex systems. Research in process theory has made great strides in coping with

abstraction in areas that focus on functional behaviours of systems. However, when it comes to theories

focusing on functional behaviours and extra-functional behaviours such as probabilistic behaviour, we

suddenly find that many issues are still unresolved.

This paper addresses abstraction in the setting of systems that have both non-deterministic and proba-

bilistic traits, hereafter referred to as probabilistic systems. The model that we use throughout this paper

to describe such systems is that of graphs that adhere to the strictly alternating regime as studied by Hans-

son [14], rather than the non-alternating model [19, 20] as proposed by Segala et al. In particular, we study

the notion of branching bisimilarity for this model. The need for this particular equivalence relation is

already convincingly argued by e.g. Van Glabbeek and Weijland in [13], and by Groote and Vaandrager

in [12]. Recall that branching bisimilarity for probabilistic systems has been defined earlier for the non-

alternating model by Segala and Lynch [20] and a variation on that notion was defined by Stoelinga [21].

However, we stress that the differences in the alternating model and the non-alternating model lead to in-

compatibilities of the notions of branching bisimilarity in both settings. In fact, these differences are a key

motivation for our investigation: while our notion of branching bisimulation satisfies the properties com-

monly attributed to it, the existing notions turn out to be too strict in their current phrasing (as we explain

in detail in section 7, page 7), and discriminate between systems that are intuitively branching bisimilar.

1

Page 2

Van Glabbeek and Weijland [13] showed that a key property of branching bisimilarity is that it pre-

serves the branching structure of processes, i.e. it preserves computations together with the potentials in

all intermediate states of a system that are passed through, even when unobservable events are involved.

Roughly speaking, the potentials are the options the system has to branch and behave. This property sets

branching bisimilarity apart from weak bisimilarity, which does not have this property. They illustrated this

property by defining two new equivalences, called concrete coloured trace equivalence (in a setting without

abstraction) and coloured trace equivalence (in a setting with abstraction), which both use colours to code

for the potentials. Subsequently, they showed that strong bisimilarity and concrete coloured trace equiva-

lence coincide, proving that colours can indeed be used to code for the potentials of a system. Next, they

showed that also branching bisimilarity and coloured trace equivalence coincide, and both are strictly finer

than weak bisimilarity. This proved that branching bisimilarity indeed preserves the branching structure of

the system.

Although our setting is considerably more complex than the non-probabilistic setting, the key concept

of preservation of potentials should still hold. We show that this is indeed the case by defining probabilistic

counterparts of concrete coloured trace equivalence and coloured trace equivalence, and show that these

coincide with strong bisimilarity and branching bisimilarity, respectively. A major advantage of (concrete)

coloured trace equivalence is that it can be understood without knowledge of probability theory and without

appealing to schedulers.

Another property of branching bisimilarity (one that is due to the alternating model, and which can also

be found for weak bisimilarity [18]), is the preservation of maximal probabilities. We show that branching

bisimilarity can be rephrased in terms of such maximal probabilities, thus yielding another alternative

definition of branching bisimulation. Apart from the more appetising phrasing that this yields, this result is

alsoatthebasisofthecomplexityresultsfordecidingbranchingbisimilarity. Wealsoprovidethealgorithm

for deciding branching bisimilarity.

Both alternative phrasings of branching bisimulation have their own merits and focus on orthogonal

aspects. We emphasise that together, these are instrumental in understanding branching bisimulation and

its properties for probabilistic systems.

This paper is outlined as follows. In section 2, we introduce the semantic model we use in the remain-

der of this paper, together with the notions of strong bisimulation and branching bisimulation. In section 3,

we prove that branching bisimulation can be rephrased in terms of maximal probabilities, and we discuss

the decidability of branching bisimulation in detail. Section 4 formalises the notions of colours and blends.

Then, in sections 5 and 6 we define concrete coloured trace equivalence and coloured trace equivalence

and we show that these two equivalence relations coincide with strong bisimilarity and branching bisimi-

larity, respectively. In section 7 we give an overview of related work, which in turn provides the motivation

for conducting this research in the first place. Section 8 summarises the results of this paper and addresses

issues for further research.

Acknowledgements. Thanks are due to Jos Baeten, Christel Baier, Holger Hermanns, Joost-Pieter Katoen,

Ana Sokolova and Frits Vaandrager for fruitful discussions and useful comments on the topics addressed

in this paper.

2 Semantic Model

We use graphs1to model probabilistic systems. The graphs we consider follow the strictly alternating

regime of Hansson [14]. They can be used to describe systems that have both non-deterministic and prob-

abilistic characteristics.

Graphs consist of two types of nodes: probabilistic nodes and non-deterministic nodes. These nodes are

connected by two types of directed edges, called probabilistic transitions and non-deterministic transitions.

The latter are labelled with actions from a set of action labels, representing atomic activities of a system

or with the unobservable event, which is denoted τ and which is not part of the set of action labels of any

graph. A graph, not containing τ-transitions is referred to as a concrete graph. The probabilistic transitions

1The model we use is also known as Labelled Concurrent Markov Chains. We use the term graph to stay in line with [13]

2

Page 3

model the probabilistic behaviour of a system. We assume the existence of a special node nil, which is not

part of the set of nodes of any graph. This node is used as a terminal node for all graphs.

Definition 2.1. A graph is a 7-tuple ?N,P,s,Act,→,?,pr?, where

• N is a non-empty finite set of non-deterministic nodes. We write Nnilfor the set N ∪ {nil}.

• P is a non-empty finite set of probabilistic nodes. We write Pnilfor the set P ∪ {nil}.

• s ∈ P is the initial node, also called root.

• Act is a finite set of action labels. We abbreviate the set Act ∪ {τ} with Actτ.

• →⊆ N × Actτ× Pnilis the non-deterministic transition relation. We require that for all n ∈ N,

there is at least one (n,a,p) ∈→ for some a ∈ Actτand p ∈ Pnil.

• ?⊆ P × N is a probabilistic transition relation.

• pr: ?→ (0,1] is a total function for which?

We write n

is denoted G. In the remainder of this paper, x,y,... range over G. We write Nx,Px,sx, etc. for the

components of the graph x, and use Sxto denote the union Px∪ Nx. We write Snil,xfor the set Sx∪ {nil}.

When x is the only graph under consideration, or when no confusion can arise, we drop the subscripts

altogether.

As a derived notion, we introduce the cumulative probability µ:Snil× 2Snil→ [0,1], which yields the

total probability of reaching a set of nodes via probabilistic transitions: µ(p,M)

p ∈ P and 0 otherwise.

n∈Npr(p,n) = 1 for all p ∈ P.

a− → p rather than (n,a,p) ∈→ and p ? n rather than (p,n) ∈?. The set of all graphs

def

=?

n∈M∩Npr(p,n) if

There are several variations on the graph model that we use throughout this paper. In [18], a more lib-

eral version is considered, in which the alternation between probabilistic transitions and non-deterministic

transitions is not as strict as in our model: in between two probabilistic transitions, one or more non-

deterministic transitions may be specified. Other variations allow for non-deterministic nodes as starting

nodes. From a theoretical point of view, these variations do not add to the expressive power of the model,

and the theory outlined in this paper easily transfers to those models.

2.1Strong Bisimulation

Equivalence relations can be seen as a characterisation of the discriminating power of specific observers.

Strong bisimilarity [17] is known to capture the capabilities of one of the most powerful observers that still

has some appealing properties. It compares the stepwise behaviour of nodes in graphs and relates nodes

when this behaviour is found to be identical.

Definition 2.2. Let x and y be graphs, let N = Nx∪ Nyand let P = Px∪ Py. A relation R ⊆ N2

is a strong bisimulation relation when for all nodes s and t for which sRt holds, we have

1. if s ∈ N and t ∈ N and s

2. if s ∈ P and t ∈ P then µ(s,M) = µ(t,M) holds for all M ∈ (Nnil∪ P)/R.

We say that x and y are strong bisimilar, denoted x ↔ y iff there is a strong bisimulation relation R such

that sxRsy.

A corollary of requirement 2 in the definition of strong bisimilarity is that all probabilistic nodes that can

be related by some strong bisimulation relation share the same cumulative probability of reaching another

equivalence class. This justifies the overloading of the notation µ for cumulative probability to denote the

probability of reaching a set of nodes from an entire equivalence class rather than from a single node. For

a strong bisimulation relation R, we define µ([s]R,M)

M ∈ (Nnil∪ P)/R.

Proposition 2.3. ↔ is an equivalence relation on G.

nil∪ P2

a− → s?then there is some t?, such that t

a− → t?and s?Rt?holds.

def

= µ(s,M) for arbitrary s ∈ P and arbitrary

?

3

Page 4

2.2 Paths, Probabilities and Schedulers

A decomposition of a graph in a set of so-named computation trees is necessary for further quantitative

analysis of the graph: rather than conducting the analysis on the graph itself, the computation trees are

analysed.

The decomposition requires all non-determinism in the graph to be resolved. This is typically achieved

by employing a scheduler (also known as adversary or policy). A scheduler resolves the non-determinism

by selecting at most one of possibly many non-deterministic transitions in each non-deterministic nodes. A

computation tree is then obtained from the graph by resolving a non-deterministic choice according to the

scheduler and keeping probabilistic information for the relevant nodes. Dependent on the type of scheduler,

this choice is based on e.g. some history, randomisation or local information.

Wesubsequentlyformalisethenotionofschedulers. Letxbeagraph. Apathstartinginanodes0∈ Snil

is an alternating finite sequence c ≡ s0l1...lnsn, or an alternating infinite sequence c ≡ s0l1s1 ... of

nodes and labels, where for all i ≥ 1, si∈ Sniland li∈ Actτ∪ (0,1] and

1. for all nodes sj∈ N (j ≥ 0), we require sj

2. for all nodes sj∈ P (j ≥ 0), we require sj? sj+1and lj+1= pr(sj,sj+1).

Pathsalwaysconsistofatleastonenode(itsstartingnode). Forapathcstartingins0, wewritefirst(c) = s0

for the initial node of c and, if c is a finite path, we write last(c) for the last node of c. The set of all nodes

occurring in c is denoted nodes(c). We denote the trace of c by trace(c), which is the sequence of action

labels from the set Actτ that occur in c. The concatenation of two paths is again a path: given a path

c ≡ s0l1...lnsn(for n ≥ 0) and a path c?with last(c) = first(c?), we denote their concatenation by c◦c?

and it is defined as the path s0l1...lnc?. If c ≡ (s0l1s1) ◦ c?we write rest(c) = c?.

The set of all paths starting in s0is denoted Path(s0) and the set of finite paths starting in s0is denoted

Pathf(s0). A path c is a maximal path iff c is a finite path with last(c) = nil or c is an infinite path. The

set of maximal paths starting in s0is denoted Pathm(s0).

lj+1

− − − → sj+1.

Definition 2.4. A scheduler of paths starting in a node s0is a partial function σ:Pathf(s0) ?→ (→ ∪{⊥})

(where ⊥ represents “halt”). If, for some c ∈ Pathf(s0), σ(c) is defined we require that the following two

conditions are met:

1. if last(c) ∈ N, then σ(c) = ⊥ or σ(c) = last(c)

2. if last(c) ∈ Pnil, then σ(c) = ⊥.

Moreover, we impose the following sanity restrictions on σ: for all c ∈ Pathm(s0) ∩ Pathf(s0), we have

σ(c) = ⊥ and for all c ∈ Pathf(s0) with last(c) ∈ N, we require that σ(c) is defined. We denote the

set of all schedulers of a node s0by Sched(s0). When defining schedulers, we will often leave the extra

definitions that are required to meet these sanity restrictions implicit and focus on the remaining rules.

a− → t for some a and t.

Remark that the second condition in the definition of a scheduler expresses that a finite path c ending in a

probabilistic node can only be scheduled (if scheduled at all) to ⊥. In case such a path is not scheduled,

then σ is defined for all extensions of this path by a probabilistic transition. This is also illustrated in

example 2.7 at the end of this section.

For most practical purposes, we are not interested in all paths of a graph, but only in those paths that

are scheduled by a given scheduler. Let σ ∈ Sched(s0) be a scheduler of a node s0in a graph x. We write

SPath(s0,σ) for the set of all finite and infinite paths c ≡ s0l1s1... where for each si∈ N we have

σ(s0l1s1 ... si) = si

− − → si+1. The set of maximal scheduled paths starting in s0that is induced by

σ is denoted SPathm(s0,σ) and contains all infinite scheduled paths and all finite scheduled paths c for

which σ(c) = ⊥.

Note that our sanity restrictions on schedulers turns finite maximal paths into finite maximal scheduled

paths (since the former are necessarily scheduled to ⊥). This is required for a proper definition of the

li+1

4

Page 5

probability space and a probability measure.

Several types of schedulers are defined in the literature, such as randomised schedulers, determinate

schedulers and history dependent schedulers. For the exhibition of the theory, we do not fix a specific type

of schedulers, but in section 3.3 we show that a particular type of scheduler, so-called simple schedulers

are sufficiently powerful for our purposes.

Definition 2.5. Let s0∈ Snilbe a node and let σ ∈ Sched(s0) be a scheduler. We say that σ is a simple

scheduler if for all c,c?∈ Pathf(s0) with last(c) = last(c?), σ(c) = σ(c?).

Obviously, for a graph x the set of all schedulers that can be defined for a given node s0may be infinite,

while the set of all simple schedulers for that graph is finite. This fact will be used in section 3.3 where an

algorithm for deciding branching bisimulation on graphs is given.

Definition 2.6. A probabilistic tree is a 7-tuple ?N,P,s,Act,→,?,pr?, where

• N is a non-empty countable set of non-deterministic nodes.

• P is a non-empty countable set of probabilistic nodes.

• → :N × Actτ→ Pnilis the non-deterministic transition function.

• s ∈ P, Act, ? and pr are defined along the lines of definition 2.1.

Graphs and probabilistic trees differ with respect to the non-deterministic branching degree that is allowed.

While graphs have finite non-deterministic branching degree, probabilistic trees have branching degree 1.

In other words, all non-deterministic transitions are uniquely determined by a pair consisting of a non-

deterministic node and an action label. Furthermore, the set of nodes of a graph are necessarily finite,

while probabilistic trees can have infinitely many nodes. It is well-known that probabilistic trees can be

used to represent fully probabilistic systems (see e.g. [1, 4]).

Every scheduler σ ∈ Sched(s0) for a graph x defines a probabilistic tree CTx(s0,σ) whose nodes are

finite scheduled paths in x. The probabilistic and non-deterministic transitions of CTx(s0,σ) are uniquely

defined by the transition relations of x and σ in the obvious way. The probabilistic tree CTx(s0,σ) is called

a computation tree starting in s0and induced by σ. When no confusion can arise we omit the index x. The

probabilistic transition relation ? of x is used to define a probability on a finite path in CTx(s0,σ). These

probabilities are then employed to define a probability measure for the probability space associated to σ.

We proceed with the formal definitions. Let c ≡ s0l1...lnsnbe a finite path. Then, the probability of c,

denoted P(c), is defined as:

1. P(c) =?

2. P(c) = 1 otherwise.

li∈(0,1]liif at least one li∈ (0,1] for 1 ≤ i ≤ n.

Let c be a finite scheduled path. Then, the basic cylinder of c, induced by σ is given by

c↑ = {c?∈ SPathm(s0,σ) | c is a prefix of c?}

The probability measure of c↑, denoted by P(c↑) is defined as P(c↑) = P(c). The probability space

(Ωσ,Fσ,Pσ) induced by σ ∈ Sched(s0) is defined as follows2:

1. Ωσ= SPathm(s0,σ)

(1)

2. Fσis the smallest sigma-algebra on SPathm(s0,σ) that contains all basic cylinders c↑ for c a finite

scheduled σ-path

2Note that we here overload the notation P.

5