Content uploaded by Gerrit Grossmann

Author content

All content in this area was uploaded by Gerrit Grossmann on Jul 08, 2019

Content may be subject to copyright.

Reducing Spreading Processes on Networks to

Markov Population Models

Gerrit Großmann( )1 [0000−0002−4933−447X]and Luca

Bortolussi1,2[0000−0001−8874−4001]

1Saarland University, 66123 Saarbr¨ucken, Germany

gerrit.grossmann@uni-saarland.de

2University of Trieste, Trieste, Italy

lbortolussi@units.it

Abstract. Stochastic processes on complex networks, where each node

is in one of several compartments, and neighboring nodes interact with

each other, can be used to describe a variety of real-world spreading phe-

nomena. However, computational analysis of such processes is hindered

by the enormous size of their underlying state space.

In this work, we demonstrate that lumping can be used to reduce any

epidemic model to a Markov Population Model (MPM). Therefore, we

propose a novel lumping scheme based on a partitioning of the nodes.

By imposing diﬀerent types of counting abstractions, we obtain coarse-

grained Markov models with a natural MPM representation that approx-

imate the original systems. This makes it possible to transfer the rich pool

of approximation techniques developed for MPMs to the computational

analysis of complex networks’ dynamics.

We present numerical examples to investigate the relationship between

the accuracy of the MPMs, the size of the lumped state space, and the

type of counting abstraction.

Keywords: Epidemic Modeling ·Markov Population Model ·Lumping

·Model Reduction ·Spreading Process ·SIS Model ·Complex Networks

1 Introduction

Computational modeling and analysis of dynamic processes on networked sys-

tems is a wide-spread and thriving research area. In particular, much eﬀort has

been put into the study of spreading phenomena [2,37,16,27]. Arguably, the most

common formalism for spreading processes is the so-called Susceptible-Infected-

Susceptible (SIS) model with its variations [27,37,38].

In the SIS model, each node is either infected (I) or susceptible (S). Infected

nodes propagate their infection to neighboring susceptible nodes and become

susceptible again after a random waiting time. Naturally, one can extend the

number of possible node states (or compartments) of a node. For instance, the

SIR model introduces an additional recovered state in which nodes are immune

to the infection.

2 G. Großmann and L. Bortolussi

SIS-type models are remarkable because—despite their simplicity—they al-

low the emergence of complex macroscopic phenomena guided by the topological

properties of the network. There exists a wide variety of scenarios which can be

described using the SIS-type formalism. For instance, the SIS model has been

successfully used to study the spread of many diﬀerent pathogens like inﬂuenza

[25], dengue fever [39], and SARS [35]. Likewise, SIS-type models have shown to

be extremely useful for analyzing and predicting the spread of opinions [48,28],

rumors [52,51], and memes [50] in online social networks. Other areas of applica-

tions include the modeling of neural activity [15], the spread of computer viruses

[11] as well as blackouts in ﬁnancial institutions [33].

The semantics of SIS-type processes can be described using a continuous-

time Markov chain (CTMC) [27,46] (cf. Chapter 3 for details). Each possible

assignment of nodes to the two node states Sand Iconstitutes an individual

state in the CTMC (here referred to as network state to avoid confusion3).

Hence, the CTMC state space grows exponentially with the number of nodes,

which renders the numeral solution of the CTMC infeasible for most realistic

contact networks.

This work investigates an aggregation scheme that lumps similar network

states together and thereby reduces the size of the state space. More precisely, we

ﬁrst partition the nodes of the contact network. After which, we impose a count-

ing abstraction on each partition. We only lump two networks states together

when their corresponding counting abstractions coincide on each partition.

As we will see, the counting abstraction induces a natural representation of

the lumped CTMC as a Markov Population Model (MPM). In an MPM, the

CTMC states are vectors which, for diﬀerent types of species, count the number

of entities of each species. The dynamics can elegantly be represented as species

interactions. More importantly, a very rich pool of approximation techniques

has been developed on the basis of MPMs, which can now be applied to the

lumped model. These include eﬃcient simulation techniques [7,1], dynamic state

space truncation [23,32], moment-closure approximations [43,19], linear noise

approximation [45,18], and hybrid approaches [4,42].

The remainder of this work is organized as follows: Section 2 shortly revises

related work, Section 3 formalized SIS-type models and their CTMC seman-

tics. Our lumping scheme is developed in Section 4. In Section 5, we show that

the lumped CTMCs have a natural MPM representation. Numerical results are

demonstrated in in Section 6 and some conclusions in Section 7 complete the

paper and identify open research problems.

2 Related Work

The general idea behind lumping is to reduce the complexity of a system by ag-

gregating (i.e., lumping) individual components of the system together. Lumping

is a popular model reduction technique which has been used to reduce the num-

ber of equations in a system of ODEs and the number of states in a Markov

3In the following, we will use the term CTMC state and network state interchangeably.

Reducing Spreading Processes on Networks to Markov Population Models 3

chain, in particular in the context of biochemical reaction networks [30,6,49,8].

Generally speaking, one can distinguish between exact and approximate lumping

[30,6].

Most work on the lumpability of epidemic models has been done in the con-

text of exact lumping [27,41,47]. The general idea is typically to reduce the state

space by identifying symmetries in the CTMC which themselves can be found

using symmetries (i.e., automorphisms) in the contact network. Those methods,

however, are limited in scope because these symmetries are infeasible to ﬁnd

in real-world networks and the state space reduction is not suﬃcient to make

realistic models small enough to be solvable.

This work proposes an approximate lumping scheme. Approximate lump-

ing has been shown to be useful when applied to mean-ﬁeld approximation

approaches of epidemic models like the degree-based mean-ﬁeld and pair ap-

proximation equations [29], as well as the approximate master equation [20,14].

However, mean-ﬁeld equations are essentially inﬂexible as they do not take topo-

logical properties into account or make unrealistic independence assumptions

between neighboring nodes.

Moreover, [26] proposed using local symmetries in the contact network in-

stead of automorphisms to construct a lumped Markov chain. This scheme seems

promising, in particular on larger graphs where automorphisms often do not even

exist, however, the limitations for real-world networks due to a limited amount

of state space reduction and high computational costs seem to persist.

Conceptually similar to this work is also the uniﬁed mean-ﬁeld framework

(UMFF) proposed by Devriendt et al. in [10]. Devriendt et al. also partition the

nodes of the contact network but directly derive a mean-ﬁeld equation from it. In

contrast, this work focuses on the analysis of the lumped CTMC and its relation

to MPMs. Moreover, we investigate diﬀerent types of counting abstractions,

not only node based ones. The relationship between population dynamics and

networks has also been investigated with regard to Markovian agents [3].

3 Spreading Processes

Let G= (N,E) be a an undirected graph without self-loops. At each time point

t∈R≥0each node occupies one of mdiﬀerent node states, denoted by S=

{s1, s2, . . . , sm}(typically, S={S,I}). Consequently, the network state is given

by a labeling x:N → S. We use

X={x|x:N → S}

to denote all possible labelings. Xis also the state space of the underlying CTMC.

As each of the |N | nodes occupies one of mstates, we ﬁnd that |X | =|S||N |.

A set of stochastic rules determines the particular way in which nodes change

their corresponding node states. Whether a rule can be applied to a node depends

on the state of the node and of its immediate neighborhood.

4 G. Großmann and L. Bortolussi

I

I

I

I

I

I

I I

I

I

I

I

I

I

I I

I

I

I

I

I

I I

I

I

I

I

I

I

I

I

I

I

3λ

µ

3λµ2λ

µ2λ

µ

λ

µ

2λ

µ

λ

µ

λ

µλ

λ

λ

µ

Fig. 1: The CTMC induced by the SIS model (S:blue,I:magenta, ﬁlled) on a

toy graph. Only a subset of the CTMC spate space (11 out of 26= 64 network

states) is shown.

The neighborhood of a node is modeled as a vector m∈Z|S|

≥0where m[s]

denotes the number of neighbors in state s∈ S (we assume an implicit enu-

meration of states). Thus, the degree (number of neighbors, denoted by k) of

a node is equal to the sum over its associated neighborhood vector, that is,

k=Ps∈S m[s]. The set of possible neighborhood vectors is denoted as

M=nm∈Z|S|

≥0X

s∈S

m[s] ≤kmaxo,

where kmax denotes the maximal degree in a given network.

Each rule is a triplet s1

f

−→ s2(s1, s2∈ S, s16=s2), which can be applied to

each node in state s1. When the rule “ﬁres” it transforms the node from s1into

s2. The rate at which a rule “ﬁres” is speciﬁed by the rate function f:M → R≥0

and depends on the node’s neighborhood vector. The time delay until the rule is

applied to the network state is drawn from an exponential distribution with rate

f(m). Hence, higher rates correspond to shorter waiting times. For the sake of

simplicity and without loss of generality, we assume that for each pair of states

s1,s2there exists at most one rule that transforms s1to s2.

In the well-known SIS model, infected nodes propagate their infection to sus-

ceptible neighbors. Thus, the rate at which a susceptible node becomes infected

is proportional to its number of infected neighbors:

Sf

−→ Iwith f(m) = λ·m[I],

where λ∈R≥0is a rule-speciﬁc rate constant (called infection rate) and m[I] de-

notes the number of infected neighbors. Furthermore, a recovery rule transforms

Reducing Spreading Processes on Networks to Markov Population Models 5

infected nodes back to being susceptible:

If

−→ Swith f(m) = µ ,

where µ∈R≥0is a rule-speciﬁc rate constant called recovery rate.

A variation of the SIS model is the SI model where no curing rule exists and

all nodes (that are reachable from an infected node) will eventually end up being

infected. Intuitively, each rule tries to “ﬁre” at each position n∈ N where it can

be applied. The rule and node that have the shortest waiting time “win” and

the rule is applied there. This process is repeated until some stopping criterion

is fulﬁlled.

3.1 CTMC Semantics

Formally, the semantics of the SIS-type processes can be given in terms of

continuous-time Markov Chains (CTMCs). The state space is the set of possible

network states X. The CTMC has a transition from state xto x0(x, x0∈ X ,

x6=x0) if there exists a node n∈ N and a rule s1

f

−→ s2such that the appli-

cation of the rule to ntransforms the network state from xto x0. The rate of

the transition is exactly the rate f(m) of the rule when applied to n. We use

q(x, x0)∈R≥0to denote the transition rate between two network states. Fig. 1

illustrates the CTMC corresponding to an SIS process on a small toy network.

Explicitly computing the evolution of the probability of x∈ X over time

with an ODE solver, using numerical integration, is only possible for very small

contact networks, since the state space grows exponentially with the number of

nodes. Alternative approaches include sampling the CTMC, which can be done

reasonably eﬃciently even for comparably large networks [21,9,44] but is subject

to statistical inaccuracies and is mostly used to estimate global properties.

4 Approximate Lumping

Our lumping scheme is composed of three basic ingredients:

Node Partitioning: The partitioning over the nodes Nthat is explicitly pro-

vided.

Counting Pattern: The type of features we are counting, i.e., nodes or edges.

Implicit State Space Partitioning: The CTMC state space is implicitly par-

titioned by counting the nodes or edges on each node partition.

We will start our presentation discussing the partitioning of the state space,

then showing how to obtain it from a given node partitioning and counting

pattern. To this end, we use Yto denote the new lumped state space and assume

that there is a surjective4lumping function

L:X → Y

4If Lis not surjective, we consider only the image of Lto be the lumped state space.

6 G. Großmann and L. Bortolussi

that deﬁnes which network states will be lumped together. Note that the lumped

state space is the image of the lumping function and that all network states x∈ X

which are mapped to the same y∈ Y will be aggregated.

Later in this section, we will discuss concrete realizations of L. In particular,

we will construct Lbased on a node partitioning and a counting abstraction of

our choice. Next, we deﬁne the the transition rates q(y, y0) (where y, y0∈ Y,

y6=y0) between the states of the lumped Markov chain:

q(y, y0) = 1

|L−1(y)|X

x∈L−1(y)X

x0∈L−1(y0)

q(x, x0).(1)

This is simply the mean transition rate at which an original state from xgoes

to some x0∈ L−1(y0). Technically, Eq. (1) corresponds to the following lumping

assumption: we assume that at each point in time all network states belonging

to a lumped state yare equally likely.

4.1 Partition-Based Lumping

Next, we construct the lumping function L. Because we want to make our lump-

ing aware of the contact network’s topology, we assume a given partitioning P

over the nodes Nof the contact network. That is, P ⊂ 2Nand SP∈P P=N

and all P∈ P are disjoint and non-empty. Based on the node partitioning, we

can now impose diﬀerent kinds of counting abstractions on the network state.

This work considers two types: counting nodes and counting edges. The counting

abstractions are visualized in Fig. 3. A full example of how a lumped CTMC of

an SI model is constructed using the node-based counting abstraction is given

in Fig. 2.

Node-Based Counting Abstraction We count the number of nodes in each

state and partition. Thus, for a given network state x∈ X , we use y(s, P ) to

denote the number of nodes in state s∈ S in partition P∈ P . The lumping

function Lprojects xto the corresponding counting abstraction. Formally:

Y={y|y:S × P → Z≥0}

L(x) = y

with: y(s, P ) = |{n∈ N | X(n) = s, n ∈P}| .

Edge-Based Counting Abstraction Again, we assume that a network state

xand a node partitioning Pare given. Now we count the edges, that is for

each pair of states s, s0∈ S and each pair of partitions P, P 0∈ P , we count

y(s, P, s0, P 0) which is the number of edges (n, n0)∈ E where x(n) = s,n∈P,

x(n0) = s0,n0∈P0. Note that this includes cases where P=P0and s=s0.

However, only counting the edges does not determine how many nodes there are

in each state (see Fig. 3 for an example).

Reducing Spreading Processes on Networks to Markov Population Models 7

Graph Partition Rule

(a)

Rate

1

2

Original Markov Model

Edge-Based

Partitioning

Node-Based

(b)

P1

2

0

P2

2

0

2

0

1

1

2

0

0

2

P1

1

1

P2

2

0

1

1

1

1

1

1

0

2

P1

0

2

P2

2

0

0

2

1

1

0

2

0

2

Rate

1

2

1.5

Lumped Markov Model

(c)

Fig. 2: Illustration of the lumping process. (a): Model. A basic SI-Process where

infected nodes (magenta, ﬁlled) infect susceptible neighbors (blue) with rate in-

fection λ= 1. The contact graph is divided into two partitions. (b): The underly-

ing CTMC with 24= 16 states. The graph partition induces the edge-based and

node-based lumping. The edge-based lumping reﬁnes the node-based lumping

and generates one partition more (vertical line in the central partition). (c): The

lumped CTMC using node-based counting abstraction with only 9 states. The

rates are the averaged rates from the full CTMC.

8 G. Großmann and L. Bortolussi

Without

Dummy

With

Dummy

(a) (b)

Fig. 3: (a) By adding the dummy-node, the edge-based abstraction is able to

diﬀerentiate the two graphs. Adding the dummy-node ensures that the nodes

in each state are counted in the edge-based abstraction. (b) Left: A partitioned

network (Zachary’s Karate Club graph from [12]) (S:blue,I:magenta, ﬁlled).

The network is partitioned into P1(#-nodes) and P2(2-nodes). Right: The

corresponding counting abstractions.

In order to still have this information encoded in each lumped state, we

slightly modify the network structure by adding a new dummy node n?and

connecting each node to it . The dummy node has a dummy state denoted by

?which never changes, and it can be assigned to a new dummy partition P?.

Formally,

N:= N ∪ {n?} S := S ∪ {?}L(n?) = ?P:= P ∪ {P?}

E:= E ∪ {(n, n?)|n∈ N , n 6=n?}.

Note that the rate function fignores the dummy node. The lumped represen-

tation is then given as:

Y={y|y:S × P × S × P → Z≥0}

L(x) = y

with: y(s, P, s0, P 0) = |(n, n0)∈ E | x(n) = s, n ∈P, x(n0) = s0, n0∈P0|

Example Fig. 2 illustrates how a given partitioning and the node-based count-

ing approach induces a lumped CTMC. The partitions induced by the edge-based

counting abstracting are also shown. In this example, the edge-based lumping

aggregates only isomorphic network states.

4.2 Graph Partitioning

Broadly speaking, we have three options to partition the nodes based on local

features (e.g., its degree) or global features (e.g., communities in the graph)

or randomly. As a baseline, we use a random node partitioning. Therefore, we

Reducing Spreading Processes on Networks to Markov Population Models 9

ﬁx the number of partitions and randomly assign each node to a partition while

enforcing that all partitions have, as far as possible, the same number of elements.

Moreover, we investigate a degree-based partitioning, where we deﬁne the

distance between to nodes n, n0as their relative degree diﬀerence (similar to

[29]):

dk(n, n0) = |kn−kn0|

max(kn, kn0).

We can then use any reasonable clustering algorithm and build partitions (i.e.,

clusters) with the distance function. In this work, we focus on bottom-up hier-

archical clustering as it provides the most principled way of precisely controlling

the number of partitions. Note that, for the sake of simplicity (in particular, to

avoid inﬁnite distances), we only consider contact networks where each node is

reachable from every other node. We break ties arbitrarily.

To get a clustering considering global features we use a spectral embedding

of the contract network. Speciﬁcally, we use the spectral_layout function from

the NetworkX Python-package [22] with three dimensions and perform hierar-

chical clustering on the embedding. In future research, it would be interesting

to compute node distances based on more sophisticated graph embedding as the

ones proposed in [17]. Note that in the border cases |P | = 1 and |P| =|N | all

methods yield the same partitioning.

5 Markov Population Models

Markov Population Models (MPMs) are a special form of CTMCs where each

CTMC state is a population vector over a set of species. We use Zto denote the

ﬁnite set of species (again, with an implicit enumeration) and y∈Z|Z|

≥0to denote

the population vector. Hence, y[z] identiﬁes the number of entities of species z.

The stochastic dynamics of MPMs is typically expressed as a set of reactions R,

each reaction, (α, b)∈ R, is comprised of a propensity function α:Z|Z|

≥0→R≥0

and a change vector b∈Z|Z|. When reaction (α, b) is applied, the system moves

from state yto state y+b. The corresponding rate is given by the propensity

function. Therefore, we can rewrite the transition matrix of the CTMC as5:

q(y,y0) = (α(y) if ∃(α, b)∈ R,y0=y+b

0 otherwise .

Next, we show that our counting abstractions have a natural interpretation

as MPMs.

5Without loss of generality, we assume that diﬀerent reactions have diﬀerent change

vectors. If this is not the case, we can merge reactions with the same update by

summing their corresponding rate functions.

10 G. Großmann and L. Bortolussi

5.1 Node-Based Abstraction

First, we deﬁne the set of species Z. Conceptually, species are node states which

are aware of their partition:

Z={(s, P )|s∈ S, P ∈ P} .

Again, we assume an implicit enumeration of Z. We use z.s and z.P to denote

the components of a give species z.

We can now represent the lumped CTMC state as a single population vector

y∈Z|Z|

≥0, where y[z] the number of nodes belonging to species z(i.e., which are

in state z.s and partition z .P ). The image of the lumping function L, i.e. the

lumped state space Y, is now a subset of non-negative integer vectors: Y ⊂ Z|Z |

≥0.

Next, we express the dynamics by a set of reactions. For each rule r=s1

f

−→ s2

and each partition P∈ P, we deﬁne a reaction (αr,P ,br,P ) with propensity

function as:

αr,P :Y → R≥0

αr,P (y) = 1

L−1(y)X

x∈L−1(y)X

n∈P

f(mx,n)1x(n)=s1,

where mx,n denotes the neighborhood vector of nin network state x. Note that

this is just the instantiation of Equation 1 to the MPM framework.

The change vector br,P ∈Z|Z| is deﬁned element-wise as:

br,P [z] =

1 if z.s =s2, P =z.P

−1 if z.s =s1, P =z.P

0 otherwise

.

Note that s1, s2refer to the current rule and z.s to the entry of br,P .

5.2 Edge-Based Counting Abstraction

We start by deﬁning a species neighborhood. The species neighborhood of a node

nis a vector v∈Z|Z|

≥0, where v[z] denotes the number of neighbors of species

z. We deﬁne Vnto be the set of possible species neighborhoods for a node n,

given a ﬁxed contact network and partitioning. Note that we still assume that a

dummy node is used to encode the number of states in each partition.

Assuming an arbitrary ordering of pairs of states and partitions, we deﬁne

Z=(ssource, Psour ce, starg et, Ptarg et)|ssource , starget ∈ S, Psource , Ptarget ∈ P,

(ssource, Psour ce)≤(starg et, Ptarg et).

Let us deﬁne VPto be the set of partition neighborhoods all nodes in Pcan

have:

VP=[

n∈P

Vn.

Reducing Spreading Processes on Networks to Markov Population Models 11

For each rule r=s1

f

−→ s2, and each partition P∈ P, and each v∈ VP, we

deﬁne a propensity function αr,P,vwith:

αr,P,v:Y → R≥0

αr,P,v(y) = 1

L−1(y)X

x∈L−1(y)X

n∈P

f(mx,n)1x(n)=s1,V (n)=v.

Note that the propensity does not actually depend on v, it is simply indi-

vidually deﬁned for each v. The reason for this is that the change vector de-

pends on the a node’s species neighborhood. To see this, consider a species

z= (ssource, Psour ce, starg et, Ptarg et), corresponding to edges connecting a node

in state ssource and partition Psource to a node in state star get and partition

Ptarget . There are two scenarios in which the corresponding counting variable

has to change: (a) when the node changing state due to an application of rule

ris the source node, and (b) when it is the target node. Consider case (a); we

need to know how many edges are connecting the updated node (which was in

state s1and partition P) to a node in state starget and partition Ptar get. This

information is stored in the vector v, speciﬁcally in position v[starget , Ptarget ].

The case in which the updated node is the target one is treated symmetrically.

This gives rise to the following deﬁnition:

br,P,v[z] =

v[z.starg et, z.Ptarget] if s2=z.ssour ce, P =z.Psour ce

−v[z.starg et, z.Ptarget] if s1=z.ssource, P =z.Psource

v[z.ssource , z.Psour ce] if s2=z.star get, P =z.Ptarget

−v[z.ssource , z.Psour ce] if s1=z.starget , P =z.Ptarget

0 otherwise

.

The ﬁrst two lines of the deﬁnition handle cases in which the node changing

state is the source node, while the following two lines deal with the case in which

the node changing state appears as target.

Fig. 4 illustrates how a lumped network state is inﬂuenced by the application

of an infection rule.

5.3 Direct Construction of the MPM

Approximating the solution of an SIS-type process on a contact network by

lumping the CTMC ﬁrst, already reduces the computational costs by many or-

ders of magnitude. However, this scheme is still only applicable when it is possible

to construct the full CTMC in the ﬁrst place. Recall that the number of network

states is exponential in the number of nodes of the contact network, that is,

|X | =|S||N | .

However, in recent years, substantial eﬀort was dedicated to the analysis of

very small networks [47,24,31,34,36]. One reason is that when the size of a net-

work increases, the (macro-scale) dynamics becomes more deterministic because

12 G. Großmann and L. Bortolussi

Fig. 4: Example of how the neighborhood vinﬂuences the update in the edge-

based counting abstraction on an example graph. Here, all nodes belong to the

same partition (thus, nodes states and species are conceptually the same) and

the node states are ordered [S,I, ?]. The population vector yis given in matrix

form for the ease of presentation.

stochastic eﬀects tend to cancel out. For small contact networks, however, meth-

ods which capture the full stochastic dynamics of the system, and not only the

mean behavior, are of particular importance.

A substantial advantage of the reduction to MPM is the possibility of con-

structing the lumped CTMC without building the full CTMC ﬁrst. In particular,

this can be done exactly for the node counting abstraction. On the other hand, for

the edge counting we need to introduce an extra approximation in the deﬁnition

of the rate function, roughly speaking introducing an approximate probability

distribution over neighboring vectors, as knowing how many nodes have a spe-

ciﬁc neighboring vector requires us full knowledge of the original CTMC. We

present full details of such direct construction in Appendix 8.

5.4 Complexity of the MPM

The size of the lumped MPM is critical for our method, as it determines which

solution techniques are computationally tractable and provides guidelines on

how many partitions to choose. There are two notions of size to consider: (a) the

number of population variables and (b) the number of states of the underlying

CTMC. While the latter governs the applicability of numerical solutions for

CTMCs, the former controls the complexity of a large number of approximate

techniques for MPMs, like mean ﬁeld or moment closure.

Node-based abstraction. In this abstraction, the population vector is of length

|S| · |P |, i.e. there is a variable for each node state and each partition.

Note that the sum of the population variables for each partition Pis |P|,

the number of nodes in the partition. This allows us to count easily the number

of states of the CTMC of the population model: for each partition, we need

to subdivide |P|diﬀerent nodes into |S| diﬀerent classes, which can be done in

|P|+|S|−1

|S|−1ways, giving a number of CTMC states exponential in the number |S|

of node states and |P| of partitions, but polynomial in the number of nodes:

|Y| =Y

P∈P|P|+|S | − 1

|S| − 1.

Reducing Spreading Processes on Networks to Markov Population Models 13

Edge-based abstraction. The number of population variables, in this case, is one

for each edge connecting two diﬀerent partitions, plus those counting the number

of nodes in each partition and each node state, due to the presence of the dummy

state. In total, we have q(q−1)

2+qpopulation variables, with q=|S| · |P |.

In order to count the number of states of the CTMC in this abstraction, we

start by observing that the sum of all variables for a given pair of partitions

P0, P 00 is the number of edges connecting such partitions in the graph. We use

(P0, P 00) to denote the number of edges between P0, P 00 (resp. the number of

edges inside P0if P0=P00). Thus,

|Y| ≤ Y

P0,P 00∈P 2

P0≤P00

(P0, P 00) + S2−1

S2−1·Y

P∈P|P|+|S | − 1

|S| − 1.

This is an over-approximation, because not all combinations are consistent with

the graph topology. For example, a high number of infected nodes in a partition

might not be consistent with a small number of I−I-edges inside the partition.

Note that also this upper bound is exponential in |S| and |P| but still polynomial

in the number of nodes N, diﬀerently from the original network model, whose

state space is exponential in N.

The exponential dependency on the number of species (i.e., dimensions of

the population vector) makes the explicit construction of the lumped state space

viable only for very small networks with a small number of node states. However,

this is typically the case for spreading models like SIS or SIR. Yet, also the

number of partitions has to be kept small, particularly in realistic models. We

expect that the partitioning is especially useful for networks showing a small

number of large-scale homogeneous structures, as happens in many real-world

networks [12].

An alternative strategy for analysis is to derive mean-ﬁeld [5] or moment clo-

sure equations [40] for MPMs, which can be done without explicitly constructing

the lumped (and the original) state space. These are sets of ordinary diﬀerential

equation (ODE) describing the evolution of (moments of) the population vari-

ables. We refer the reader to [10] for a similar approach regarding the node-based

abstraction.

6 Numerical Results

In this section, we compare the numerical solution of the original model—referred

to as baseline model—with diﬀerent lumped MPMs. The goal of this compari-

son is to provide evidence supporting the claim that the lumping preserves the

dynamics of the original system, with an accuracy increasing with the resolu-

tion of the MPM. We will perform the comparison by solving numerically the

ground and the lumped system, thus comparing the the probability of each state

in each point in time. In practical applications of our method, exact transient or

steady state solutions may not be feasible, but in this case we can still rely to

14 G. Großmann and L. Bortolussi

Error

Fig. 5: Trade of between accuracy and state space size for the node-based (blue)

and edge-based (magenta, ﬁlled) counting abstraction. Results are shown for

node partitions based on the degree (l.), spectral embedding (c.), and random

partitioning (r.). The accuracy is measured as the mean (4) and maximal (5)

diﬀerence between the original and lumped solution over all timepoints.

approximation methods for MPM [5,40]. Determining which of those techniques

performs best in this context is a direction of future exploration.

A limit of the comparison based on numerical solution of the CTMC is that

the state space of the original model has |S||N | states, which limits the size of

the contact network strongly6.

Let P(X(t) = x) denote the probability that the baseline CTMC occupies

network state x∈ X at time t≥0. Furthermore, let P(Y(t) = y) for t≥0

and y∈ Y denote the same probability for a lumped MPM (corresponding to

a speciﬁc partitioning and counting abstraction). To measure their diﬀerence,

we ﬁrst approximate the probability distribution of the original model using the

lumped solution, invoking the lumping assumption which states that all network

states which are lumped together have the same probability mass. We use PL

to denote the lifted probability distribution over the original state space given a

lumped solution. Formally,

PLY(t) = x=PY(t) = y

|L−1(y)|where yis s.t. L(x) = y.

We measure the diﬀerence between the baseline and a lumped solution at

a speciﬁc time point by summing up the diﬀerence in probability mass of each

state, then take the maximum error in time:

d(P, PL) = max

tX

x∈X PLY(t) = x)−P(X(t) = x.

In our experiments, we used a small toy network with 13 nodes and 2 states

(213 = 8192 network states). We generated a synthetic contact network following

the Erd˝os–R´enyi graph model with a connection probability of 0.5. We use a SIS

model with an infection rate of λ= 1.0 and a recovery rate of µ= 1.3. Initially,

we assign an equal amount of probability mass to all network states.

6Code is available at github.com/gerritgr/Reducing-Spreading-Processes

Reducing Spreading Processes on Networks to Markov Population Models 15

Fig. 5 shows the relationship between the error of the lumped MPM, the

type of counting abstraction and the method used for node partitioning. We

also report the mean diﬀerence together with the maximal diﬀerence over time.

From our results, we conclude that the edge-based counting abstraction yields

a signiﬁcantly better trade-oﬀ between state space size and accuracy. However,

it generates larger MPM models than the node-based abstraction when adding

a new partition. We also ﬁnd that spectral and degree-based partitioning yield

similar results for the same number of CTMC states and that random partition-

ing performed noticeably worse, for both edge-based and node-based counting

abstractions.

7 Conclusions and Future Work

This work developed ﬁrst steps in a uniﬁcation of the analysis of stochastic

spreading processes on networks and Markov population models. Since the so

obtained MPM can become very large in terms of species, it is important to be

able to control the trade-oﬀ between state space size and accuracy.

However, there are still many open research problems ahead. Most evidently,

it remains to be determined which of the many techniques developed for the

analysis of MPMs (e.g. linear noise, moment closure) work best on our proposed

epidemic-type MPMs and how they scale with increasing size of the contact

network. We expect also that these reduction methods can provide a good start-

ing point for deriving advanced mean-ﬁeld equations, similar to ones in [10].

Moreover, literature is very rich in proposed moment-closure-based approxima-

tion techniques for MPMs, which can now be utilized [43,19]. We also plan to

investigate the relationship between lumped mean-ﬁeld equations [20,29] and

coarse-grained counting abstractions further.

Future work can additionally explore counting abstraction of diﬀerent types,

for instance, a neighborhood-based abstraction like the one proposed by James

P. Gleeson in [13,14].

Finally, we expect that there are many more possibilities of partitioning

the contact network that remain to be investigated and which might have a

signiﬁcant impact on the ﬁnal accuracy of the abstraction.

Acknowledgements This research has been partially funded by the German

Research Council (DFG) as part of the Collaborative Research Center “Methods

and Tools for Understanding and Controlling Privacy”. We thank Verena Wolf

for helpful discussions and provision of expertise.

16 G. Großmann and L. Bortolussi

References

1. G. E. Allen and C. Dytham. An eﬃcient method for stochastic simulation of

biological populations in continuous time. Biosystems, 98(1):37–42, 2009.

2. A.-L. Barab´asi. Network science. Cambridge university press, 2016.

3. A. Bobbio, D. Cerotti, M. Gribaudo, M. Iacono, and D. Manini. Markovian agent

models: a dynamic population of interdependent markovian agents. In Seminal

Contributions to Modelling and Simulation, pages 185–203. Springer, 2016.

4. L. Bortolussi. Hybrid behaviour of Markov population models. Information and

Computation, 247:37–86, 2016.

5. L. Bortolussi, J. Hillston, D. Latella, and M. Massink. Continuous approximation

of collective system behaviour: A tutorial. Performance Evaluation, 70(5):317–349,

2013.

6. P. Buchholz. Exact and ordinary lumpability in ﬁnite markov chains. Journal of

applied probability, 31(1):59–75, 1994.

7. Y. Cao, D. T. Gillespie, and L. R. Petzold. Eﬃcient step size selection for the

tau-leaping simulation method. The Journal of chemical physics, 124(4).

8. L. Cardelli, M. Tribastone, M. Tschaikowski, and A. Vandin. Erode: a tool for

the evaluation and reduction of ordinary diﬀerential equations. In International

Conference on Tools and Algorithms for the Construction and Analysis of Systems,

pages 310–328. Springer, 2017.

9. W. Cota and S. C. Ferreira. Optimized gillespie algorithms for the simulation of

markovian epidemic processes on large and heterogeneous networks. Computer

Physics Communications, 219:303–312, 2017.

10. K. Devriendt and P. Van Mieghem. Uniﬁed mean-ﬁeld framework for susceptible-

infected-susceptible epidemics on networks, based on graph partitioning and the

isoperimetric inequality. Physical Review E, 96(5):052314, 2017.

11. C. Gan, X. Yang, W. Liu, Q. Zhu, and X. Zhang. Propagation of computer virus

under human intervention: a dynamical model. Discrete Dynamics in Nature and

Society, 2012, 2012.

12. M. Girvan and M. E. Newman. Community structure in social and biological

networks. Proceedings of the national academy of sciences, 99(12):7821–7826, 2002.

13. J. P. Gleeson. High-accuracy approximation of binary-state dynamics on networks.

Physical Review Letters, 107(6):068701, 2011.

14. J. P. Gleeson. Binary-state dynamics on complex networks: Pair approximation

and beyond. Physical Review X, 3(2):021004, 2013.

15. A. Goltsev, F. De Abreu, S. Dorogovtsev, and J. Mendes. Stochastic cellular

automata model of neural networks. Physical Review E, 81(6):061921, 2010.

16. J. Goutsias and G. Jenkinson. Markovian dynamics on complex reaction networks.

Physics Reports, 529(2):199–264, 2013.

17. P. Goyal and E. Ferrara. Graph embedding techniques, applications, and perfor-

mance: A survey. Knowledge-Based Systems, 151:78–94, 2018.

18. R. Grima. An eﬀective rate equation approach to reaction kinetics in small volumes:

Theory and application to biochemical reactions in nonequilibrium steady-state

conditions. The Journal of Chemical Physics, 133(3):035101, 2010.

19. R. Grima. A study of the accuracy of moment-closure approximations for stochastic

chemical kinetics. The Journal of chemical physics, 136(15):04B616, 2012.

20. G. Großmann, C. Kyriakopoulos, L. Bortolussi, and V. Wolf. Lumping the ap-

proximate master equation for multistate processes on complex networks. In Qest,

pages 157–172. Springer, 2018.

Reducing Spreading Processes on Networks to Markov Population Models 17

21. G. Großmann and V. Wolf. Rejection-based simulation of stochastic spreading

processes on complex networks. arXiv preprint arXiv:1812.10845, 2018.

22. A. Hagberg, P. Swart, and D. S Chult. Exploring network structure, dynamics,

and function using networkx. Technical report, Los Alamos National Lab.(LANL),

Los Alamos, NM (United States), 2008.

23. T. A. Henzinger, M. Mateescu, and V. Wolf. Sliding window abstraction for inﬁnite

markov chains. In International Conference on Computer Aided Veriﬁcation, pages

337–352. Springer, 2009.

24. P. Holme. Shadows of the susceptible-infectious-susceptible immortality transition

in small networks. Physical Review E, 92(1):012804, 2015.

25. M. J. Keeling and P. Rohani. Modeling infectious diseases in humans and animals.

Princeton University Press, 2011.

26. W. R. KhudaBukhsh, A. Auddy, Y. Disser, and H. Koeppl. Approximate lumpabil-

ity for markovian agent-based models using local symmetries. arXiv:1804.00910.

27. I. Z. Kiss, J. C. Miller, and P. L. Simon. Mathematics of epidemics on networks:

from exact to approximate models. Forthcoming in Springer TAM series, 2016.

28. M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, and

H. A. Makse. Identiﬁcation of inﬂuential spreaders in complex networks. Nature

physics, 6(11):888, 2010.

29. C. Kyriakopoulos, G. Grossmann, V. Wolf, and L. Bortolussi. Lumping of degree-

based mean-ﬁeld and pair-approximation equations for multistate contact pro-

cesses. Physical Review E, 97(1):012301, 2018.

30. G. Li and H. Rabitz. A general analysis of approximate lumping in chemical

kinetics. Chemical engineering science, 45(4):977–1002, 1990.

31. M. L´opez-Garc´ıa. Stochastic descriptors in an sir epidemic model for heterogeneous

individuals in small networks. Mathematical biosciences, 271:42–61, 2016.

32. M. Mateescu, V. Wolf, F. Didier, and T. Henzinger. Fast adaptive uniformisation

of the chemical master equation. IET systems biology, 4(6):441–452, 2010.

33. R. M. May and N. Arinaminpathy. Systemic risk: the dynamics of model banking

systems. Journal of the Royal Society Interface, 7(46):823–838, 2009.

34. M. Moslonka-Lefebvre, M. Pautasso, and M. J. Jeger. Disease spread in small-

size directed networks: epidemic threshold, correlation between links to and from

nodes, and clustering. Journal of theoretical biology, 260(3):402–411, 2009.

35. T. W. Ng, G. Turinici, and A. Danchin. A double epidemic model for the sars

propagation. BMC Infectious Diseases, 3(1):19, 2003.

36. M. Pautasso, M. Moslonka-Lefebvre, and M. J. Jeger. The number of links to

and from the starting node as a predictor of epidemic size in small-size directed

networks. Ecological Complexity, 7(4):424–432, 2010.

37. M. Porter and J. Gleeson. Dynamical systems on networks: A tutorial, volume 4.

Springer, 2016.

38. H. S. Rodrigues. Application of sir epidemiological model: new trends.

arXiv:1611.02565, 2016.

39. H. S. Rodrigues, M. T. T. Monteiro, and D. F. Torres. Dynamics of dengue epi-

demics when using optimal control. Mathematical and Computer Modelling, 52(9-

10):1667–1673, 2010.

40. D. Schnoerr, G. Sanguinetti, and R. Grima. Approximation and inference meth-

ods for stochastic biochemical kinetics - a tutorial review. Journal of Physics A,

51:169501, 2018.

41. P. L. Simon, M. Taylor, and I. Z. Kiss. Exact epidemic models on graphs using

graph-automorphism driven lumping. Journal of mathematical biology, 62(4):479–

508, 2011.

18 G. Großmann and L. Bortolussi

42. A. Singh and J. P. Hespanha. Stochastic hybrid systems for studying biochemical

processes. Royal Society A, 368(1930):4995–5011, 2010.

43. M. Soltani, C. A. Vargas-Garcia, and A. Singh. Conditional moment closure

schemes for studying stochastic dynamics of genetic circuits. IEEE transactions

on biomedical circuits and systems, 9(4):518–526, 2015.

44. G. St-Onge, J.-G. Young, L. H´ebert-Dufresne, and L. J. Dub´e. Eﬃcient sampling

of spreading processes on complex networks using a composition and rejection

algorithm. Computer Physics Communications, 2019.

45. N. G. Van Kampen. Stochastic processes in physics and chemistry, volume 1.

Elsevier, 1992.

46. P. Van Mieghem, J. Omic, and R. Kooij. Virus spread in networks. IEEE/ACM

Transactions on Networking, 17(1):1–14, 2009.

47. J. A. Ward and J. Evans. A general model of dynamics on networks with graph

automorphism lumping. In International Workshop on Complex Networks and

their Applications, pages 445–456. Springer, 2018.

48. D. J. Watts and P. S. Dodds. Inﬂuentials, networks, and public opinion formation.

Journal of consumer research, 34(4):441–458, 2007.

49. J. Wei and J. C. Kuo. Lumping analysis in monomolecular reaction systems.

analysis of the exactly lumpable system. Industrial & Engineering chemistry fun-

damentals, 8(1):114–123, 1969.

50. X. Wei, N. C. Valler, B. A. Prakash, I. Neamtiu, M. Faloutsos, and C. Faloutsos.

Competing memes propagation on networks: A network science perspective. IEEE

Journal on Selected Areas in Communications, 31(6):1049–1060, 2013.

51. L. Zhao, H. Cui, X. Qiu, X. Wang, and J. Wang. Sir rumor spreading model in

the new media age. Physica A, 392(4):995–1003, 2013.

52. L. Zhao, J. Wang, Y. Chen, Q. Wang, J. Cheng, and H. Cui. Sihr rumor spreading

model in social networks. Physica A, 391(7):2444–2453, 2012.

Reducing Spreading Processes on Networks to Markov Population Models 19

Appendix

8 Direct Construction of MPMs

Here, we prosper a way of directly deriving the lumped MPMs from the contact

network without building the original CTMC ﬁrst. We start with the node-based

counting abstraction.

8.1 Node-Based Abstraction with General Rate Functions

Our general strategy is to iterate over the nodes in the contact network and

to compute the mean rate attributed to that node over all x∈ X . Therefore,

we consider the possible states of each node together with all possible species

neighborhoods. The probability of a node nbeing in state sand having species

neighborhood vis denoted as Pr X(n) = s, V (n) = v.

For a speciﬁc rule r=s1

f

−→ s2and partition P, we can then describe αr,P

as:

αr,P (y) = X

n∈PX

v∈Vn

f(mv) Pr X(n) = s1, V (n) = v,

where mvis the neighborhood vector induced by v, which we receive by grouping

all partitions together. Note that it is not computationally necessary to actually

iterate over all nodes in the partition. Instead we can group all nodes with

the same partition neighborhood together, that is, all nodes n0, n00 ∈Pwith

Vn0=Vn00 as the probability only depends on v.

Computing the probability is the interesting part, we start by establishing

that

Pr X(n) = s1, V (n) = v

= Pr X(n) = s1·Pr V(n) = v|X(n) = s1.

The ﬁrst term in the product can be described by simply dividing the number

of s1-nodes in Pwith the total number of nodes in P.

Pr X(n) = s1=y[s1, P ]

|P|where: n∈P .

The latter probability can be computed for each partition independently. This

is because we know the number of nodes in each state in each partition. We also

know that in partition Pthe current node nis already in state s1, which we

have to take into account. First, we deﬁne yP∈Z|S|

≥0to the the projection from

yto P. Thus, each entry is deﬁned by:

yP[s] = y[s, P ].

20 G. Großmann and L. Bortolussi

Likewise, we deﬁne vP∈Z|S|

≥0, such that vP[s] = v[s, P ]. We also deﬁne V(n)P∈

Z|S|

≥0to be the number of neighbors of node nin partition Pfor each state. Finally,

we deﬁne ys−

1

Pto be the same vector as yPexcept that the entry corresponding

to state s1is subtracted by one (and truncated at zero). We can now rewrite the

probability as:

Pr V(n) = v|X(n) = s1

=Y

P0∈P

Pr V(n)P0=vP0X(n) = s1

= Pr V(n)P=vPX(n) = s1Y

P0∈P\{P}

Pr V(n)P0=vP0X(n) = s1

=phvP;ys−

1

P·Y

P0∈P\{P}

phvP0;yP0.

We use ph(k;K) to denote the probability mass function of the the multivari-

ate hypergeometric distribution, where k,Kdenote to vectors over non-negative

integers of the same length. That is, if Kdenotes the number of nodes in each

state in a partition (resp., the number of marbles in an urn with diﬀerent col-

ors), then, ph(k;K) denotes the probability of drawing exactly k[s] nodes (resp.

marbles) of each state (resp. color).

8.2 Reaction Networks and Linear Models

A special case of MPMs are biochemical reaction networks, where the species

represent diﬀerent types of molecules. The change vectors and corresponding

propensity functions can elegantly be expressed as monomolecular (A→B) and

bimolecular (A+B→C+D) reaction rules (A,B,C,D∈ Z).

Reduction to Biochemical Reaction Networks Most classical models in

computational epidemiology are solely comprised of node-based rules (like the

curing rule) and edge-based rules (like the infecting propagation rule). We call

these linear models. Node-based rules, also referred to as spontaneous or inde-

pendent rules, have a constant rate function, i.e., f(m) = µ. Edge-based rules,

also referred to as contact rules, are linear in exactly one dimension, i.e., they

have the form f(m) = λm[s].

Linear models are special because not the whole neighborhood is important

for the rate of a rule but only the expected number of neighbors in a certain state.

This makes the rule very similar to monomolecular and bimolecular reaction

rates in MPMs. In fact, we can model the whole dynamics as a set of reaction

over the species Z.

Chemical reaction networks are a special case of Markov population models.

In a chemical reaction network the state space is given by population vectors

Reducing Spreading Processes on Networks to Markov Population Models 21

over species and molecular reactions have the form Aa

−→ Cor A+Bb

−→ C+D,

where A,B,C,D denote species and a, b ∈R≥0are reaction rate constants.

For each node-based rule s1

µ

−→ s2, we construct the reactions

(s1, P )µ

−→ (s2, P )∀P∈ P

For each edge-based rule s1

f

−→ s1,f(m) = λm[s0], we construct the reactions

(s1, P )+(s0, P 0)λwP,P 0

−−−−→ (s2, P )+(s0, P 0)∀P , P 0∈ P

where wP,P 0denotes the mean number of edges of a random node in Pwith

nodes in P0, that is7:

wP,P 0=

(P,P )

|P|

1

|P|−1if P=P0

(P,P 0)

|P|

1

|P0|otherwise

.

with

(P, P 0) = |{(n1, n2)∈ E | n1∈P, n2∈P0}| .(2)

8.3 Edge-Based Counting Abstraction

For each rule r=s1

f

−→ s2, and each partition P∈ P, and each v∈ VP, we

deﬁne a propensity function αr,P,vwith:

αr,P,v(y) = X

n∈P

f(mv) Pr X(n) = s1, V (n) = v.

Again, we use

Pr X(n) = s1, V (n) = v= Pr X(n) = s1·Pr V(n) = v|X(n) = s1

to compute this probability, where we can solve Pr X(n) = s1exactly as before.

Since we have now information about the edges, we can derive the probability

of neighborhoods more precisely. In fact, we can directly construct the set of

candidate neighbors from y. Therefore, we deﬁne a vector ys,P,P0∈Z|S |

≥0, where

entry ys,P,P 0[s0] speciﬁes the number of neighbors of a random node in state s

and partition P, which lie in partition P0and occupy state s0. Formally:

ys,P,P 0[s0] = (y[s, P, s0, P 0] if (s, P )≤(s0, P 0)

y[s0, P 0, s, P ] otherwise .

This gives rise to the ﬁnal approximation of the probability of neighborhood

species:

Pr V(n) = vX(n) = s1≈Y

P0∈P

phvP;ys1,P,P 0(where n∈P.)

7Note that, despite the duple notation, we only count edges once