PreprintPDF Available

Reducing Spreading Processes on Networks to Markov Population Models

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Stochastic processes on complex networks, where each node is in one of several compartments and neighboring nodes interact with each other, can be used to describe a variety of real-world spreading phenomena. However, computational analysis of such processes is hindered by the enormous size of their underlying state space. In this work, we demonstrate that lumping can be used to reduce any epidemic model to a Markov population model (MPM). Therefore, we propose a novel lumping scheme based on a partitioning of the nodes. By imposing different types of counting abstractions, we obtain coarse-grained Markov models with a natural MPM representation that approximate the original systems. This makes it possible to transfer the rich pool of approximation techniques developed for MPMs to the computational analysis of complex networks' dynamics. We present numerical examples to investigate the relationship between, the accuracy of the MPMs, the size of the lumped state space, and the type of counting abstraction.
Content may be subject to copyright.
Reducing Spreading Processes on Networks to
Markov Population Models
Gerrit Großmann( )1 [000000024933447X]and Luca
Bortolussi1,2[0000000188744001]
1Saarland University, 66123 Saarbr¨ucken, Germany
gerrit.grossmann@uni-saarland.de
2University of Trieste, Trieste, Italy
lbortolussi@units.it
Abstract. Stochastic processes on complex networks, where each node
is in one of several compartments, and neighboring nodes interact with
each other, can be used to describe a variety of real-world spreading phe-
nomena. However, computational analysis of such processes is hindered
by the enormous size of their underlying state space.
In this work, we demonstrate that lumping can be used to reduce any
epidemic model to a Markov Population Model (MPM). Therefore, we
propose a novel lumping scheme based on a partitioning of the nodes.
By imposing different types of counting abstractions, we obtain coarse-
grained Markov models with a natural MPM representation that approx-
imate the original systems. This makes it possible to transfer the rich pool
of approximation techniques developed for MPMs to the computational
analysis of complex networks’ dynamics.
We present numerical examples to investigate the relationship between
the accuracy of the MPMs, the size of the lumped state space, and the
type of counting abstraction.
Keywords: Epidemic Modeling ·Markov Population Model ·Lumping
·Model Reduction ·Spreading Process ·SIS Model ·Complex Networks
1 Introduction
Computational modeling and analysis of dynamic processes on networked sys-
tems is a wide-spread and thriving research area. In particular, much effort has
been put into the study of spreading phenomena [2,37,16,27]. Arguably, the most
common formalism for spreading processes is the so-called Susceptible-Infected-
Susceptible (SIS) model with its variations [27,37,38].
In the SIS model, each node is either infected (I) or susceptible (S). Infected
nodes propagate their infection to neighboring susceptible nodes and become
susceptible again after a random waiting time. Naturally, one can extend the
number of possible node states (or compartments) of a node. For instance, the
SIR model introduces an additional recovered state in which nodes are immune
to the infection.
2 G. Großmann and L. Bortolussi
SIS-type models are remarkable because—despite their simplicity—they al-
low the emergence of complex macroscopic phenomena guided by the topological
properties of the network. There exists a wide variety of scenarios which can be
described using the SIS-type formalism. For instance, the SIS model has been
successfully used to study the spread of many different pathogens like influenza
[25], dengue fever [39], and SARS [35]. Likewise, SIS-type models have shown to
be extremely useful for analyzing and predicting the spread of opinions [48,28],
rumors [52,51], and memes [50] in online social networks. Other areas of applica-
tions include the modeling of neural activity [15], the spread of computer viruses
[11] as well as blackouts in financial institutions [33].
The semantics of SIS-type processes can be described using a continuous-
time Markov chain (CTMC) [27,46] (cf. Chapter 3 for details). Each possible
assignment of nodes to the two node states Sand Iconstitutes an individual
state in the CTMC (here referred to as network state to avoid confusion3).
Hence, the CTMC state space grows exponentially with the number of nodes,
which renders the numeral solution of the CTMC infeasible for most realistic
contact networks.
This work investigates an aggregation scheme that lumps similar network
states together and thereby reduces the size of the state space. More precisely, we
first partition the nodes of the contact network. After which, we impose a count-
ing abstraction on each partition. We only lump two networks states together
when their corresponding counting abstractions coincide on each partition.
As we will see, the counting abstraction induces a natural representation of
the lumped CTMC as a Markov Population Model (MPM). In an MPM, the
CTMC states are vectors which, for different types of species, count the number
of entities of each species. The dynamics can elegantly be represented as species
interactions. More importantly, a very rich pool of approximation techniques
has been developed on the basis of MPMs, which can now be applied to the
lumped model. These include efficient simulation techniques [7,1], dynamic state
space truncation [23,32], moment-closure approximations [43,19], linear noise
approximation [45,18], and hybrid approaches [4,42].
The remainder of this work is organized as follows: Section 2 shortly revises
related work, Section 3 formalized SIS-type models and their CTMC seman-
tics. Our lumping scheme is developed in Section 4. In Section 5, we show that
the lumped CTMCs have a natural MPM representation. Numerical results are
demonstrated in in Section 6 and some conclusions in Section 7 complete the
paper and identify open research problems.
2 Related Work
The general idea behind lumping is to reduce the complexity of a system by ag-
gregating (i.e., lumping) individual components of the system together. Lumping
is a popular model reduction technique which has been used to reduce the num-
ber of equations in a system of ODEs and the number of states in a Markov
3In the following, we will use the term CTMC state and network state interchangeably.
Reducing Spreading Processes on Networks to Markov Population Models 3
chain, in particular in the context of biochemical reaction networks [30,6,49,8].
Generally speaking, one can distinguish between exact and approximate lumping
[30,6].
Most work on the lumpability of epidemic models has been done in the con-
text of exact lumping [27,41,47]. The general idea is typically to reduce the state
space by identifying symmetries in the CTMC which themselves can be found
using symmetries (i.e., automorphisms) in the contact network. Those methods,
however, are limited in scope because these symmetries are infeasible to find
in real-world networks and the state space reduction is not sufficient to make
realistic models small enough to be solvable.
This work proposes an approximate lumping scheme. Approximate lump-
ing has been shown to be useful when applied to mean-field approximation
approaches of epidemic models like the degree-based mean-field and pair ap-
proximation equations [29], as well as the approximate master equation [20,14].
However, mean-field equations are essentially inflexible as they do not take topo-
logical properties into account or make unrealistic independence assumptions
between neighboring nodes.
Moreover, [26] proposed using local symmetries in the contact network in-
stead of automorphisms to construct a lumped Markov chain. This scheme seems
promising, in particular on larger graphs where automorphisms often do not even
exist, however, the limitations for real-world networks due to a limited amount
of state space reduction and high computational costs seem to persist.
Conceptually similar to this work is also the unified mean-field framework
(UMFF) proposed by Devriendt et al. in [10]. Devriendt et al. also partition the
nodes of the contact network but directly derive a mean-field equation from it. In
contrast, this work focuses on the analysis of the lumped CTMC and its relation
to MPMs. Moreover, we investigate different types of counting abstractions,
not only node based ones. The relationship between population dynamics and
networks has also been investigated with regard to Markovian agents [3].
3 Spreading Processes
Let G= (N,E) be a an undirected graph without self-loops. At each time point
tR0each node occupies one of mdifferent node states, denoted by S=
{s1, s2, . . . , sm}(typically, S={S,I}). Consequently, the network state is given
by a labeling x:N → S. We use
X={x|x:N → S}
to denote all possible labelings. Xis also the state space of the underlying CTMC.
As each of the |N | nodes occupies one of mstates, we find that |X | =|S||N |.
A set of stochastic rules determines the particular way in which nodes change
their corresponding node states. Whether a rule can be applied to a node depends
on the state of the node and of its immediate neighborhood.
4 G. Großmann and L. Bortolussi
I
I
I
I
I
I
I I
I
I
I
I
I
I
I I
I
I
I
I
I
I I
I
I
I
I
I
I
I
I
I
I
3λ
µ
3λµ2λ
µ2λ
µ
λ
µ
2λ
µ
λ
µ
λ
µλ
λ
λ
µ
Fig. 1: The CTMC induced by the SIS model (S:blue,I:magenta, filled) on a
toy graph. Only a subset of the CTMC spate space (11 out of 26= 64 network
states) is shown.
The neighborhood of a node is modeled as a vector mZ|S|
0where m[s]
denotes the number of neighbors in state s∈ S (we assume an implicit enu-
meration of states). Thus, the degree (number of neighbors, denoted by k) of
a node is equal to the sum over its associated neighborhood vector, that is,
k=Ps∈S m[s]. The set of possible neighborhood vectors is denoted as
M=nmZ|S|
0X
s∈S
m[s] kmaxo,
where kmax denotes the maximal degree in a given network.
Each rule is a triplet s1
f
s2(s1, s2∈ S, s16=s2), which can be applied to
each node in state s1. When the rule “fires” it transforms the node from s1into
s2. The rate at which a rule “fires” is specified by the rate function f:M → R0
and depends on the node’s neighborhood vector. The time delay until the rule is
applied to the network state is drawn from an exponential distribution with rate
f(m). Hence, higher rates correspond to shorter waiting times. For the sake of
simplicity and without loss of generality, we assume that for each pair of states
s1,s2there exists at most one rule that transforms s1to s2.
In the well-known SIS model, infected nodes propagate their infection to sus-
ceptible neighbors. Thus, the rate at which a susceptible node becomes infected
is proportional to its number of infected neighbors:
Sf
Iwith f(m) = λ·m[I],
where λR0is a rule-specific rate constant (called infection rate) and m[I] de-
notes the number of infected neighbors. Furthermore, a recovery rule transforms
Reducing Spreading Processes on Networks to Markov Population Models 5
infected nodes back to being susceptible:
If
Swith f(m) = µ ,
where µR0is a rule-specific rate constant called recovery rate.
A variation of the SIS model is the SI model where no curing rule exists and
all nodes (that are reachable from an infected node) will eventually end up being
infected. Intuitively, each rule tries to “fire” at each position n∈ N where it can
be applied. The rule and node that have the shortest waiting time “win” and
the rule is applied there. This process is repeated until some stopping criterion
is fulfilled.
3.1 CTMC Semantics
Formally, the semantics of the SIS-type processes can be given in terms of
continuous-time Markov Chains (CTMCs). The state space is the set of possible
network states X. The CTMC has a transition from state xto x0(x, x0 X ,
x6=x0) if there exists a node n∈ N and a rule s1
f
s2such that the appli-
cation of the rule to ntransforms the network state from xto x0. The rate of
the transition is exactly the rate f(m) of the rule when applied to n. We use
q(x, x0)R0to denote the transition rate between two network states. Fig. 1
illustrates the CTMC corresponding to an SIS process on a small toy network.
Explicitly computing the evolution of the probability of x∈ X over time
with an ODE solver, using numerical integration, is only possible for very small
contact networks, since the state space grows exponentially with the number of
nodes. Alternative approaches include sampling the CTMC, which can be done
reasonably efficiently even for comparably large networks [21,9,44] but is subject
to statistical inaccuracies and is mostly used to estimate global properties.
4 Approximate Lumping
Our lumping scheme is composed of three basic ingredients:
Node Partitioning: The partitioning over the nodes Nthat is explicitly pro-
vided.
Counting Pattern: The type of features we are counting, i.e., nodes or edges.
Implicit State Space Partitioning: The CTMC state space is implicitly par-
titioned by counting the nodes or edges on each node partition.
We will start our presentation discussing the partitioning of the state space,
then showing how to obtain it from a given node partitioning and counting
pattern. To this end, we use Yto denote the new lumped state space and assume
that there is a surjective4lumping function
L:X → Y
4If Lis not surjective, we consider only the image of Lto be the lumped state space.
6 G. Großmann and L. Bortolussi
that defines which network states will be lumped together. Note that the lumped
state space is the image of the lumping function and that all network states x∈ X
which are mapped to the same y∈ Y will be aggregated.
Later in this section, we will discuss concrete realizations of L. In particular,
we will construct Lbased on a node partitioning and a counting abstraction of
our choice. Next, we define the the transition rates q(y, y0) (where y, y0∈ Y,
y6=y0) between the states of the lumped Markov chain:
q(y, y0) = 1
|L1(y)|X
x∈L1(y)X
x0∈L1(y0)
q(x, x0).(1)
This is simply the mean transition rate at which an original state from xgoes
to some x0∈ L1(y0). Technically, Eq. (1) corresponds to the following lumping
assumption: we assume that at each point in time all network states belonging
to a lumped state yare equally likely.
4.1 Partition-Based Lumping
Next, we construct the lumping function L. Because we want to make our lump-
ing aware of the contact network’s topology, we assume a given partitioning P
over the nodes Nof the contact network. That is, P ⊂ 2Nand SP∈P P=N
and all P∈ P are disjoint and non-empty. Based on the node partitioning, we
can now impose different kinds of counting abstractions on the network state.
This work considers two types: counting nodes and counting edges. The counting
abstractions are visualized in Fig. 3. A full example of how a lumped CTMC of
an SI model is constructed using the node-based counting abstraction is given
in Fig. 2.
Node-Based Counting Abstraction We count the number of nodes in each
state and partition. Thus, for a given network state x X , we use y(s, P ) to
denote the number of nodes in state s∈ S in partition P∈ P . The lumping
function Lprojects xto the corresponding counting abstraction. Formally:
Y={y|y:S × P Z0}
L(x) = y
with: y(s, P ) = |{n∈ N | X(n) = s, n P}| .
Edge-Based Counting Abstraction Again, we assume that a network state
xand a node partitioning Pare given. Now we count the edges, that is for
each pair of states s, s0∈ S and each pair of partitions P, P 0∈ P , we count
y(s, P, s0, P 0) which is the number of edges (n, n0)∈ E where x(n) = s,nP,
x(n0) = s0,n0P0. Note that this includes cases where P=P0and s=s0.
However, only counting the edges does not determine how many nodes there are
in each state (see Fig. 3 for an example).
Reducing Spreading Processes on Networks to Markov Population Models 7
Graph Partition Rule
(a)
Rate
1
2
Original Markov Model
Edge-Based
Partitioning
Node-Based
(b)
P1
2
P2
2
2
1
2
0
P1
1
P2
2
1
1
1
0
P1
0
P2
2
0
1
0
0
Rate
1
2
1.5
Lumped Markov Model
(c)
Fig. 2: Illustration of the lumping process. (a): Model. A basic SI-Process where
infected nodes (magenta, filled) infect susceptible neighbors (blue) with rate in-
fection λ= 1. The contact graph is divided into two partitions. (b): The underly-
ing CTMC with 24= 16 states. The graph partition induces the edge-based and
node-based lumping. The edge-based lumping refines the node-based lumping
and generates one partition more (vertical line in the central partition). (c): The
lumped CTMC using node-based counting abstraction with only 9 states. The
rates are the averaged rates from the full CTMC.
8 G. Großmann and L. Bortolussi
Without
Dummy
With
Dummy
(a) (b)
Fig. 3: (a) By adding the dummy-node, the edge-based abstraction is able to
differentiate the two graphs. Adding the dummy-node ensures that the nodes
in each state are counted in the edge-based abstraction. (b) Left: A partitioned
network (Zachary’s Karate Club graph from [12]) (S:blue,I:magenta, filled).
The network is partitioned into P1(#-nodes) and P2(2-nodes). Right: The
corresponding counting abstractions.
In order to still have this information encoded in each lumped state, we
slightly modify the network structure by adding a new dummy node n?and
connecting each node to it . The dummy node has a dummy state denoted by
?which never changes, and it can be assigned to a new dummy partition P?.
Formally,
N:= N ∪ {n?} S := S ∪ {?}L(n?) = ?P:= P ∪ {P?}
E:= E ∪ {(n, n?)|n N , n 6=n?}.
Note that the rate function fignores the dummy node. The lumped represen-
tation is then given as:
Y={y|y:S × P × S × P Z0}
L(x) = y
with: y(s, P, s0, P 0) = |(n, n0)∈ E | x(n) = s, n P, x(n0) = s0, n0P0|
Example Fig. 2 illustrates how a given partitioning and the node-based count-
ing approach induces a lumped CTMC. The partitions induced by the edge-based
counting abstracting are also shown. In this example, the edge-based lumping
aggregates only isomorphic network states.
4.2 Graph Partitioning
Broadly speaking, we have three options to partition the nodes based on local
features (e.g., its degree) or global features (e.g., communities in the graph)
or randomly. As a baseline, we use a random node partitioning. Therefore, we
Reducing Spreading Processes on Networks to Markov Population Models 9
fix the number of partitions and randomly assign each node to a partition while
enforcing that all partitions have, as far as possible, the same number of elements.
Moreover, we investigate a degree-based partitioning, where we define the
distance between to nodes n, n0as their relative degree difference (similar to
[29]):
dk(n, n0) = |knkn0|
max(kn, kn0).
We can then use any reasonable clustering algorithm and build partitions (i.e.,
clusters) with the distance function. In this work, we focus on bottom-up hier-
archical clustering as it provides the most principled way of precisely controlling
the number of partitions. Note that, for the sake of simplicity (in particular, to
avoid infinite distances), we only consider contact networks where each node is
reachable from every other node. We break ties arbitrarily.
To get a clustering considering global features we use a spectral embedding
of the contract network. Specifically, we use the spectral_layout function from
the NetworkX Python-package [22] with three dimensions and perform hierar-
chical clustering on the embedding. In future research, it would be interesting
to compute node distances based on more sophisticated graph embedding as the
ones proposed in [17]. Note that in the border cases |P | = 1 and |P| =|N | all
methods yield the same partitioning.
5 Markov Population Models
Markov Population Models (MPMs) are a special form of CTMCs where each
CTMC state is a population vector over a set of species. We use Zto denote the
finite set of species (again, with an implicit enumeration) and yZ|Z|
0to denote
the population vector. Hence, y[z] identifies the number of entities of species z.
The stochastic dynamics of MPMs is typically expressed as a set of reactions R,
each reaction, (α, b)∈ R, is comprised of a propensity function α:Z|Z|
0R0
and a change vector bZ|Z|. When reaction (α, b) is applied, the system moves
from state yto state y+b. The corresponding rate is given by the propensity
function. Therefore, we can rewrite the transition matrix of the CTMC as5:
q(y,y0) = (α(y) if (α, b)∈ R,y0=y+b
0 otherwise .
Next, we show that our counting abstractions have a natural interpretation
as MPMs.
5Without loss of generality, we assume that different reactions have different change
vectors. If this is not the case, we can merge reactions with the same update by
summing their corresponding rate functions.
10 G. Großmann and L. Bortolussi
5.1 Node-Based Abstraction
First, we define the set of species Z. Conceptually, species are node states which
are aware of their partition:
Z={(s, P )|s∈ S, P ∈ P} .
Again, we assume an implicit enumeration of Z. We use z.s and z.P to denote
the components of a give species z.
We can now represent the lumped CTMC state as a single population vector
yZ|Z|
0, where y[z] the number of nodes belonging to species z(i.e., which are
in state z.s and partition z .P ). The image of the lumping function L, i.e. the
lumped state space Y, is now a subset of non-negative integer vectors: Y Z|Z |
0.
Next, we express the dynamics by a set of reactions. For each rule r=s1
f
s2
and each partition P∈ P, we define a reaction (αr,P ,br,P ) with propensity
function as:
αr,P :Y → R0
αr,P (y) = 1
L1(y)X
x∈L1(y)X
nP
f(mx,n)1x(n)=s1,
where mx,n denotes the neighborhood vector of nin network state x. Note that
this is just the instantiation of Equation 1 to the MPM framework.
The change vector br,P Z|Z| is defined element-wise as:
br,P [z] =
1 if z.s =s2, P =z.P
1 if z.s =s1, P =z.P
0 otherwise
.
Note that s1, s2refer to the current rule and z.s to the entry of br,P .
5.2 Edge-Based Counting Abstraction
We start by defining a species neighborhood. The species neighborhood of a node
nis a vector vZ|Z|
0, where v[z] denotes the number of neighbors of species
z. We define Vnto be the set of possible species neighborhoods for a node n,
given a fixed contact network and partitioning. Note that we still assume that a
dummy node is used to encode the number of states in each partition.
Assuming an arbitrary ordering of pairs of states and partitions, we define
Z=(ssource, Psour ce, starg et, Ptarg et)|ssource , starget ∈ S, Psource , Ptarget ∈ P,
(ssource, Psour ce)(starg et, Ptarg et).
Let us define VPto be the set of partition neighborhoods all nodes in Pcan
have:
VP=[
nP
Vn.
Reducing Spreading Processes on Networks to Markov Population Models 11
For each rule r=s1
f
s2, and each partition P∈ P, and each v∈ VP, we
define a propensity function αr,P,vwith:
αr,P,v:Y R0
αr,P,v(y) = 1
L1(y)X
x∈L1(y)X
nP
f(mx,n)1x(n)=s1,V (n)=v.
Note that the propensity does not actually depend on v, it is simply indi-
vidually defined for each v. The reason for this is that the change vector de-
pends on the a node’s species neighborhood. To see this, consider a species
z= (ssource, Psour ce, starg et, Ptarg et), corresponding to edges connecting a node
in state ssource and partition Psource to a node in state star get and partition
Ptarget . There are two scenarios in which the corresponding counting variable
has to change: (a) when the node changing state due to an application of rule
ris the source node, and (b) when it is the target node. Consider case (a); we
need to know how many edges are connecting the updated node (which was in
state s1and partition P) to a node in state starget and partition Ptar get. This
information is stored in the vector v, specifically in position v[starget , Ptarget ].
The case in which the updated node is the target one is treated symmetrically.
This gives rise to the following definition:
br,P,v[z] =
v[z.starg et, z.Ptarget] if s2=z.ssour ce, P =z.Psour ce
v[z.starg et, z.Ptarget] if s1=z.ssource, P =z.Psource
v[z.ssource , z.Psour ce] if s2=z.star get, P =z.Ptarget
v[z.ssource , z.Psour ce] if s1=z.starget , P =z.Ptarget
0 otherwise
.
The first two lines of the definition handle cases in which the node changing
state is the source node, while the following two lines deal with the case in which
the node changing state appears as target.
Fig. 4 illustrates how a lumped network state is influenced by the application
of an infection rule.
5.3 Direct Construction of the MPM
Approximating the solution of an SIS-type process on a contact network by
lumping the CTMC first, already reduces the computational costs by many or-
ders of magnitude. However, this scheme is still only applicable when it is possible
to construct the full CTMC in the first place. Recall that the number of network
states is exponential in the number of nodes of the contact network, that is,
|X | =|S||N | .
However, in recent years, substantial effort was dedicated to the analysis of
very small networks [47,24,31,34,36]. One reason is that when the size of a net-
work increases, the (macro-scale) dynamics becomes more deterministic because
12 G. Großmann and L. Bortolussi
Fig. 4: Example of how the neighborhood vinfluences the update in the edge-
based counting abstraction on an example graph. Here, all nodes belong to the
same partition (thus, nodes states and species are conceptually the same) and
the node states are ordered [S,I, ?]. The population vector yis given in matrix
form for the ease of presentation.
stochastic effects tend to cancel out. For small contact networks, however, meth-
ods which capture the full stochastic dynamics of the system, and not only the
mean behavior, are of particular importance.
A substantial advantage of the reduction to MPM is the possibility of con-
structing the lumped CTMC without building the full CTMC first. In particular,
this can be done exactly for the node counting abstraction. On the other hand, for
the edge counting we need to introduce an extra approximation in the definition
of the rate function, roughly speaking introducing an approximate probability
distribution over neighboring vectors, as knowing how many nodes have a spe-
cific neighboring vector requires us full knowledge of the original CTMC. We
present full details of such direct construction in Appendix 8.
5.4 Complexity of the MPM
The size of the lumped MPM is critical for our method, as it determines which
solution techniques are computationally tractable and provides guidelines on
how many partitions to choose. There are two notions of size to consider: (a) the
number of population variables and (b) the number of states of the underlying
CTMC. While the latter governs the applicability of numerical solutions for
CTMCs, the former controls the complexity of a large number of approximate
techniques for MPMs, like mean field or moment closure.
Node-based abstraction. In this abstraction, the population vector is of length
|S| · |P |, i.e. there is a variable for each node state and each partition.
Note that the sum of the population variables for each partition Pis |P|,
the number of nodes in the partition. This allows us to count easily the number
of states of the CTMC of the population model: for each partition, we need
to subdivide |P|different nodes into |S| different classes, which can be done in
|P|+|S|−1
|S|−1ways, giving a number of CTMC states exponential in the number |S|
of node states and |P| of partitions, but polynomial in the number of nodes:
|Y| =Y
P∈P|P|+|S | − 1
|S| − 1.
Reducing Spreading Processes on Networks to Markov Population Models 13
Edge-based abstraction. The number of population variables, in this case, is one
for each edge connecting two different partitions, plus those counting the number
of nodes in each partition and each node state, due to the presence of the dummy
state. In total, we have q(q1)
2+qpopulation variables, with q=|S| · |P |.
In order to count the number of states of the CTMC in this abstraction, we
start by observing that the sum of all variables for a given pair of partitions
P0, P 00 is the number of edges connecting such partitions in the graph. We use
(P0, P 00) to denote the number of edges between P0, P 00 (resp. the number of
edges inside P0if P0=P00). Thus,
|Y| ≤ Y
P0,P 00∈P 2
P0P00
(P0, P 00) + S21
S21·Y
P∈P|P|+|S | − 1
|S| − 1.
This is an over-approximation, because not all combinations are consistent with
the graph topology. For example, a high number of infected nodes in a partition
might not be consistent with a small number of II-edges inside the partition.
Note that also this upper bound is exponential in |S| and |P| but still polynomial
in the number of nodes N, differently from the original network model, whose
state space is exponential in N.
The exponential dependency on the number of species (i.e., dimensions of
the population vector) makes the explicit construction of the lumped state space
viable only for very small networks with a small number of node states. However,
this is typically the case for spreading models like SIS or SIR. Yet, also the
number of partitions has to be kept small, particularly in realistic models. We
expect that the partitioning is especially useful for networks showing a small
number of large-scale homogeneous structures, as happens in many real-world
networks [12].
An alternative strategy for analysis is to derive mean-field [5] or moment clo-
sure equations [40] for MPMs, which can be done without explicitly constructing
the lumped (and the original) state space. These are sets of ordinary differential
equation (ODE) describing the evolution of (moments of) the population vari-
ables. We refer the reader to [10] for a similar approach regarding the node-based
abstraction.
6 Numerical Results
In this section, we compare the numerical solution of the original model—referred
to as baseline model—with different lumped MPMs. The goal of this compari-
son is to provide evidence supporting the claim that the lumping preserves the
dynamics of the original system, with an accuracy increasing with the resolu-
tion of the MPM. We will perform the comparison by solving numerically the
ground and the lumped system, thus comparing the the probability of each state
in each point in time. In practical applications of our method, exact transient or
steady state solutions may not be feasible, but in this case we can still rely to
14 G. Großmann and L. Bortolussi
Error
Fig. 5: Trade of between accuracy and state space size for the node-based (blue)
and edge-based (magenta, filled) counting abstraction. Results are shown for
node partitions based on the degree (l.), spectral embedding (c.), and random
partitioning (r.). The accuracy is measured as the mean (4) and maximal (5)
difference between the original and lumped solution over all timepoints.
approximation methods for MPM [5,40]. Determining which of those techniques
performs best in this context is a direction of future exploration.
A limit of the comparison based on numerical solution of the CTMC is that
the state space of the original model has |S||N | states, which limits the size of
the contact network strongly6.
Let P(X(t) = x) denote the probability that the baseline CTMC occupies
network state x∈ X at time t0. Furthermore, let P(Y(t) = y) for t0
and y∈ Y denote the same probability for a lumped MPM (corresponding to
a specific partitioning and counting abstraction). To measure their difference,
we first approximate the probability distribution of the original model using the
lumped solution, invoking the lumping assumption which states that all network
states which are lumped together have the same probability mass. We use PL
to denote the lifted probability distribution over the original state space given a
lumped solution. Formally,
PLY(t) = x=PY(t) = y
|L1(y)|where yis s.t. L(x) = y.
We measure the difference between the baseline and a lumped solution at
a specific time point by summing up the difference in probability mass of each
state, then take the maximum error in time:
d(P, PL) = max
tX
x∈X PLY(t) = x)P(X(t) = x.
In our experiments, we used a small toy network with 13 nodes and 2 states
(213 = 8192 network states). We generated a synthetic contact network following
the Erd˝os–R´enyi graph model with a connection probability of 0.5. We use a SIS
model with an infection rate of λ= 1.0 and a recovery rate of µ= 1.3. Initially,
we assign an equal amount of probability mass to all network states.
6Code is available at github.com/gerritgr/Reducing-Spreading-Processes
Reducing Spreading Processes on Networks to Markov Population Models 15
Fig. 5 shows the relationship between the error of the lumped MPM, the
type of counting abstraction and the method used for node partitioning. We
also report the mean difference together with the maximal difference over time.
From our results, we conclude that the edge-based counting abstraction yields
a significantly better trade-off between state space size and accuracy. However,
it generates larger MPM models than the node-based abstraction when adding
a new partition. We also find that spectral and degree-based partitioning yield
similar results for the same number of CTMC states and that random partition-
ing performed noticeably worse, for both edge-based and node-based counting
abstractions.
7 Conclusions and Future Work
This work developed first steps in a unification of the analysis of stochastic
spreading processes on networks and Markov population models. Since the so
obtained MPM can become very large in terms of species, it is important to be
able to control the trade-off between state space size and accuracy.
However, there are still many open research problems ahead. Most evidently,
it remains to be determined which of the many techniques developed for the
analysis of MPMs (e.g. linear noise, moment closure) work best on our proposed
epidemic-type MPMs and how they scale with increasing size of the contact
network. We expect also that these reduction methods can provide a good start-
ing point for deriving advanced mean-field equations, similar to ones in [10].
Moreover, literature is very rich in proposed moment-closure-based approxima-
tion techniques for MPMs, which can now be utilized [43,19]. We also plan to
investigate the relationship between lumped mean-field equations [20,29] and
coarse-grained counting abstractions further.
Future work can additionally explore counting abstraction of different types,
for instance, a neighborhood-based abstraction like the one proposed by James
P. Gleeson in [13,14].
Finally, we expect that there are many more possibilities of partitioning
the contact network that remain to be investigated and which might have a
significant impact on the final accuracy of the abstraction.
Acknowledgements This research has been partially funded by the German
Research Council (DFG) as part of the Collaborative Research Center “Methods
and Tools for Understanding and Controlling Privacy”. We thank Verena Wolf
for helpful discussions and provision of expertise.
16 G. Großmann and L. Bortolussi
References
1. G. E. Allen and C. Dytham. An efficient method for stochastic simulation of
biological populations in continuous time. Biosystems, 98(1):37–42, 2009.
2. A.-L. Barab´asi. Network science. Cambridge university press, 2016.
3. A. Bobbio, D. Cerotti, M. Gribaudo, M. Iacono, and D. Manini. Markovian agent
models: a dynamic population of interdependent markovian agents. In Seminal
Contributions to Modelling and Simulation, pages 185–203. Springer, 2016.
4. L. Bortolussi. Hybrid behaviour of Markov population models. Information and
Computation, 247:37–86, 2016.
5. L. Bortolussi, J. Hillston, D. Latella, and M. Massink. Continuous approximation
of collective system behaviour: A tutorial. Performance Evaluation, 70(5):317–349,
2013.
6. P. Buchholz. Exact and ordinary lumpability in finite markov chains. Journal of
applied probability, 31(1):59–75, 1994.
7. Y. Cao, D. T. Gillespie, and L. R. Petzold. Efficient step size selection for the
tau-leaping simulation method. The Journal of chemical physics, 124(4).
8. L. Cardelli, M. Tribastone, M. Tschaikowski, and A. Vandin. Erode: a tool for
the evaluation and reduction of ordinary differential equations. In International
Conference on Tools and Algorithms for the Construction and Analysis of Systems,
pages 310–328. Springer, 2017.
9. W. Cota and S. C. Ferreira. Optimized gillespie algorithms for the simulation of
markovian epidemic processes on large and heterogeneous networks. Computer
Physics Communications, 219:303–312, 2017.
10. K. Devriendt and P. Van Mieghem. Unified mean-field framework for susceptible-
infected-susceptible epidemics on networks, based on graph partitioning and the
isoperimetric inequality. Physical Review E, 96(5):052314, 2017.
11. C. Gan, X. Yang, W. Liu, Q. Zhu, and X. Zhang. Propagation of computer virus
under human intervention: a dynamical model. Discrete Dynamics in Nature and
Society, 2012, 2012.
12. M. Girvan and M. E. Newman. Community structure in social and biological
networks. Proceedings of the national academy of sciences, 99(12):7821–7826, 2002.
13. J. P. Gleeson. High-accuracy approximation of binary-state dynamics on networks.
Physical Review Letters, 107(6):068701, 2011.
14. J. P. Gleeson. Binary-state dynamics on complex networks: Pair approximation
and beyond. Physical Review X, 3(2):021004, 2013.
15. A. Goltsev, F. De Abreu, S. Dorogovtsev, and J. Mendes. Stochastic cellular
automata model of neural networks. Physical Review E, 81(6):061921, 2010.
16. J. Goutsias and G. Jenkinson. Markovian dynamics on complex reaction networks.
Physics Reports, 529(2):199–264, 2013.
17. P. Goyal and E. Ferrara. Graph embedding techniques, applications, and perfor-
mance: A survey. Knowledge-Based Systems, 151:78–94, 2018.
18. R. Grima. An effective rate equation approach to reaction kinetics in small volumes:
Theory and application to biochemical reactions in nonequilibrium steady-state
conditions. The Journal of Chemical Physics, 133(3):035101, 2010.
19. R. Grima. A study of the accuracy of moment-closure approximations for stochastic
chemical kinetics. The Journal of chemical physics, 136(15):04B616, 2012.
20. G. Großmann, C. Kyriakopoulos, L. Bortolussi, and V. Wolf. Lumping the ap-
proximate master equation for multistate processes on complex networks. In Qest,
pages 157–172. Springer, 2018.
Reducing Spreading Processes on Networks to Markov Population Models 17
21. G. Großmann and V. Wolf. Rejection-based simulation of stochastic spreading
processes on complex networks. arXiv preprint arXiv:1812.10845, 2018.
22. A. Hagberg, P. Swart, and D. S Chult. Exploring network structure, dynamics,
and function using networkx. Technical report, Los Alamos National Lab.(LANL),
Los Alamos, NM (United States), 2008.
23. T. A. Henzinger, M. Mateescu, and V. Wolf. Sliding window abstraction for infinite
markov chains. In International Conference on Computer Aided Verification, pages
337–352. Springer, 2009.
24. P. Holme. Shadows of the susceptible-infectious-susceptible immortality transition
in small networks. Physical Review E, 92(1):012804, 2015.
25. M. J. Keeling and P. Rohani. Modeling infectious diseases in humans and animals.
Princeton University Press, 2011.
26. W. R. KhudaBukhsh, A. Auddy, Y. Disser, and H. Koeppl. Approximate lumpabil-
ity for markovian agent-based models using local symmetries. arXiv:1804.00910.
27. I. Z. Kiss, J. C. Miller, and P. L. Simon. Mathematics of epidemics on networks:
from exact to approximate models. Forthcoming in Springer TAM series, 2016.
28. M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, and
H. A. Makse. Identification of influential spreaders in complex networks. Nature
physics, 6(11):888, 2010.
29. C. Kyriakopoulos, G. Grossmann, V. Wolf, and L. Bortolussi. Lumping of degree-
based mean-field and pair-approximation equations for multistate contact pro-
cesses. Physical Review E, 97(1):012301, 2018.
30. G. Li and H. Rabitz. A general analysis of approximate lumping in chemical
kinetics. Chemical engineering science, 45(4):977–1002, 1990.
31. M. L´opez-Garc´ıa. Stochastic descriptors in an sir epidemic model for heterogeneous
individuals in small networks. Mathematical biosciences, 271:42–61, 2016.
32. M. Mateescu, V. Wolf, F. Didier, and T. Henzinger. Fast adaptive uniformisation
of the chemical master equation. IET systems biology, 4(6):441–452, 2010.
33. R. M. May and N. Arinaminpathy. Systemic risk: the dynamics of model banking
systems. Journal of the Royal Society Interface, 7(46):823–838, 2009.
34. M. Moslonka-Lefebvre, M. Pautasso, and M. J. Jeger. Disease spread in small-
size directed networks: epidemic threshold, correlation between links to and from
nodes, and clustering. Journal of theoretical biology, 260(3):402–411, 2009.
35. T. W. Ng, G. Turinici, and A. Danchin. A double epidemic model for the sars
propagation. BMC Infectious Diseases, 3(1):19, 2003.
36. M. Pautasso, M. Moslonka-Lefebvre, and M. J. Jeger. The number of links to
and from the starting node as a predictor of epidemic size in small-size directed
networks. Ecological Complexity, 7(4):424–432, 2010.
37. M. Porter and J. Gleeson. Dynamical systems on networks: A tutorial, volume 4.
Springer, 2016.
38. H. S. Rodrigues. Application of sir epidemiological model: new trends.
arXiv:1611.02565, 2016.
39. H. S. Rodrigues, M. T. T. Monteiro, and D. F. Torres. Dynamics of dengue epi-
demics when using optimal control. Mathematical and Computer Modelling, 52(9-
10):1667–1673, 2010.
40. D. Schnoerr, G. Sanguinetti, and R. Grima. Approximation and inference meth-
ods for stochastic biochemical kinetics - a tutorial review. Journal of Physics A,
51:169501, 2018.
41. P. L. Simon, M. Taylor, and I. Z. Kiss. Exact epidemic models on graphs using
graph-automorphism driven lumping. Journal of mathematical biology, 62(4):479–
508, 2011.
18 G. Großmann and L. Bortolussi
42. A. Singh and J. P. Hespanha. Stochastic hybrid systems for studying biochemical
processes. Royal Society A, 368(1930):4995–5011, 2010.
43. M. Soltani, C. A. Vargas-Garcia, and A. Singh. Conditional moment closure
schemes for studying stochastic dynamics of genetic circuits. IEEE transactions
on biomedical circuits and systems, 9(4):518–526, 2015.
44. G. St-Onge, J.-G. Young, L. H´ebert-Dufresne, and L. J. Dub´e. Efficient sampling
of spreading processes on complex networks using a composition and rejection
algorithm. Computer Physics Communications, 2019.
45. N. G. Van Kampen. Stochastic processes in physics and chemistry, volume 1.
Elsevier, 1992.
46. P. Van Mieghem, J. Omic, and R. Kooij. Virus spread in networks. IEEE/ACM
Transactions on Networking, 17(1):1–14, 2009.
47. J. A. Ward and J. Evans. A general model of dynamics on networks with graph
automorphism lumping. In International Workshop on Complex Networks and
their Applications, pages 445–456. Springer, 2018.
48. D. J. Watts and P. S. Dodds. Influentials, networks, and public opinion formation.
Journal of consumer research, 34(4):441–458, 2007.
49. J. Wei and J. C. Kuo. Lumping analysis in monomolecular reaction systems.
analysis of the exactly lumpable system. Industrial & Engineering chemistry fun-
damentals, 8(1):114–123, 1969.
50. X. Wei, N. C. Valler, B. A. Prakash, I. Neamtiu, M. Faloutsos, and C. Faloutsos.
Competing memes propagation on networks: A network science perspective. IEEE
Journal on Selected Areas in Communications, 31(6):1049–1060, 2013.
51. L. Zhao, H. Cui, X. Qiu, X. Wang, and J. Wang. Sir rumor spreading model in
the new media age. Physica A, 392(4):995–1003, 2013.
52. L. Zhao, J. Wang, Y. Chen, Q. Wang, J. Cheng, and H. Cui. Sihr rumor spreading
model in social networks. Physica A, 391(7):2444–2453, 2012.
Reducing Spreading Processes on Networks to Markov Population Models 19
Appendix
8 Direct Construction of MPMs
Here, we prosper a way of directly deriving the lumped MPMs from the contact
network without building the original CTMC first. We start with the node-based
counting abstraction.
8.1 Node-Based Abstraction with General Rate Functions
Our general strategy is to iterate over the nodes in the contact network and
to compute the mean rate attributed to that node over all x X . Therefore,
we consider the possible states of each node together with all possible species
neighborhoods. The probability of a node nbeing in state sand having species
neighborhood vis denoted as Pr X(n) = s, V (n) = v.
For a specific rule r=s1
f
s2and partition P, we can then describe αr,P
as:
αr,P (y) = X
nPX
v∈Vn
f(mv) Pr X(n) = s1, V (n) = v,
where mvis the neighborhood vector induced by v, which we receive by grouping
all partitions together. Note that it is not computationally necessary to actually
iterate over all nodes in the partition. Instead we can group all nodes with
the same partition neighborhood together, that is, all nodes n0, n00 Pwith
Vn0=Vn00 as the probability only depends on v.
Computing the probability is the interesting part, we start by establishing
that
Pr X(n) = s1, V (n) = v
= Pr X(n) = s1·Pr V(n) = v|X(n) = s1.
The first term in the product can be described by simply dividing the number
of s1-nodes in Pwith the total number of nodes in P.
Pr X(n) = s1=y[s1, P ]
|P|where: nP .
The latter probability can be computed for each partition independently. This
is because we know the number of nodes in each state in each partition. We also
know that in partition Pthe current node nis already in state s1, which we
have to take into account. First, we define yPZ|S|
0to the the projection from
yto P. Thus, each entry is defined by:
yP[s] = y[s, P ].
20 G. Großmann and L. Bortolussi
Likewise, we define vPZ|S|
0, such that vP[s] = v[s, P ]. We also define V(n)P
Z|S|
0to be the number of neighbors of node nin partition Pfor each state. Finally,
we define ys
1
Pto be the same vector as yPexcept that the entry corresponding
to state s1is subtracted by one (and truncated at zero). We can now rewrite the
probability as:
Pr V(n) = v|X(n) = s1
=Y
P0∈P
Pr V(n)P0=vP0X(n) = s1
= Pr V(n)P=vPX(n) = s1Y
P0∈P\{P}
Pr V(n)P0=vP0X(n) = s1
=phvP;ys
1
P·Y
P0∈P\{P}
phvP0;yP0.
We use ph(k;K) to denote the probability mass function of the the multivari-
ate hypergeometric distribution, where k,Kdenote to vectors over non-negative
integers of the same length. That is, if Kdenotes the number of nodes in each
state in a partition (resp., the number of marbles in an urn with different col-
ors), then, ph(k;K) denotes the probability of drawing exactly k[s] nodes (resp.
marbles) of each state (resp. color).
8.2 Reaction Networks and Linear Models
A special case of MPMs are biochemical reaction networks, where the species
represent different types of molecules. The change vectors and corresponding
propensity functions can elegantly be expressed as monomolecular (AB) and
bimolecular (A+BC+D) reaction rules (A,B,C,D∈ Z).
Reduction to Biochemical Reaction Networks Most classical models in
computational epidemiology are solely comprised of node-based rules (like the
curing rule) and edge-based rules (like the infecting propagation rule). We call
these linear models. Node-based rules, also referred to as spontaneous or inde-
pendent rules, have a constant rate function, i.e., f(m) = µ. Edge-based rules,
also referred to as contact rules, are linear in exactly one dimension, i.e., they
have the form f(m) = λm[s].
Linear models are special because not the whole neighborhood is important
for the rate of a rule but only the expected number of neighbors in a certain state.
This makes the rule very similar to monomolecular and bimolecular reaction
rates in MPMs. In fact, we can model the whole dynamics as a set of reaction
over the species Z.
Chemical reaction networks are a special case of Markov population models.
In a chemical reaction network the state space is given by population vectors
Reducing Spreading Processes on Networks to Markov Population Models 21
over species and molecular reactions have the form Aa
Cor A+Bb
C+D,
where A,B,C,D denote species and a, b R0are reaction rate constants.
For each node-based rule s1
µ
s2, we construct the reactions
(s1, P )µ
(s2, P )P∈ P
For each edge-based rule s1
f
s1,f(m) = λm[s0], we construct the reactions
(s1, P )+(s0, P 0)λwP,P 0
(s2, P )+(s0, P 0)P , P 0∈ P
where wP,P 0denotes the mean number of edges of a random node in Pwith
nodes in P0, that is7:
wP,P 0=
(P,P )
|P|
1
|P|−1if P=P0
(P,P 0)
|P|
1
|P0|otherwise
.
with
(P, P 0) = |{(n1, n2)∈ E | n1P, n2P0}| .(2)
8.3 Edge-Based Counting Abstraction
For each rule r=s1
f
s2, and each partition P∈ P, and each v∈ VP, we
define a propensity function αr,P,vwith:
αr,P,v(y) = X
nP
f(mv) Pr X(n) = s1, V (n) = v.
Again, we use
Pr X(n) = s1, V (n) = v= Pr X(n) = s1·Pr V(n) = v|X(n) = s1
to compute this probability, where we can solve Pr X(n) = s1exactly as before.
Since we have now information about the edges, we can derive the probability
of neighborhoods more precisely. In fact, we can directly construct the set of
candidate neighbors from y. Therefore, we define a vector ys,P,P0Z|S |
0, where
entry ys,P,P 0[s0] specifies the number of neighbors of a random node in state s
and partition P, which lie in partition P0and occupy state s0. Formally:
ys,P,P 0[s0] = (y[s, P, s0, P 0] if (s, P )(s0, P 0)
y[s0, P 0, s, P ] otherwise .
This gives rise to the final approximation of the probability of neighborhood
species:
Pr V(n) = vX(n) = s1Y
P0∈P
phvP;ys1,P,P 0(where nP.)
7Note that, despite the duple notation, we only count edges once
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Efficient stochastic simulation algorithms are of paramount importance to the study of spreading phenomena on complex networks. Using insights and analytical results from network science, we discuss how the structure of contacts affects the efficiency of current algorithms. We show that algorithms believed to require O(logN) or even O(1) operations per update – where N is the number of nodes – display instead a polynomial scaling for networks that are either dense or sparse and heterogeneous. This significantly affects the required computation time for simulations on large networks. To circumvent the issue, we propose a node-based method combined with a composition and rejection algorithm, a sampling scheme that has an average-case complexity of O[log(logN)] per update for general networks. This systematic approach is first set-up for Markovian dynamics, but can also be adapted to a number of non-Markovian processes and can enhance considerably the study of a wide range of dynamics on networks.
Article
Full-text available
Complex networks play an important role in human society and in nature. Stochastic multistate processes provide a powerful framework to model a variety of emerging phenomena such as the dynamics of an epidemic or the spreading of information on complex networks. In recent years, mean-field type approximations gained widespread attention as a tool to analyze and understand complex network dynamics. They reduce the model's complexity by assuming that all nodes with a similar local structure behave identically. Among these methods the approximate master equation (AME) provides the most accurate description of complex networks' dynamics by considering the whole neighborhood of a node. The size of a typical network though renders the numerical solution of multistate AME infeasible. Here, we propose an efficient approach for the numerical solution of the AME that exploits similarities between the differential equations of structurally similar groups of nodes. We cluster a large number of similar equations together and solve only a single lumped equation per cluster. Our method allows the application of the AME to real-world networks, while preserving its accuracy in computing estimates of global network properties, such as the fraction of nodes in a state at a given time.
Article
Full-text available
We study a Markovian agent-based model (MABM) in this paper. Each agent is endowed with a local state that changes over time as the agent interacts with its neighbours. The neighbourhood structure is given by a graph. In a recent paper [Simon et al. 2011], the authors used the automorphisms of the underlying graph to generate a lumpable partition of the joint state space ensuring Markovianness of the lumped process for binary dynamics. However, many large random graphs tend to become asymmetric rendering the automorphism-based lumping approach ineffective as a tool of model reduction. In order to mitigate this problem, we propose a lumping method based on a notion of local symmetry, which compares only local neighbourhoods of vertices. Since local symmetry only ensures approximate lumpability, we quantify the approximation error by means of Kullback-Leibler divergence rate between the original Markov chain and a lifted Markov chain. We prove the approximation error decreases monotonically. The connections to fibrations of graphs are also discussed.
Article
Full-text available
Contact processes form a large and highly interesting class of dynamic processes on networks, including epidemic and information spreading. While devising stochastic models of such processes is relatively easy, analyzing them is very challenging from a computational point of view, particularly for large networks appearing in real applications. One strategy to reduce the complexity of their analysis is to rely on approximations, often in terms of a set of differential equations capturing the evolution of a random node, distinguishing nodes with different topological contexts (i.e., different degrees of different neighborhoods), like degree-based mean field (DBMF), approximate master equation (AME), or pair approximation (PA). The number of differential equations so obtained is typically proportional to the maximum degree kmax of the network, which is much smaller than the size of the master equation of the underlying stochastic model, yet numerically solving these equations can still be problematic for large kmax. In this paper, we extend AME and PA, which has been proposed only for the binary state case, to a multi-state setting and provide an aggregation procedure that clusters together nodes having similar degrees, treating those in the same cluster as indistinguishable, thus reducing the number of equations while preserving an accurate description of global observables of interest. We also provide an automatic way to build such equations and to identify a small number of degree clusters that give accurate results. The method is tested on several case studies, where it shows a high level of compression and a reduction of computational time of several orders of magnitude for large networks, with minimal loss in accuracy.
Article
Full-text available
Graphs, such as social networks, word co-occurrence networks, and communication networks, occur naturally in various real-world applications. Analyzing them yields insight into the structure of society, language, and different patterns of communication. Many approaches have been proposed to perform the analysis. Recently, methods which use the representation of graph nodes in vector space have gained traction from the research community. In this survey, we provide a comprehensive and structured analysis of various graph embedding techniques proposed in the literature. We first introduce the embedding task and its challenges such as scalability, choice of dimensionality, and features to be preserved, and their possible solutions. We then present three categories of approaches based on factorization methods, random walks, and deep learning, with examples of representative algorithms in each category and analysis of their performance on various tasks. We evaluate these state-of-the-art methods on a few common datasets and compare their performance against one another and versus non-embedding based models. Our analysis concludes by suggesting some potential applications and future directions. We finally present the open-source Python library, named GEM (Graph Embedding Methods), we developed that provides all presented algorithms within a unified interface, to foster and facilitate research on the topic.
Chapter
Stochastic processes can model many emerging phenomena on networks, like the spread of computer viruses, rumors, or infectious diseases. Understanding the dynamics of such stochastic spreading processes is therefore of fundamental interest. In this work we consider the wide-spread compartment model where each node is in one of several states (or compartments). Nodes change their state randomly after an exponentially distributed waiting time and according to a given set of rules. For networks of realistic size, even the generation of only a single stochastic trajectory of a spreading process is computationally very expensive.
Conference Paper
In this paper we introduce a general Markov chain model of dynamical processes on networks. In this model, nodes in the network can adopt a finite number of states and transitions can occur that involve multiple nodes changing state at once. The rules that govern transitions only depend on measures related to the state and structure of the network and not on the particular nodes involved. We prove that symmetries of the network can be used to lump equivalent states in state-space. We illustrate how several examples of well-known dynamical processes on networks correspond to particular cases of our general model. This work connects a wide range of models specified in terms of node-based dynamical rules to their exact continuous-time Markov chain formulation.
Article
We propose an approximation framework that unifies and generalizes a number of existing mean-field approximation methods for the susceptible-infected-susceptible (SIS) epidemic model on complex networks. We derive the framework, which we call the unified mean-field framework (UMFF), as a set of approximations of the exact Markovian SIS equations. Our main novelty is that we describe the mean-field approximations from the perspective of the isoperimetric problem, which results in bounds on the UMFF approximation error. These new bounds provide insight in the accuracy of existing mean-field methods, such as the N-intertwined mean-field approximation and heterogeneous mean-field method, which are contained by UMFF. Additionally, the isoperimetric inequality relates the UMFF approximation accuracy to the regularity notions of Szemerédi's regularity lemma.
Chapter
In chemical reaction networks (CRNs) with stochastic semantics based on continuous-time Markov chains (CTMCs), the typically large populations of species cause combinatorially large state spaces. This makes the analysis very difficult in practice and represents the major bottleneck for the applicability of minimization techniques based, for instance, on lumpability. In this paper we present syntactic Markovian bisimulation (SMB), a notion of bisimulation developed in the Larsen-Skou style of probabilistic bisimulation, defined over the structure of a CRN rather than over its underlying CTMC. SMB identifies a lumpable partition of the CTMC state space a priori, in the sense that it is an equivalence relation over species implying that two CTMC states are lumpable when they are invariant with respect to the total population of species within the same equivalence class. We develop an efficient partition-refinement algorithm which computes the largest SMB of a CRN in polynomial time in the number of species and reactions. We also provide an algorithm for obtaining a quotient network from an SMB that induces the lumped CTMC directly, thus avoiding the generation of the state space of the original CRN altogether. In practice, we show that SMB allows significant reductions in a number of models from the literature. Finally, we study SMB with respect to the deterministic semantics of CRNs based on ordinary differential equations (ODEs), where each equation gives the time-course evolution of the concentration of a species. SMB implies forward CRN bisimulation, a recently developed behavioral notion of equivalence for the ODE semantics, in an analogous sense: it yields a smaller ODE system that keeps track of the sums of the solutions for equivalent species.