Page 1

C H A P T E R T H I R T E E N

Deterministic and Stochastic Models

of Genetic Regulatory Networks

Ilya Shmulevich and John D. Aitchison

Contents

1. Introduction

2. Boolean Networks

2.1. Attractors as cell types and cellular functional states

3. Differential Equation Models

3.1. Accurate description of cellular growth and division and

prediction of mutant phenotypes

4. Probabilistic Boolean Networks

4.1. Steady-state analysis and stability under stochastic

fluctuations

5. Stochastic Differential Equation Models

5.1. The influence of noise on system behavior

References

336

337

341

343

346

347

350

351

352

353

Abstract

Traditionally molecular biology research has tended to reduce biological path-

waystocompositeunitsstudiedasisolatedpartsofthecellularsystem.Withthe

advent of high throughput methodologies that can capture thousands of data

points, and powerful computational approaches, the reality of studying cellular

processes at a systems level is upon us. As these approaches yield massive

datasets, systems level analyses have drawn upon other fields such as engi-

neering and mathematics, adapting computational and statistical approaches to

decipher relationships between molecules. Guided by high quality datasets and

analyses, one can begin the process of predictive modeling. The findings from

such approaches are often surprising and beyond normal intuition. We discuss

four classes of dynamical systems used to model genetic regulatory networks.

The discussion is divided into continuous and discrete models, as well as

deterministic and stochastic model classes. For each combination of these

categories, a model is presented and discussed in the context of the yeast cell

cycle,illustratinghowdifferenttypesofquestionscanbeaddressedbydifferent

model classes.

Methods in Enzymology, Volume 467

ISSN 0076-6879, DOI: 10.1016/S0076-6879(09)67013-0

#2009 Elsevier Inc.

All rights reserved.

Institute for Systems Biology, Seattle, Washington, USA

335

Author’s personal copy

Page 2

1. Introduction

Modern molecular biology technologies and the proliferation of

Web-based resources containing information on various aspects of biomo-

lecular networks in living cells have made it possible to mathematically

model dynamical systems of molecular interactions that control various

cellular functions and processes. Such models can then be used to predict

the behavior of the system in response to different perturbations or stimuli

and ultimately for developing rational control strategies intended to drive

the cellular system toward a desired state or away from an undesired state

that may be associated with disease. To this end, various dynamical models

have been studied, most commonly in the context of genetic regulatory

networks, for a variety of biological systems. Although there are a number

of natural ways to categorize and classify dynamical models of genetic

networks, this chapter presents a model class with an accompanying exam-

ple in each combination of deterministic versus stochastic and continuous

versus discrete model categories. The example used in each of the model

classes is that of the yeast cell cycle, as this system has been extensively

studied from a variety of different perspectives and with different model

classes. It is not the intention of this chapter to go into an in-depth

investigation of the cell cycle, but rather to use it as a running example to

illustrate the kinds of questions that can be addressed by the different model

classes considered.

A deterministic model of a genetic regulatory network may involve a

number of different mechanisms that capture the collective behavior of the

elements constituting the network. The models can differ in numerous

ways, such as in the nature of the physical elements that are represented in

the model (i.e., genes, proteins, and other factors); the resolution or scale at

which the behavior of the network elements are captured (e.g., are genes

discretized, such as being either on or off, or do they take on continuous

values?); and how the network elements interact (e.g., interactions can

either be present or absent or they may have a quantitative nature). The

common aspect of deterministic models is the inherent lack of randomness

or stochasticity in the model. This chapter presents Boolean networks and

systems of differential equations as examples of discrete and continuous

deterministic models of genetic networks, respectively.

Stochastic models of genetic regulatory networks differ from their

deterministic counterparts by incorporating randomness or uncertainty.

Most deterministic models can be generalized such that one associates

probabilities with particular components or aspects of the model. Thus,

stochastic models can also be categorized into discrete and continuous

categories. The stochastic or probabilistic components in such models can

336

Ilya Shmulevich and John D. Aitchison

Author’s personal copy

Page 3

either be associated with model structure, so that the interactions or rules of

interaction are described by probability distributions, or by the incorpora-

tion of noise terms that capture intrinsic biological stochasticity or measure-

ment uncertainty. Probabilistic Boolean networks (PBNs) and stochastic

differential equations are presented as examples of discrete and continuous

stochastic models of genetic networks, respectively.

2. Boolean Networks

Boolean networks are a class of discrete dynamical systems that can be

characterized by the interactions over a set of Boolean variables. Random

Boolean networks (RBN), which are ensembles of random network struc-

tures, were first introduced by Kauffman (1969a,b) as a simple model class

for studying dynamical properties of gene regulatory networks at a time

when the structure of such networks was largely unknown. The idea behind

such an approach is to define an ensemble of Boolean networks such that it

fulfills certain known features of biological networks and then study random

instances of these networks to learn more about general properties of such

networks (Kauffman, 1974, 1993, 2004). Boolean network modeling of

genetic networks was further developed by Thomas (1973) and others.

The ensemble approach has been extraordinarily successful in shedding

light on fundamental principles of complex living systems at all scales of

organization, including adaptability and evolvability, robustness, coordina-

tion of complex behaviors, storage of information, and the relationships

between the structure of such complex systems and their dynamical behav-

ior. The reader is referred to several excellent review articles that cover the

ensemble properties of Boolean networks (Aldana et al., 2002; Drossel,

2007). However, our focus here is on Boolean network models that can

be used to capture the behavior of a specific gene regulatory network.

Consider a directed graph where the vertices represent genes and the

directed edges represent the actions of genes, or rather their products, on

other genes. For example, directed edges from genes A and B into gene C

indicatethatAandBjointlyactonC.Thespecificmechanismofactionisnot

represented in the graph structure itself, so an additional representation is

necessary. One of the simplest representation of frameworks assumes that

genes are binary-valued entities, meaning that they can be in one of two

possible states of activity (e.g., ON or OFF) at any given point in time, and

thattheyactoneachotherbymeansofrulesrepresentedbyBooleanfunctions.

For example, geneC may bedeterminedby the output ofa Booleanfunction

whoseinputsareAandB.Theunderlyingdirectedgraphmerelyrepresentsthe

input–output relationships. We now present this idea more formally.

Deterministic and Stochastic Models of Genetic Regulatory Networks

337

Author’s personal copy

Page 4

A Boolean network is defined by a set of nodes (genes) {x1, ..., xn} and a

list of Boolean functions {f1, f2, ..., fn}. Each gene xi2 {0, 1} (i ¼ 1, ..., n)

is a binary variable whose value at time t þ 1 is completely determined by

the values of genes xj1, xj2, ..., xjkiat time t by means of a Boolean function

fi: f0;1gki! f0;1g. That is, there are kiregulatory genes assigned to gene

xithat determine the ‘‘wiring’’ of that gene. Thus, one can write

xiðt þ 1Þ ¼ fiðxj1ðtÞ;xj2ðtÞ;...;xjkiðtÞÞ:

In an RBN, the functions fiare selected randomly as are the genes that

are used as their inputs. This is the basis of the ensemble approach men-

tioned above.

Each xirepresents the state (expression) of gene i, where xi¼ 1 repre-

sents the fact that gene i is expressed and xi¼ 0 means it is not expressed.

Such a seemingly crude simplification of gene expression has ample justifi-

cation in the experimental literature (Bornholdt, 2008). Indeed, consider

the fact that many organisms exhibit an amazing determinism of gene

activity under specific experimental contexts or conditions, such as Escher-

ichia coli under temperature change (Richmond et al., 1999). The determin-

ism is apparent despite the prevalent molecular stochasticity and

experimental noise inherent to measurement technologies such as micro-

arrays. Furthermore, accurate mathematical models of gene regulation that

capture kinetic level details of molecular reactions frequently operate with

expressed molecular concentrations spanning several orders of magnitude,

either in a saturation regime or in a regime of insignificantly small concen-

trations, with rapid switch-like transitions between such regimes (Davidich

and Bornholdt, 2008a). Further, even higher organisms, which are neces-

sarily more complex in terms of genetic regulation and heterogeneity,

exhibit remarkable consistency when gene expression is quantized into

two levels; for example, different subtypes of human tumors can be reliably

discriminated in the binary domain (Shmulevich and Zhang, 2002).

In a Boolean network, a given gene transforms its inputs (regulatory

factors that bind to it) into an output, which is the state or expression of the

gene itself at the next time-point. All genes are assumed to update synchro-

nously in accordance with the functions assigned to them and this process is

then repeated. It is clear that the dynamics of a synchronous Boolean

network are completely determined by Eq. (13.1). The artificial synchrony

simplifies computation while preserving the qualitative, generic properties

of global network dynamics. Synchronous updating has been applied in

most analytical studies so far, as it is the only one that yields deterministic

state transitions. Although the introduction of asynchronous updating,

which typically involves a random update schedule, renders the system

stochastic, asynchronous updating is not per se biologically more realistic

and has to be motivated carefully in every case not to fall victim to artifacts

(Chaves et al., 2005). Additionally, recent research indicates that some

ð13:1Þ

338

Ilya Shmulevich and John D. Aitchison

Author’s personal copy

Page 5

molecular control networks are so robustly designed that timing is not a

critical factor (Braunewell and Bornholdt, 2006), that time ordering in the

emergence of cell-fate patterns is not an artifact of synchronous updating in

the Boolean model (Alvarez-Buylla et al., 2008), and that simplified syn-

chronous models are able to reliably reproduce the sequence of states in

biological systems. Nonetheless, PBNs, presented in Section 4, are able to

model asynchronous updating as well as other stochastic generalizations of

Boolean networks.

Let us start with a simple example to illustrate the dynamics of Boolean

networksandpresentthekeyideaofattractors.ConsideraBooleannetwork

consisting of five genes {x1, ..., x5} with the corresponding Boolean

functions given by the truth tables shown in Table 13.1. Note that

x4(t þ 1) ¼ f4(x4(t)) is a function of only one variable and is an example

of autoregulation. The maximum connectivity (i.e., maximal number of reg-

ulators) K ¼ maxikiis equal to 3 in this case.

ThedynamicsofthisBooleannetworkareshowninFig.13.1.Sincethere

are five genes, there are 25¼ 32 possible states that the network can be in.

Each state is represented by a circle and the arrows between states show the

transitions of the network according to the functions in Table 13.1. It is easy

to see that because of the inherent deterministic directionality in Boolean

networksaswellasonlyafinitenumberofpossiblestates,certainstateswillbe

revisitedinfinitelyoftenif,dependingontheinitialstartingstate,thenetwork

happens to transition into them. Such states are called attractors and the states

thatleadintothem,includingtheattractorsthemselves,comprisetheirbasins

of attraction. For example, in Fig. 13.1, the state (00000) is an attractor and

Table 13.1

five genes

Truth tables of the functions in a Boolean network with

f1

f2

f3

f4

f5

0

1

1

1

0

1

1

1

0

1

1

0

0

1

1

1

0

1

1

0

1

1

0

1

0

1

–

–

–

–

–

–

0

0

0

0

0

0

0

1

j1

j2

j3

5

2

4

3

5

4

3

1

5

4

–

–

5

4

1

The indices j1, j2, and j3indicate the input connections for each of the functions.

Deterministic and Stochastic Models of Genetic Regulatory Networks

339

Author’s personal copy

Page 6

together with the seven other (transient) states that eventually lead into it

comprise its basin of attraction.

The attractors represent the fixed points of the dynamical system, thus

capturing the system’s long-term behavior. The attractors are always cyclical

and may consist of more than one state. Starting from any state on an

attractor, the number of transitions necessary for the system to return to it

is called the cycle length. For example, the attractor (00000) has cycle length 1

while the states (11010) and (11110) comprise an attractor of length 2.

Real genetic regulatory networks are highly stable in the presence of

perturbations, since the cell must be able to maintain homeostasis in metab-

olismoritsdevelopmentalprograminthefaceofsuchexternalperturbations

and variety of stimuli. Within the Boolean network formalism, this means

thatwhenaminimalnumberofgenestransientlychangevalue(say,bymeans

of some external stimulus), the system typically transitions into states that

reside in the same basin of attraction and the network eventually ‘‘flows’’

back to the same attractor. Generally speaking, large basins of attraction

correspond to higher stability. Such stability of networks in living organisms

allows the cells to maintain their functional state within their environment.

Although in developmental biology, epigenetic, heritable changes in cell

determination have been well established, it is now becoming evident that

the same type of mechanisms may also be responsible in carcinogenesis and

that gene expression patterns can be inherited without the need for muta-

tional changes in DNA (MacLeod, 1996). In the Boolean network frame-

work, this can be explained by so-called hysteresis; that is, a change in the

system’s state caused by a stimulus that does not change back when the

stimulus is withdrawn (Huang, 1999). Thus,if the change of some particular

gene does in fact cause a transition to a different attractor, the network will

often remain in the new attractor even if that gene is switched off. Thus, the

00000

10100

01100 11000

00101 0000110101

11001

00100

10000

01000 11100

10001011010100111101

10011

11011

11110

11010

1011001010 00010

01011 00011

1001001111 011100011100110

11111 11111

1011110111

Figure 13.1

Table 13.1 (Shmulevich et al., 2002c).

The state-transition diagram for the Boolean network defined in

340

Ilya Shmulevich and John D. Aitchison

Author’s personal copy

Page 7

structure of the state space of a Boolean network, in which every state in a

basin of attraction is associated with the corresponding attractor to which

the system will ultimately flow, represents a type of associative memory.

2.1. Attractors as cell types and cellular functional states

Real gene regulatory networks exhibit spontaneous emergence of ordered

collective behavior of gene activity, captured by the attractors. Indeed,

recent findings provide experimental evidence for the existence of attractors

in real regulatory networks (Chang et al., 2008; Huang and Ingber, 2000;

Huang et al., 2005). At the same time, many studies have shown (e.g., Wolf

and Eeckman, 1998) that dynamical system behavior and stability of equili-

bria can be largely determined from regulatory element organization. This

suggests that there must exist certain generic features of regulatory networks

that are responsible for their inherent robustness and stability. Since in

multicellular organisms, the cellular ‘‘fate’’ is determined by which genes

and proteins are expressed, the attractors in the Boolean networks should

correspond to cell types, an idea originally due to Kauffman (2004). This

interpretation is quite reasonable if cell types are characterized by stable

recurrent patterns of gene expression (Jacob and Monod, 1961).

Another interpretation of attractors in Boolean networks is that they

correspond to cellular states, such as proliferation (cell cycle), apoptosis (pro-

grammed cell death), and differentiation (execution of tissue-specific tasks)

(Huang, 1999). Such an interpretationcan providenewinsights into cellular

homeostasis and cancer progression, the latter being characterized by a

disbalance between these cellular states. For instance, an occurrence of a

structural mutation can result in a reduction of the probability of the

network entering the apoptosis attractor(s), making the cells less likely to

undergo apoptosis and exhibiting uncontrolled growth. Similarly, an

enlargement of the basins of attraction for the proliferation attractor

would hyperstabilize it, resulting in hyperproliferation, typical of tumor-

igenesis. Such an interpretation need not be at odds with the interpretation

that attractors represent cellular types. To the contrary, these views are

complementary to each other, since for a given cell type, different cellular

functional states must exist and be determined by the collective behavior of

gene activity. Thus, one cell type can comprise several ‘‘neighboring’’

attractors each corresponding to different cellular functional states.

Biological networks can often be modeled as logical circuits from well-

known local interaction data in a straightforward way. This is clearly one of

the advantages of the Boolean network approach. Though logical models

may sometimes appear obvious and simplistic, compared to detailed kinetic

models of biomolecular reactions, they may help to understand the dynamic

key properties of a regulatory process. Further, a Boolean network model

can be formulated as a coarse-grained limit of the more detailed differential

Deterministic and Stochastic Models of Genetic Regulatory Networks

341

Author’s personal copy

Page 8

equations model for a system (Davidich and Bornholdt, 2008a), discussed in

Section 3. They may also lead the experimentalist to ask new questions and

to test them first in silico.

Let us consider a Boolean network model of the cell cycle control

network in the budding yeast Saccharomyces cerevisiae proposed in Li et al.

(2004). The core regulatory network involving activations and inhibitions

among cyclins, transcription factors, and check points, such as cell size,

consists of 11 binary variables. The Boolean functions, Eq. (13.1), assigned

to each variable are chosen from the subclass of threshold Boolean functions

(Muroga, 1971), which sum up their inputs with weights and if the sum

exceeds a threshold, then the output of the function is equal to 1, else it is

equal to 0. This is equivalent to a perceptron and represents a hyperplane

that cuts the Boolean hypercube into two halves, zeros on one side, and

ones on the other. The model, shown in Fig. 1 in Li et al. (2004), also has

self-degradation loops such that nodes that are not negatively regulated by

others are degraded at the next time point. The dynamics of the model are

described by

8

>

and the weights were all set to 1 or ?1, depending on activation or

inhibition, respectively (Li et al., 2004).

Since there are 11 nodes in the network, there are 2048 states in total and

all the state transitions can be computed directly through Eq. (13.2). One of

the attractors, among seven, is the most stable and attracts approximately

86% of all states. This stable (fixed point) attractor, in which the molecules

Cdhl and Sicl are equal to 1 and all others (Cln3, MBF, SBF, Cln1/2, Swi5,

Cdc20, Clb5/6, Clb1/2, Mcml) are equal to 0, represents the biological

G1stationary state (one of the four phases of the cell cycle process in which

the cell grows and can commit to division), guaranteeing cellular stability in

this state. It is further demonstrated in Li et al. (2004) that the dynamic state

trajectories starting from each of the states in the basin of attraction of the

G1stationary state converge rapidly onto an attracting state trajectory that is

highly stable, ensuring that starting from any point in the cell cycle process,

the system does not deviate from this trajectory. It is also shown, by

comparison with random networks, that the highly stable attractor is

unlikely to arise by chance (Li et al., 2004). Additionally, the results were

fairly insensitive to the values of the weights, justifying setting them both

equal to 1. Other similar studies have been carried out with the cell cycle of

the fission yeast Schizosaccharomyces pombe (Davidich and Bornholdt, 2008b)

xiðt þ 1Þ ¼

1;

Xn

Xn

j¼1aijxjðtÞ > 0

Xn

0;

j¼1aijxjðtÞ < 0

j¼1aijxjðtÞ ¼ 0

xiðtÞ

>

>

>

:

<

ð13:2Þ

342

Ilya Shmulevich and John D. Aitchison

Author’s personal copy

Page 9

and the mammalian cell cycle (Faure ´ et al., 2006). Recently, a new more

accurate Boolean network model, which can incorporate time delays, has

been proposed as a model of the budding yeast cell cycle (Irons, 2009).

3. Differential Equation Models

A model of a genetic network based on a system of differential

equations expresses the rates of change of an element, such as a gene

product, in terms of the levels of other elements of the network and possibly

external inputs. Ingeneral, a nonlineartime-dependent differential equation

has the form

_ x ¼ f ðx;u;tÞ;

ð13:3Þ

where x is a state vector denoting the values of the physical variables in the

system, _ x ¼ dx=dt is the elementwise derivative of x, u is a vector of

external inputs, and t is time.

If time is discretized and the functional dependency specified by f does

not depend on time, then the system is said to be time-invariant. If f is linear

and time-invariant, then it can be expressed as

_ x ¼ Ax þ Bu:

ð13:4Þ

where A and B are constant matrices (Weaver et al., 1999).

When _ x ¼ 0, the variables no longer change with time and thus define

the steady state of the system, which is analogous to a fixed point attractor in

a Boolean network. Consider the simple case of a gene product x (a scalar)

whose rate of synthesis is proportional, with kinetic constant k1, to the

abundance of another protein a that is sufficiently abundant such that the

overall concentration of a is not significantly changed by the reaction.

However, x is also subject to degradation, the rate of which is proportional,

with constant k2, to the concentration of x itself. This situation can be

expressed as

_ x ¼ k1a ? k2x witha;x > 0:

ð13:5Þ

Let us analyze the behavior of this simple system. If initially x ¼ 0, then

the decay term is also 0 and _ x ¼ k1a. However, as x is produced, the decay

term k2x will also increase thereby decreasing the rate _ x toward 0 and

stabilizing x at some steady-state value ? x. It is easy to determine this value,

since setting _ x ¼ 0 and solving for x yields

? x ¼k1a

k2

:

ð13:6Þ

Deterministic and Stochastic Models of Genetic Regulatory Networks

343

Author’s personal copy

Page 10

This behavior is shown in Fig. 13.2, where x starts off at x ¼ 0 and

approaches the value in Eq. (13.6). The exact form of the kinetics is

xðtÞ ¼k1a

k2

ð1 ? e?k2tÞ:

ð13:7Þ

Similarly, the derivative _ x, also shown in Fig. 13.2, starts off at the initial

value of k1a and thereafter tends toward zero.

Now suppose that a is suddenly removed after the steady-state value ? x is

reached. Since a ¼ 0, we have _ x ¼ ?k2x and since the initial condition is

x ¼ k1a/k2, _ x ¼ ?k1a initially. The solution of this equation is

xðtÞ ¼k1a

k2

e?k2t

ð13:8Þ

and it can be seen that it will eventually approach zero.

This example describes a linear relationship between a and _ x. However,

most gene interactions are highly nonlinear. When the regulator is below

some critical value, it has very little effect on the regulated gene. When it is

above the critical value, it has virtually full effect that cannot be significantly

amplified by increased concentrations of the regulator. This nonlinear

behavior is typically described by sigmoid functions, which can be either

monotonically increasing or decreasing. A common form is the so-called

Hill functions given by

5 3.752.5

t

1.250

2

1.5

1

0.5

0

Figure 13.2

k2¼ 1, and a ¼ 1. As can be seen, the gene product x, shown with a solid plot, tends

toward its steady-state value given in Eq. (13.6). The time derivative _ x, which starts at

initial value of k1a and tends toward 0, is shown with a dashed plot.

The behavior of the solution to _ x ¼ k1a ? k2x, x(0) ¼ 0, where k1¼ 2,

344

Ilya Shmulevich and John D. Aitchison

Author’s personal copy

Page 11

Fþðx;yÞ ¼

xn

ynþ xn

yn

ynþ xn¼ 1 ? Fþðx;yÞ:

F?ðx;yÞ ¼

ð13:9Þ

The function Fþ(x, 1) is illustrated in Fig. 13.3 for n ¼ 1, 2, 5, 10, 20,

50, and 100. It can be seen that it approaches an ideal step function with

increasing n, thus approximating a Boolean switch. In fact, the parameter y

essentially plays the role of the threshold value. Glass (1975) used step

functions in place of sigmoidal functions in differential equation models,

resulting in so-called piecewise linear differential equations. Glass and

Kauffman (1973)alsoshowed that manysystems exhibit the samequalitative

behavior for a wide range of sigmoidal steepnesses, parameterized by n.

Given that gene regulation is nonlinear, the differential equation models

can incorporate the Hill functions into their synthesis and decay terms.

There are many available computer tools for simulating and analyzing such

dynamical systems using a variety of methods and algorithms (Lambert,

1991), including DBsolve (Goryanin et al., 1999), GEPASI (Mendes,

01

x

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.20.40.6 0.8 1.2 1.41.6 1.82

F+(x,q)

Figure 13.3

As n gets large, Fþ(x, y) approaches an ideal step function and thus functions as a

Boolean switch.

The function Fþ(x, y) for y ¼ 1 and n ¼ 1, 2, 5, 10, 20, 50, and 100.

Deterministic and Stochastic Models of Genetic Regulatory Networks

345

Author’s personal copy

Page 12

1993), and Dizzy (Ramsey et al., 2005). Additionally, there are toolboxes

available for MATLABÒthat can be used for modeling, simulating, and

analyzing biological systems with ordinary differential equations (Schmidt

and Jirstrand, 2006). MathWorks’ SimBiologyÒtoolbox (http://www.

mathworks.com/products/simbiology) also provides a graphical user inter-

face for constructing models and entering reactions, parameters, and kinetic

laws, which can be simulated deterministically or stochastically. A useful

review of nonlinear ordinary differential equation modeling of the cell cycle

is available in Sible and Tyson (2007).

3.1. Accurate description of cellular growth and division and

prediction of mutant phenotypes

Let us return to the regulatory network controlling the cell cycle in budding

yeast. If the goal of the modeling is to predict detailed quantitative

phenomena, such as cell cycle duration in parent and daughter cells, the

length of the different phases of the cell cycle, or ratios between certain

regulatory proteins, then logical models such as Boolean networks are not

appropriate, and systems of ordinary differential equations with detailed

kinetic parameters must be used. Chen et al. (2004) constructed such a

detailed model of the cell cycle regulatory network containing 36 equations

with 148 constants, in addition to algebraic equations (available in Table 1

in that paper, with Table 2 containing parameter values). The model

incorporates protein concentrations, cell mass, DNA mass, the state of the

emerging bud, and of the mitotic spindle.

After manual fitting of some of the parameters, the dynamics generated

by the model were able to accurately describe the growth and division of

wild-type cells. Remarkably, the model also conformed to the phenotypes

of more than 100 mutant strains, in terms of experimentally observed

properties such as size at bud emergence or at onset of DNA synthesis,

viability, or growth rate, relative to these properties in the wild type.

It should be pointed out that parameter estimation of the model must be

approached with care. First, the objective function, for example mean-

squared error between the model predictions and the experimental data,

may have multiple local optima in the parameter space. Thus, an apparently

good model fit may nonetheless contain unrealistic sets of parameters that

will ultimately fail to generalize. For example, as was found in Chen et al.

(2004), changing parameters to ‘‘rescue’’ a model with respect to a mutant

(i.e., make it agree with experimental observations) often exhibit unin-

tended and unanticipated effects on other mutants. Second, model selection

must be carefully considered, since a model that is overly complex, meaning

that it has many degrees of freedom, is likely to ‘‘overfit’’ the data and

thereby, sacrifice predictive accuracy. In other words, the model may

appear to predict very well when tested against data on which it was trained,

346

Ilya Shmulevich and John D. Aitchison

Author’s personal copy

Page 13

but when tested against data under new conditions, the model will predict

very poorly. There are powerful tools, such as minimum description length,

and indeed, entire frameworks based on algorithmic information theory and

Bayesian inference, devoted to these fundamental issues (Rissanen, 2007).

4. Probabilistic Boolean Networks

PBNs are probabilistic or stochastic generalizations of Boolean net-

works. Essentially, the deterministic dynamics are replaced by probabilistic

dynamics, which can be framed within the mature and well-established

theory of Markov chains, for which many analytical and numerical tools

have been developed. Recall that Markov chains are stochastic processes

having the property that future states depend only on the present state, and

not on the past states. The transitions from one state to another (possibly

itself) are specified by state transition probabilities. Boolean networks are

special cases of PBNs in which state transition probabilities are either 1 or 0,

depending on whether Eq. (13.1) is satisfied for all i ¼ 1,...,n. The prob-

abilistic nature of this model class affords flexibility and power in terms of

making inferences from data, which necessarily contain uncertainty, as well

as in terms of understanding the dynamical behavior of biological networks,

particularly in relation to their structure.

Once the state transition probabilities for a Markov chain corresponding

to a PBN are determined, it becomes possible to study the steady-state

(long-run) behavior of the stochastic system. This long-run behavior is

analogous to attractors in Boolean networks or fixed points in systems of

differential equations. Kim et al. (2002) investigated the Markov chain

corresponding to a small network based on microarray data observations

of human melanoma samples.The steady-state behavior (distribution)of the

constructed Markov chain was then compared to the initial observations. If

the Markov chain is ergodic, meaning that it is possible to reach any state

from any other state after an arbitrary number of steps, then the steady-state

probability corresponds to the fraction of the time that the system will spend

in that particular state.

The remarkable finding was that only a small number of all possible states

had significant steady-state probabilities and most of those states with high

probability were observed in the data. Furthermore, it was found that more

than 85% of those states with high steady-state probability that were not

observed in the data were very close to the observed data in terms of

Hamming distance, which is equal to the number of genes that ‘‘disagree’’

in their binary values. Based on the transition rules inferred from the data,

the model produced localized stability, meaning that the system tended to

flow back to the states with high steady-state probability mass if placed in

Deterministic and Stochastic Models of Genetic Regulatory Networks

347

Author’s personal copy

Page 14

their vicinity. Thus, the stochastic dynamics of the Markov chain were able

to mimic biological regulation. It should be noted that Markov chains are

commonly used to model gene expression dynamics using so-called

dynamic Bayesian networks (Murphy and Mian, 1999; Yu et al., 2004;

Zou and Conzen, 2005). Indeed, PBNs and dynamic Bayesian networks are

able to represent the same joint probability distribution over their common

variables (i.e., genes) (La ¨hdesma ¨ki et al., 2006).

Except in very restricted circumstances, gene expression data refute the

determinism inherent to the Boolean network model, there typically being

a numberof possible successor states to any given state. Consequently, if one

continues to assume the state at time t þ 1 is independent of the state values

prior to time t, then, as stated above, the network dynamics are described by

a Markov chain whose state transition matrix reflects the observed stochas-

ticity. In terms of gene regulation, this stochasticity can be interpreted to

mean that several regulator gene sets are associated with each gene and at

any time point one of these ‘‘predictor’’ sets, along with a corresponding

Boolean function, is randomly chosen to provide the value of the gene as a

function of the values within the chosen predictor set. It is this reasoning

that motivated the original definition of a PBN in which the definition of a

Boolean networkwas adapted insuch a waythat, for each gene, at each time

point, a Boolean function (and predictor gene set) is randomly chosen to

determine the network transition (Shmulevich et al., 2002a,c).

Rather than simply randomly assigning Boolean functions at each time

point, one can take the perspective that the data come from distinct sources,

each representing a ‘‘context’’ of the cell. From this perspective, the data

derive from a family of deterministic networks and, in principle, the data

could be separated into separate samples according to the contexts from

which they have been derived. Given the context, the overall network

would function as a Boolean network, its transition matrix reflecting deter-

minism (i.e., each row contains one 1, in the column that corresponds to the

successor state, and the rest are 0s). If defined in this manner, a PBN is a

collection of Boolean networks in which a constituent network governs

gene activity for a random period of time before another randomly chosen

constituent network takes over, possibly in response to some random event,

such asan external stimulus orthe action ofa (latent) regulatorthat isoutside

the scope of the network. Since the latter is not part of the model, network

switching is random. This model defines a ‘‘context-sensitive’’ PBN (Brun

et al., 2005; Shmulevich et al., 2002c). The probabilistic nature of the

constituent choice reflects the fact that the system is open, not closed,

the idea being that changes between the constituent networks result from

the genes responding to latent variables external to the model network.

We now formally define PBNs. Although we retain the terminology

‘‘Boolean’’ in the definition, this does not refer to the binary quantization

assumed in standard Boolean networks, but rather to the logical character of

348

Ilya Shmulevich and John D. Aitchison

Author’s personal copy

Page 15

the gene predictor functions. In the case of PBNs, quantization is assumed

to be finite, but not necessarily binary. However, we restrict ourselves to the

binary domain here for simplicity. Formally, a PBN consists of a sequence

V ¼ fxign

vector-valued functions, defining constituent networks. In the framework

of gene regulation, each element xirepresents the expression value of a

gene. Each vector-valued function fl¼ ð fð1Þ

constituentnetwork, orcontext,

fðiÞ

l

: f0;1gn! f0;1g is a predictor of gene i, whenever network l is

selected. At each updating epoch, a decision is made whether to switch

the constituent network. This decisiondepends on a binary random variable

x: if x ¼ 0, then the current context is maintained; if x ¼ 1, then a

constituent network is randomly selected from among all constituent net-

works according to the selection probability distribution

i¼1of n nodes, where xi2 {0, 1}, and a sequence fflgm

l¼1of

l

;fð2Þ

l

the

;...;fðnÞ

PBN.

l

Þ determines a

Thefunction of

fclgm

l¼1

X

m

l¼1

cl¼ 1:

ð13:10Þ

The switching probability q ¼ P(x ¼ 1) is a system parameter. If the

current network is maintained, then the PBN behaves like a fixed network

and synchronously updates the values of all the genes according to the

current context. Note that, even if x ¼ 1, a different constituent network

is not necessarily selected because the ‘‘new’’ network is selected from

among all contexts. In other words, the decision to switch is not equivalent

to the decision to change the current network. If a switch is called for

(x ¼ 1), then, after selecting the predictor function fl, the values of genes

are updated accordingly; that is, according to the network determined by fl.

If q < 1, the PBN is said to be context-sensitive; if q ¼ 1, the PBN is said to

be instantaneously random, which corresponds to the original definition in

Shmulevich et al. (2002a).

Whereas a network switch corresponds to a change in a latent variable

causing a structural change in the functions governing the network, a

random perturbation corresponds to a transient value change that leaves

the network wiring unchanged, as in the case of activation or inactivation

owing to external stimuli such as stress conditions, small molecule inhibi-

tors, etc. In a PBN with perturbation, there is a small probability p that

a gene may change its value at each epoch. Perturbation is characterized

by a random perturbation vector g ¼ (g1, g2, ..., gn), gi2 {0, 1}, and

P(gi¼ 1) ¼ p, the perturbation probability; gi is also known as a

Bernoulli(p) random variable. If x(t) is the current state of the network, and

g(t þ 1) ¼ 0, then the next state of the network is given by x(t þ 1) ¼

fl(x(t)), as in Eq. (13.1); otherwise, x(t þ 1) ¼ x(t) ? g(t þ 1), where ? is

componentwise exclusive OR. The probability of no perturbation, in which

Deterministic and Stochastic Models of Genetic Regulatory Networks

349

Author’s personal copy