Content uploaded by Siyuan Wu
Author content
All content in this area was uploaded by Siyuan Wu on Mar 31, 2023
Content may be subject to copyright.
Mathematical modelling and inference of
genetic regulation for
cell fate determination in hematopoiesis
A thesis submitted in partial fulllment of
the requirements for the degree of
Doctor of Philosophy
by
Siyuan Wu
Supervisor: A/Prof. Tianhai Tian
Associate Supervisor: Dr. Tiangang Cui
School of Mathematics
Monash University
Australia
February 2022
—–This page intentionally left blank—–
Copyright notice
© Siyuan Wu (2022)
I certify that I have made all reasonable eorts to secure copyright permissions for
third-party content included in this thesis and have not knowingly added copyright content
to my work without the owner’s permission.
i
—–This page intentionally left blank—–
Dedicated to my family
especially
my mother Rong He and father Yong Wu
and
my wife Wei Zhang
ii
—–This page intentionally left blank—–
Abstract
This thesis focuses on the development of mathematical methods to explore and ana-
lyze the dynamical mechanisms of genetic regulation related to cell fate determination
in hematopoiesis. Although substantial research studies have already been conducted, the
detailed regulatory mechanisms are still uncertain. The mathematical modelling and infer-
ence of genetic regulatory networks is therefore of particular importance. The objective of
this thesis is to design two mathematical frameworks. The rst framework is developed to
understand nonlinear dynamics of gene expression. The second framework is designed to
achieve the multistability property of a system by embedding two bistable systems. This
thesis consists of three parts.
Firstly, we focus only on the genetic regulatory networks of protein monomers. We
use a Forward Search Algorithm and ordinary dierential equations to analyze the genetic
regulation of the network and the dynamical properties. By using genetic regulation in
hematopoiesis as a testing system, we provide an eective framework for studying regulatory
mechanisms.
Second, we extend our methods to analyze the regulatory roles of protein heterodimers
and/or synergistic eects in the determination of cell fate during hematopoiesis. We propose
a new algorithm, known as Extended Forward Search Algorithm, to infer the structure of
networks with both linear terms (namely protein monomers) and nonlinear terms (namely
protein heterodimers and/or synergistic eects). The Taylor expansion method is used to
simplify the nonlinear ordinary dierential equation for describing the dynamical properties
of genetic regulatory networks. This proposed approach gives accurately simulated results
based on the published data on hematopoiesis.
Finally, in the third part, we nd that hematopoiesis can be treated as a system with
multiple stable states. To better understand the problem of cell fate determination in
hematopoiesis, we develop a novel framework to obtain a multistable system by embed-
ding systems with bistable characteristics. To ensure the multistability of the embedding
system, we demonstrate the conditions of existence for all possible equilibrium states be-
tween the bistable systems and the embedding system and the conditions of stability for
each equilibrium state. Using the GATA1-GATA2-PU.1 module as a testing system, our
method with stochastic simulation successfully achieves the recent experimental results.
In summary, research results in this thesis demonstrate that the proposed modelling
and inference methods are powerful tools to study genetic regulatory networks and other
complex systems.
Keywords: mathematical modelling, stochastic modelling, dierential equation, genetic
regulatory networks, hematopiesis, cell fate determination, multistability, network inference
iii
—–This page intentionally left blank—–
Declaration
This thesis is an original work of my research and contains no material which has been
accepted for the award of any other degree or diploma at any university or equivalent in-
stitution and that, to the best of my knowledge and belief, this thesis contains no material
previously published or written by another person, except where due reference is made in
the text of the thesis.
Name: Siyuan Wu
Date: February 25, 2022
iv
—–This page intentionally left blank—–
List of publications
•Wu S., Cui T. and Tian T. Mathematical Modelling of Genetic Network for Reg-
ulating the Fate Determination of Hematopoietic Stem Cells, Proceedings of 2018
IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2018),
2167-2173, IEEE Press.
•Wu S., Cui T, Zhang X and Tian T. 2020. A non-linear reverse-engineering method
for inferring genetic regulatory networks. PeerJ 8:e9065
•Wu S., Zhou T. and Tian T. 2022. A robust method for designing multistable
systems by embedding bistable subsystems. NPJ Systems Biology and Applications
(Accepted)
v
—–This page intentionally left blank—–
“Wir müssen wissen. Wir werden wissen.”
— David Hilbert
vi
—–This page intentionally left blank—–
Acknowledgements
Throughout the writing of this thesis I have received a great deal of support and assistance.
I would like to express my innermost gratitude to my supervisor, Associate Professor
Tianhai Tian and Dr. Tiangang Cui, whose expertise was invaluable in formulating the
research questions and methodology. Also for your constant guidance and patient support
throughout the whole PhD life. Without your inspiring and encouragement, this thesis
could not have been accomplished. Your insightful feedback pushed me to sharpen my
thinking and brought my work to a higher level.
I would also like to acknowledge the members of my PhD Committee, Associate Professor
Jonathan Keith,Associate Professor Tim Garoni and Dr. Mark Flegg. Your
timely suggestions with kindness and valuable feedback have enabled me on the right track.
I also thank my collaborators, Professor Tianshou Zhou from Sun Yet-sen University
and Professor Xinan Zhang from Central China Normal University, for your eorts in
putting this into our manuscripts.
I would also like to thank graduate research coordinator, Associate Professor Heiko
Dietrich, and school postgraduate administrator, Mr. John Chan, for your assistance
in many aspects during my PhD time at Monash University.
I have been very grateful for the friendship and support of my friends. Many thanks go to
my brilliant colleagues at the School of mathematics, especially to my good friend, Ziwen
Zhong. In addition, it is a pleasure to thank my wonderful friends from the WeChat
group, 2021 Get Rich, for the wonderful times we shared, especially the countless board-
game nights and dinners. I also want to thank my close friends Peichun Yang and Ke
Zhang for always having faith in me and telling me ”I know you can do it”. I feel fortunate
to have met you in Melbourne and made my time in Australia memorable. Moreover, I
would like to express my special thanks to my best friend, Shiyi Wang. Thank you for
being by my side in my life at the most dicult time, as well as many unforgettable chats
and discussions on the topic of games, research, career and life. Your friendship means
more to me than you can ever know.
Last but not least, my family deserves endless gratitude: my mother, Rong He, for teaching
me how to fear and love life, and my father, Yong Wu, for teaching me how to be strong
when faced with challenges. I will leave this last word to my wife, Wei Zhang, thank you
for your everlasting love, including but not limited to the constant encouragement, spiritual
companionship and fabulous food. Your twelve years of companionship was an essential
foundation for me to be able to achieve my dream.
Big thank you to each and everyone!
vii
—–This page intentionally left blank—–
Contents
Copyright notice i
Abstract iii
Acknowledgements vii
1 Introduction 1
1.1 Thesis outline ................................... 3
2 Background 4
2.1 Introduction to genetic regulation ........................ 4
2.2 Introduction to hematopoiesis .......................... 6
2.3 Mathematical models for genetic regulation .................. 10
2.3.1 Ordinary dierential equation models .................. 10
2.3.1.1 Linear model .......................... 10
2.3.1.2 Michaelis-Menten formalism .................. 10
2.3.1.3 Hill equation .......................... 14
2.3.1.4 Shea-Ackers formalism ..................... 16
2.3.2 Stochastic dierential equation models ................. 17
2.4 Inference methods for genetic regulation .................... 19
2.4.1 Static model ............................... 19
2.4.2 Dynamic model .............................. 20
2.5 Numerical methods for parameter estimation .................. 21
2.5.1 Genetic algorithm ............................ 21
2.5.2 Approximate Bayesian computation .................. 22
2.5.3 Robustness analysis ........................... 22
2.6 Numerical methods for simulation of mathematical models .......... 24
2.6.1 Numerical simulation of ODE models .................. 25
2.6.1.1 Euler’s method ......................... 25
2.6.1.2 Runge-Kutta methods ..................... 27
2.6.2 Numerical simulation of SDE models .................. 32
2.6.2.1 Euler-Maruyama method ................... 32
2.6.2.2 Milstein method ........................ 34
2.7 Review of mathematical modelling in hematopoiesis .............. 35
viii
CONTENTS
3 Forward search algorithm for inferring genetic regulatory networks 39
3.1 Experimental data ................................ 40
3.1.1 Database background .......................... 40
3.1.2 Selection of candidate genes ....................... 40
3.2 Methods ...................................... 42
3.2.1 Top-down approach: static model .................... 42
3.2.2 Bottom-up approach: dynamic model .................. 42
3.2.3 Parameter inference ........................... 43
3.2.4 Robustness analysis ........................... 44
3.3 Results ....................................... 44
3.3.1 Inference of regulatory network ..................... 44
3.3.2 Inference of dynamic model ....................... 45
3.3.3 Reduction of network model - edge deletion .............. 47
3.4 Summary ..................................... 49
4 Extended forward search algorithm for inferring genetic regulatory net-
works 51
4.1 Methods ...................................... 52
4.1.1 Top-down approach: static model .................... 52
4.1.2 Bottom-up approach: dynamic model .................. 55
4.1.3 Robustness analysis ........................... 57
4.2 Results ....................................... 58
4.2.1 Inference of regulatory network ..................... 58
4.2.2 Inference of dynamic model ....................... 60
4.2.3 Reduction of network model - edge deletion .............. 63
4.3 Summary ..................................... 64
5 A robust method for designing multistable systems by embedding bistable
subsystems 69
5.1 Principle of embeddedness ............................ 69
5.1.1 Embedding method for designing multistable models ......... 69
5.1.2 Eectiveness of embedding method ................... 71
5.2 Model development for embedding method ................... 73
5.2.1 Model development with bistability properties ............. 73
5.2.2 Perturbation analysis of bistable models ................ 80
5.2.3 Model development for tristability properties ............. 83
5.3 Application in hematopoiesis .......................... 89
5.3.1 Bistable models for GATA1-PU.1 and GATA-switching modules . . . 89
5.3.2 Tristable model of the GATA1-GATA2-PU.1 network ......... 91
5.3.3 Stochastic model for realizing heterogeneity .............. 92
5.4 Summary ..................................... 97
6 Conclusions and open questions 101
6.1 Conclusion .....................................101
6.2 Limitations of study and open questions ....................103
Bibliography 106
ix
List of Figures
2.1 Description of the Central dogma of molecular biology: Black solid
arrows represent general information transfer; Black dashed arrow represents
the special information transfer from RNA to DNA with specic reverse
transcriptase; Cyan dashed arrows indicate that the protein will bind to a
specied gene site to regulate the expression of DNA and RNA, rather than
the information ow stated in the central dogma. Created with BioRender.com. 6
2.2 Diagram of developmental process of hematopoietic stem cell. Cre-
ated with BioRender.com. ............................ 7
2.3 Illustrative diagram of GATA-Switching. Expression levels of GATA1
and GATA2 during GATA-switching process. Created with BioRender.com. 9
2.4 The network structure of GATA1-GATA2-PU.1. ’→’ and ’⊣’ denote
the activating and inhibiting regulations, respectively. ............. 37
3.1 The genetic regulatory networks of eleven genes predicted by FSA
are related to fate determination of HSCs. Regulatory network for
HSCs choose megakaryocyte-erythroid lineage. The network is visualized by
Cytoscape software. ................................ 46
3.2 The genetic regulatory networks of eleven genes predicted by FSA
are related to fate determination of HSCs. Regulatory network for
HSCs choose granulocyte-macrophage lineage. The network is visualized by
Cytoscape software. ................................ 46
3.3 Simulation result of the regulatory network with eleven genes for
erythroid dierentiation. Red dash line: microarray data; Blue solid
line: simulation of the regulatory network ................... 47
3.4 Simulation result of the regulatory network with eleven genes for
neutrophil dierentiation. Red dash line: microarray data; Blue solid
line: simulation of the regulatory network ................... 48
4.1 Inferred regulatory network for the dierentiation of erythrocyte
by EFSA. The genetic regulatory network predicted by EFSA with 11
genes and 44 NLTs (11 isolated terms excluded), which is related to the fate
determination of erythrocyte pathway: Regulatory network for hematopoi-
etic stem cells dierentiate to megakaryocyte-erythroid progenitors. The
network is visualized by Cytoscape software. .................. 59
x
LIST OF FIGURES
4.2 Inferred regulatory network for the dierentiation of neutrophil
by EFSA. The genetic regulatory networks predicted by EFSA with 11
genes and 38 NLTs (17 isolated terms excluded), which is related to the fate
determination of neutrophil pathway: Regulatory network for hematopoietic
stem cells dierentiate to granulocyte-macrophage progenitors. The network
is visualized by Cytoscape software. ....................... 60
4.3 Simulation results and experimental data of the regulatory net-
work for erythrocyte dierentiation. Red solid line: experimental mi-
croarray data; Blue star dash line: simulation of the regulatory network. . . 61
4.4 Simulation results and experimental data of the regulatory net-
work for neutrophil dierentiation. Red solid line: experimental mi-
croarray data; Blue star dash line: simulation of the regulatory network. . . 62
4.5 Predicted genetic regulatory network of erythrocyte pathway after
edges deletion. The genetic regulatory network predicted by the Extended
Forward Search Algorithm with 11 genes and 41 non-linear terms (NLTs) (14
isolated NLTs excluded) after edges deletion test, which is related to the fate
determination of erythrocyte pathway: Regulatory network for hematopoi-
etic stem cells dierentiate to megakaryocyte-erythroid progenitors. The
network is visualized by Cytoscape software. .................. 67
4.6 Predicted genetic regulatory network of neutrophil pathway after
edges deletion. The genetic regulatory networks predicted by the Ex-
tended Forward Search Algorithm with 11 genes and 38 non-linear terms
(NLTs) (17 isolated NLTs excluded) after edges deletion test, which is re-
lated to the fate determination of neutrophil pathway: Regulatory network
for hematopoietic stem cells dierentiate to granulocyte-macrophage pro-
genitors. The network is visualized by Cytoscape software. .......... 68
5.1 Methodology for developing multistable models by embedding two
sub-systems with bistability together.
(A) Brief owchart of hematopoietic hierarchy that is created with BioRen-
der.com. HSCs, hematopoietic stem cells; MPPs, multipotent progenitors;
MEPs, megakaryocyte-erythroid progenitors; GMPs, granulocyte-macrophage
progenitors.
(B) The principle of embeddedness: Z-Umodule is the rst bistable sub-
system. Once this module crosses the saddle point from state Zto state U,
it enters the X-Ysub-system that has two stable steady states Xand Y,
reaching either state Xand state Yvia the imaginary state U.
(C,D) The structure of two double-negative feedback loops with positive au-
toregulations, which is the mechanisms for bistable sub-systems in HSCs.
(E) The structure of regulatory network after embeddedness. The X-Ysub-
system is embedded into the state U. (’→’ and ’⊣’ denote the activating
and inhibiting regulations, respectively.) .................... 72
xi
LIST OF FIGURES
5.2 Realization of tristability by embedding two bistable sub-systems.
(A) The phase plane of the toggle switch sub-system (5.6) with bistability
(a and b: stable steady states, c: saddle state).
(B) The 3D phase portrait of the embedded system (5.8) with tristability
(Three red points: stable steady states; two black points: saddle states) . . . 73
5.3 Realization of tristability by embedding two bistable sub-systems
in hematopoiesis.
(A) Phase plane of the GATA1-PU.1 module showing the bistable property
of the proposed model, where a and b are stable steady states; c, d and e
are saddle states.
(B) Simulations of GATA-switching of model (5.93). Upper panel: An un-
successful switching with a small value of k∗
0due to the displacement of
GATA2 not being enough for cells to leave the HSCs state (Zstate); Lower
panel: A successful switching with sucient displacement of GATA2 by us-
ing a large value of k∗
0. Cells leave the HSCs state and enter the U state.
(C) The 3D phase portrait of the modied embedding model (5.94) with
k∗= 0. Four red points are stable steady states, while the three black points
are saddle states. ................................. 93
5.4 The 3D phase portrait of the embedded system. Based on the exper-
imental data, the proposed model successfully realize the tristability prop-
erties, with the same parameter values presented in the Table 5.3 and Table
5.4. Red points: stable steady states; Black points: saddle states. . . . . . . 94
5.5 The network structure of GATA1-GATA2-PU.1. ’→’ and ’⊣’ denote
the activating and inhibiting regulations, respectively. ............. 94
5.6 Stochastic simulations showing four stable states that correspond
to the experimentally observed four dierent states.
(A) Simulation of unsuccessful GATA switching that makes the cell stay at
the HSC state, which is the G2H state.
(B) Simulation of unsuccessful GATA switching but the cell enters the state
with low expression of all three genes, which is the LES CMP state.
(C) Simulation of successful switching that leads to the GMP state with high
expression levels of PU.1, which is the P1H state.
(D) Simulation of successful switching that leads to the MEP state with high
expression levels of GATA1, which is the G1H state. .............. 96
5.7 Distributions of dierent cell types derived from stochastic simu-
lations.
(A) Frequencies of cells having successful switching for each set of parame-
ters (k∗
0, ψ).
(B) Ratios of GMP cells to MEP cells when the cells have successful switch-
ing in (A) for each set of parameters (k∗
0, ψ).
(C) Parameter sets of (k∗
0, ψ)that generate stochastic simulations with four
steady states as shown in Figure 5.6 (yellow part) or with two or three states
(blue part).
(D) Violin plots of the natural log normalised (expression level per cell + 1)
distributions for three genes in dierent cell states derived from stochastic
simulations with parameters k∗
0= 0.52 and ψ= 0.0005............. 98
xii
List of Tables
2.1 Example of Shea-Ackers formalism .................... 17
3.1 Information of the 30 candidate genes for dierentiation of hematopoi-
etic stem cells. The 30 genes in “This chapter” are the combination of the
genes in two published studies. ......................... 41
3.2 Literature information for the selected 11 genes in this study. These
genes are selected from Table 3.1 based on their relationship with the three
genes GATA1,GATA2 and PU.1......................... 42
3.3 Edge deletion test for erythroid dierentiation. OES represents the
network without any deletion. (RA: robustness property in the mean, RSTD:
robustness property in standard deviation). .................. 49
3.4 Edge deletion test for erythroid dierentiation. OES represents the
network without any deletion. (RA: robustness property in the mean, RSTD:
robustness property in standard deviation). .................. 49
3.5 Edge deletion test for neutrophil dierentiation. OES represents the
network without any deletion. (RA: robustness property in the mean, RSTD:
robustness property in standard deviation). .................. 50
4.1 Edge deletion test for erythrocyte dierentiation. RR: Removed reg-
ulation; SE: Simulation error, dened by (4.9); RA: Robust average, dened
by (2.52); RSTD: Robust standard deviation, dened by (2.53). ....... 65
4.2 Edge deletion test for neutrophil dierentiation. RR: Removed reg-
ulation; SE: Simulation error, dened by (4.9); RA: Robust average, dened
by (2.52); RSTD: Robust standard deviation, dened by (2.53). ....... 66
5.1 Three types of the bistable model whose stable steady states locate
at dierent positions. Type 1: two stable states are in the axis; Type 2
and Type 3, one of the stable states is in an axis but the other is out of the
axis. ........................................ 81
5.2 Perturbation analysis with strength ε= 1.8
ε= 1.8
ε= 1.8.Type 1 is the Type 1
case in Table 5.1. Perturbed cases 1 and 2 are obtained from Type 1 by
perturbing the model parameters. In these two cases, one the stable state is
in an axis but the other is out of the axis. ................... 82
5.3 Estimated model parameter values for module X-Y.......... 90
5.4 Estimated model parameter values for module Z-U........... 90
5.5 Estimated additional model parameter values for modied model. . 92
5.6 Distances between four stable states and three saddle points shown
in the phase portrait of Figure 5.3C - Related to Results. . . . . . 99
xiv
LIST OF TABLES
5.7 The expression variations in stochastic simulations around the four
stable states of the corresponding deterministic model - Related
to Results. The deterministic solutions (GATA1, PU.1, GATA2) for G1H,
P1H, G2H and LE3G states are (51.7224, 2.9587, 0.0459), (0.2486, 91.5298,
0.0216), (0.0288, 0.0038, 41.8227) and (2.3364, 0.7414, 8.6664), respectively
(also shown in Figure 5.3C). The minimal/maximal expression levels of each
gene are obtained from 20000 stochastic simulations for each state. . . . . . 100
xv
1
Introduction
Biological systems are highly ordered and complex, with the cell serving as the basic unit.
The behaviour of cellular activities is then controlled by complex regulatory mechanisms.
Researchers frequently choose to describe activity within a cell in terms of data (e.g.,
gene expression data), chemical reaction equations, or networks (e.g., gene regulatory net-
works/biochemical reaction networks). Regulatory networks provide a comprehensive ex-
amination of molecular mechanisms underlying biological activities [116]. With the ad-
vances of omics technologies, a large amount of gene expression data can be obtained by
high-throughput technology with dierent experimental conditions, including time-series
gene expression data [8,115]. But how to harness such experimental data to uncover the
nature of the mechanisms underlying the biological systems is still a challenge for researchers
today, especially for the mechanism of cell fate determination. Recently, a large number
of network inference algorithms have been designed to deduce the regulation of molecular
components inside biological systems [83,101,115], and also a number of mathematical
models have been proposed to describe the dynamical properties of regulatory network
[28,78,83,103,146,152,158,159]. Chapter 2 will introduce a needed biological back-
ground and also describes some classical computational models and methods for studying
genetic regulatory networks. One of the key challenges in most of the inference methods
is the large number of unknown parameters compared with the relatively small amount
of data. Chapter 3 and Chapter 4 provide a general approach to address these issues in
predicting the dynamical mechanism of genetic regulatory networks in hematopoiesis.
Another issue is that majority of current mathematical models only considered the
function of each gene as monomer (namely using xi) or homodimer (namely using x2
i). The
function of heterodimer (namely using xixj) has not been considered. There is a lack of
1
investigations for the eect of possible protein heterodimers and/or synergistic eect in
genetic regulations. The main reason is that the number of unknown parameters in the
model will be the order of n3for a network with ngenes. It would be dicult to estimate
such a large number of model parameters, and therefore a large amount of experimental data
is needed to determine these parameters. More importantly, it is very challenging to infer
the model parameters due to the complexity of parameter space and high computational
cost. Given the large parameter number issue, a linear ODE model may have the least
number of unknown parameters among the models for all possible relationships between
genes and protein heterodimers and/or synergistic eects. However, as the linear model is
limited to describing linear relationships, it is not appropriate to use a linear model to study
systems with complex nonlinear dynamics. Chapter 4 addresses these issues by proposing
a inference method to the reconstruction of genetic regulatory networks in hematopoiesis
with genes and the protein heterodimers and/or synergistic eects.
Multistability is the characteristic of a system that exhibits two or more mutually ex-
clusive stable states. This phenomenon has been observed in many dierent disciplines of
science, including genetic regulatory networks [37,75,106,117], cell signalling pathways
[4,48,132], metabolic networks [24], ecosystems [9,79], neuroscience [58], laser systems
[44,68], and quantum systems [67]. When external and/or internal conditions change,
the system may switch from one stable state to another either randomly by perturbations
or in a desired way according to control strategies. In recent years mathematical models
with multistability have been developed for theoretical analysis and computer simulations,
which shed light on the mechanisms that generate multistability and control the transition
between stable states [7,38,63,108].
Mathematical modelling is a powerful tool to explore the regulatory mechanisms of
multistability for controlling the transitions between dierent cell types [3,29,103]. A
number of mathematical models have been proposed to describe the underlying mecha-
nisms of multistability inside biological systems [13,21,52,95,119,136]. Although these
attempts have realized multistability by using dierent assumptions, it is still a challenge
to develop mathematical models for realizing tristability by using both realistic regula-
tory mechanisms and experimental data. On the other hand, substantial research studies
have been conducted to develop mathematical models for realizing bistability properties
[37,39,71,84,107,118,135]. Thus, the question is whether we can develop mathemati-
cal models with tristability or higher-order of multistability by using the bistable models.
Chapter 5 addresses this problem by proposing a novel and robust method to develop mul-
tistable models by embedding bistable models, and it uses this methodology to analyze the
problem of cell fate determination in hematopoiesis.
2
1.1. THESIS OUTLINE
1.1 Thesis outline
Motivated by these issues, the principal aim of this thesis is to use mathematical and com-
putational methods to analyze the underlying mechanisms of genetic regulatory networks
involved in the cell fate determination in hematopoiesis. Here, a brief description of each
chapter contained in this thesis is now presented.
•Chapter 2: This chapter contains various biological and mathematical backgrounds
needed for this thesis.
•Chapter 3: This chapter contains a general approach that is used to predict the
dynamical mechanism of genetic regulatory networks in hematopoiesis by combining
both the top-down approach (for reducing the complexity of the network structure)
and bottom-up approach (for deriving the detailed dynamic properties) [153].
•Chapter 4: This chapter presents a general approach with a new inference algorithm,
Extended Forward Search Algorithm, and a new mathematical model to infer the
genetic regulatory networks of genes and the protein heterodimers and/or synergistic
eects [154].
•Chapter 5: This chapter provides a new framework to achieve multistability from
bistable systems. The methodology is also used to discuss the cell fate determination
in hematopiesis [155].
•Chapter 6: This chapter discusses all studies in this thesis, and also remarks some
interesting topics for future research.
3
2
Background
The objective of this chapter is to provide the necessary biological and mathematical back-
ground knowledge needed for the follow-up chapters.
2.1 Introduction to genetic regulation
In 1953, with the introduction of the double helix structure of DNA by James D. Watson
and Francis Crick, the nature of genes was further recognized as DNA segments that con-
tain genetic information [150]. It has been shown that there are one or two DNA molecules
per chromosome, each with multiple genes, each containing large numbers of deoxyribonu-
cleotides. Therefore, deoxyribonucleotides are the material basis of DNA. There are four
bases in deoxyribonucleotides (i.e., adenine, guanine, cytosine, and thymine) that deter-
mine biodiversity. Dierent genes contain dierent genetic information due to the dierent
base sequences. The genetic information stored in DNA represents the genotype. The
most fundamental level at which the genotype results in an individual’s phenotype is gene
expression. In genetics, gene expression is the process by which information from a gene is
used to synthesize a functional gene product that allows the gene to produce end products
such as protein or noncoding RNA and, as a result, alter the phenotype [80]. Although
every cell in an organism contains all the genetic information, the gene expression varies for
dierent cell types. The specicity of gene expression supports the basic construction and
performance of life. Gene expression is regulated by genetic regulatory networks at several
stages in protein synthesis, including DNA transcription, RNA processing, RNA translation
4
2.1. INTRODUCTION TO GENETIC REGULATION
and post-translational modication of a protein [28]. Genetic regulatory networks are col-
lections of molecular regulators that interact with each other and with other substances in
the cell to control the timing, location and levels of gene expression of mRNAs and proteins
within the cell, which in turn determines the cell’s structure and function [148]. Conse-
quently, uncovering the mechanisms of gene expression and regulation is essential to the
understanding of basic intracellular processes and is also critical for understanding the onset
and development of various diseases and their treatment processes [53,72,87,89,121]. Al-
though the process of gene expression is complex, its underlying principles can be described
by the central dogma of molecular biology.
The central dogma of molecular biology is one of signicant milestones in biology. It
describes the transition ow of genetic information within a biological system. The original
dogma is rst stated by Francis Crick in 1958 [26]. The dogma states that the genetic infor-
mation contained in a protein-coding gene is expressed through transcription and transla-
tion. RNA polymerases use one of two DNA strands as a template to form a complementary
RNA strand that carries the genetic information contained in the DNA into the ribosome
in the form of mRNA, where it controls the synthesis of proteins [80]. That is, the ow
of genetic information within cells follows that outline DNA →RNA →Protein. In 1970,
Howard Temin and David Baltimore discovered a reverse transcriptase that catalyzes the
synthesis of DNA using RNA as a template from two RNA tumour viruses: Rous sarcoma
virus and murine leukemia virus [6,130]. The discovery of this reverse transcriptase re-
vealed that genetic information can ow not only from DNA to RNA, but also from RNA
to DNA, which further developed and rened the central dogma of molecular biology. The
updated central dogma with the path RNA →DNA was restated by Francis Crick in the
same year [25]. We therefore understand that genetic information ow is a two-way transfer
between DNA and RNA, but the information is transmitted one-way from nucleic acids to
proteins as shown in Figure 2.1.
By studying gene regulatory systems in depth, researchers have uncovered a number
of complex molecular regulatory mechanisms that are grounded in the central dogma. For
example, in eukaryotic cells, a sequence of DNA is transcribed into a pre-mRNA by RNA
polymerases with the aid of transcription factors (TFs). The pre-mRNA must be processed
in to mRNA by the following steps
1. Adding a 5’ cap to the beginning of the pre-mRNA.
2. Adding a 3’ poly-A tail to the end of the pre-mRNA.
3. Removing the introns (non-coding sequences) from the pre-mRNA.
4. Joining the exons (protein-coding sequences) together.
5
2.2. INTRODUCTION TO HEMATOPOIESIS
Figure 2.1: Description of the Central dogma of molecular biology: Black solid
arrows represent general information transfer; Black dashed arrow represents the special
information transfer from RNA to DNA with specic reverse transcriptase; Cyan dashed
arrows indicate that the protein will bind to a specied gene site to regulate the expression
of DNA and RNA, rather than the information ow stated in the central dogma. Created
with BioRender.com.
Then, the mRNA is exported to the cytoplasm from the nucleus for translation. When
the translation process is activated, ribosomes will move along the mRNA from 5’ end
to 3’ end. At the same time, amino acids are transported to the ribosomes by transfer
RNAs and paired with mRNA codons to synthesize the corresponding polypeptide. After
being modied, the polypeptide folds into its characteristic and functional 3D structure to
become a mature protein. However, the detailed mechanisms of these chemical reactions
are beyond the scope of this thesis, which we will not discuss further.
2.2 Introduction to hematopoiesis
Hematopoiesis is a process for blood cell production, which is a highly complex process
that controls the proliferation, dierentiation and maturation of hematopoietic stem cells
(HSCs) [97]. In adult mammals, all blood cell types arise from HSCs that reside mainly
in the bone marrow [12,113]. HSCs have the feature of self-renewal and multipotency as
well as the ability to dierentiate into multipotent progenitors (MPPs). Then, MPPs will
dierentiate into two main lineages of blood cells, namely the myeloid lineage which starts
6
2.2. INTRODUCTION TO HEMATOPOIESIS
Figure 2.2: Diagram of developmental process of hematopoietic stem cell. Created
with BioRender.com.
at common myeloid progenitors (CMPs) and the lymphoid lineage which starts at common
lymphoid progenitors (CLPs). In addition, the myeloid lineage has two distinct progen-
itors, namely megakaryocyte-erythroid progenitors (MEPs) and granulocyte-macrophage
progenitors (GMPs). MEPs can dierentiate into megakaryocytes and erythrocytes, and
GMPs can give rise to mast cells, macrophages and granulocytes. Lymphoid lineage cells
include T lymphocytes (T-cells), B lymphocytes (B-cells) and natural killer cells (NK-cells)
[104]. Figure 2.2 gives the brief developmental process of HSC to dierent blood cells.
Blood is one of the most regenerative tissues in the organism, and new blood cells are
constantly replenished into the organism’s metabolic cycle [113]. Due to this ability of
blood cells, the hematopoietic system has been successfully used in regenerative medicine
for more than three decades. At present, many malignant blood disorders, such as leukemia,
are treated primarily through HSCs transplantation. Transplanting healthy HSCs can help
patients quickly restore the health of their hematopoietic system by taking advantage of its
highly regenerative power. To maintain proper function of the hematopoietic system, an
organism must generate the appropriate amount of specic blood cells at the appropriate
7
2.2. INTRODUCTION TO HEMATOPOIESIS
time and location. Hence, HSCs must continually determine their cell fates correctly, such
as when to initiate proliferation, self-renew, dierentiate, which lineage to develop into,
and when to undergo apoptosis process. Whereas each cell has the potential to select for
dierent fates, the output of distinct mature cell types is balanced and regulated at the
population level. If the normal mechanisms of cell fate determination of HSCs are disrupted
so that the distribution of dierent cell types produced is skewed [47]. This can lead to a
variety of blood disorders, such as leukemia. Thus, it is imperative to unravel the regulatory
mechanisms for cell fate determination of HSCs.
At the molecular level, a number of TFs act as key regulators to control the cell fate
determination of HSCs and operate within organized regulatory programs that can be
modelled as regulatory networks [2,16,47]. For example, Mef2c is abundantly expressed
in HSCs and CLPs. Whereas the expression declines when CMPs dierentiate into GMPs
and MEPs [32]. Studies reported that Mef2c is a direct target gene of of Scl/Tal1 and
regulates the megakaryopoiesis and B-cell homeostasis [43]. In addition, Scl/Tal1 is essen-
tial for hematopoiesis, maturation of both megakaryopoiesis and erythropoiesis [32]. The
Runx family is another collection of key transcription factors. Runx1 is primarily involved
in the dierentiation and self-renewal of HSCs, while Runx2 and Runx3 are important in
its maintenance [15]. The GATA family is also play a signicant role in the development
and dierentiation of HSCs. Especially, the GATA1/2/3 are required for HSC prolifera-
tion and dierentiation into erythrocytes and megakaryocytes [143]. Gata3 is abundantly
expressed in CMPs, and it is a master regulator of T cell dierentiation while inhibiting
their dierentiation into B-cells [138].
In this thesis, we mainly focus on the fate determination of HSCs in the myeloid lineage
for the choice between MEPs lineage and GMPs lineage. The genetic complex GATA1-
GATA2-PU.1 is a very important module for the cell fate determination of HSCs between
erythrocytes or granulocytes dierentiation [41,76,77]. GATA2 is mainly expressed in the
HSCs and MPPs and regulates the hematopoiesis. The initial expression levels at HSCs are
high for gene GATA2 but low for genes GATA1 and PU.1, since gene GATA2 controls the
production and proliferation of HSCs by an autoregulation [77]. The erythroid commitment
of progenitors decreases the expression of GATA2, while expression of GATA1 increases with
erythroid dierentiation. Experimental studies suggested that GATA1 directly represses
GATA2 transcription and GATA2 and GATA1 sequentially bind the same cis-elements (as
shown in Figure 2.3). This transition process is referred to as the GATA-switching [56,124].
Therefore, the GATA-switching is an essential driver of hematopoiesis [14]. Moreover, the
GATA1-PU.1 complex forms a double negative feedback module, in which each gene inhibits
the expression of the other [41]. HSCs are more likely to choose MEPs lineage with high
expression levels of GATA1, or conversely to choose GMPs lineage with high expression
8
2.2. INTRODUCTION TO HEMATOPOIESIS
levels of PU.1. Recently it has been elucidated that the fate determination of HSCs between
MEPs and GMPs lineage was dened not only by the ratio of GATA1 and PU.1 [51], but
also by a third party during the regulation. For example, FOG-1 is a signicant third party
to regulate the GATA1-PU.1 module [17,85]. Erythropoietin receptor (EpoR) signalling
also acts the essential role in regulating the GATA1-PU.1 Module [166]. Although the
regulatory mechanisms of GATA1-GATA2-PU.1 module in hematopoiesis are relatively
well studied, the connection of this triad complex with other signicant genes as well as the
role of these genes in hematopoiesis are mostly unclear [76].
Figure 2.3: Illustrative diagram of GATA-Switching. Expression levels of GATA1
and GATA2 during GATA-switching process. Created with BioRender.com.
As mentioned above, the cell fate determination of HSCs is governed by genetic regu-
latory networks. Although the regulatory mechanisms have been studied over a century,
there are still many challenging questions regarding cell fate determination in hematopoiesis
[105]. One of the reasons is the experimental validation of functional relationships between
regulator and target genes does not readily scale to a system-wide approach [47]. There-
fore, mathematical modelling and inference methods have become widely used to study the
mechanism of genetic regulations. This use of mathematical models to investigate regu-
latory networks will provide a novel perspective on gene regulation mechanisms. This is
important for researchers to comprehensively analyze the mechanisms of cell fate determi-
nation in the hematopoietic system and to better understand the pathogenesis of malignant
blood disorder, prevention methods, and the development of new treatments.
9
2.3. MATHEMATICAL MODELS FOR GENETIC REGULATION
2.3 Mathematical models for genetic regulation
Mathematical modelling is an important method for studying the detailed regulatory mech-
anisms. Mathematical models can describe the functioning of biological systems more
objectively and accurately than the data itself. Starting in the 20th century, a large
number of mathematical models are used to quantitatively analyze biological systems
[28,50,103,120]. These proposed mathematical formalisms are developed by known ge-
netic regulatory mechanisms[28]. In this section, I will give a brief introduction to the most
common quantitative approaches to describe the regulatory mechanism for determining
stem cell fates.
2.3.1 Ordinary dierential equation models
Ordinary dierential equations (ODEs) are most used to analyze genetic regulatory mech-
anism. ODEs describe the production rate of an RNA or a protein as a function of the
presence of other reactants in the transcriptional process with the mathematical form [28]
as follows
dXi
dt =fi(X
X
X),(2.1)
where X
X
Xis a vector of concentrations of reactants with non-negative real valued components.
Next, I will introduce some classical ODE models in analysis of regulatory mechanisms.
2.3.1.1 Linear model
Linear modelling is the most direct method to analyze how the production rate of a protein
or RNA is regulated by dierent reactants [28].
dXi
dt =
n
X
j=1
aijXj(2.2)
where aij is the regulation parameter from the reactant Xjto Xi. If aij is positive (nega-
tive), which means the concentration of reactant Xjwill up-regulate (down-regulate) the
production rate of Xi.
2.3.1.2 Michaelis-Menten formalism
In 1913, Leonor Michaelis and Maud Menten proposed the mathematical model to describe
the kinetics of an enzymatic reaction mechanism [91]. This formalism is based on the two
important rules.
10
2.3. MATHEMATICAL MODELS FOR GENETIC REGULATION
Theorem 2.3.1 (Conservation of mass).For any system closed to all transfers of matter
and energy, the mass of the system must remain constant over time, as the system’s mass
cannot change, so quantity can neither be added nor be removed. Therefore, the quantity
of mass is conserved over time.
Theorem 2.3.2 (Law of mass action).The rate of the chemical reaction is directly pro-
portional to the product of the activities or concentrations of the reactants.
Let us rst consider reaction equation for transcriptional activation,
TF + DNA kf
−−⇀
↽−−
kb
TF−DNA kcat
−−→ P + TF−DNA,(2.3)
which describes the process of a TF binds to a DNA to form a complex TF-DNA. For
simplicity, we neglect the mRNA production process. Then this complex activates the
production of protein (P). Moreover, the TF binding is a reversible process, and we assume
the binding process is fast enough. In this equation kfis a forward reaction rate, kbis a
backward reaction rate and kcat is a catalytic rate. Before we move forward, I need to point
out that we will use the notation [.]represents the concentration of a reactant. Based on
the Theorem 2.3.1, we have the following relationship about the concentration of DNA.
[DNA]total = [DNA]free + [DNA]combined,(2.4)
where [DNA]free and [DNA]combined are actually the concentration of unbinding DNA and
the concentration of the complex TF-DNA, respectively. That is,
[DNA]total = [DNA]+[TF-DNA].(2.5)
More specically, we have
the concentration of free DNA
=the concentration obtained from the backward reaction of TF-DNA
−the concentration consumed from the forward reaction to TF-DNA.
Then, based on the Theorem 2.3.2, we have the following dierential equation to describe
the rate of change of the concentration of enzyme with respect to the time t
d[DNA]
dt =kb[TF-DNA]−kf[TF][DNA].(2.6)
The similar consideration applied to other reactants, we also have the following three equa-
tions. Since concentration of the complex TF-DNA before and after the transcription
11
2.3. MATHEMATICAL MODELS FOR GENETIC REGULATION
remains same, there is no consumption for TF-DNA in the second (right) reaction.
d[TF]
dt =kb[TF-DNA]−kf[TF][DNA],(2.7)
d[TF-DNA]
dt =kf[TF][DNA]−kb[TF-DNA],(2.8)
d[P]
dt =kcat[TF-DNA].(2.9)
Assume the reaction between forward and backward reaches the chemical equilibrium, which
means that the both reactants and products are present in the rate of change of the con-
centration which have no further tendency to change, that is,
kf[TF][DNA] = kb[TF-DNA].(2.10)
Then substitutes (2.5) into (2.10), we have
kf[TF]([DNAtotal]−[TF-DNA]) = kb[TF-DNA](2.11)
Rearrange the equation, we have
[TF-DNA] = kf[DNA]total [TF]
kb+kf[TF],(2.12)
=[DNA]total[S]
kb
kf+ [TF],(2.13)
=[DNA]total[TF]
Kd+ [TF],(2.14)
where Kd=kb/kfis the dissociation rate of the complex TF-DNA. Finally, we substitute
(2.14) into (2.9), we will have the rate of product P is produced by the enzyme reaction.
The expression is given by
d[P]
dt =kcat
[DNA]total[TF]
Kd+ [TF](2.15)
=Vmax
[TF]
Kd+ [TF],(2.16)
where Vmax =kcat[DNA]total is the maximum reaction rate of product P. This is a increasing
function of TF concentration.
Then, we consider the reaction for transcriptional repression, which means that if a TF
binds to a DNA, blocking the production of protein P. In this case, we need to consider the
12
2.3. MATHEMATICAL MODELS FOR GENETIC REGULATION
equation separately.
TF + DNA kf
−−⇀
↽−−
kb
TF−DNA,(2.17)
DNA kcat
−−→ P + DNA.(2.18)
The same idea as above, based on the Theorem 2.3.1 and Theorem 2.3.2, we know that
[DNA]total = [DNA]+[TF-DNA],(2.19)
d[P]
dt =kcat[DNA],(2.20)
kf[TF][DNA] = kb[TF-DNA].(2.21)
Then, we substitute (2.19) into (2.21) and rearrange the equation, we have
[TF-DNA] = kf[DNA]total [TF]
kb+kf[TF].(2.22)
Next, we substitute (2.21) into (2.22),
kf
kb
[TF][DNA] = kf[DNA]total [TF]
kb+kf[TF](2.23)
=⇒[DNA] =
kb
kf[DNA]total
kb
kf+ [TF]=Kd[DNA]total
Kd+ [TF](2.24)
Finally, we substitute (2.24) into (2.20). The rate equation for the production of protein is
given by
d[P]
dt =kcat
Kd[DNA]total
Kd+ [TF](2.25)
=Vmax
Kd
Kd+ [TF](2.26)
Thus, we have discussed the production rate of a protein for transcriptional activation and
repression, respectively. However, a protein has a half-life that allows it to degrade. Based
on these results, we can have the following key points:
1. If the production a protein is up-regulated by a TF, The concentration of a protein
is given by
d[P]
dt =Vmax
[TF]
Kd+ [TF]−k∗[P],(2.27)
13
2.3. MATHEMATICAL MODELS FOR GENETIC REGULATION
where k∗is the degradation rate of protein P.
2. If the production a protein is down-regulated by a TF, The concentration of a protein
is given by
d[P]
dt =Vmax
[Kd]
Kd+ [TF]−k∗[P].(2.28)
3. We can also add the logic operator into the Michaelis-Menten formalism with multiple
dierent TFs [103].
• If the production a protein is down-regulated by a transcriptional factor A AND
up-regulated by a transcriptional factor B, we can develop the following model
d[P]
dt =VAKA
KA+ [TFA]·VB[TFB]
KB+ [TFB]−k∗[P].(2.29)
• If the production a protein is down-regulated by a transcriptional factor A OR
up-regulated by a transcriptional factor B, we can develop the following model
d[P]
dt =VAKA
KA+ [TFA]+VB[TFB]
KB+ [TFB]−k∗[P].(2.30)
2.3.1.3 Hill equation
In 1910, Archibald Hill proposed a classical nonlinear ODE model to describe the sigmoidal
oxygen binding curve of haemoglobin [50]. Since then, Hill equation have been applied
to explore the mechanisms in a wide range of genetic regulatory networks and biological
systems. For example, the genetic toggle switching was achieved by the models with Hill
equations [42]. In addition, Hill equation was employed to formalize the mechanisms of
cell fate determination [52,69,156] and Hill equation with high cooperativity was used to
realize the tristability [52]. Recently, Hill equation was also used to discover a regulatory
network of 52 genes with the uniform activation and repression strengths [73].
To better understand the concept of Hill equation, we need to know the following
denition.
Denition 2.3.1 (Cooperative binding).The binding of a ligand to a molecule is often
enhanced if there are already other ligands present on the same molecule.
For example, let us consider the chemical reaction in gene transcription for activation.
DNA + nTF kf
−−⇀
↽−−
kb
TFn−DNA kcat
−−→ P + TFn−DNA,(2.31)
14
2.3. MATHEMATICAL MODELS FOR GENETIC REGULATION
which describes the process of nsame TFs bind simultaneously to a DNA. The Cooperative
binding is still a reversible process. In this equation, nis the number of TFs in the chem-
ical reaction. The same idea is applied when consider the transcriptional repression with
cooperative binding. The derivation of Hill functions is highly similar with the Michaelis-
Menten formalism. The only dierence is we replace [TF]with [TF]nduring the algebraic
operation. We therefore can summarise the following key points for Hill functions:
1. If the production a protein is up-regulated by TF, The concentration of a protein is
given by
d[P]
dt =Vmax
[TF]n
Kd+ [TF]n−k∗[P],(2.32)
=Vmax
[TF]n
Kn+ [TF]n−k∗[P],(2.33)
where nis called the Hill coecient, which measures degree of cooperativity. Kdcan
be rewritten as Kd=Kn, we say that Kis the dissociation constant with the Hill
coecient n.
2. If the production a protein is down-regulated by a TF, The concentration of a protein
is given by
d[P]
dt =Vmax
Kn
Kn+ [TF]n−k∗[P],(2.34)
3. We can also add the logic operator into Hill equation with multiple dierent TFs
[103].
• If the production a protein is down-regulated by a transcriptional factor A AND
up-regulated by a transcriptional factor B, we can develop the following model
d[P]
dt =VAKnA
A
KnA
A+ [TFA]nA·VB[TFB]nB
KnB
B+ [TFB]nB−k∗[P].(2.35)
• If the production a protein is down-regulated by a transcriptional factor A OR
up-regulated by a transcriptional factor B, we can develop the following model
d[P]
dt =VAKnA
A
KnA
A+ [TFA]nA+VB[TFB]nB
KnB
B+ [TFB]nB−k∗[P].(2.36)
It is clear to see that the Michaelis-Menten formalism is actually a special case of Hill
function with the Hill coecient n= 1. That is, Michaelis-Menten formalism does not
consider any cooperativity in chemical reactions.
15
2.3. MATHEMATICAL MODELS FOR GENETIC REGULATION
2.3.1.4 Shea-Ackers formalism
Another widely used approach is the Shea-Ackers formalism for studying the mechanism
of genetic regulatory networks [120]. It models gene transcriptions from the thermody-
namic view. For example, Tian used the Shea-Ackers formalism to realize the mechanisms
of GATA-switching and designed an eective algorithm to realize tristability [136]. The
structure of this formalism is similar to the Michaelis-Menten formalism. To understand
this formalism, we need the following biological terminologies.
Denition 2.3.2 (Promoter).In genetics, a promoter is a sequence of DNA to which RNA
polymerase bind that initiate transcription of a single RNA from the DNA downstream of
it.
Denition 2.3.3 (RNA polymerase (RNAP)).RNAP is an enzyme that binds to a pro-
moter for basal transcription.
Denition 2.3.4 (Basal transcription).Basal transcription is dened as the level of tran-
script produced by RNAP in the absence of regulation by TF.
Consider both TF and RNAP, there are four possible binding situations: (1) Nothing bound,
(2) TF bound, (3) RNAP bound and (4) TF-RNAP bound. The rst two situations mean
that transcription is “o” and the last two situations mean that transcription is “on”.
The Shea-Acker formalism describes the transcription rate by the ratio of the statistical
weight of all “on” states (Zon) to the total statistical weight of all “o” and “on” states
(Ztotal =Zo +Zon). Let us consider the following example, suppose a production rate of
protein is regulated by a TF and RNAP. The four scenarios are presented in Table 2.1.
In the columns of “TF binding site” and “RNAP binding site”, 0means unbound and 1
means bound. In the column of “Status”, 0means transcription is inactive and 1means
transcription is active. In addition, Case 1 is the reference state, which represents there is
nothing bound to the DNA, the binding constant for the reference state is always 1. Case 2
is the case of basal transcription since only RNAP bound to the DNA. Moreover, the rate
is calculated based on the Theorem 2.3.1. We will use the (partial) sum of these rates to
obtain the statistical weight for both “o” states and “on” states.
In this case, we can have the following ODE model to describe the production rate of
protein P
d[P]
dt =Zon
Zo +Zon
=β1[RNAP] + β3[TF][RNAP]
1 + β1[RNAP] + β2[TF] + β3[TF][RNAP].(2.37)
16
2.3. MATHEMATICAL MODELS FOR GENETIC REGULATION
TF RNAP Status Binding Rate
binding site binding site constant
Case 1 0 0 0 1 1
Case 2 0 1 1 β1β1[RNAP]
Case 3 1 0 0 β2β2[TF]
Case 4 1 1 1 β3β3[TF][RNAP]
Table 2.1: Example of Shea-Ackers formalism
2.3.2 Stochastic dierential equation models
There are many dierent sources of variation that exist in gene expression data. In par-
ticular, the chemical reactions in gene transcription are intrinsically stochastic, leading to
variation in the production of proteins [35]. However, deterministic models cannot describe
the noise in gene transcription. Stochastic model was therefore proposed to investigate the
function of noise in regulating cell fate determination. Numerical results also suggested that
uctuations of protein numbers may lead stem cells to dierent developmental pathways.
Developing stochastic models to understand noise in gene expression data therefore has
signicant implications for understanding regulatory mechanisms. The stochastic dieren-
tial equation (SDE) model is constructed from an ODE model with an extra noise term.
Suppose the general ODE to describe the dynamics of gene transcription is given by
d[P]
dt =f(x)−k∗[P],(2.38)
where xis a vector of TFs and f(x)represents the regulation of these TFs. The noise term
in SDE is based on the Wiener process.
Denition 2.3.5 (Wiener Process).Wiener process {W(t)}is a stochastic process with
the following properties:
1. (Independence of increments) W(t)−W(s), for t > s, is independent of the past
W(u),0≤u≤s.
2. (Normal increments) W(t)−W(s)has normal distribution with mean 0and variance
t−s.
3. (Continuity of paths) W(t),t≥0are continuous function of t.
Based on these basis, we can have the following SDEs with a noise strength parameter µ
[133]
17
2.3. MATHEMATICAL MODELS FOR GENETIC REGULATION
• SDE model with additive noise
d[P] = (f(x)−k∗[P])dt +µdW (t).(2.39)
This model only consider the stochastic properties explicitly in gene expression data.
Stochasticity may occur during the process of protein degradation. Thus, the following
two models are proposed.
• SDE model with multiplicative noise in protein degradation
d[P] = (f(x)−k∗[P])dt +µk∗[P]dW (t).(2.40)
• SDE model with Langevin noise in protein degradation
d[P] = (f(x)−k∗[P])dt +µpk∗[P]dW (t).(2.41)
These three SDE models above fail to take stochasticity into account in the transcrip-
tional process. We therefore also have two SDE models that consider stochasticity in gene
transcription to address this issue as follows.
• SDE model with multiplicative noise in the process of gene transcription
d[P] = (f(x)−k∗[P])dt +µbf(x)dW (t).(2.42)
• SDE model with Langevin noise in the process of gene transcription
d[P] = (f(x)−k∗[P])dt +µpbf(x)dW (t).(2.43)
Moreover, we can also combine any one of degradation’s and transcriptional noise, respec-
tively, together to form a SDE model, such as
• SDE model with Langevin noise in the process of gene transcription and protein
degradation
d[P] = (f(x)−k∗[P])dt +µ1pbf(x)dW1(t) + µ2pk∗[P]dW2(t),(2.44)
where W1(t)and W2(t)are two independent Wiener processes.
18
2.4. INFERENCE METHODS FOR GENETIC REGULATION
2.4 Inference methods for genetic regulation
Genome-wide gene expression and kinase activity measurements have become feasible due
to signicant advancements in high-throughput technologies such as microarray gene ex-
pression data, DNA/RNA sequencing data, and single-cell sequencing data. Given the
abundance of experimental data available, researchers hope to gain new insights and in-
spiration for unknown regulatory mechanisms from experimental observations. Therefore,
nding an ecient inference method based on experimental data becomes a major challenge
in computational biology and bioinformatics. There are many inference methods that have
been proposed for inferring genetic regulatory networks. For those interested in this eld,
it is recommended to read these papers for more details [20,83,98,110,123,129]. In this
section, I only introduce two types of methods for inferring genetic regulatory networks
that will be used in this thesis.
2.4.1 Static model
An inference method, which identies the structure of regulatory networks from experi-
mental data, is termed as a static model. Probabilistic graphical model is a useful tool
as a static model to predict the regulatory network between dierent components in this
system. A gene network is represented by a graph Gwith a set of nodes (genes) Kand a
set of edges E. The nodes of the graph are modelled as random variables and the edges
represent the interaction between them [99]. There are two popular graphical model used
for regulatory network inference.
One type of probabilistic graphical models is the Gaussian graphical model (GGM),
which provides a simple and eective method to characterize the regulatory relationship
between genes. The GGM is based on the calculation of the conditional dependencies
among genes using the gene expression data. If a n-dimensional continuous random vector
X
X
X∈
R
n(ngenes) has the probability density function
f(X
X
X) = 1
(2π)n/2|Σ
Σ
Σ|1/2exp −1
2(X
X
X−µ
µ
µ)TΣ
Σ
Σ−1(X
X
X−µ
µ
µ),(2.45)
we say that random vector X
X
Xfollows a multivariate Gaussian distribution N
N
N(µ
µ
µ,Σ
Σ
Σ)with
the mean parameter µ
µ
µ∈
R
nand the positive denite covariance matrix Σ
Σ
Σ[140].
Given an undirected graph Gwith a set of nodes (genes) Kand a set of edges E. The
partial correlation between genes iand jis estimated to search for the best independence
graph [147,149]. The edge connecting two genes in the graph is neglected if they are
conditionally independent given all other genes [64]. That is, a n-dimensional continuous
random vector X
X
X(ngenes) with a multivariate normal distribution said to satisfy the
19
2.4. INFERENCE METHODS FOR GENETIC REGULATION
Gaussian graphical model with graph G, if X
X
Xfollows a multivariate Gaussian distribution
N
N
N(µ
µ
µ,Σ
Σ
Σ)with
(Σ
Σ
Σ−1)i,j = 0 for all(i, j )/∈E. (2.46)
Therefore, the graph Gis described by the sparse matrix of Σ
Σ
Σ−1. Based on the theory, the
GGM with Forward Search Algorithm (FSA) was proposed to infer the network structure
[147]. Here, I give a brief description of the FSA, which is used in Chapter 3, as follows,
Forward Search Algorithm (FSA)
1. Let X= (X1, . . . , XN)be a vector with Nelements, nbe the number of genes. An
initial empty graph Gis built by the identity matrix with n-dimensions.
2. An iterative maximum likelihood estimates algorithm [30] is used to compute the
covariance matrix , Cov(G), of the initial graph Guntil the maximum deviance
dierence is almost unchanged. In our study, we set this threshold value as 0.0001.
3. An edge Eiis added into the initial graph and then we compute a new covariance
matrix, denoted as Cov(Ei), by the iterative maximum likelihood estimates as step
2. The signicance of the added edge Eiis tested by the deviance dierence, which is
approximately Chi-square distribution with one degree of freedom. A p-value of the
Chi-square test is used as the model selection criteria.
4. After all edges have been tested, all p-values are ranked by descending order. If the
smallest p-value of an added edge Eiis smaller than a predened cuto p-value (e.g.
cuto p-value = 0.05), the edge is added to the initial graph G, and then go back to
step 2. Repeatedly computing from step 2 to step 4 until the ranked p-value of added
edge is larger than the predened cuto p-value.
5. Based on the recording of added edges from step 4, the algorithm shows the sparse
matrix of graphical model and the partial correlation coecient matrix.
6. Finally, nodes represent genes; edges describe the regulatory relationship between
a pair of nodes. Moreover, the sparse matrix elucidates the regulatory relationship
existing between genes and the partial correlation coecient matrix tells us regarding
the positive or negative regulation between two components.
2.4.2 Dynamic model
An inference method, which examines the mechanisms through dynamical properties of
interactions between network components, is termed as a dynamic model. For a network
20
2.5. NUMERICAL METHODS FOR PARAMETER ESTIMATION
with Ngenes, the following dynamic model comprising all network components is used to
capture the dynamical behavior of genetic regulation,
dXi
dt =PN
j=1 aijXnj
j
1 + PN
j=1 bijXnj
j−kiXifor i= 1,··· , N, (2.47)
where Xidenotes expression levels of the i-th gene and coecient aij denotes regulations
from the j-th gene to the i-th gene. The regulation may be positive (aij >0) or negative
(aij = 0) if the corresponding coecient (bij >0). In addition, when bij >0, it is assumed
that the i-th gene can autoregulate itself if aii >0. Moreover, if aij =bij = 0, there will
be no regulation from the j-th gene to the i-th gene. Coecients njand kiare the Hill
coecient of gene jand degradation rate of gene i, respectively.
As the dynamic model has many unknown parameters, it is generally dened that
this inference method is a problem for estimating unknown parameters from experimental
observations. Based on the estimated results, we can then infer the regulatory types and
strengths of the unknown regulatory mechanisms.
2.5 Numerical methods for parameter estimation
In previous sections, I have introduced several models, both mathematical models that
study dynamical properties based on known regulatory mechanisms and dynamic models
that make predictions about unknown mechanisms based on experimental data. The pa-
rameter estimation of these models is an essential part to study the genetic regulations
in detail. In this section, I will outline several existing numerical methods for parameter
estimation.
2.5.1 Genetic algorithm
Genetic algorithms are an eective stochastic search method for nding unknown param-
eters of a mathematical model when the search space is associated with a complex error
landscape. Given the gene expression data and developed mathematical model, the algo-
rithm begins by generating a population of initial parameter values. Each initial value is
called an individual and the whole population is called one generation. Then it calculates
the tness value for each individual of current generation. Highly t individuals will have
higher probability of being selected to take part in the next stage than the less t individuals
[22]. Based on the tness values, the algorithm next creates new values for each individual
by modifying the selected individuals and thus forms a population of the next generation.
This process is repeated until a pre-dened number of generations have been calculated. In
21
2.5. NUMERICAL METHODS FOR PARAMETER ESTIMATION
Chapter 3 and Chapter 4, we employed the following functions in MATLAB to implement
the Genetic algorithm for estimating the unknown dynamic model parameters, namely
function crtbp to generate initially binary populations, function reins to eect tness-based
reinsertion, function select to give a convenient interface to the selection routines, function
recombine to conduct crossover operators, and function mut to conduct binary and integer
mutations. The detailed information of these functions and their alternatives can be found
in the relevant reference [22].
2.5.2 Approximate Bayesian computation
Since the likelihood function reects the probability of observed data tting a particular
statistical model, it quanties the evidence supporting certain parameter values and models
selection in all simulation-based statistical inferences. It is generally possible to nd an
analytic solution for the likelihood function for simple models. However, analytical solutions
for likelihood functions are dicult or impossible to obtain for more complex models and
may be prohibitively expensive to compute. Approximate Bayesian computation (ABC) is
a computational methodology based on Bayesian statistics which can be used to estimate
posterior distributions of model parameters. This method is highly useful for simulation-
based models since it does not need to determine the likelihood function. The algorithm
begins with a set of candidate model parameters, b
θ, which is generated by Monte Carlo
sampling techniques with a user-dened prior probability distribution, π(θ). Based on the
mathematical model, the algorithm then use these candidate parameters to simulate the
given gene expression data. Then, a distance function, ρ(X
X
X,X∗
X∗
X∗), such as absolute deviations
and root mean squared errors is used to measure the similarity between the experimental
data (X
X
X) and the simulated data (X∗
X∗
X∗). Finally, if the distance obtained is less than the pre-
dened cuto value (ε0), we will keep this set of candidate model parameters. Otherwise,
we will discard them, resample the parameters and repeat the steps until ρ(X
X
X,X∗
X∗
X∗)< ε0.
In Chapter 5, the ABC rejection-sampling algorithm is used to estimate the mathematical
model parameters [10,139]. The pseudo-code of this algorithm is shown as follows.
2.5.3 Robustness analysis
Robustness is described in biological systems as a system’s capacity to operate appropri-
ately in the midst of both internal and external uncertainty [27,147]. If a model with the
estimated parameters is not robust, a perturbation to the parameters might lead to sub-
stantial variations of the model output. Therefore, robustness has been more popular in
recent years as a critical criterion for determining the ideal network structure or model pa-
rameters from estimated results [5,86]. The robustness property of a mathematical model
22
2.5. NUMERICAL METHODS FOR PARAMETER ESTIMATION
Algorithm 1: ABC rejection-sampling algorithm
Data: Gene expression data
Result: Estimated model parameters
1Given gene expression data (X
X
X)
2Set the prior distribution π(θ)
3Set the cuto value ε0
4Dene the distance function ρ(X
X
X,X∗
X∗
X∗)
5for 1≤i≤Ndo
6while ρ(X
X
X,X∗
X∗
X∗)> ε0do
7Sample the candidate parameters b
θfrom the prior distribution π(θ)
8Solve mathematical models and generate simulated data X∗
X∗
X∗with b
θ
9Calculate the distance ρ(X
X
X,X∗
X∗
X∗)
10 end
11 Store θi←b
θ
12 end
with respect to a set of perturbations Pis dened as the average of an evaluation function
DS
a,P of the system over all perturbations p∈P. Here we propose that evaluation function
is Σi,txit (p), where xit(p)is the expression level of gene iat time point twith a specic
perturbation p∈P. Thus, the average gene expression level over all perturbed model
parameters is given by
RM
a,p =X
i,t Zp∈P
P
(p)xit(p)dp, (2.48)
where
P
(p)is the probability of the perturbation p. In addition, the impact of perturbations
on the system performance is evaluated by
RN
a,p =X
i,t Zp∈P
P
(p)(xit −xit(p))2dp. (2.49)
This average value should be close to the simulated gene expression levels xit obtained from
the unperturbed rate constants. In perturbation test, based on the inferred parameter ki
that is assumed to be the unperturbed one, the perturbed parameter is generated by
ki=ki×(1 + µ×ε),(2.50)
where εis a sample generated from either the normal distribution or the uniform distribu-
tion. In addition, µis the control parameter to determine the perturbation strength.
For a set of estimated parameters, we can rstly obtained Nsets of perturbed model
parameters by using (2.50) and then use these perturbed parameter sets to obtain Ncorre-
23
2.6. NUMERICAL METHODS FOR SIMULATION OF MATHEMATICAL MODELS
sponding simulations. We used x(k)
it (p)and x(k)
it to denote the simulation value of variable xi
(1 ≤i≤m)at time point t(1 ≤t≤M)obtained by the k-th perturbed and unperturbed
model parameters, respectively. Then, we dened
E(k)=v
u
u
tm
X
i=1
M
X
t=1
(x(k)
it (p)−x(k)
it )2(2.51)
as the measure for the robustness property of the model with the k-th perturbed parameter
set. Afterwards, we dened the robust average for the given parameter set as
RA =1
N
N
X
k=1
E(k),(2.52)
and robust standard deviation as
RST D =v
u
u
t1
N−1
N
X
k=1
(E(k)−RA)2(2.53)
over Nperturbation tests. Smaller values of RA and RSTD mean that the model with
the given parameter set is more robust. In Chapter 3 and Chapter 4, we use robustness
property (2.52) and (2.53) of the model to select the optimal model parameter sets from
estimated candidates.
2.6 Numerical methods for simulation of mathemati-
cal models
We always try to describe complex and nonlinear systems, such as genetic regulatory sys-
tems, with some complicated mathematical models. It is either time-consuming or impos-
sible to nd analytical solutions for most nonlinear deterministic or stochastic models. In
this case, we can use appropriate numerical methods to nd solutions to these mathematical
models within acceptable approximation error ranges to further simulate and analyze the
mechanism of the described system. In this section, I will introduce some classical methods
for simulating ordinary dierential equations and stochastic dierential equations, respec-
tively.
24
2.6. NUMERICAL METHODS FOR SIMULATION OF MATHEMATICAL MODELS
2.6.1 Numerical simulation of ODE models
This subsection, based on textbooks written by Chapra [19] and Quarteroni et al. [111], is
devoted to solving ODEs of the form
dy
dt =f(t, y).(2.54)
Recalling the rst principle of dierentiation, the rate of change (slope) for a function is
dened as
dy
dt = lim
∆t→0
∆y
∆t = lim
∆t→0
y(ti+1)−y(ti)
ti+1 −ti
,(2.55)
where ∆y =y(ti+1)−y(ti)and ∆t =ti+1 −tiare dierences in values of yand time t
computed over nite intervals, y(ti)is the value of yat time ti, and y(ti+1)is the value
of yat one-step later time ti+1. If ∆t is small enough, the slope can be approximated by
dierences over nite intervals. That is,
dy
dt ≈∆y
∆t =y(ti+1)−y(ti)
ti+1 −ti
.(2.56)
Thus, the equation (2.56) can be rewritten more concisely as
y(ti+1) = y(ti) + dy
dt ∆t, (2.57)
=y(ti) + Φ∆t, (2.58)
where the slope dy
dt =Φis called an increment function. From this general form, we can
easily determine the future value y(ti+1)if we know the previous value y(ti), the time step
∆t and the increment function Φ. These approaches, which determine a future value by
previous value at a single time point, are called the Runge-Kutta methods. The Euler’s
method is the simplest Runge-Kutta method, which Leonhard Euler described in his book
and published in 1768 [36].
2.6.1.1 Euler’s method
Euler’s method is a rst order numerical method for solving ODEs with a given initial
value. Let the increment function Φis provided by the ODE evaluated at tiand y(ti). That
is,
Φ=f(ti, y(ti)).(2.59)
25
2.6. NUMERICAL METHODS FOR SIMULATION OF MATHEMATICAL MODELS
Then, we will have the following formula of Euler’s method,
y(ti+1) = y(ti) + f(ti, y(ti))∆t. (2.60)
Euler’s method uses the slope and old value at the beginning of the time interval tito
nd the new value y(ti+1)at the end of the time interval ti+1 . The pseudocode of Euler’s
method is given below.
Algorithm 2: Euler’s method
1Given the increment function Φ=dy
dt =f(t, y)
2Given the initial value y(t0), nite time span t= [t0, tT]and time step dt
3Calculate the number of iterations n=tT−t0
dt
4Set a result vector Y
Y
Ywith ncomponents and let Y
Y
Y(1) = y(t0)
5Set told =t0,steps = 1 and yold =Y
Y
Y(1) = y(t0)
6for 2≤i≤ndo
7ynew =yold +dt ∗f(told, yold)
8steps =steps + 1
9told =told +dt
10 Store Y
Y
Y(steps)←ynew
11 yold =ynew
12 end
13 Display Y
Y
Yas the trajectory of the solution over the time span t= [t0, tT]
There are two types of errors for the numerical solution of Euler’s method. One is
the round-o error resulting from the limited signicant gures that can be retained by a
computer. The second is a global truncation error (GTE) that consists of a local truncation
error (LTE) and a propagated truncation error. The local truncation error is the error
produced by a single step approximation for the value of ythat is the dierence between the
numerical solution y(ti+1)and the exact solution at time ti+1. Furthermore, the propagated
truncation error is caused by the approximation during the preceding steps. Thus, the
global truncation error is the sum of these errors, which is the cumulative eect of the
local truncation errors determined at each step over the whole time span. To nd the local
truncation error, we need to nd the exact solution at time ti+1. First, we consider the
Taylor expansion of the function yaround time ti
y(ti+1) = y(ti) + ∆ty′(ti)+(∆t)2y′′ (ti) + O((∆t)3).(2.61)
Moreover, the numerical solution of y(ti+1 )by Euler’s method is y(ti+1) = y(ti) + ∆ty′(ti).
26
2.6. NUMERICAL METHODS FOR SIMULATION OF MATHEMATICAL MODELS
Thus,
LTE =Exact solution −Numerical solution ,(2.62)
=y(ti+1)−y(ti)−∆ty′(ti),(2.63)
= (∆t)2y′′(ti) + O((∆t)3),(2.64)
=O((∆t)2).(2.65)
It is clear to show that the LTE is proportional to (∆t)2. In addition, the number of
iteration over the time span t= [t0, tT]is determined by tT−t0
∆t, which is proportional to
1
∆t. Therefore, the GTE is proportional to ∆t, that is, GTE =O(∆t). Moreover, this
conclusion tell us that we can reduce the LTE/GTE of Euler’s method by decreasing the
time step size, ∆t, and if the solution of the dierential equation (increment function) is
linear, the Euler’s method will give an exact solution for y(ti+1).
Noted that Euler’s method provides an exact solution if the increment function is a
rst-order polynomial function (linear function), this is why Euler’s method is also referred
to as a rst-order method. In general, an nth-order method will provide an exact solution
if its increment function is an nth-order polynomial function, and the LTE and GTE of an
nth-order method are O((∆t)n+1)and O((∆t)n), respectively. In this thesis, we use the
Euler’s method to solve all dierential equations models.
2.6.1.2 Runge-Kutta methods
A key rationale behind the application of Euler’s method is that the derivative at the time
point, yti, is assumed to be unchanged over the entire one-step time interval, [yti, yti+1 ].
Thus, in addition to decreasing the ∆t, we can also improve Euler’s method by applying
other methods to estimate the derivative over the time interval. These methods belong to
a family of iterative methods called Runge-Kutta methods.
The general form of Runge-Kutta methods is the following equation
y(ti+1) = y(ti) + Φ∆t (2.66)
with the increment function
Φ=a1k1+a2k2+· ·· +ankn,(2.67)
27
2.6. NUMERICAL METHODS FOR SIMULATION OF MATHEMATICAL MODELS
where the coecients aare constants and the coecients kare
k1=f(ti, y(ti)),(2.68)
k2=f(ti+p2∆t, y(ti) + q2,1k1∆t),(2.69)
k3=f(ti+p3∆t, y(ti)+(q3,1k1+ +q3,2k2)∆t),(2.70)
.
.
.
kn=f(ti+pn∆t, y(ti)+(qn,1k1+qn,2k2+·· · +qn,n−1kn−1)∆t),(2.71)
where the coecients pand qare constants. All coecients a,pand qare determined by
setting Eq. (2.66) with nth-order equal to the corresponding nth-order Taylor expansion.
Coecient k1is obtained from the ODE evaluated at tiand y(ti), then the remaining
coecients kare determined by recurrence relationships. I will then introduce the various
types of Runge-Kutta methods that are most commonly used by researchers. The detailed
derivations of these methods are beyond the scope of this thesis, which we will not discuss
here.
1. First-order Runge-Kutta methods: When n= 1 in the increment function, we
can derive the following equation from the Eq. (2.66),
y(ti+1) = y(ti) + a1f(ti, y(ti))∆t (2.72)
If a1is assumed to be 1, we can get the formula for Euler’s method. The LTE and
GTE of the rst-order Runge-Kutta methods are O((∆t)2)and O(∆t), respectively.
As we discussed in the last subsection, the rst-order method will provide an exact
solution if the increment function is linear
2. Second-order Runge-Kutta methods: When n= 2 in the increment function,
we can derive the following equation from the Eq. (2.66),
y(ti+1) = y(ti)+(a1k1+a2k2)∆t, (2.73)
where
k1=f(ti, y(ti)),(2.74)
k2=f(ti+p2∆t, y(ti) + q2,1k1∆t).(2.75)
By setting Eq. (2.73) equal to the corresponding second-order Taylor expansion, we
can have the following methods with dierent combinations of coecients.
28
2.6. NUMERICAL METHODS FOR SIMULATION OF MATHEMATICAL MODELS
• When a1=a2=1
2and p2=q2,1= 1, we have
y(ti+1) = y(ti)+(1
2k1+1
2k2)∆t, (2.76)
where
k1=f(ti, y(ti)),(2.77)
k2=f(ti+∆t, y(ti) + k1∆t).(2.78)
This method is known as Heun’s method. Since k1and k2are the derivative
evaluated at the beginning and end of the time interval [yti, yti+1 ], respectively,
the key rationale behind the application of Heun’s method is that the increment
function is assumed to be the average of start and end gradients across the whole
time interval. The pseudocode of Heun’s method is given below.
Algorithm 3: Heun’s method
1Given the function f(t, y)
2Given the initial value y(t0), nite time span t= [t0, tT]and time step dt
3Calculate the number of iterations n=tT−t0
dt
4Set a result vector Y
Y
Ywith ncomponents and let Y
Y
Y(1) = y(t0)
5Set told =t0,steps = 1 and yold =Y
Y
Y(1) = y(t0)
6for 2≤i≤ndo
7k1=f(told, yold )
8k2=f(told +dt, yold +k1∗dt)
9ynew =yold +dt
2∗(k1+k2)
10 steps =steps + 1
11 told =told +dt
12 Store Y
Y
Y(steps)←ynew
13 yold =ynew
14 end
15 Display Y
Y
Yas the trajectory of the solution over the time span t= [t0, tT]
• When a1= 0,a2= 1 and p2=q2,1=1
2, we have
y(ti+1) = y(ti) + k2∆t, (2.79)
where
k1=f(ti, y(ti)),(2.80)
k2=f(ti+∆t
2, y(ti) + k1
∆t
2).(2.81)
29
2.6. NUMERICAL METHODS FOR SIMULATION OF MATHEMATICAL MODELS
This method is known as the Midpoint method. Since k2is the derivative evalu-
ated at the midpoint of the time interval [yti, yti+1 ], the key rationale behind the
application of the Midpoint method is that the increment function is assumed
to be the midpoint gradient across the whole time interval.
The LTE and GTE of the second-order Runge-Kutta methods are O((∆t)3)and
O((∆t)2), respectively. Therefore, the second-order method will provide an exact
solution if the increment function is quadratic. The pseudocode of the Midpoint
method is given below.
Algorithm 4: The Midpoint method
1Given the function f(t, y)
2Given the initial value y(t0), nite time span t= [t0, tT]and time step dt
3Calculate the number of iterations n=tT−t0
dt
4Set a result vector Y
Y
Ywith ncomponents and let Y
Y
Y(1) = y(t0)
5Set told =t0,steps = 1 and yold =Y
Y
Y(1) = y(t0)
6for 2≤i≤ndo
7k1=f(told, yold )
8k2=f(told +dt
2, yold +k1∗dt
2)
9ynew =yold +dt ∗k2
10 steps =steps + 1
11 told =told +dt
12 Store Y
Y
Y(steps)←ynew
13 yold =ynew
14 end
15 Display Y
Y
Yas the trajectory of the solution over the time span t= [t0, tT]
3. Fourth-order Runge-Kutta methods: When n= 4 in the increment function, we
can derive the following equation from the Eq. (2.66),
y(ti+1) = y(ti)+(a1k1+a2k2+a3k3+a4k4)∆t, (2.82)
where
k1=f(ti, y(ti)),(2.83)
k2=f(ti+p2∆t, y(ti) + q2,1k1∆t),(2.84)
k3=f(ti+p3∆t, y(ti)+(q3,1k1+q3,2k2)∆t),(2.85)
k4=f(ti+p4∆t, y(ti)+(q4,1k1+q4,2k2+q4,3k3)∆t).(2.86)
The most commonly used method in the Runge-Kutta family is the classical fourth-
order Runge-Kutta method with a1=a4=1
6,a2=a3=1
3,p2=p3=q2,1=q3,2=1
2,
30
2.6. NUMERICAL METHODS FOR SIMULATION OF MATHEMATICAL MODELS
q3,1=q4,1=q4,2= 0 and p4=q4,3= 1. That is,
y(ti+1) = y(ti) + 1
6(k1+ 2k2+ 2k3+k4)∆t, (2.87)
where
k1=f(ti, y(ti)),(2.88)
k2=f(ti+1
2∆t, y(ti) + 1
2k1∆t),(2.89)
k3=f(ti+1
2∆t, y(ti) + 1
2k2∆t),(2.90)
k4=f(ti+∆t, y(ti) + k3∆t).(2.91)
The key rationale behind the application of the classical fourth-order Runge-Kutta
method is similar to the Heun’s method in that the increment function is assumed to
be an improved average gradient by multiple estimates across the whole time interval.
In addition, the LTE and GTE of the fourth-order Runge-Kutta methods are O((∆t)5)
and O((∆t)4), respectively. Therefore, the fourth-order method will provide an exact
solution if the increment function is a 4th-order polynomial. The pseudocode of the
classical fourth-order Runge-Kutta method is given below.
Algorithm 5: The classical fourth-order Runge-Kutta method
1Given the function f(t, y)
2Given the initial value y(t0), nite time span t= [t0, tT]and time step dt
3Calculate the number of iterations n=tT−t0
dt
4Set a result vector Y
Y
Ywith ncomponents and let Y
Y
Y(1) = y(t0)
5Set told =t0,steps = 1 and yold =Y
Y
Y(1) = y(t0)
6for 2≤i≤ndo
7k1=f(told, yold )
8k2=f(told +dt
2, yold +k1∗dt
2)
9k3=f(told +dt
2, yold +k2∗dt
2)
10 k4=f(told +dt, yold +k3∗dt)
11 ynew =yold +dt
6∗(k1+ 2k2+ 2k3+k4)
12 steps =steps + 1
13 told =told +dt
14 Store Y
Y
Y(steps)←ynew
15 yold =ynew
16 end
17 Display Y
Y
Yas the trajectory of the solution over the time span t= [t0, tT]
Depending on the choice of order n, we can also derive other Runge-Kutta methods.
The dierent methods are similar in concept in terms of the solution of the coecients and
31
2.6. NUMERICAL METHODS FOR SIMULATION OF MATHEMATICAL MODELS
follow the pattern we discussed before for both LTE and GTE. Higher-order Runge-Kutta
methods are certainly more accurate, but the number of steps required for computation
is also greater, thus reducing the computational eciency. Therefore, when using these
methods to solve ODEs, we should balance accuracy and computational eciency to choose
a more appropriate method, rather than pursuing more complex and advanced methods.
2.6.2 Numerical simulation of SDE models
This subsection is devoted to solving the Itô SDE driven by a one-dimensional Wiener
process of the form
dX(t) = µ(X(t), t)dt +σ(X(t), t)dW (t)(2.92)
for 0≤t≤Twith initial condition X(0) = X0≤ ∞. In this SDE, functions µ(X(t), t)and
σ(X(t), t)are called the drift and the diusion coecient, respectively; X(t)is an arbitrary
process and W(t)is a Wiener process. If for all t >≥0we have Rt
0|µ(X(s), s)|ds < ∞and
Rt
0σ2(X(s), s)ds < ∞, then the solution of the Itô SDE is given by
X(t) = X0+Zt
0
µ(X(s), s)ds +Zt
0
σ(X(s), s)dW (s).(2.93)
To nd the approximate solution of the Itô SDE, it is useful to discretize the SDE by
dividing the time interval [0, T ]into nequal subintervals of width ∆t =T
n>0, such as
0 = t0< t1< t2< t3<··· < tn−1< tn=T. (2.94)
Then, the following two methods are introduced to solve this SDE, namely Euler-Maruyama
method and Milstein method. I will not give a comprehensive introduction here on the
theoretical foundations behind these methods. A rigorous description and explanation of
these underlying theories can be found in Kloeden and Platen’s textbook [61] and Higham’s
paper [49]. The methods presented in this subsection are also based on these references.
2.6.2.1 Euler-Maruyama method
The simplest numerical method for solving the Itô SDE is the Euler-Maruyama (EM)
method. EM method is also referred to as the explicit Euler method, which is an extension
of the Euler method for ODEs. Since the time interval is discretized, then the SDE can
be solved by evaluating the X(t)at each time point recurrently if the initial condition is
32
2.6. NUMERICAL METHODS FOR SIMULATION OF MATHEMATICAL MODELS
given. That is,
X(tj+1) = Xtj+Ztj+1
tj
µ(X(s), s)ds +Ztj+1
tj
σ(X(s), s)dW (s).(2.95)
The EM method approximates the two integrals as follows:
Ztj+1
tj
µ(X(s), s)ds ≈µ(X(tj), tj)(tj+1 −tj) = µ(X(tj), tj)∆t, (2.96)
Ztj+1
tj
σ(X(s), s)dW (s)≈σ(X(tj), tj)(W(tj+1)−W(tj)) = σ(X(tj), tj)∆W (tj)(2.97)
Then, for 0≤j≤n−1, the EM method takes the form of
X(tj+1) = Xtj+µ(X(tj), tj)∆t +σ(X(tj), tj)∆W (tj).(2.98)
Noted that if σ(X(tj).tj)=0and Xtjis a constant, the EM method reduces to Euler’s
method for solving ODEs. Since the increment of the Wiener process, DeltaW (tj) =
(W(tj+1)−W(tj)), is normally distributed with mean 0and variance ∆t, the independent
standard Gaussian pseudorandom number generator randn in the MATLAB can be used to
generate the increment of Wiener process ∆W (tj)at each time point by setting ∆W (tj) =
√∆t ∗randn. The pseudocode of the EM method is given below.
Algorithm 6: The Euler-Maruyama method
1Set an arbitrary random seed value Rfor repeated experiments randn(’seed’, R)
2Given the function µ(x, t)and σ(x, t)
3Given the initial value x0, nite time span t= [0, T ]and time step dt
4Calculate the number of iterations n=T
dt
5Set a result vector X
X
Xwith ncomponents and let X
X
X(1) = x0
6Set told = 0,steps = 1 and xold =X
X
X(1) = x0
7for 2≤i≤ndo
8xnew =xold +µ(xold, told)∗dt +σ(xold, told)∗√dt ∗randn
9steps =steps + 1
10 told =told +dt
11 Store X
X
X(steps)←xnew
12 xold =xnew
13 end
14 Display X
X
Xas the trajectory of the solution over the time span t= [0, T ]
There are also two other methods based on the similar idea, the only dierence being
that these two methods use dierent time points to evaluate functions µ(X(tj), tj)and
σ(X(tj), tj)
33
2.6. NUMERICAL METHODS FOR SIMULATION OF MATHEMATICAL MODELS
1. The semi-implicit Euler method:
X(tj+1) = Xtj+µ(X(tj+1 ), tj+1)∆t +σ(X(tj), tj)∆W (tj).(2.99)
2. The implicit Euler method:
X(tj+1) = Xtj+µ(X(tj+1 ), tj+1)∆t +σ(X(tj+1 ), tj+1)∆W (tj).(2.100)
In the EM and semi-implicit Euler methods, the left-endpoint value at tjis used to evaluate
the integral Rtj+1
tjσ(X(s), s)dW (s)based on the denition of the Itô integral. Therefore,
the numerical solutions of these two methods converge to exact solution of the Itô SDE. In
contrast, in the implicit Euler method, the right-endpoint value at tj+1 is used to evaluate
the integral Rtj+1
tjσ(X(s), s)dW (s)based on the denition of backward stochastic integrals.
Therefore, the numerical solution of this methods converge to exact solution of the right-
endpoint SDE rather than the Itô SDE [134]. Even though EM is the simplest method
for solving SDE, this method is only strongly convergent with order 0.5if the drift and
diusion functions µ(X(tj), tj)and σ(X(tj), tj)satisfy appropriate conditions (see detail in
[61]). It implies that if we want to decrease the approximation error by 10 times, we need
to make the time step size 100 times smaller than before. As a result, the computation
time will be 100 times greater than before.
2.6.2.2 Milstein method
Milstein method is a numerical scheme for solving SDE, which is strongly convergent with
order 1if the drift and diusion functions µ(X(tj), tj)and σ(X(tj), tj)satisfy some appro-
priate conditions. That means if we want to decrease the approximation error by 10 times,
we only need to make the time step size 10 times smaller than before. The Milstein method
is obtained by truncating the stochastic Taylor series in the form of
X(tj+1) =Xtj+µ(X(tj), tj)∆t +σ(X(tj), tj)∆W (tj)
+1
2σ(X(tj), tj)σ′(X(tj), tj)((∆W (tj))2−∆t).(2.101)
The pseudocode of the Milstein method is implemented in a similar way as the EM method.
See the detail below.
34
2.7. REVIEW OF MATHEMATICAL MODELLING IN HEMATOPOIESIS
Algorithm 7: Milstein method
1Set an arbitrary random seed value Rfor repeated experiments randn(’seed’, R)
2Given the function µ(x, t)and σ(x, t)
3Find the derivative of σ(x, t),σ′(x, t)
4Given the initial value x0, nite time span t= [0, T ]and time step dt
5Calculate the number of iterations n=T
dt
6Set a result vector X
X
Xwith ncomponents and let X
X
X(1) = x0
7Set told = 0,steps = 1 and xold =X
X
X(1) = x0
8for 2≤i≤ndo
9dw =√dt ∗randn
10 xnew =
xold +µ(xold, told)∗dt+σ(xold, told )∗dw+1
2∗σ(xold, told )∗σ′(xold, told)∗((dw)2−dt)
11 steps =steps + 1
12 told =told +dt
13 Store X
X
X(steps)←xnew
14 xold =xnew
15 end
16 Display X
X
Xas the trajectory of the solution over the time span t= [0, T ]
2.7 Review of mathematical modelling in hematopoiesis
As we discussed in Section 2.2, the genetic module GATA1-GATA2-PU.1 is signicant in
the process of lineage specication of HSC. Therefore, several mathematical models have
been proposed to investigate the interactions within this module. This section presents a
brief review of existing models for studying genetic regulation in hematopoiesis.
The rst mathematical model to study the regulatory mechanism of genes GATA1 and
PU.1 is proposed in the form of the Shea–Ackers formalism [114]. The assumption of the
model is based on the experimental observations as outlined below:
1. Both GATA1 and PU.1 positively regulate self-transcription and activate transcrip-
tion of each other in the form of GATA1 and PU.1 homodimers.
2. The formation of GATA1/PU.1 heterodimers inhibits both GATA1 and PU.1 expres-
sion.
Moreover, some regulatory eects are ignored in the model studies for simplicity, such as
post-transcriptional regulation, time lags and interactive regulation with other TFs. We
assume that xand yare the molecular concentrations of GATA1 and PU.1, respectively,
35
2.7. REVIEW OF MATHEMATICAL MODELLING IN HEMATOPOIESIS
and the model is proposed in a dimensionless form as
dx
dt =sx2+ukuy2
1 + x2+kuy2+krxy −x, (2.102)
dy
dt =sy2+ukux2
1 + kux2+y2+krxy −y. (2.103)
Detailed descriptions of this model can be found in the referenced paper [114]. This model
sheds new light on the mechanisms underlying HSCs dierentiation. In case the strength
of each regulation type varies (i.e., the parameters vary), the system may have a dierent
number of stable solutions. The model also indicates that if a gene is over-expressed sud-
denly at a stable stable, the entire system can be shifted from one stable state to another.
As a rst mathematical model of GATA1-PU.1 module,
Hill equation has also been used to study the cell fate determination of HSCs. Based
on the double-negative feedback loop of GATA1 and PU.1 with positive autoregulations,
Huang et al. proposed the following model
dx
dt =a1
xn
θn
1+xn+b1
θn
2
θn
2+yn−k1x, (2.104)
dy
dt =a2
yn
θn
3+yn+b2
θn
4
θn
4+xn−k2y, (2.105)
where xand yrepresent expression levels of GATA1 and PU.1, respectively; a1,2,b1,2
and θ1,2,3,4are non-negative parameters. In addition, k1and k2are degradation rates of
GATA1 and PU.1, respectively. This model suggested that there are two stable states and
one unstable state for some parameter values. The unstable state can be found at the
boundary between the two basins of attraction of two stable states. The unstable state
can be seen as a progenitor cell capable of choosing MEP lineage or GMP lineage. This
binary state choice of the system also provided me with great inspiration for my later work
in Chapter 5. While we can easily design a binary choice model, is it possible to construct
a higher-order multistable model by embedding multiple bistable systems so that we can
more accurately describe a larger system? In Chapter 5, I will introduce a novel framework
for constructing multistable systems.
Tian and Smith-Miles proposed a mathematical model for GATA1-GATA2-PU.1 module
based on the Shea–Ackers formalism [136]. This is the rst stochastic model to study
the mechanism of GATA-switching and the function of noise in cell fate determination of
HSCs. The model is based on the regulatory mechanism shown in Figure 2.4. The detailed
assumptions can be found in the referenced paper [136]. Let x,yand zbe the molecular
concentrations of GATA1,GATA2 and PU.1, respectively. To describe the mechanisms
of GATA-switching, Tian introduced an additional rate constant k∗
2over a time interval
36
2.7. REVIEW OF MATHEMATICAL MODELLING IN HEMATOPOIESIS
GATA1
PU.1
GATA2
Figure 2.4: The network structure of GATA1-GATA2-PU.1. ’→’ and ’⊣’ denote the
activating and inhibiting regulations, respectively.
[t1, t2]for the displacement rate of GATA2 proteins during the process of GATA-switching.
Since the displacement of GATA2 protein increasing, the concentration of GATA1 proteins
around the binding site will increase proportionally to k∗
2. Hence, the rate µk∗
2yfor the
increase of GATA1 during GATA-switching, where µis a control parameter to adjust the
availability of GATA1 proteins around chromatin sites. Then, the mathematical model is
proposed as follows
dx
dt =a1x+a2y
a3+a4x+a5y+a6z+a7xy −k1x+µk∗
2y, (2.106)
dy
dt =b1y
b2+b3x+b4y+b5z+b6yz −k2y−k∗
2y, (2.107)
dz
dt =c1z
c2+c3x+c4y+c5z+c6xz +c6yz −k3z. (2.108)
The simulation result shows that the model successfully realize the tristability if the model
parameters satisfy some necessary conditions. However, the deterministic model cannot de-
scribe the heterogeneity in cell fate determination of HSCs. Therefore, Tian also introduced
the following stochastic model by using Poisson τ-leap method [135].
x(t+τ) = x(t) +
P
a1x+a2y
a3+a4x+a5y+a6z+a7xy τ−
P
[k1xτ] +
P
[µk∗
2yτ ],(2.109)
y(t+τ) = y(t) +
P
b1y
b2+b3x+b4y+b5z+b6yz τ−
P
[k2yτ ]−
P
[k∗
2yτ ],(2.110)
z(t+τ) = z(t) +
P
c1z
c2+c3x+c4y+c5z+c6xz +c6yz τ−
P
[k3zτ ].(2.111)
In this stochastic model, the molecular concentrations of three genes are treated as numbers
of molecular copies in the dierentiation process. The model successfully revealed a range of
37
2.7. REVIEW OF MATHEMATICAL MODELLING IN HEMATOPOIESIS
cell proportions resulting in a variety of dierentiation pathways. This is also the rst time
a discrete stochastic model has been used to simulate the cell fate determination of HSCs
with a multimodal distribution, and it also gives testable predictions about the mechanisms
behind the realization of distinct dierentiation lineages.
In summary, mathematical modelling is a powerful tool to accurately describe the dy-
namics of hematopoiesis and to explore the regulatory mechanisms for controlling the tran-
sitions between dierent cell types. Beyond the models described above, many more have
been used to study the regulatory mechanism of hematopoiesis [33,92,96]. In addition,
bifurcation theory is also an ecient method to explore the mechanisms of GATA1-PU.1
module [13]. Moreover, the underlying mechanisms of how the stem/progenitor cells leave
the stable steady states and commit to a specic lineage were also revealed with the assis-
tance of mathematical models [95]. At the single-cell level, mathematical modelling and in-
ference methods also helped to reconstruct the genetic regulatory network in hematopoiesis
from the experimental data [47,102]. Moreover, mathematical models have also been used
to study the dynamical properties of blood diseases such as periodic hematological disorders
and leukemia. [82,127].
38
3
Forward search algorithm for inferring genetic
regulatory networks
The objective of this chapter is to study a framework for inferring the detailed dynamical
mechanism of genetic regulatory networks. Inference of genetic networks is an important
task to explore and predict the regulatory mechanism inside the cell. Although a number
of algorithms have been designed to reverse-engineer regulatory networks eectively, it is
still a challenge to introduce nonlinearity into dynamic models eectively. Recently, Wang
and his team have proposed the probabilistic graphical models for microarray data analysis
[145]. Based on this method, in this chapter, we introduce a novel framework for inferring
genetic networks with nonlinearity to address these issues [153]. A new dynamic model
using ordinary dierential equations with exponential function is introduced to understand
the nonlinearity. Using hematopoietic stem cell fate determination as a test problem, this
work successfully constructs two networks for erythroid and granulocyte dierentiation
respectively, each of which involves 11 genes. Numerical results suggest that our new
framework is able to provide accurate realizations of the system states. This work provides
new ideas to infer regulatory networks eectively and explore novel regulatory mechanisms.
In this chapter, we rst introduce experimental data and selected candidate genes, then
reconstruct the genetic regulatory networks in fate determination of HSCs by Forward
Search Algorithm as the top-down approach, and we also introduce the nonlinear dynamic
model in genetic regulation as the bottom-top approach. To reduce the number of unknown
parameters, we combine both top-down and bottom-up approaches together. Finally we
present numerical results.
39
3.1. EXPERIMENTAL DATA
3.1 Experimental data
3.1.1 Database background
In this work, we used the sub-series GSE49987 as the experimental data from the published
microarray dataset GSE49991 [88]. This dataset contains the expression proles collected by
experiments using the cell line FDCPmix. This dataset was generated with the probe name
version of Agilent Whole Mouse Genome Microarray 4×44K[88]. It provides microarray
gene expression proles of hematopoietic stem cells (HSCs) dierentiating into erythrocytes
and neutrophils. To convert all microarray probe IDs to gene names, we pre-processed this
dataset based on the Ensembl BioMart and GO Enrichment Analysis [131]. From a previous
study, the regulatory network of 18 core genes during the hematopoiesis has been curated
[93]. Moreover, the same research team studied the regulatory interaction of 26 core genes
during the hematopoiesis [94]. The total number of distinct genes in these two studies is
30. Thus, in our work we considered 30 genes whose names are listed in Table 3.1. There
are three repeated experiments for each developmental process, each of which contains the
expression levels of 30 genes from HSCs to dierentiated cells at 30 time points spanning
over one week. The observation time points are those starting from the HSCs/progenitors
stage (1 point), then every 2 hours over the rst day (12 points), every 3 hours over the
second day (8 points), every 4 hours over the third day (6 points), every 24 hours until the
fth day (2 points), and the seventh day (1 point). In this study, we used the average data
of these three repeated tests as the experimental data for each time point.
3.1.2 Selection of candidate genes
Based on our research experience [147], it is challenging to study a dynamic network with
30 genes. Thus, we conducted an extensive literature review for selecting a smaller number
of important genes based on their relationship with the three genes GATA1, GATA2 and
PU.1. These candidate genes should be essential for the cell-fate choice in hematopoiesis,
or they signicantly interact with these three genes. For example, gene Scl/Tal1 inter-
acts with GATA1, Eto2/Cbfa2t3 and Ldb1 [45], and is a regulator in the dierentiation of
hematopoietic stem cells (HSCs) [109,112,122,164]. In addition, Eto2/Cbfa2t3 regulates
the dierentiation of HSCs by repressing the expression of target gene Scl/Tal1 [45]. More-
over, Ldb1 is a signicant TF for the dierentiation of erythroid lineage [125]. According
to the ChIPSeq analysis, Ldb1 is necessary for HSCs to control their maintenance since it
binds to the majority of enhancer elements in hematopoiesis [74].
We also included a number of genes with potential regulatory relationship with the
three genes GATA1, GATA2 and PU.1. For example, it was indicated that there might be
40
3.1. EXPERIMENTAL DATA
Reference Paper Number of Genes Name of Genes
Moignard et al.,
2013, Figure 1 18
GATA1, GATA2, PU.1, G1, G1b,
Hhex, Ldf1, Lmo2, Lyl1, Meis1,
Mitf, Nfe2, Runx1, Tal1, Etv6,
Erg, Cbfa2t3
Moignard et al.,
2015, Figure 3 26
Mesi1, Mitf, Etv2, Fli1, Tal1,
GATA1, Hoxb4, Lyl1, Notch1, Sox7,
PU.1, Ets1, Erg, Nfe2, Cbfa2t3,
Lmo2, Myb, Hoxb2, Sox17, G1,
G1b, Hhex, Tbx3, Tbx20, FoxH1,
Ikaros
This chapter 30
GATA1, GATA2, PU.1, G1, G1b,
Hhex, Ldb1 Lmo2, Lyl1, Meis1,
Mitf, Nfe2, Runx1,Tal1, Etv6,
Erg, Cbfa2t3, Etv2, Fli1, Hoxb4,
Notch1, Sox7, Ets1, Hoxb2, Sox17,
Tbx3, Tbx20, FoxH1, Ikaros, Myb
Table 3.1: Information of the 30 candidate genes for dierentiation of hematopoi-
etic stem cells. The 30 genes in “This chapter” are the combination of the genes in two
published studies.
unclear regulations between GATA2 and G1 [93]. G1 is an important TF in the regu-
lation of HSCs dierentiation [66,141]. G1 is required for the dierentiation of common
lymphoid progenitors (CLPs) and common myeloid progenitors (CMPs) from HSCs and
exists in the majority of HSCs, CLPs and CMPs. Similar to gene G1, gene Runx1 is
also expressed in most HSCs and progenitor cells as well. Then, G1 and/or Runx1 are
expressed continually in most cells which dierentiate into the granulocyte lineage [100].
Lmo2 is a master regulator of hematopoiesis [54]. However, its specic role in regulation is
still unclear. Experimental studies suggested that the knockdown of Lmo2 does not aect
the expression of GATA1 and Scl/Tal1 [54]. However, the overexpression of Lmo2 gene also
inhibited erythroid dierentiation [144]. In addition, gene Ets1 is a suppressor in the ery-
throcyte dierentiation. It is downregulated in erythrocyte dierentiation by binding to and
activating the GATA2 promoter [81]. The last candidate gene is Notch1 that inhibits the
dierentiation of granulocyte lineage by maintaining the expression of gene GATA2. It also
enhances the HSCs dierentiate to CLPs [65,128]. Therefore, in this study we considered
the regulatory networks with the following 11 genes: GATA1, GATA2, PU.1/Sfpi1, Runx1,
Eto2/Cbfa2t3, Ets1, Notch1, Scl/Tal1, Ldb1, G1 and Lmo2. The detailed information of
the references for these 11 genes is also given in Table 3.2.
41
3.2. METHODS
Name of Genes Reference
GATA1 Friedman, 2007; Liew et al., 2006; Ling et al., 2004
GATA2 Friedman, 2007; Liew et al., 2006; Ling et al., 2004
PU.1 Friedman, 2007; Liew et al., 2006; Ling et al., 2004
Runx1 North et al., 2004
Cbfa2t3 Goardon et al., 2006
Ets1 Lulli et al., 2006
Notch1 Kumano et al., 2001; Stier et al., 2002
Tal1 Goardon et al., 2006; Shivdasani et al., 1995;
Zhang et al., 2005; Porcher et al., 1996; Real et al., 2012
Ldb1 Soler et al., 2010; Li et al., 2011
G1 North et al., 2004; van der Meer et al., 2010; Lancrin et al., 2012
Lmo2 Inouea et al., 2013; Visvader et al., 1997
Table 3.2: Literature information for the selected 11 genes in this study. These
genes are selected from Table 3.1 based on their relationship with the three genes GATA1,
GATA2 and PU.1.
3.2 Methods
3.2.1 Top-down approach: static model
In this study, we use the Gaussian graphical model with a forward search algorithm (FSA)
[146] as the static model for inferring the structure of regulatory networks. FSA is used
to predict gene-gene interactions based on the time series data from microarray dataset.
However, since the number of hetero-dimers is much larger than the number of genes, we
cannot derive the probabilistic graphic model using the algorithm mentioned above directly.
We will provide an extended algorithm to construct regulatory networks with a large number
of putative hetero-dimers in the next chapter.
3.2.2 Bottom-up approach: dynamic model
In this work, we introduce a dynamic model that considers nonlinearity in genetic regula-
tions as the bottom-up approach for studying the detailed regulatory mechanism. For a
gene network with ngenes, the expression level of the i-th gene is denoted at xi(t)at time
t. The following general model describes the dynamics of the network as follows:
˙
x
x
x=dx
x
x
dt =F(t,x
x
x),(3.1)
42
3.2. METHODS
where x
x
x= (x1, . . . , xn)is the expression level of ngenes which consist of eleven genes
we discussed above. A number of dynamic models have been proposed to describe the
regulatory relationship, such as the linear model
dxi
dt =X
j
aijxj(3.2)
Although linear model is more ecient than nonlinear model, nonlinear model capture
more information than linear model. In our study, it assume that there are nonlinear
regulatory relationships existing in the proposed network. Thus, we propose the following
dynamic model, given by
dxi
dt =Fi(t,x
x
x) = bi
ci+diexp {Pn
j=1,j=iαij xj}−kixi,(3.3)
where xirepresents the expression level of a single gene, and kiis the degradation rate of
xi. In addition, bi,ciand diare arbitrary constants in this model. To avoid the confusion
between degradation and self-regulation, we assume that αii = 0 for each gene and do not
discuss the self-regulation.
In this model, the value of the model coecients represents the putative regulations.
If the value of the αij is positive (negative or zero), it means that gene xjmay activate
(inhibit or has no relation to) the expression of gene xi. The network we studied in this
work contains 11 genes, thus, the derived system (3.3) has 11 ordinary dierential equations
in total.
3.2.3 Parameter inference
In this study, we used a MATLAB toolbox [22] to infer the unknown ODE model parameters
based on the inferred network structures. We use 500 generations and 100 individuals per
generation for each estimate of model parameters in the simple genetic algorithm. There are
three dierent types of parameters (αij, bi, ci, di, ki)in our dynamic model. We assume that
the initial rate constants follow the uniform distribution [0, Wmax]. To ensure our model
works properly, based on the numerical test, we determine the value of Wmax for parameters
(αij, bi, ci, di, ki)are (1,1,1,1,1), respectively by numerical tests. For each parameter, we
rst select an initial value of Wmax to infer model parameters. If certain estimates are
very close to Wmax, the value of Wmax will be increased. Otherwise, the value of Wmax is
decreased if the estimated values are substantially smaller than Wmax . The nal estimate
of rate constants are dierent by using dierent setup of random seed in the algorithm
(i.e., the initial estimate of rate constants are changed). For each dynamic model, we infer
200 sets of model parameters and select the top ten sets with minimal errors for further
43
3.3. RESULTS
analysis.
The error of an estimation was measured by the L2-norm between the simulated ex-
pression level and original microarray date. The total error is calculated by
E=
N
X
i=1
M
X
j=1
(xi(tj)−x∗
ij)2(3.4)
where xi(tj)and x∗
ij are the experimentally and simulated measured gene expression levels
at time point tj(j= 1,2, . . . , M ), respectively.
3.2.4 Robustness analysis
As described in Chapter 2, we next used the robustness property of the model to select
the inferred model parameter sets from the Genetic algorithm. It is assumed that the
perturbation sample εin (2.50) is generated from a standard Gaussian distribution N(0,1).
We have tested various values of µ, to ensure the perturbation has enough impact on the
simulation process, we use µ= 0.1in this robustness analysis.
For each of the top ten sets of parameters determined in the previous subsection, we
rstly obtained N= 5000 sets of perturbed model parameters by using (2.50) and then
used these parameter sets to obtain 5000 corresponding simulations. Then, based on the
robustness property (2.52) and (2.53), we determine the optimal model parameter sets from
estimated candidates.
3.3 Results
3.3.1 Inference of regulatory network
To improve the precision of dynamic model. We rst use the published MATLAB package -
FSA [145] to predict the regulatory network in the fate determination of HSCs. The cuto
p-value controls the number of edges in this network. Since there are eleven genes in our
regulatory network. We setup the p-value so that there are no more than 25 edges in this
study. When the graph is non-directional, it means that on average each gene receives about
5 regulations from other genes, which is a reasonable value for the sparse genetic regulation.
In order to have 25 edges between these eleven genes, we setup the p-value as 0.1 for HSCs
choosing MEL and p-value as 0.05 for HSCs choose granulocyte and macrophage lineage
(GML).
Using FSA algorithm, an undirected graph between single gene is obtained which is
a sparse structure of the network for single genes. This network consist of 25 undirected
44
3.3. RESULTS
edges between dierent single genes. The predicted regulatory network for two dierent
lineages are presented in Figure 3.1 and Figure 3.2.
3.3.2 Inference of dynamic model
After the success implementation of the top-down approach, we obtain a predicted regu-
latory network of 11 genes. In the next step, we will derive the detailed dynamics of this
network. The rst step is to estimate the unknown parameters in our dynamic model.
When a full connection network in (3.3) is considered, there are N2+ 4N= 165 unknown
parameter in our model. After the sparse approximation by FSA the number of unknown
parameter is reduced to 50 + 44 = 94 (50 directed edges, 11 self-degradation terms and
33 arbitrary constants). Based on the partial correlation coecient obtained by FSA and
normalized microarray data, we apply the genetic algorithm to estimate these 94 unknown
parameters. The genetic algorithm is implemented by using dierent random seeds for
samples, which leads to dierent initial samples and then dierent estimates of parameters.
We obtain ten sets of estimated model parameter with minimal simulation errors for the
erythroid and neutrophil networks, respectively. Then we analyze the robustness property
for these ten sets of estimated parameters. Numerical results suggest that the optimal
estimation of unknown parameters for erythroid dierentiation is the estimation set with
estimation error 0.6003, moreover, the robust mean and standard deviation is 272.4482
and 99.7386, respectively. In addition, the optimal estimation of unknown parameters for
neutrophil dierentiation is the estimation set with estimation error 0.7328, robust mean
266.9088 and robust standard deviation 120.1733.
Figure 3.3 and Figure 3.4 show simulation results with the optimal estimation set for
the expression levels of four genes - GATA1,GATA2,PU.1 and Ets1, for two dierent cell
fate choices. We clearly see that GATA1 activity steady climb in the microarray data and
our simulation during the erythroid dierentiation, however, it keeps uctuation in the
microarray data and we estimate the expression level of GATA1 almost unchanged during
neutrophil dierentiation. For GATA2, during both dierentiated processes, the expression
level instantly increase and then keep steady, and slightly increase for the expression level
of GATA2 during neutrophil dierentiation. Our simulation results indicates that the ex-
pression level of GATA2 gradually increase for both cell fate choices. The expression level
of PU.1 displays the completely opposite trend for two dierentiated processes. Similarly,
the expression level of Ets1 rises with uctuation during erythroid dierentiation and de-
cline with uctuation during neutrophil dierentiation. To summary, the simulation result
almost ts the trend of expression level of genes.
45
3.3. RESULTS
Figure 3.1: The genetic regulatory networks of eleven genes predicted by FSA
are related to fate determination of HSCs. Regulatory network for HSCs choose
megakaryocyte-erythroid lineage. The network is visualized by Cytoscape software.
Figure 3.2: The genetic regulatory networks of eleven genes predicted by FSA
are related to fate determination of HSCs. Regulatory network for HSCs choose
granulocyte-macrophage lineage. The network is visualized by Cytoscape software.
46
3.3. RESULTS
Figure 3.3: Simulation result of the regulatory network with eleven genes for
erythroid dierentiation. Red dash line: microarray data; Blue solid line: simulation of
the regulatory network
3.3.3 Reduction of network model - edge deletion
The networks predicted by the FSA algorithm are undirected among eleven core genes.
There are 25 undirected edges in each of our two predicted network with 11 genes (Figure
3.1 and Figure 3.2). Next we simplify the network structure by reducing certain regulations
from the network. In the test of regulation deletion, we evaluated 25 mutual regulatory
interaction in erythroid and neutrophil dierentiation, respectively, to test the potential
insignicant regulation that should be removed from our predicted network. We test the
rst edge based on 25 edges, then based on the simulations of the 25 systems, we delete one
edge if this deletion generates the smallest estimation error among the 25 systems. Then we
further test the system by deleting one of the edges based on the remaining 24 edges, and
then delete one edge with the same standard as mentioned above. The test is repeated until
there is substantial increase in the estimation error and/or decrease in robustness property.
Table 3.3 suggests that, (GATA1, G1) edge is signicant in this regulatory network,
47
3.3. RESULTS
Figure 3.4: Simulation result of the regulatory network with eleven genes for
neutrophil dierentiation. Red dash line: microarray data; Blue solid line: simulation
of the regulatory network
since if we delete this edge the simulation error and robustness property become worse
than optimal estimation set. However, when we delete (Notch1, G1) edge, the estimation
error is lower than the error of optimal estimation set, and the robustness property of the
systems with deletion is better than that of the optimal model. For edge (GATA2, G1),
although the estimation error is lower than the optimal estimation error, the robustness
property is slightly worse. In Table 3.4, it indicates that deleting both (Notch1, G1)
and (GATA2, G1) edges cause poor robustness. This suggests that the variations in
the parameters related to these edges in the optimal model have much inuences on the
simulation error. Thus, for erythroid dierentiation, single (Notch1, G1) or (GATA2,
G1) edges is removable. However, deletion of both edges is not recommended for the
regulatory network of erythrocytic dierentiation. An interesting observation is that these
two deleted edges are related to gene G1. Similarly, Table 3.5 indicates that G1 is a core
regulator in neutrophil dierentiation because of removing the edges connected with G1
48
3.4. SUMMARY
have a worse robustness property than before. These numerical results suggest that edge
deletion is not recommended for this network.
3.4 Summary
In this study we proposed a new dynamic model for inferring genetic regulatory network
between eleven genes in the fate determination of HSCs. To improve the accuracy and
eciency of dynamic models, we rst applied FSA to infer the network topology. Based
on dierent cuto p-value of FSA, we derived the regulatory network of eleven genes for
MEL (p-value = 0.09) and GML (p-value = 0.05). Combining the partial correlation
coecient, sparse matrix and microarray data, we estimated and tested our dynamic model
with dierent random seed. Subsequently, we simulate the tested model with optimal
estimation error and robustness property. Finally, we tested the possibility of removing
regulation edge from our predicted regulation network, according to simulation error and
robustness property. Numerical results indicated that our proposed method and model is
able to provide accurate prediction for inferring the regulatory network among genes. The
proposed new methods can be applied to other complex networks.
Table 3.3: Edge deletion test for erythroid dierentiation. OES represents the
network without any deletion. (RA: robustness property in the mean, RSTD: robustness
property in standard deviation).
Edge pair Estimation Error RA RSTD
OES 0.6003 272.4482 99.7386
GATA1 ↔G1 0.6221 256.9349 111.5465
Notch1 ↔G1 0.5820 270.4757 95.2828
GATA2 ↔G1 0.5603 279.2123 97.9913
Table 3.4: Edge deletion test for erythroid dierentiation. OES represents the
network without any deletion. (RA: robustness property in the mean, RSTD: robustness
property in standard deviation).
Edge pair Estimation Error RA RSTD
OES 0.6003 272.4482 99.7386
Notch1 ↔G1 0.5820 270.4757 95.2828
Notch1 ↔G1
GATA2 ↔G1 0.5504 295.4578 101.2557
49
3.4. SUMMARY
Table 3.5: Edge deletion test for neutrophil dierentiation. OES represents the
network without any deletion. (RA: robustness property in the mean, RSTD: robustness
property in standard deviation).
Edge pair Estimation Error RBNM RSTD
OES 0.7328 266.9088 120.1733
GATA2 ↔G1 0.7725 284.0381 100.9540
Pu.1 ↔G1 0.8339 295.4406 105.4037
GATA1 ↔G1 0.8487 295.7750 105.0168
Tal1 ↔G1 0.8026 281.7749 110.3516
Runx1 ↔G1 0.8798 295.8294 107.7127
Ldb1 ↔G1 0.8391 298.1686 110.4269
Notch1 ↔G1 0.7205 282.2148 103.5167
50
4
Extended forward search algorithm for
inferring genetic regulatory networks
The objective of this chapter is to extend the method developed in Chapter 3 to study
the protein heterodimers and/or synergistic eects involved in genetic regulatory networks.
As introduced in Chapter 2, hematopoiesis is a highly complex developmental process that
produces various types of blood cells [16,97]. Although substantial progress has been made
for understanding hematopoiesis [18,21,52,103,137], the detailed regulatory mechanisms
of protein monomers, heterodimers and/or synergistic eect for the fate determination of
HSCs are still unravelled. In this chapter, we introduce a novel approach to infer the de-
tailed regulatory mechanisms. This work is designed to develop a framework that is able
to realize nonlinear gene expression dynamics accurately. In particular, we intended to
investigate the eect of possible protein heterodimers and/or synergistic eects in genetic
regulation. This approach includes the Extended Forward Search Algorithm to infer net-
work structure (top-down approach) and a nonlinear dynamic model to infer dynamical
property (bottom-up approach). Based on the published experimental data, we study two
regulatory networks of 11 genes for regulating the erythrocyte dierentiation pathway and
the neutrophil dierentiation pathway. The proposed algorithm is rst applied to predict
the network topologies among 11 genes and 55 nonlinear terms which may be for het-
erodimers and/or synergistic eects. Then, the unknown model parameters are estimated
by tting simulations to the expression data of two dierent dierentiation pathways. In
addition, the edge deletion test is conducted to remove possible insignicant regulations
from the inferred networks. Furthermore, the robustness property of the dynamic model is
employed as an additional criterion to choose better network reconstruction results. Our
51
4.1. METHODS
simulation results successfully realized experimental data for two dierent dierentiation
pathways, which suggests that the proposed approach is an eective method to infer the
topological structure and dynamic property of genetic regulations. In this chapter, based
on the selected candidate genes in Chapter 3, I develop a dynamic model and a inference
algorithm, Extended Forward Search Algorithm, to reconstruct the network structure, and
then present numerical results.
4.1 Methods
4.1.1 Top-down approach: static model
To reduce the number of unknown parameters in dynamic model, we used the Gaussian
graphical models as the static model to infer the topological structure of gene regulatory
networks. In this work it is assumed that a system includes genes {G1, . . . , Gm}with
expression levels xij for gene Giat time point j. Compared with the existing methods
that study networks with genes only, this work will study gene networks that include not
only genes in the form of monomers {G1, . . . , Gm}, which are represented by the linear
terms in the model, but also protein heterodimers and/or synergistic eect {Gk-Gl}(k, l =
1, . . . , m), which are represented by the non-linear terms (NLTs) in the model. There are
two reasons for using the NLTs {Gk-Gl}. Firstly, we can use the product of two variables
to represent the synergistic eect of these two genes. Secondly, if the NLT represents
the protein heterodimer, we assumed that the binding and disassociation reactions for the
heterodimer {Gk-Gl}reach an equilibrium state quickly. Thus the level of the heterodimer
{Gk-Gl}can be written as Ckl ×Gk×Gl, where Ckl is the equilibrium constant. We can
consider this constant Ckl as a coecient in our dynamic model. In both cases, we only need
to consider the product of the expression levels of these two genes, namely yklj =xkjxlj ,
as the level of NLT {Gk-Gl}at time tjfor our algorithm computation. Since the number
of possible regulations from NLTs to genes is much larger than that of possible regulations
among genes (i.e., 726 vs 110), the regulations from NLTs to genes will dominate the whole
genetic regulatory system with high probability. However, the regulations among genes
should be the core mechanisms rather than the regulations from NLTs to genes. To avoid
the dominance of NLTs regulations, we assume that the number of regulations from NLTs
to genes does not exceed that between genes.
According to the GGM [145,147], we proposed a new algorithm, named Extended
Forward Search Algorithm (EFSA), to infer the topological structure of regulatory networks
that includes both genes and NLTs. Let X
X
X= (x1, x2. . . , xN)be a vector that consists of
mgenes and nNLTs (N=m+n). The following three matrices are constructed, namely
am×mcovariance matrix A
A
Aof mgenes, a m×ncovariance matrix B
B
Bto measure the
52
4.1. METHODS
covariance between mgenes and nNLTs, and a n×ncovariance matrix C
C
Cof nNLTs. The
N-dimensional matrix M
M
Mis dened by
M
M
M="A
A
A B
B
B
B′
B′
B′C
C
C#,(4.1)
where B′
B′
B′is the transpose of B
B
B. An initial empty graph G
G
Gis built by the N-dimensional
identity matrix. This initial graph G
G
Gconsists of four matrices G1
G1
G1,G2
G2
G2,G3
G3
G3and G4
G4
G4which
have the same dimensions as A
A
A,B
B
B,B′
B′
B′and C
C
C, respectively, namely
G
G
G="G1
G1
G1G2
G2
G2
G3
G3
G3G4
G4
G4#,(4.2)
where G1
G1
G1and G4
G4
G4are identity matrix with dimensions mand n, respectively, and G2
G2
G2and
G3
G3
G3are m×nand n×mzero matrices, respectively.
The proposed algorithm is given below [154].
Extended Forward Search Algorithm (EFSA)
1. Let X
X
X= (x1, x2. . . , xN)be a vector with Nelements, and Nbe the number of
components consist of mgenes and nNLTs. An initial empty graph G
G
Gis built by the
N-dimensional identity matrix, which is dened by (4.2).
2. Substitute all covariance values from the diagonal positions of sub-matrix A
A
Ainto the
corresponding positions of sub-matrix G1
G1
G1, and then based on the updated G1
G1
G1, use the
Iterative Maximum Likelihood Estimates Algorithm (IMLEA) to compute the new
covariance matrix [30].
3. Add an undirected edge E1
ij ((i, j)∈[1, m]2) into G1
G1
G1, namely add the symmetrical
covariance value between the ith gene and jth gene from the positions A
A
A(i, j)and
A
A
A(j, i)into the positions G1
G1
G1(i, j)and G1
G1
G1(j, i), respectively. Then compute a new
covariance matrix by the IMLEA. Based on the deviance dierence between the new
covariance matrix and that before addition, test the signicance of the added edge
E1
ij by using the Chi-square distribution with one degree of freedom. The p-value of
the Chi-square test is used in the next step as the edge selection criterion. Record
the p-value of this tested edge and remove it from G1
G1
G1.
4. Add a new undirected edge into G1
G1
G1. Then, repeat the computation in Step 3. After
all possible undirected edges have been tested, sort all tested edges in ascending order
by their p-values. If the smallest p-value is lower than the predened cut-o value,
add the edge with the smallest p-value into the sub-graph G1
G1
G1permanently.
53
4.1. METHODS
5. Go back to step 3, add the second edge in the updated sub-graph G1
G1
G1. Repeat the
computation in steps 3 and 4 until the smallest p-value of an added edge is larger
than the cuto p-value.
6. Based on the last updated undirected graph G1
G1
G1, the graph orientation rules are ap-
plied to transform the undirected graph into a directed acyclic graph (DAG) [90].
The inferred DAG with m1directed edges, denoted as As
As
As, represents the predicted
regulatory network among mgenes.
7. Test the possible edges between mgenes and nNLTs. Based on the latest matrix G
G
G,
add an undirected edge E2
ij between the ith gene and the jth NLT. That is, add the
symmetrical covariance value between the ith gene and jth NLT from the positions
B
B
B(i, j)and B′
B′
B′(j, i)into the positions G2
G2
G2(i, j)and G3
G3
G3(j, i), respectively. Then, com-
pute a new covariance matrix by the IMLEA [30]. Based on the deviance dierence
between the new covariance matrix and that before addition, test the signicance of
the added edge E2
ij by using the Chi-square distribution with one degree of freedom.
The p-value of the Chi-square test is used as the edge selection criterion. Record the
p-value of this tested edge E2
ij and remove it from G
G
G.
8. Repeat the computation in steps 7 for the regulation between genes and NLTs. The
last updated sub-graph G3
G3
G3with n1edges, denoted as B′
s
B′
s
B′
s, is the predicted directed
regulatory network from nNLTs to mgenes. Since we only consider regulations
among genes and those from NLTs to genes, the result matrix is given as follows,
Gs
Gs
Gs="As
As
As
B′
s
B′
s
B′
s#.(4.3)
The output network includes m1directed edges among mgene and n1directed edges
from nNLTs to mgenes.
Note that we have initially applied the GGM in our previous work to the whole matrix
M
M
Mdirectly [147]. However, since the number of NLTs is much larger than that of genes,
numerical results showed that the majority of selected edges connect NLTs, but few edges
are selected to connect genes. This result is not appropriate because the regulations between
genes should be the primary mechanisms of the network. Then we conducted another test,
in which we did not consider the regulations between NLTs by changing matrix C
C
Cinto an
identity matrix I
I
Im. Matrix M
M
Mnow is
M1
M1
M1="As
As
AsB
B
B
B′
B′
B′Im
Im
Im#.(4.4)
54
4.1. METHODS
However, when we applied the GGM to M1
M1
M1directly, the singular problem arose during
the computation of IMLEA. To satisfy our intention and make the algorithm stable, we
proposed EFSA which is executed in two steps. The rst step selects regulations between
genes and the second step nds regulations from NLTs to genes. The EFSA can be used
to predict the gene-gene interactions and the eect from NLTs to genes based on the time-
course experimental data.
4.1.2 Bottom-up approach: dynamic model
For a regulatory network with mgenes, the expression levels of the i-th gene at time t
is denoted as xi(t). We used the following ordinary dierential equation (ODE) model to
describe the dynamics of the network [28],
dx
x
x
dt =F(t,x
x
x),(4.5)
where x
x
x= (x1, . . . , xm)is a vector representing the expression levels of mgenes. A number
of dynamic formalisms have been proposed to describe the dynamical interactions between
dierent genes in the network, such as the models with linear functions [28],
Fi(t,x
x
x) =
n
X
j=1,j=i
aijxj−kixi(4.6)
or the models with non-linear functions [103],
Fi(t,x
x
x) = Pn
j=1 aijxj
1 + Pn
j=1 bijxj−kixi.(4.7)
The advantage of the model (4.5) with the linear functions (3.2) is that it has a much
smaller number of unknown parameters than the non-linear functions (4.7). However, the
non-linear model is able to describe the non-linear dynamics more precisely. Therefore, we
proposed a method that combines the feature of additive terms in the linear model and the
advantages of non-linear model. We applied the second truncated Taylor series approach to
approximate the non-linear function (4.7). Here the Taylor series is a dynamic formula to
approximate a function by using a polynomial function [126]. Thus, we proposed an ODE
model (4.5) with the following functions [154]
Fi(t,x
x
x) =
m
X
j=1,j=i
αijxj+X
1≤j<k≤n
βijk xjxk−kixi,(4.8)
where kiis the degradation rate of xi. This proposed model (4.5) with the non-linear
55
4.1. METHODS
function (4.8) is based on the following assumptions:
1. The regulations from dierent genes to a particular gene are additive. Similarly, the
regulations from non-linear terms (NLTs) to a particular gene are also additive.
2. The regulations from gene jto gene iis represented by αijxj, where αij is the coe-
cient of regulation strength.
3. The regulation of NLT xjxkto gene iis represented by βijkxjxk, where βijk consists
of the regulation strength and equilibrium constant Cij, as we discussed in the sub-
section Top-down Approach.
4. The auto-regulation is not considered, namely αii = 0, to avoid confusion between
auto-regulation term αiixiand degradation term kixi. Note that the issue of auto-
regulation may be addressed using a model with non-linear function (4.7). In addition,
we just consider the eect of NLTs xjxkfor j=ksince the expression levels of xj
may be highly correlated to that of x2
j. Therefore, we assume that βijj = 0.
5. If the value of αij is positive (negative or zero), it means that gene xjactivates
(represses or has no regulation to) the expression of gene xi. Similar assumption is
applied to the value of βijk.
We emphasize that the proposed method in this work is substantially dierent from
our previous work [147]. The rst dierence is that the proposed non-linear model (4.8)
is dierent from the non-linear model in [147]. This new model not only can study the
regulations from genes to genes, as we considered in our previously proposed model [147],
but also can investigate the eects of heterodimers and/or synergistic eect in genetic
regulation. This new model also leads to the second dierence compared with our previous
top-down approach, namely the proposed Extended Forward Search Algorithm (EFSA)
not only includes the probabilistic graphical model in our previous work [147] but also can
predict the possible regulations from NLTs to genes. In addition, in this work, we will infer
a medium-sized network rst by using EFSA and then reduce the network size by removing
regulations from the network in the Results section, rather than inferring a core network
rst and then adding regulations to the core network in our previous approach [147].
When considering the full connected graph among mgenes and nnon-linear terms
(NLTs), we have an ordinary dierential equation (ODE) system with mdierential equa-
tions. The total number of all unknown coecients is m(m+n). After applying the Ex-
tended Forward Search Algorithm (EFSA), we have an inferred regulatory network which
contains only m1edges among genes and n1edges from NLTs to genes. Thus, the numbers
56
4.1. METHODS
of coecients αij and βijk are reduced from m(m−1) to m1and from mn to n1, respec-
tively. It is easier to estimate the parameters for the inferred network than for the fully
connected network.
In this work, we used a MATLAB toolbox of Genetic Algorithm to estimate the param-
eters in the proposed dynamic model [22]. To ensure the accuracy of estimates, we set the
number of generations as 1000 and the number of individuals for each generation as 300.
For the parameter vector (αij , βijk, ki), we used the uniform distribution over the interval
[Wmin, Wmax ]to generate the initial estimates. Here Wmin and Wmax are the minimal value
and maximal value, respectively, for choosing the samples of the parameters. The values
of Wmin and Wmax are adjusted by computation. For example, if the majority of estimated
parameters all are close to Wmin, then we will further decrease the value of Wmin. How-
ever, if the majority of estimated values are well above Wmin, then we need to increase the
value of Wmin accordingly. The similar consideration is applied to Wmax. In this study,
for the erythroid lineage pathway, numerical results suggest that the values of Wmin and
Wmax for (αij, βijk , ki)are (−3,−3,0) and (3,3,1), respectively. In addition, for the neu-
trophil lineage pathway, numerical results suggest that the values of Wmin and Wmax for
(αij, βijk , ki)are (−2.5,−2.5,0) and (2.5,2.5,1), respectively. We run the algorithm using
an initial random number to generate an initial set of model parameters, which leads to a
set of estimated parameters. For each model, we used 200 dierent initial random numbers,
which lead to 200 dierent sets of estimated model parameters. Denote xi(tj)and x∗
i(tj)
as the observation data and numerical simulations at time point tjfor j={1,2, . . . , M },
respectively. The simulation error is calculated by
E=v
u
u
t
m
X
i=1
M
X
j=1
(xi(tj)−x∗
i(tj))2.(4.9)
We selected the top ten sets with the minimal estimated errors out of 200 estimates for
further analysis and comparison.
4.1.3 Robustness analysis
As described in Chapter 2, we next used the robustness property of the model to select
the inferred model parameter sets from the Genetic algorithm. It is assumed that the
perturbation sample εin (2.50) is generated from a standard Gaussian distribution N(0,1).
The value of parameter µdetermines the variations of simulations. Numerical results
suggest that when the value of µis small, perturbation has small eect on the system
dynamics, and it is dicult to distinguish the robustness properties of the model with
dierent parameter sets. However, if the value of µis large, perturbation will make the
57
4.2. RESULTS
model output substantially dierent, and it will be dicult to measure the robustness
property. To make the variations of simulations appropriately for robustness analysis,
µ= 0.4was employed in this study.
For each of the top ten sets of parameters determined in the previous subsection, we
rstly obtained N= 5000 sets of perturbed model parameters by using (2.50) and then
used these parameter sets to obtain 5000 corresponding simulations. Then, based on the
robustness property (2.52) and (2.53), we determine the optimal model parameter sets from
estimated candidates.
4.2 Results
4.2.1 Inference of regulatory network
To reduce the complexity of regulatory networks, we rst used the Extended Forward
Search Algorithm (EFSA) to predict the topological structure of genetic networks. The
algorithm controls the number of edges by adjusting a pre-dened cut-o value. This value
is equivalent to the signicant value in statistics. If the threshold is too low, we may miss
some signicant regulations. However, if the threshold is relatively high, it is quite possible
to select insignicant regulations. This work considers the networks including 11 genes and
55 non-linear terms (NLTs). For the sub-network of 11 genes only, i.e. matrix As
As
Asin (4.3), to
ensure the statistically signicant, we set a specic threshold as 0.1 for both the erythroid
regulatory network and neutrophil regulatory network. The selection of this threshold value
(i.e. 0.1) is based on the balance between neither selecting much insignicant regulations
nor choosing a small number of candidate regulations. Then we had 46 and 40 directed
edges for the erythroid regulatory network and neutrophil network, respectively.
For the regulations from NLTs to genes, i.e. matrix B′
s
B′
s
B′
sin (4.3), the size of matrix B′
s
B′
s
B′
s
is much larger than that of As
As
As. To avoid the dominance of the regulations from NLTs to
genes, we also set the cut-o value as 0.1 for the two networks, or take the rst 46 and
40 directed edges from NLTs to genes for the erythrocyte and neutrophil dierentiation,
respectively, if more edges are selected when using the cut-o value 0.1. The reason we
still applied threshold 0.1 here is that the number of selected edges that satisfy this value
is much larger than the required number (i.e. 46 for the erythroid regulatory network and
40 for the neutrophil regulatory network). Since the edges are selected and ranked by their
signicance, we can simply select the top 46 edges and 40 edges for the erythroid and
neutrophil pathway, respectively, without conducting any further numerical tests.
Figure 4.1 and Figure 4.2 present the inferred regulatory networks for the erythroid
and neutrophil networks, respectively. Note that there are 11 and 17 isolated NLTs in the
erythroid and neutrophil networks, respectively, since no signicant edges have been selected
58
4.2. RESULTS
Gata1−Gata2
Gata2
PU.1
Notch1−Tal1
PU.1−Gata2
Gata1−Cbfa2t3
Gata1−Ldb1
PU.1−Cbfa2t3
PU.1−Tal1
Runx1
Cbfa2t3
Tal1
Runx1−Cbfa2t3
PU.1−Lmo2
PU.1−Ldb1
PU.1−Runx1
Ets1−Lmo2
Ets1−PU.1
Gata1
Ets1
Ets1−Notch1
Ets1−Tal1
Ets1−Runx1
Gfi1−Gata2
Gfi1−Ldb1
Gfi1−Gata1
Gfi1−Notch1
Gfi1−Lmo2
Lmo2−Gata2
Notch1−Gata2
Gfi1−Tal1
Notch1−Cbfa2t3
Gfi1−Runx1
Gfi1−PU.1
Ldb1
Gata1−Tal1
Gfi1−Cbfa2t3
Gfi1 Gata1−Lmo2
Gata1−Pu.1
Ets1−Gata2
Cbfa2t3−Gata2
Runx1−Tal1
Tal1−Gata2
Ets1−Ldb1Runx1−Gata2
Tal1−Lmo2 Ets1−Cbfa2t3
Ets1−Gata1
Notch1−Runx1
Notch1
Notch1−Lmo2
Lmo2
Notch1−PU.1
Notch1−Ldb1
Regulation among genes
Regulation from NLTs to Gfi1
Regulation from NLTs to Ets1
Regulation from NLTs to Gata1
Regulation from NLTs to Tal1
Regulation from NLTs to Cbfa2t3
Regulation from NLTs to Runx1
Regulation from NLTs to Notch1
Regulation from NLTs to PU.1
Regulation from NLTs to Lmo2
Color Reference Table
Isolated NLTs Table
Gata1−Runx1
Gata1−Notch1
Gata2−Ldb1
Runx1−Lmo2
Runx1−Ldb1
Tal1−Ldb1
Cbfa2t3−Lmo2
Cbfa2t3−Ldb1
Tal1−Cbfa2t3
Lmo2−Ldb1
Gfi1−Ets1
Figure 4.1: Inferred regulatory network for the dierentiation of erythrocyte by
EFSA. The genetic regulatory network predicted by EFSA with 11 genes and 44 NLTs (11
isolated terms excluded), which is related to the fate determination of erythrocyte pathway:
Regulatory network for hematopoietic stem cells dierentiate to megakaryocyte-erythroid
progenitors. The network is visualized by Cytoscape software.
from these NLTs by our algorithm. All these isolated NLTs are listed in the ”Isolated NLTs
Table”. Moreover, all arrows in these gures only represent the direction of regulations,
rather than the types of regulations (i.e. positive or negative regulation). We will study
the detailed regulatory mechanisms in the next subsection. We found that the targeted
gene of the protein heterodimer is a component of that heterodimer in all situations. The
possible explanation of this observation is that the expression levels of a heterodimer are
the product of the expression levels of the two corresponding genes (namely xixjfor genes
iand jwith expression levels xiand xj, respectively). Thus, the expression data of the
NLTs {xixj}may be highly correlated to those of the component genes, namely {xi},{xj}.
59
4.2. RESULTS
Ets1−Runx1
Gfi1−Gata1
Gfi1−Cbfa2t3
Gata1−Pu.1
Ets1−PU.1
Ets1−Tal1
Gata1−Tal1
Gata1−Cbfa2t3
Gata1−Runx1
PU.1−Ldb1
PU.1−Tal1
PU.1−Gata2
Ets1−Ldb1
PU.1−Cbfa2t3
Ets1−Gata2
Ets1−Cbfa2t3
PU.1−Runx1
Ets1−Gata1
Ets1−Notch1
Ets1−Lmo2
Gfi1−Ldb1
Ldb1−Cbfa2t3
Gfi1−Gata2Lmo2−Tal1
Gfi1−PU.1
Gfi1−Runx1
Gfi1−Lmo2
Lmo2−Cbfa2t3
Gfi1−Tal1
Notch1−Lmo2
Notch1−Cbfa2t3
Notch1−Gata1
Notch1−Ldb1
Notch1−Gfi1
Notch1−Gata2
Cbfa2t3
Notch1−Runx1
Notch1−PU.1
Ets1
Tal1
Gata1Runx1
Notch1−Tal1
Ldb1
Gfi1
Notch1
PU.1
Lmo2
Gata2
Regulation among genes
Regulation from NLTs to Gfi1
Regulation from NLTs to Ets1
Regulation from NLTs to Gata1
Regulation from NLTs to PU.1
Regulation from NLTs to Lmo2
Regulation from NLTs to Notch1
Regulation from NLTs to Ldb1
Color Reference Table
Isolated NLTs Table
Gata1−Gata2
Gata1−Lmo2
Gata1−Ldb1
Gata2−Runx1
Gata2−Tal1
Gata2−Cbfa2t3
Runx1−Cbfa2t3
Runx1−Lmo2
Runx1−Ldb1
PU.1−Lmo2
Tal1−Cbfa2t3
Gata2−Lmo2
Tal1−Ldb1
Lmo2−Ldb1
Gata2−Ldb1
Runx1−Tal1
Gfi1−Ets1
Figure 4.2: Inferred regulatory network for the dierentiation of neutrophil by
EFSA. The genetic regulatory networks predicted by EFSA with 11 genes and 38 NLTs (17
isolated terms excluded), which is related to the fate determination of neutrophil pathway:
Regulatory network for hematopoietic stem cells dierentiate to granulocyte-macrophage
progenitors. The network is visualized by Cytoscape software.
4.2.2 Inference of dynamic model
After the success of constructing regulatory networks in the previous sub-section, we next
study the detailed dynamics of genetic networks in fate determination of hematopoietic
stem cells (HSCs) by using our proposed dynamic model. The major step is to infer the
values of unknown parameters in the model (4.8). If we consider the fully connected model,
there should be 11 ×(11 + 55) = 726 parameters. However, after the application of EFSA,
the number of unknown parameters is reduced to 103 (including 46 directed edges between
genes, 46 directed edges from non-linear terms (NLTs) to genes and 11 self-degradation rate
constants) for the dierentiation of erythrocytes and 91 (including 40 directed edges between
genes, 40 directed edges from NLTs to genes and 11 self-degradation rate constants) for the
dierentiation of neutrophils. We next applied the Genetic-Algorithm to estimate these
unknown parameters for two networks. We used 200 dierent random numbers to obtain
60
4.2. RESULTS
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1.25
Expression level of Gata1
0 100 120 140 160
Time
20 40 60 80 0 100 120 140 160
Time
20 40 60 80
0 100 120 140 160
Time
20 40 60 80 0 100 120 140 160
Time
20 40 60 80
0.80
0.85
0.90
0.95
1.00
1.05
1.10
Expression level of PU.1
Expression level of Ets1
0.90
0.95
1.00
1.05
1.10
1.15
1.20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
Expression level of Tal1
A. B.
C. D.
Figure 4.3: Simulation results and experimental data of the regulatory network
for erythrocyte dierentiation. Red solid line: experimental microarray data; Blue star
dash line: simulation of the regulatory network.
dierent initial values of rate constants (αij , βijk, ki)over the dened range [Wmin, Wmax],
which was discussed in the Methods section. This leads to 200 dierent sets of estimated
parameters. Then, we chose the top ten sets of estimated results for each dierentiated
lineage with the smallest estimation errors for further robustness analysis. According to
the denition of estimation error (4.9), the optimal inferred network for the erythrocyte
dierentiation in our tests has estimation error 0.9902. In addition, based on the robustness
dened in Chpater 3, the robust average (2.52) and robust standard deviation (2.53) are
0.3977 and 0.1066, respectively. For the neutrophil dierentiation, the optimal inferred
network has estimation error 0.8726, robust average 0.3983 and robust standard deviation
0.1275.
Figure 4.3 and Figure 4.4 present the simulation results based on the optimal estimated
parameters for the expression levels of four genes, namely genes GATA1, PU.1, Ets1 and
Tal1, for the dierentiation of erythrocyte and neutrophil, respectively. The expression
levels of GATA1 increase continuously in both simulated and experimental data during the
61
4.2. RESULTS
0.70
0.90
1.00
1.10
1.20
Expression level of Gata1
0.80
0 100 120 140 160
Time
20 40 60 80 0 100 120 140 160
Time
20 40 60 80
0 100 120 140 160
Time
20 40 60 80 0 100 120 140 160
Time
20 40 60 80
1.00
1.05
1.10
1.15
1.20
1.25
1.30
Expression level of PU.1
0.90
0.95
Expression level of Ets1
1.10
1.15
1.20
1.25
1.30
1.35
1.40
0.90
0.95
1.00
1.05
0.95
1.00
1.05
1.10
1.15
1.20
Expression level of Tal1
0.80
0.85
0.90
A. B.
C. D.
Figure 4.4: Simulation results and experimental data of the regulatory network
for neutrophil dierentiation. Red solid line: experimental microarray data; Blue star
dash line: simulation of the regulatory network.
erythrocyte dierentiation. However, during the neutrophil dierentiation, experimental
data of GATA1 keep uctuations and then turn to slightly decreasing at the end of dier-
entiation, which is matched by our simulation. For PU.1 , both microarray and simulated
data decline in the dierentiation of erythrocyte but climb during the dierentiation of
neutrophil. Similarly, the expression levels of Ets1 in microarray data increase during ery-
throcyte dierentiation but decrease during neutrophil dierentiation. Simulation results
also t the trends for both dierentiation pathways. The experimental data of Tal1 in-
crease with uctuations during the rst 60 hours of erythrocyte dierentiation, but then
rises rapidly after the rst 60 hours. Our simulated results are consistent with the expres-
sion levels of Tal1 with the same trend in expression levels. Thus, our simulation results t
the trend of expression levels of these genes very well during two developmental processes.
62
4.2. RESULTS
4.2.3 Reduction of network model - edge deletion
We have obtained two regulatory networks with 92 directed edges and 80 directed edges
for erythroid and neutrophil dierentiation, respectively. Next we tested the possibility
to delete the potential insignicant edges from our predicted regulatory networks. In the
rst step, we tested the deletion of regulations from non-linear terms (NLTs) to genes. We
removed one edge in each test to form a temporary system model, and then examined
the simulation error and robustness property of the new model. Afterwards, we removed
one specic edge permanently if the corresponding new system has the minimal change in
simulation error and robustness property, and then formed an updated model. This test
is repeated until both the simulation error and robustness property of the updated model
are much worse than the original network without any removal. In the second step, we
evaluated the regulatory interactions between 11 genes using the same method in the rst
step.
For the erythrocyte dierentiation, Table 4.1 suggests that after removing 3 regulations
from NLTs to genes, the estimation error (4.9) is improved (shown in DEL1). Then, we
tested the regulation reduction from gene to gene. The nal result suggests that, after
we deleted (Ldb1 →Lmo2), (Notch1 →Lmo2), (Cbfa2t3 →Lmo2) and (Runx1 →Lmo2)
edges, the estimation error (4.9) is slightly increased. However, the robustness property is
better than that of the DEL1 model since the robust average (2.52) is decreased. Thus,
for the erythroid dierentiation, numerical tests recommended to remove total seven edges
from our predicted regulatory network. We stopped the deletion test after obtaining the
DEL5 model. If we proceed further deletion, both simulation error and robustness property
of the temporary network are much worse than the original network without removal.
Table 4.2 shows that, for the neutrophil dierentiation, there are no insignicant reg-
ulations from NLTs to genes, because the removal of any edge from NLTs to genes will
increase the simulation error (4.9) substantially and/or decrease the robustness property
by increasing the values of robust average (2.52) and robust standard deviation (2.53).
For the regulations between genes, we have removed the following four regulations, namely
(GATA2 →Ldb1), (Runx1 →Cbfa2t3), (Ldb1 →Lmo2) and (Tal1 →Lmo2), and formed
an updated system. This table shows that the simulation error and robustness property of
the updated system are close to those of the original system without any removal of edges.
Thus, for the neutrophil dierentiation, numerical tests recommended to remove only four
edges from our predicted regulatory network. Coincidentally, we stopped the deletion test
after obtaining the DEL5 model because of the same reason for the erythrocyte dierenti-
ation.
Figure 4.5 and Figure 4.6 present the inferred regulatory networks after edge deletion
test for erythroid and neutrophil dierentiation, respectively. Initially, we have 92 directed
63
4.3. SUMMARY
edges for the erythrocyte pathway and 80 directed edges for the neutrophil pathway. After
the edges deletion, seven and four directed edges have been taken away from the erythrocyte
network and neutrophil network, respectively, since the removal of these edges has not much
negative inuence on simulation error (4.9), robust average (2.52) and robust standard
deviation (2.53). Thus, there are 85 and 76 directed edges left for the erythrocyte and
neutrophil pathways, respectively.
4.3 Summary
This work was designed to develop a dynamic framework that was able to realize nonlinear
gene expression dynamics accurately. In particular, we intended to investigate the eect
of possible protein heterodimers and/or synergistic eect in genetic regulation. In this
study, we designed the Extended Forward Search Algorithm (EFSA) to predict the topol-
ogy of regulatory networks connecting genes and heterodimers. We also proposed a new
dynamic model for inferring dynamic mechanisms of regulatory networks. Using the EFSA,
we derived two regulatory networks of 11 genes for erythrocyte and neutrophil dierentia-
tion pathways. According to the predicted networks and experimental data, we estimated
parameters in our proposed dynamic model based on the criteria of simulation error and
robustness property. By removing regulations with less importance based on simulation
error and robustness property, we developed two gene networks that regulate erythrocyte
and neutrophil dierentiation pathways. Numerical results suggested that our proposed
method is capable of reconstructing genetic regulatory networks eectively and accurately.
64
4.3. SUMMARY
Table 4.1: Edge deletion test for erythrocyte dierentiation. RR: Removed regula-
tion; SE: Simulation error, dened by (4.9); RA: Robust average, dened by (2.52); RSTD:
Robust standard deviation, dened by (2.53).
Model RR SE RA RSTD
OES N/A 0.9902 0.3977 0.1066
GATA2-Notch1 →Notch1
DEL1 Tal1-G1 →G1 0.9826 0.4594 0.1259
Cbfa2t3-G1 →G1
DEL2 Ldb1 →Lmo2 0.9955 0.3938 0.1124
DEL3 Notch1 →Lmo2 0.9861 0.4506 0.1263
DEL4 Cbfa2t3 →Lmo2 1.0451 0.3820 0.0962
DEL5 Runx1 →Lmo2 1.0298 0.3471 0.0904
Description of dierent models: OES: The original model without any deletion;
DEL1: Model based on OES by removing regulations from NLTs to genes; DEL2: Model
based on DEL1 by removing a regulation among genes; DEL3: Model based on DEL2 by
removing a regulation among genes; DEL4: Model based on DEL3 by removing a
regulation among genes; DEL5: Model based on DEL4 by removing a regulation among
genes.
65
4.3. SUMMARY
Table 4.2: Edge deletion test for neutrophil dierentiation. RR: Removed regulation;
SE: Simulation error, dened by (4.9); RA: Robust average, dened by (2.52); RSTD:
Robust standard deviation, dened by (2.53).
Model RR SE RA RSTD
OES N/A 0.8726 0.3983 0.1275
DEL1 No Suggestion N/A N/A N/A
DEL2 GATA2 →Ldb1 0.8726 0.3943 0.1273
DEL3 Runx1 →Cbfa2t3 0.8726 0.3928 0.1265
DEL4 Ldb1 →Lmo2 0.8748 0.4183 0.1333
DEL5 Tal1 →Lmo2 0.8809 0.3925 0.1237
Description of dierent models: OES: The original model without any deletion;
DEL1: Model based on OES by removing regulations from NLTs to genes; DEL2: Model
based on DEL1 by removing a regulation among genes; DEL3: Model based on DEL2 by
removing a regulation among genes; DEL4: Model based on DEL3 by removing a
regulation among genes; DEL5: Model based on DEL4 by removing a regulation among
genes.
66
4.3. SUMMARY
Gfi1−Gata1
Gfi1−Runx1
Gfi1−Gata2
Gfi1−PU.1
Gfi1−Ldb1
Notch1−Cbfa2t3
Gfi1−Lmo2
Lmo2−Gata2
Gfi1−Notch1
Cbfa2t3
Tal1
PU.1−Lmo2
PU.1−Ldb1
PU.1−Tal1
Runx1−Cbfa2t3
PU.1−Gata2
Runx1
PU.1−Runx1
Ets1−Lmo2
Ets1−Tal1
Ets1−Notch1
Ets1−PU.1
Gata1
Ets1
Gata1−Cbfa2t3
Ets1−Runx1
Gata1−Ldb1
Gata1−Lmo2
Ldb1
Gata2
Gata1−Tal1
Gata1−Gata2
Gfi1
Gata1−Pu.1
Notch1−Runx1
Lmo2
Notch1−PU.1
Notch1
Notch1−Tal1
PU.1−Cbfa2t3
PU.1
Notch1−Lmo2
Notch1−Ldb1
Tal1−Lmo2
Runx1−Tal1
Ets1−Gata1
Tal1−Gata2
Ets1−Gata2
Ets1−Cbfa2t3
Ets1−Ldb1Runx1−Gata2
Cbfa2t3−Gata2
Regulation among genes
Regulation from NLTs to Gfi1
Regulation from NLTs to Ets1
Regulation from NLTs to Gata1
Regulation from NLTs to Tal1
Regulation from NLTs to Cbfa2t3
Regulation from NLTs to Runx1
Regulation from NLTs to Notch1
Regulation from NLTs to PU.1
Regulation from NLTs to Lmo2
Color Reference Table
Isolated NLTs Table
Gata1−Runx1
Gata1−Notch1
Gata2−Notch1
Gata2−Ldb1
Runx1−Lmo2
Runx1−Ldb1
Tal1−Ldb1
Tal1−Gfi1
Cbfa2t3−Lmo2
Cbfa2t3−Ldb1
Cbfa2t3−Gfi1
Tal1−Cbfa2t3
Lmo2−Ldb1
Gfi1−Ets1
Figure 4.5: Predicted genetic regulatory network of erythrocyte pathway after
edges deletion. The genetic regulatory network predicted by the Extended Forward
Search Algorithm with 11 genes and 41 non-linear terms (NLTs) (14 isolated NLTs excluded)
after edges deletion test, which is related to the fate determination of erythrocyte pathway:
Regulatory network for hematopoietic stem cells dierentiate to megakaryocyte-erythroid
progenitors. The network is visualized by Cytoscape software.
67
4.3. SUMMARY
Notch1−PU.1
Lmo2−Tal1
Notch1−Gata2
Notch1−Ldb1
Notch1−Gfi1
Notch1−Cbfa2t3
Notch1−Lmo2
Notch1−Gata1
Notch1
PU.1
Lmo2−Cbfa2t3
Ldb1−Cbfa2t3
Lmo2
Runx1
Gfi1−Tal1
Gfi1−Runx1
Notch1−Tal1
Ets1
Cbfa2t3
Notch1−Runx1 Tal1
PU.1−Cbfa2t3
Gata1−Cbfa2t3
Gfi1−Gata1
Gata1−Runx1
Ets1−Tal1
Gfi1−Cbfa2t3
Gfi1−Gata2
Ets1−Notch1
Gata1−Tal1
Ets1−Runx1
Ets1−PU.1
Gata1−Pu.1
Ets1−Cbfa2t3
Ets1−Lmo2
PU.1−Runx1
PU.1−Gata2
Ets1−Ldb1PU.1−Ldb1
Ets1−Gata2
PU.1−Tal1 Ets1−Gata1
Gata2
Gfi1
Gfi1−Lmo2
Gfi1−Ldb1
Ldb1
Gata1
Gfi1−PU.1
Regulation among genes
Regulation from NLTs to Gfi1
Regulation from NLTs to Ets1
Regulation from NLTs to Gata1
Regulation from NLTs to PU.1
Regulation from NLTs to Lmo2
Regulation from NLTs to Notch1
Regulation from NLTs to Ldb1
Color Reference Table
Isolated NLTs Table
Gata1−Gata2
Gata1−Lmo2
Gata1−Ldb1
Gata2−Runx1
Gata2−Tal1
Gata2−Cbfa2t3
Runx1−Cbfa2t3
Runx1−Lmo2
Runx1−Ldb1
PU.1−Lmo2
Tal1−Cbfa2t3
Gata2−Lmo2
Tal1−Ldb1
Lmo2−Ldb1
Gata2−Ldb1
Runx1−Tal1
Gfi1−Ets1
Figure 4.6: Predicted genetic regulatory network of neutrophil pathway after
edges deletion. The genetic regulatory networks predicted by the Extended Forward
Search Algorithm with 11 genes and 38 non-linear terms (NLTs) (17 isolated NLTs excluded)
after edges deletion test, which is related to the fate determination of neutrophil pathway:
Regulatory network for hematopoietic stem cells dierentiate to granulocyte-macrophage
progenitors. The network is visualized by Cytoscape software.
68
5
A robust method for designing multistable
systems by embedding bistable subsystems
The objective of this chapter is to introduce a novel and robust method to develop mul-
tistable mathematical models by embedding bistable models together. This study uses
the GATA1-GATA2-PU.1 module in hematopoiesis as the test system, we rst develop a
tristable model based on two bistable models without any high cooperative coecients,
and then modify the tristable model based on the experimentally determined mechanisms.
The modied model successfully realize four stable steady states and accurately reects a
recent experimental observation showing four transcriptional states. In addition, we de-
velop a stochastic model, and stochastic simulations successfully realize the experimental
observations in single cells. These results suggest that the proposed method is a general
approach to develop mathematical models for realizing multistability and heterogeneity in
complex systems. In this chapter, I rst outline principle for obtaining a multistable system
by embedding bistable systems, then introduce the model development for both bistable
systems and the embedding multistable system. Finally, I apply the methodology to the
problem of cell fate determination in hematopoiesis.
5.1 Principle of embeddedness
5.1.1 Embedding method for designing multistable models
We propose a framework to model regulatory networks with multiple stable steady states
based on the embedding of sub-systems with less stable steady states [155]. It is assumed
69
5.1. PRINCIPLE OF EMBEDDEDNESS
that we need to study a regulatory network that consists of two regulatory modules. The
rst module has genes Xi, and it is modelled by following equations
dXi
dt =Fi(X1, X2,··· , Xn, Xn+1,··· , Xn+N, Θ
Θ
Θ1, t)(5.1)
for i= 1,2,··· , n +N, where Θ
Θ
Θ1includes model parameters of Fi. The second module has
the following model
dYj
dt =Gj(Y1, Y2,··· , Ym,Θ
Θ
Θ2, t)(5.2)
for j= 1,2,··· , m, where Θ
Θ
Θ2includes model parameters of Gj. In these two models,
F(X
X
X,Θ
Θ
Θ1, t)and G(Y
Y
Y ,Θ
Θ
Θ2, t)are non-linear vector elds. To develop mathematical models
with more stable steady states, we propose an embedding method by assuming that Xn+k
(k= 1, ..., N ) are functions of variables Y1, Y2,··· , Ym, given by
Xn+k=Hk(Y1, Y2,··· , Ym).(5.3)
In this way, we obtain an embedding system
dW
W
W
dt =F
F
F(W
W
W , Θ
Θ
Θ∗, t),(5.4)
where W
W
W= (X1, X2,··· , Xn, Y1, Y2,··· , Ym)represents all genes in the system, F
F
Fdenotes
embedding systems from two modules with gene Xiand Yiwith function Hk. In addition,
Θ
Θ
Θ∗=Θ
Θ
Θ1∪Θ
Θ
Θ2is the model parameters space. This embedding system (5.4), consists of two
components:
dXi
dt =Fi(X1, X2,··· , Xn,Hk(Y1, Y2,··· , Ym),Θ
Θ
Θ∗, t),
dYj
dt =Rj(X1, X2,··· , Xn, Y1, Y2,··· , Ym, Θ
Θ
Θ∗, t)
(5.5)
for i= 1,2,··· , n,k= 1, ..., N and j= 1,2,· ·· , m. Since each Xiis regulated by the Xn+k
(k= 1, ..., N ), and Xn+kare functions of Y1, Y2,··· , Ym, the expressions of each gene Yjis
also regulated by Xi(i= 1, ..., n). The non-linear vector eld G(Y
Y
Y ,Θ
Θ
Θ2, t)in (5.2) will then
be transformed to a new non-linear vector eld R(W
W
W , Θ
Θ
Θ∗, t), which includes both genes
Xiand Yifrom two sub-systems with their corresponding regulations. Note that this is a
general idea to develop mathematical models with more stable steady states. Depending the
specic formalism and properties of sub-systems, the embedding system may have dierent
results regarding multiple stable steady states with dierent conditions. In this study, we
only focus on the systems with Shea-Ackers formalism [1].
70
5.1. PRINCIPLE OF EMBEDDEDNESS
5.1.2 Eectiveness of embedding method
The motivation of this work is to develop a mathematical model to realize the tristable
property of the HSC genetic regulatory network in Figure 5.1A based on experimental
observations. Figure 5.1B and Figure 5.1E illustrate the embedding method to couple two
bistable modules in a network together. Variable Uin the rst Z-Umodule is an auxiliary
node, which is assumed to be U=µX +δY , where µand δare two positive parameters.
When the system stays in the state with a high expression level of Zand a low level of U,
the expression levels of Xand Yare low. However, when the system has a low expression
level of Zand a high level of U, the system triggers the second module X-Yto choose
either a high level of Xand a low level of Yor a low level of Xand a high level of Y. In
this way we realize the system with three stable states in which one of the three variables
(namely Z,Xor Y) is at the high expression state but the other two are at low expression
states.
To demonstrate the eectiveness of the proposed embedding method, we use the toggle
switch network as the test system [62]. This network consists of two genes that form a
double negative feedback loop and is modelled by the following equations with parameter
space Θ
Θ
Θ1={a= 0.2, b = 4, c = 3}, given by
dz
dt =F1(z, u, Θ
Θ
Θ1, t) = 0.2 + 4
1 + u3−z,
du
dt =F2(z, u, Θ
Θ
Θ1, t) = 0.2 + 4
1 + z3−u.
(5.6)
It is assumed that the rst Z-Umodule follows model (5.6) and the second X-Ymodule
satises the same model with same parameter space Θ
Θ
Θ1, but dierent variables xand y,
given by
dx
dt =G1(x, y, Θ
Θ
Θ1, t) = 0.2 + 4
1 + y3−x,
dy
dt =G2(x, y, Θ
Θ
Θ1, t) = 0.2 + 4
1 + x3−y.
(5.7)
Now we embed these two sub-systems together using u=H(x, y) = x+y. Since gene z
is negatively regulated by gene uin the sub-system (5.6), and uis a function of genes x
and y, the expressions of genes xand yare also negatively regulated by gene zin the new
embedding model. Then the non-linear vector elds G1,2(x, y, Θ
Θ
Θ1, t)are transformed to new
non-linear vector elds R1,2(x, y, z, Θ
Θ
Θ1, t), respectively, which include both genes x,yand
zfrom two sub-systems with negative regulations from gene zto genes xand y. Therefore,
the new model with three variables is given by
71
5.1. PRINCIPLE OF EMBEDDEDNESS
Z - U sub-system
State
Z
State
U
Saddle
State
State
Y
State
X
Saddle
State
X -Y
sub-system
Z U
X Y
Embedding system
X
Y
Z
State U
Detailed mechanism
Embeddedness
A B
CD
E
A
Figure 5.1: Methodology for developing multistable models by embedding two
sub-systems with bistability together.
(A) Brief owchart of hematopoietic hierarchy that is created with BioRender.com. HSCs,
hematopoietic stem cells; MPPs, multipotent progenitors; MEPs, megakaryocyte-erythroid
progenitors; GMPs, granulocyte-macrophage progenitors.
(B) The principle of embeddedness: Z-Umodule is the rst bistable sub-system. Once
this module crosses the saddle point from state Zto state U, it enters the X-Ysub-system
that has two stable steady states Xand Y, reaching either state Xand state Yvia the
imaginary state U.
(C,D) The structure of two double-negative feedback loops with positive autoregulations,
which is the mechanisms for bistable sub-systems in HSCs.
(E) The structure of regulatory network after embeddedness. The X-Ysub-system is
embedded into the state U. (’→’ and ’⊣’ denote the activating and inhibiting regulations,
respectively.)
72
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
0
2
4
2
0
Z
0
2
4
4
X
Y
a
b
c
dz/dt = 0
du/dt = 0
0 1 2 3 4
0
1
2
3
4
U
AB
4
Z
Figure 5.2: Realization of tristability by embedding two bistable sub-systems.
(A) The phase plane of the toggle switch sub-system (5.6) with bistability (a and b: stable
steady states, c: saddle state).
(B) The 3D phase portrait of the embedded system (5.8) with tristability (Three red points:
stable steady states; two black points: saddle states)
dx
dt =R1(x, y, z, Θ
Θ
Θ1, t) = 0.2 + 4
(1 + y3)(1 + z3)−x,
dy
dt =Rw(x, y, z, Θ
Θ
Θ1, t) = 0.2 + 4
(1 + x3)(1 + z3)−y,
dz
dt =F1(z, u =x+y, Θ
Θ
Θ1, t) = 0.2 + 4
1+(x+y)3−z.
(5.8)
Figure 5.2A shows the phase plane of the toggle switch sub-system (5.6), with bista-
bility properties, and Figure 5.2B provides the 3D phase portrait of the embedded model
(5.8), with three stable steady states. The embedded model successfully realized the trista-
bility, which validates our embedding method for developing mathematical models with
multistability.
5.2 Model development for embedding method
5.2.1 Model development with bistability properties
We rst develop a model for the network in Figure 5.1C and Figure 5.1D with bista-
bility properties. Suppose that two sub-systems, namely the Z-Usystem and X-Ysub-
73
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
system, have the same structure of a double-negative feedback loop and positive autoreg-
ulations. For the Z-Usystem, based on the formalism (5.1) with X
X
X={z, u}and Θ
Θ
Θ1=
{a1, b1, b2, c1, d1, d2, k1, k2}, we propose the following model to describe the dynamics, given
by
dz
dt =F1(z, u, Θ
Θ
Θ1, t) = a1z
1 + b1z
1
1 + b2u−k1z,
du
dt =F2(z, u, Θ
Θ
Θ1, t) = c1u
1 + d1u
1
1 + d2z−k2u.
(5.9)
Similarly, based on the formalism (5.2) with Y
Y
Y={x, y}and Θ
Θ
Θ2={α1, β1, β2, γ1, σ1, σ2, k3, k4},
the dynamics of the X−Ysubsystem is modelled by
dx
dt =G1(x, y, Θ
Θ
Θ2, t) = α1x
1 + β1x
1
1 + β2y−k3x,
dy
dt =G2(x, y, Θ
Θ
Θ2, t) = γ1y
1 + σ1y
1
1 + σ2x−k4y,
(5.10)
where xand yare expression levels of genes Xand Y, respectively; α1and γ1represent
expression rates; β1, β2, σ1and σ2represent association rates of corresponding proteins
on the binding-sites; and k3and k4are self-degradation rates. The model of the Z−U
subsystem has the same structure but may have dierent values of model parameters. To
obtain the bistability, we establish following theorems for our proposed models for these
two sub-systems. Since they have the same structure, we only give the theorems for the
X-Ysub-system.
Theorem 5.2.1. There are at most ve sets of non-negative equilibria for the model of the
X-Ysystem.
1. There are three equilibria: (0,0),(xe,0) and (0, ye), where xe=α1−k3
k3β1and ye=γ1−k4
k4σ1,
if α1> k3and γ1> k4.
2. There are two other equilibria: (x∗
1, y∗
1)and (x∗
2, y∗
2). If −B
A>0,C
A>0and B2−4AC ≥
0, then x∗
1and x∗
2are positive real solutions of the following equation,
Am2+Bm+C= 0,(5.11)
where m=β1x, A=A1B1−B1,B=A1−B1−1 + A1B1−A1B2+A2B1,C=
A1+A2−1−A1B2,A1=β2
σ1, A2=α1
k3, B1=σ2
β1and B2=γ1
k4.
3. To have positive values of y∗
1and y∗
2, the following conditions should be satised,
x∗
1,2<A2−1
β1
or x∗
1,2<B2−1
σ2
.(5.12)
74
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
Proof. Suppose the the equilibrium state exists, then we have
α1x
1 + β1x
1
1 + β2y−k3x= 0,(5.13)
γ1y
1 + σ1y
1
1 + σ2x−k4y= 0.(5.14)
Then, we consider the following three cases.
1. • The trivial solution: (0,0).
• When the equilibrium state is (xe,0), where xe= 0. According to (5.13), we
have
xe=α1−k3
k3β1
.(5.15)
Since α1, k3and β1are positive, we have the positive equilibrium solution if
α1> k3.
• When the equilibrium state is (0, ye), where ye= 0. According to (5.14), we
have
ye=γ1−k4
k4σ1
.(5.16)
Since γ1, k4and σ1are positive, we have the positive equilibrium solution if
γ1> k4.
2. When the equilibria are (x∗
1, y∗
1)and (x∗
2, y∗
2), where all values here are not zero,
according to (5.13) and (5.14), we have
α1
(1 + β1x)(1 + β2y)=k3,(5.17)
γ1
(1 + σ1y)(1 + σ2x)=k4.(5.18)
Let m=β1xand n=σ1y, we have
(1 + m)(1 + β2
σ1
n) = α1
k3
,(5.19)
(1 + n)(1 + σ2
β1
m) = γ1
k4
.(5.20)
75
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
Let A1=β2
σ1, A2=α1
k3, B1=σ2
β1and B2=γ1
k4. Finally, we can get
(1 + m)(1 + A1n) = A2,(5.21)
(1 + n)(1 + B1m) = B2.(5.22)
From (5.21) and (5.22), we have
n=A2−1−m
A1+A1m=B2−1−B1m
1 + B1m.(5.23)
That is,
(A1B1−B1)m2+ (A1−B1−1 + A1B1−A1B2+A2B1)m+ (A1+A2−1−A1B2) = 0.
Let A=A1B1−B1,B=A1−B1−1+A1B1−A1B2+A2B1and C=A1+A2−1−A1B2.
Then, we have the following quadratic function
Am2+Bm+C= 0.(5.24)
(a) If ∆=B2−4AC = 0, there is only one solution, namely m=−B
2A. Thus, the
solution of m=−B
2Ais positive if −B
A>0. Then we have
x∗
1=x∗
2=−B
2β1Aand y∗
1=y∗
2=A2−1−β1x∗
1,2
β2(1 + β1x∗
1,2)=B2−1−σ2x∗
1,2
σ1(1 + σ2x∗
1,2).(5.25)
(b) If ∆ > 0, there are two distinct real solutions. If the following conditions are
satised, we will have two distinct positive real solutions of m:
i. m1+m2=−B
A>0,
ii. m1m2=C
A>0,
iii. B2−4AC >0
In this case, the solution of (5.24) is
m=−B ± √B2−4AC
2A(5.26)
the solution of nsatises (5.23). Substitute x∗=m
β1and y∗=n
σ1into the
76
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
solution, we have the solution of (x∗
1, y∗
1)and (x∗
2, y∗
2)
x∗
1,2=−B ± √B2−4AC
2β1A(5.27)
y∗
1,2=A2−1−β1x∗
1,2
β2(1 + β1x∗
1,2)=B2−1−σ2x∗
1,2
σ1(1 + σ2x∗
1,2).(5.28)
3. From the proof of both part 2(a) and 2(b), if ∆≥0, we have the expression of y∗
1,2,
as follows
y∗
1,2=A2−1−β1x∗
1,2
β2(1 + β1x∗
1,2)=B2−1−σ2x∗
1,2
σ1(1 + σ2x∗
1,2).(5.29)
It is clear that, if x∗
1,2<A2−1
β1or x∗
1,2<B2−1
σ2, the corresponding value of y∗
1,2is positive
as well.
Moreover, to study the bistability, it is necessary to establish the conditions of stabil-
ity/instability for each equilibrium state. We rst give the following conditions for each
equilibrium state that locates on an axis
Theorem 5.2.2. The model of the X-Ysystem has three equilibria: (0,0),(xe,0) and
(0, ye).
1. The equilibrium state (0,0) is unstable if α1> k3and γ1> k4.
2. The equilibrium state (xe,0) is stable if γ1
1+σ2xe< k4.
3. The equilibrium state (0, ye)is stable if α1
1+β2ye< k3.
Proof. The Jacobian matrix J
J
J(x,y)= [J
J
Jij]2×2of the X-Ysystem is dened by
J
J
J11 =∂˙x
∂x|(x,y)=(x0,y0)=α1
(1 + β1x0)2
1
1 + β2y0−k3,(5.30)
J
J
J12 =∂˙x
∂y |(x,y)=(x0,y0)=α1x0
1 + β1x0
−β2
(1 + β2y0)2,(5.31)
J
J
J21 =∂˙y
∂x|(x,y)=(x0,y0)=γ1y0
1 + σ1y0
−σ2
(1 + σ2x0)2,(5.32)
J
J
J22 =∂˙y
∂y |(x,y)=(x0,y0)=γ1
(1 + σ1y0)2
1
1 + σ2x0−k4.(5.33)
77
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
(1). The Jacobian matrix at the equilibrium state (0,0) is
J
J
J(0,0) =
α1−k30
0γ1−k4
.(5.34)
The eigenvalues of the Jacobian matrix are λ1=α1−k3and λ2=γ1−k4. Obviously, the
equilibrium state (0,0) is unstable if any one of the following conditions are satised
α1> k3, γ1> k4.(5.35)
Notice that the above conditions are also the existence conditions for equilibria (xe,0) and
(0, ye)which has been proved in Theorem 5.2.1. In this case, we prove that when (0,0) is
an unstable state, there exist two positive equilibria (xe,0) and (0, ye).
(2). The Jacobian matrix at the equilibrium state (xe,0) = (α1−k3
k3β1,0) is
J
J
J(xe,0) =
α1
(1+β1xe)2−k3−α1β2xe
1+β1xe
0γ1
1+σ2xe−k4
.(5.36)
When xe= 0, the eigenvalues of the Jacobian matrix are
λ1=α1
(1 + β1xe)2−k3=α1
(1 + β1xe)2−α1
1 + β1xe
=−α1β1xe
(1 + β1xe)2,(5.37)
λ2=γ1
1 + σ2xe−k4(5.38)
It is clear that λ1<0. Thus, this equilibrium state is stable if
γ1
1 + σ2xe
< k4.(5.39)
(3). The Jacobian matrix at the equilibrium state (0, ye) = (0,γ1−k4
k4σ1)is
J
J
J(0,ye)=
α1
1+β2ye−k30
−γ1σ2ye
1+σ1ye
γ1
(1+σ1ye)2−k4
.(5.40)
78
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
When ye= 0, the eigenvalues of the Jacobian matrix are
λ1=α1
1 + β2ye−k3,(5.41)
λ2=γ1
(1 + σ1ye)2−k4=γ1
(1 + σ1ye)2−γ1
1 + σ1ye
=−γ1σ1ye
(1 + σ1ye)2(5.42)
It is clear that λ2<0. Thus, this equilibrium state is stable if
α1
1 + β2ye
< k3.(5.43)
In addition, we give the following stable conditions for each equilibrium state that locates
within the 2-dimensional positive real space.
Theorem 5.2.3. The positive equilibria (x∗
1, y∗
1)and (x∗
2, y∗
2)are stable if the following
condition is satised.
β1σ1ηyξx−β2σ2θxρy>0,(5.44)
where θx= 1 + β1x,ηy= 1 + β2y,ρy= 1 + σ1yand ξx= 1 + σ2x.
Proof. The Jacobian matrix at the equilibrium state (x, y)is
J
J
J(x,y)=
α1
(1+β1x)2(1+β2y)−k3−α1β2x
(1+β1x)(1+β2y)2
−γ1σ2y
(1+σ1y)(1+σ2x)2
γ1
(1+σ1y)2(1+σ2x)−k4
(5.45)
When xand yare not zero, we substitute (5.17) and (5.18) into the Jacobian matrix (5.45).
Then, we have
J
J
J(x,y)=
α1(1−θx)
θ2
xηy
−α1β2x
θxη2
y
−γ1σ2y
ρyξ2
x
γ1(1−ρy)
ρ2
yξx
.(5.46)
The eigenvalues of the Jacobian matrix (5.46) are
λ=−Φ±pΦ2−τ
2θ2
xη2
yρ2
yξ2
x
,(5.47)
79
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
where
Φ=ηyξx(ρ2
yξxα1β1x+θ2
xηyγ1σ1y),(5.48)
τ= 4θ2
xη2
yρ2
yξ2
xα1γ1xy(β1σ1ηyξx−β2σ2θxρy).(5.49)
For both (x∗
1, y∗
1)and (x∗
2, y∗
2)to be stable, it requires that both eigenvalues are negative or
have negative real parts. Thus, the stability conditions of (x∗
1, y∗
1)and (x∗
2, y∗
2)are Φ > 0
and τ > 0. Note that, Φ > 0is always true, since all values in condition (5.48) are positive.
However, τ > 0if and only if β1σ1ηy∗ξx∗−β2σ2θx∗ρy∗>0. Thus, we have proved that if
β1σ1ηy∗ξx∗−β2σ2θx∗ρy∗>0, the positive equilibria (x∗
1, y∗
1)and (x∗
2, y∗
2)are stable.
In summary, Theorem 5.2.1 gives the existence conditions of the equilibria for our proposed
two-node systems. Theorem 5.2.2 and Theorem 5.2.3 provide necessary conditions for the
stability properties of these equilibria. According to these theorems, we can easily check
whether the two-node systems have bistability based on the generated samples of model
parameters. The proofs of these theorems are given in Supplementary Information.
5.2.2 Perturbation analysis of bistable models
We have proved that systems (5.9) and (5.10) have bistable steady states under the condi-
tions in Theorem 5.2.2 or Theorem 5.2.3. Next we use the random search method to nd
the model parameters with which the system has bistable steady states. We rst generate
a sample for each model parameter from the uniform distribution over the interval [0, A]
and then test whether the system with the sampled parameters satises the conditions in
Theorem 5.2.2 or Theorem 5.2.3. If the conditions are satised, we solve nonlinear equa-
tions of the system to nd the steady states. We test dierent values of Aand nd that
the system has bistable steady states when A= 10. To nd more types of bistable states,
we test 10000 sets of parameters from the uniform distribution over the interval [0,10].
Table 5.1 gives three types of bistable steady states, namely Case 1: (xe,0) and (0, ye);
Case 2: (xe,0) and (x∗
1, y∗
1); and Case 3: (0, ye)and (x∗
2, y∗
2).All stable states in case 1 are
located on the coordinate axis. We add a perturbation to each estimated coecient cas
c∗= [ε×(P−0.5) + 1] ×c, where Pis a uniformly distributed random variable over the
interval [0,1], and εis the strength of perturbation. Table 5.2 shows that the two other
cases of bistability can be obtained by the perturbed coecients from Case 1.
80
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
t
α1β1β2γ1σ1σ2k3k4Equilibrium State Characteristic Stability
(0.0000, 4.8982) Nodal Sink Stable
Type 1 4.0252 1.1393 8.5862 7.5202 2.1524 4.6084 0.5924 0.6515 (1.6517, 0.1581) Saddle Point Unstable
(5.0862, 0.0000) Nodal Sink Stable
(0.0000, 19.0173) Nodal Sink Stable
(0.2556, 8.3039) Saddle Point Unstable
Type 2 8.0486 2.0932 0.3926 8.4293 1.1974 4.5864 1.2308 0.3546 (1.1683, 2.2871) Nodal Sink Stable
(2.6463, 0.0000) Saddle Point Unstable
(0.0000, 0.6671) Saddle Poing Unstable
(0.3034, 0.4505) Nodal Sink Stable
Type 3 8.6817 3.6357 5.7823 2.7531 1.3007 0.5855 1.1452 1.4741 (1.2241, 0.0676) Saddle Point Unstable
(1.8101, 0.0000) Nodal Sink Stable
Table 5.1: Three types of the bistable model whose stable steady states locate at dierent positions. Type 1: two
stable states are in the axis; Type 2 and Type 3, one of the stable states is in an axis but the other is out of the axis.
81
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
α1β1β2γ1σ1σ2k3k4Equilibrium State Characteristic Stability
(0.0000, 4.8982) Nodal Sink Stable
Type 1 4.0252 1.1393 8.5862 7.5202 2.1524 4.6084 0.5924 0.6515 (1.6517, 0.1581) Saddle Point Unstable
(5.0862, 0.0000) Nodal Sink Stable
(0.0000, 6.8918) Nodal Sink Stable
(2.5527, 1.2403) Saddle Point Unstable
Perturbed
case 1 4.2582 0.3682 1.7541 11.8512 3.8062 1.4729 0.6912 0.4352 (9.2824, 0.2249) Nodal Sink Stable
(14.0157, 0.0000) Saddle Point Unstable
(0.0000, 0.7307) Saddle Point Unstable
(0.2915, 0.5206) Nodal Sink Stable
Perturbed
case 2 4.9263 1.6689 9.6312 0.8750 3.1029 0.8549 0.5510 0.2678 (1.0317, 0.2372) Saddle Point Unstable
(4.7580, 0.0000) Nodal Sink Stable
Table 5.2: Perturbation analysis with strength ε= 1.8
ε= 1.8
ε= 1.8.Type 1 is the Type 1 case in Table 5.1. Perturbed cases 1 and 2 are
obtained from Type 1 by perturbing the model parameters. In these two cases, one the stable state is in an axis but the other is
out of the axis.
82
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
5.2.3 Model development for tristability properties
Figure 5.1E shows structure of the network of three genes is formed by embedding the X-Y
system into the Z-Usystem. For simplicity, let u=H(x, y) = x+y. Since gene zis
negatively regulated by the gene uin the sub-system (5.9), and uis a function of genes x
and y, the expressions of genes xand yare also negatively regulated by the gene zin the
new embedding model. The non-linear vector elds G1,2(x, y, Θ
Θ
Θ1, t)are then transformed
into new non-linear vector elds R1,2(x, y, z, Θ
Θ
Θ∗, t), respectively, which include genes x,y
and zfrom two sub-systems with the negative regulations from gene zto genes xand
y. Using the embedding method (5.5) and sub-system models (5.9,5.10), we obtain the
following model to describe the embedded X-Y-Zsystem,
dx
dt =R1(x, y, z, Θ
Θ
Θ∗, t) = α1x
1 + β1x
1
1 + β2y
1
1 + d2z−k3x,
dy
dt =R2(x, y, z, Θ
Θ
Θ∗, t) = γ1y
1 + σ1y
1
1 + σ2x
1
1 + d2z−k4y,
dz
dt =F1(z, u =x+y, Θ
Θ
Θ∗, t) = a1z
1 + b1z
1
1 + b2(x+y)−k1z.
(5.50)
To verify the tristability of model (5.50), we give the following conditions for the existence
of equilibria and necessary conditions for the stability properties of these equilibria.
Theorem 5.2.4. 1. If (xe,0) and (0, ye)are the equilibria of X-Ysub-system and (ze,0)
is a equilibrium state of Z-Usub-system, where xe=α1−k3
k3β1,ye=γ1−k4
k4σ1and ze=a1−k1
k1b1,
then (xe,0,0),(0, ye,0) and (0,0, ze)are three equilibria of the embedding X-Y-Z
system.
2. If (x∗
1, y∗
1)and (x∗
2, y∗
2)are two positive equilibria of X-Ysystem as stated in Theorem
5.2.1, then (x∗
1, y∗
1,0) and (x∗
2, y∗
2,0) are still two equilibria of the embedding X-Y-Z
system.
Proof. From Theorem 5.2.1, we have proved that the Z-Usystem has the equilibrium
state (ze,0), where ze=a1−k1
k1b1if a1> k1. Moreover, the X-Ysystem has the equilibria
(xe,0) and (0, ye), where xe=α1−k3
k3β1and ye=γ1−k4
k4σ1, if α1> k3and γ1> k4. Let us consider
the X-Y-Zsystem. Suppose the equilibrium state exists. Then we have
α1x
1 + β1x
1
1 + β2y
1
1 + d2z−k3x= 0,(5.51)
γ1y
1 + σ1y
1
1 + σ2x
1
1 + d2z−k4y= 0,(5.52)
a1z
1 + b1z
1
1 + b2(x+y)−k1z= 0.(5.53)
83
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
1. (a) When y=z= 0, according to (5.51), we have
xe=α1−k3
k3β1
.(5.54)
Since α1, k3and β1are positive, we have the positive equilibrium solution if
α1> k3.
(b) When x=z= 0, according to (5.52), we have
ye=γ1−k4
k4σ1
.(5.55)
Since γ1, k4and σ1are positive, we have the positive equilibrium solution if
γ1> k4.
(c) When x=y= 0, according to (5.53), we have
ze=a1−k1
k1b1
.(5.56)
Since a1, k1and b1are positive, we have the positive equilibrium solution if
a1> k1.
2. When z= 0, the system will reduced to
α1x
1 + β1x
1
1 + β2y−k3x= 0,(5.57)
γ1y
1 + σ1y
1
1 + σ2x−k4y= 0,(5.58)
where (5.57) and (5.58) are the same as equations (5.17) and (5.18) in the proof of
Theorem 5.2.1.
It is clear to see that, for all cases, the conditions for the existence of these equilibria in the
X-Y-Zsystem are the same as those in the two bistable sub-systems Z-Uand X-Y.
This theorem shows that the existence conditions of equilibria in the embedded system
are the same as those of the two-node sub-systems. Thus, the information of two-node
sub-systems can be directly applied to the embedded system. For each equilibrium state
located on the axis, we give the following conditions of stability.
Theorem 5.2.5. If (xe,0) and (0, ye)are both stable states of X-Ysystem and (ze,0) is a
stable state of Z-Usystem.
84
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
1. The equilibrium state (xe,0,0) is stable if a1
1+b2xe< k1.
2. The equilibrium state (0, ye,0) is stable if a1
1+b2ye< k1.
3. The equilibrium state (0,0, ze)is stable if α1
1+d2ze< k3and γ1
1+d2ze< k4.
Proof. The Jacobian matrix J
J
J(x,y,z)= [J
J
Jij]3×3of the X-Y-Zsystem is dened by
J
J
J11 =∂˙x
∂x|(x,y,z)=(x0,y0,z0)=α1
(1 + β1x0)2
1
1 + β2y0
1
1 + d2z0−k3,(5.59)
J
J
J12 =∂˙x
∂y |(x,y,z)=(x0,y0,z0)=α1x0
1 + β1x0
−β2
(1 + β2y0)2
1
1 + d2z0
,(5.60)
J
J
J13 =∂˙x
∂z |(x,y,z)=(x0,y0,z0)=α1x0
1 + β1x0
1
1 + β2y0
−d2
(1 + d2z0)2,(5.61)
J
J
J21 =∂˙y
∂x|(x,y,z)=(x0,y0,z0)=γ1y0
1 + σ1y0
−σ2
(1 + σ2x0)2
1
1 + d2z0
,(5.62)
J
J
J22 =∂˙y
∂y |(x,y,z)=(x0,y0,z0)=γ1
(1 + σ1y0)2
1
1 + σ2x0
1
1 + d2z0−k4,(5.63)
J
J
J23 =∂˙y
∂z |(x,y,z)=(x0,y0,z0)=γ1y0
1 + σ1y0
1
1 + σ2x0
−d2
(1 + d2z0)2,(5.64)
J
J
J31 =∂˙z
∂x|(x,y,z)=(x0,y0,z0)=a1z0
1 + b1z0
−b2
(1 + b2(x0+y0))2,(5.65)
J
J
J32 =∂˙z
∂y |(x,y,z)=(x0,y0,z0)=a1z0
1 + b1z0
−b2
(1 + b2(x0+y0))2,(5.66)
J
J
J33 =∂˙z
∂z |(x,y,z)=(x0,y0,z0)=a1
(1 + b1z0)2
1
1 + b2(x0+y0)−k1.(5.67)
1. The Jacobian matrix at the equilibrium state (xe,0,0) = ( α1−k3
k3β1,0,0) is
J
J
J(xe,0,0) =
α1
(1+β1xe)2−k3−α1β2xe
1+β1xe
−α1d2xe
1+β1xe
0γ1
1+σ2xe−k40
0 0 a1
1+b2xe−k1
(5.68)
When xe= 0, the eigenvalues of the Jacobian matrix are
λ1=α1
(1 + β1xe)2−k3=α1
(1 + β1xe)2−α1
1 + β1xe
=−α1β1xe
(1 + β1xe)2,(5.69)
λ2=γ1
1 + σ2xe−k4,(5.70)
λ3=a1
1 + b2xe−k1.(5.71)
85
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
It is clear that λ1<0and λ2<0since (xe,0) = (α1−k3
k3β1,0) is a stable state of the
X-Ysystem. Thus, this equilibrium state is stable if
a1
1 + b2xe
< k1.(5.72)
2. The Jacobian matrix at the equilibrium state (0, ye,0) = (0,γ1−k4
k4σ1,0) is
J
J
J(0,ye,0) =
α1
1+β2ye−k30 0
−σ2γ1ye
1+σ1ye
γ1
(1+σ1ye)2−k4−d2γ1ye
1+σ1ye
0 0 a1
1+b2ye−k1
(5.73)
When ye= 0, the eigenvalues of the Jacobian matrix are
λ1=α1
1 + β2ye−k3,(5.74)
λ2=γ1
(1 + σ1ye)2−k4=γ1
(1 + σ1ye)2−γ1
1 + σ1ye
=−γ1σ1ye
(1 + σ1ye)2,(5.75)
λ3=a1
1 + b2ye−k1.(5.76)
It is clear that λ2<0and λ1<0since (0, ye) = (0,γ1−k4
k4σ1)is a stable state of the
X-Ysystem. Thus, this equilibrium state is stable if
a1
1 + b2ye
< k1.(5.77)
3. The Jacobian matrix at the equilibrium state (0,0, ze) = (0,0,a1−k1
k1b1)is
J
J
J(0,0,ze)=
α1
1+d2ze−k30 0
0γ1
1+d2ze−k40
−a1b2ze
1+b1ze
−a1b2ze
1+b1ze
a1
(1+b1ze)2−k1
(5.78)
86
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
When ze= 0, the eigenvalues of the Jacobian matrix are
λ1=α1
1 + d2ze−k3,(5.79)
λ2=γ1
1 + d2ze−k4,(5.80)
λ3=a1
(1 + b1ze)2−k1=a1
(1 + b1ze)2−a1
1 + b1ze
=−a1b1ze
(1 + b1ze)2.(5.81)
(5.82)
It is clear that λ3<0. Thus, this equilibrium state is stable if
α1
1 + d2ze
< k3and γ1
1 + d2ze
< k4.(5.83)
In addition, we give the following stable conditions for each equilibrium state that locates
within the 3-dimensional positive real space.
Theorem 5.2.6. Suppose (x∗, y∗)is a stable state of X-Ysystem, then the equilibrium
state (x∗, y∗,0) is also a stable state of the X-Y-Zsystem if
a1
1 + b2(x∗+y∗)< k1.(5.84)
Proof. The Jacobian matrix of X-Y-Zsystem J
J
J(x,y,z)= [J
J
Jij]3×3is dened in the proof
of Theorem 5.2.5. Assume that x∗and y∗are both non-zero, then the Jacobian matrix at
(x∗, y∗,0) is
J
J
J(x∗,y∗,0) =
α1
(1+β1x∗)2(1+β2y∗)−k3−α1β2x∗
(1+β1x∗)(1+β2y∗)2−α1d2x∗
(1+β1x∗)(1+β2y∗)
−γ1σ2y∗
(1+σ1y∗)(1+σ2x∗)2
γ1
(1+σ1y∗)2(1+σ2x∗)−k4−γ1d2y∗
(1+σ1y∗)(1+σ2x∗)
0 0 a1
1+b2(x∗+y∗)−k1
(5.85)
Substitute (5.17) and (5.18) into the Jacobian matrix (5.85). Since θx= 1 + β1x,ηy=
87
5.2. MODEL DEVELOPMENT FOR EMBEDDING METHOD
1 + β2y,ρy= 1 + σ1yand ξx= 1 + σ2x. Then, we have
J
J
J(x∗,y∗,0) =
α1(1−θx)
θ2
xηy−α1β2x∗
θxη2
y−α1d2x∗
θxηy
−γ1σ2y∗
ρyξ2
x
γ1(1−ρy)
ρ2
yξx−γ1d2y∗
ρyξx
0 0 a1
1+b2(x∗+y∗)−k1
.(5.86)
The eigenvalues of the Jacobian matrix (5.86) are
λ1=a1
1 + b2(x∗+y∗)−k1(5.87)
λ2,3=−Φ±pΦ2−τ
2θ2
xη2
yρ2
yξ2
x
.(5.88)
where Φand τare dened by (5.48) and (5.49) in the proof of Theorem 5.2.3, as follows:
Φ=ηyξx(ρ2
yξxα1β1x+θ2
xηyγ1σ1y),(5.89)
τ= 4θ2
xη2
yρ2
yξ2
xα1γ1xy(β1σ1ηyξx−β2σ2θxρy).(5.90)
Since (x∗, y∗)is a stable state of the X-Ysystem, based on Theorem 5.2.3, it is clear that
τis positive. Moreover, Φis always positive. The two eigenvalues λ2and λ3are negative
or have negative real part. Thus, the equilibrium state (x∗, y∗,0) is a stable state of the
X-Y-Zsystem if
a1
1 + b2(x∗+y∗)< k1.(5.91)
Theorem 5.2.5 and Theorem 5.2.6 describe the necessary conditions for the stability prop-
erties of the equilibria in the embedding X-Y-Zsystem. By applying these theorems, we
can further constrain the estimated parameters obtained from two-node systems so that
the embedding system can achieve tristability.
88
5.3. APPLICATION IN HEMATOPOIESIS
5.3 Application in hematopoiesis
5.3.1 Bistable models for GATA1-PU.1 and GATA-switching mod-
ules
For the two double-negative feedback loops with positive autoregulation in Figure 5.1C
and Figure 5.1D, we develop two mathematical models for the Z-Umodule (5.9) and X-
Ymodule (5.10). These two models have the same structure but with dierent model
parameters. Theorem 5.2.1 shows that there are ve possible non-negative equilibrium
states in these models. Theorem 5.2.2 indicates that two steady states located on the axis
are stable under the given conditions. In addition, Theorem 5.2.3 gives the conditions under
which the two possible steady states out of the axis are stable.
We further search for stable steady states of the model with randomly sampled param-
eters. Table 5.1 gives three types of bistable steady states. However, we have not found
any parameter samples to realize tristability. To test robustness properties, we conduct
perturbation tests by examining the bistable property of the model with slightly changed
model parameters [59,60]. Computation results show that, for the model with two stable
steady states located on the axis, we can nd a perturbed bistable model that has one
stable steady state is located on an axis but another is located out of the axis (see Table
5.2). These results suggest that the developed model has very good robustness properties
in terms of parameter variations.
We next use the approximate Bayesian computation (ABC) rejection algorithm [10,139]
to estimate model parameters based on the experimental data for erythroiesis and granu-
lopoiesis [151]. The data used here is obtained by using single-molecule RNA uorescent
in situ hybridization (smFISH) on mouse stem cells derived from hematopoietic tissue to
measure the transcription dynamics of genes GATA1,GATA2 and PU.1 [151]. We rst
estimate parameters in the X-Ymodule that describes regulations between genes GATA1
and PU.1 (5.10). It is assumed that the prior distribution of each parameter is a uni-
form distribution over the interval [0,100]. The distance between experimental data and
simulations are measured by
ρ(X
X
X,X∗
X∗
X∗) =
m
X
i=1
[|xi−x∗
i|+|yi−y∗
i|],
where (xi, yi) and (x∗
i, y∗
i) are the observed data and simulated data of the model at time
point tifor genes (X, Y ), respectively. Table 5.3 gives the estimated parameters of this
module. Figure 5.3A shows that the phase plane of the GATA1-PU.1 sub-system based on
estimated parameters, which shows that this system is bistable.
89
5.3. APPLICATION IN HEMATOPOIESIS
Regarding the Z-Umodule (5.9) that describes the regulation of GATA-switching, to
be consistent with the module structure, we rst assume that GATA1 and GATA2 form
a double negative feedback module with autoregulations, and will modify this assumption
later based on the experimentally observed mechanisms. Here the data of the auxiliary
variable Uis the sum of GATA1 and PU.1.Table 5.4 gives the estimated parameters of
the Z-Umodule.
α1= 15.665 β1= 0.4263 β2= 0.9047 k3= 0.1587
γ1= 89.4σ1= 1.0724 σ2= 0.4535 k4= 0.752
Table 5.3: Estimated model parameter values for module X-Y.
a1= 16.5b1= 0.6024 b2= 1.1k1= 0.6090
c1= 4.2934 d1= 0.3340 d2= 3.6k2= 0.2143
Table 5.4: Estimated model parameter values for module Z-U.
An experimental study has identied GATA2 at chromatin sites in early-stage erythrob-
lasts [14], when expression levels of GATA1 increase as erythropoiesis progresses, GATA1
displaces GATA2 from chromatin sites. To describe the mechanisms of GATA-switching,
we introduce an additional rate constant k∗over a time interval [t1, t2]for the displacement
rate of GATA2 proteins during the process of GATA-switching, given by
k∗=
k∗
0t∈[t1, t2],
0otherwise.
(5.92)
Since the displacement of GATA2 protein increasing, the concentration of GATA1 proteins
around the binding site will increase proportionally to k∗. Hence, we use rate ψk∗zfor the
increase of GATA1 during GATA-switching, where ψis a control parameter to adjust the
availability of GATA1 proteins around chromatin sites. Then the GATA-switching module
is modelled by
dz
dt =a1z
1 + b1z
1
1 + b2u−k1z−k∗z,
du
dt =c1u
1 + d1u
1
1 + d2z−k2u+ψk∗z,
(5.93)
where zand uare expression levels of GATA2 and GATA1, respectively. Note that the
bistability property of this module is realized by model (5.93) using k∗= 0.Figure 5.3B
gives two simulations for an unsuccessful switching and a successful switching. It is assumed
90
5.3. APPLICATION IN HEMATOPOIESIS
that the GATA-switching occurs over the interval [t1, t2] = [500,3500]. Simulations show
that an adequate displacement of GATA2 is the key to achieve GATA-switching using a
relatively large value of k∗
0≤1.
5.3.2 Tristable model of the GATA1-GATA2-PU.1 network
After successfully realizing the bistability in double-negative feedback loops with positive
autoregulation, we next incorporate the GATA1-PU.1 regulatory module into the GATA-
switching module to realize the tristability of HSC dierentiation. We use expression levels
of GATA1 in the GATA-switching module to represent the total levels of GATA1 plus PU.1,
and embed these two modules together (5.50) (see Theorems 5.2.4 to 5.2.6). The model
parameters have the same values as the corresponding parameters in the Z-Umodule or
the X-Ymodule. Figure 5.4 gives the 3D phase portrait of the embedded system, which
shows that the embedding model faithfully realizes the three stable steady states in the
two sub-modules, which also suggests that the proposed embedding method is a robust
approach to develop high order multistable models based on bistable models.
As mentioned in the previous subsection, the GATA-switching module is not a perfect
double-negative feedback loop. In fact, experimental studies suggest that GATA2 moder-
ately simulates the expression of gene GATA1 [46] (shown in Figure 5.5). Thus we make
a modication to model (5.50) by adding the term d∗zin the rst equation to represent
a weak positive regulation from GATA2 to GATA1. In addition, to avoid zero basal gene
expression levels, we add a constant to each equation of the proposed model (5.50). The
modied model is given by,
dx
dt =α0+α1x
1 + β1x
1
1 + β2y
1 + d∗z
1 + d2z−k3x+ψk∗z,
dy
dt =γ0+γ1y
1 + σ1y
1
1 + σ2x
1
1 + d2z−k4y,
dz
dt =a0+a1z
1 + b1z
1
1 + b2(x+y)−k1z−k∗z,
(5.94)
where x,y,zrepresent expression levels of genes GATA1,GATA2 and PU.1, respectively.
The values of α0, γ0, a0and d∗are carefully selected so that the model simulation still
matches experimental data and the model has at least three stable steady states (see Table
5.5). Figure 5.3C gives the 3D phase portrait of system (5.94) with k∗= 0. Using the esti-
mated parameters (see Table 5.3 to Table 5.5), the modied system (5.94) actually achieves
quad-stability. In three stable states, one of the three genes has high expression levels but
the other two have low expression levels. The fourth stable state has low expression lev-
els (2.3364,0.7417,8.6664) of the three genes. In fact, these are exact four transcriptional
91
5.3. APPLICATION IN HEMATOPOIESIS
α0= 0.045 γ0= 0.1a0= 1 d∗= 0.01
Table 5.5: Estimated additional model parameter values for modied model.
states that have been observed in experimental studies, namely a PU.1highGata1/2low state
(P1H); a Gata1highGATA2/PU.1low state (G1H); a Gata2highGATA1/PU.1low state (G2H);
and a state with low expression of all three genes (LES CMP) [151]. Compared with ex-
isting modelling studies, our embedding model (5.94) for the rst time realizes the state
with low expression levels of all three genes. Note that the embedding model is based on
the assumption of GATA-switching, namely the exchange of GATA1 for GATA2 at key
chromatin sites, which controls the expression of genes GATA1 and GATA2. However,
a low level of GATA2 at the chromatin site does not mean the total level of GATA2 in
cells is also low. This may be the reason for the dierence between the simulated state
Gata1highGATA2/PU.1low state (G1H) (namely only GATA1 has high expression) and the
experimentally observed state Gata1/2highPU.1low state (G1/2H) (namely both GATA1
and GATA2 have high expression levels) [151].
5.3.3 Stochastic model for realizing heterogeneity
Although the modied embedding model has successfully realized the quad-stability proper-
ties, this deterministic model cannot describe the heterogeneity in the cell fate commitment.
Thus, the next question is whether we can use a stochastic model to realize experimental
data showing dierent gene expression levels in single cells [151]. To answer this question,
we propose a stochastic dierential equations model in Itô form to describe the functions
of noise during the cell lineage specication [155], given by
dX(t) = α0+α1X(t)
1 + β1X(t)
1
1 + β2Y(t)
1 + d∗Z(t)
1 + d2Z(t)−k3X(t) + ψk∗Z(t)dt + [ω1(k3X(t) + ψk∗Z(t))]dW 1
t,
dY (t) = γ0+γ1Y(t)
1 + σ1Y(t)
1
1 + σ2X(t)
1
1 + d2Z(t)−k4Y(t)dt + [ω2k4Y(t)]dW 2
t,
dZ(t) = a0+a1Z(t)
1 + b1Z(t)
1
1 + b2(X(t) + Y(t)) −k1Z(t)−k∗Z(t)dt + [ω3(k1+k∗)Z(t)]dW 3
t,
(5.95)
where W1
t,W2
tand W3
tare three independent Wiener processes whose increment is a Gaus-
sian random variable ∆Wt=W(t+∆t)−W(t)∼N(0, ∆t), and ω1, ω2and ω3represent
noise strengths. The reason for selecting Itô form is to maintain the mean of the stochas-
tic system (5.95) as the corresponding deterministic system (5.94). To test the inuence
92
5.3. APPLICATION IN HEMATOPOIESIS
Expression level
Time
GATA2
GATA1
GATA2
GATA1
05000 10000
0
10
20
30
40
0
10
20
30
40
50
0 5000 10000
B
A
a
b
c
d
e
X
Y
0 50 100 150 200 250
0
20
40
60
80
100
dx/dt = 0
dy/dt = 0
C
(0.0288, 0.0038, 41.8227)
(2.3364, 0.7417, 8.6664)
(51.7224, 2.95867, 0.0459)
(0.2486, 91.5198, 0.0216)
0
50
100
Y
60
0
20
40
Z
0
20
40
60
X
Figure 5.3: Realization of tristability by embedding two bistable sub-systems in
hematopoiesis.
(A) Phase plane of the GATA1-PU.1 module showing the bistable property of the proposed
model, where a and b are stable steady states; c, d and e are saddle states.
(B) Simulations of GATA-switching of model (5.93). Upper panel: An unsuccessful switch-
ing with a small value of k∗
0due to the displacement of GATA2 not being enough for cells
to leave the HSCs state (Zstate); Lower panel: A successful switching with sucient dis-
placement of GATA2 by using a large value of k∗
0. Cells leave the HSCs state and enter the
U state.
(C) The 3D phase portrait of the modied embedding model (5.94) with k∗= 0. Four red
points are stable steady states, while the three black points are saddle states.
93
5.3. APPLICATION IN HEMATOPOIESIS
Figure 5.4: The 3D phase portrait of the embedded system. Based on the exper-
imental data, the proposed model successfully realize the tristability properties, with the
same parameter values presented in the Table 5.3 and Table 5.4. Red points: stable steady
states; Black points: saddle states.
GATA1
PU.1
GATA2
Figure 5.5: The network structure of GATA1-GATA2-PU.1. ’→’ and ’⊣’ denote the
activating and inhibiting regulations, respectively.
94
5.3. APPLICATION IN HEMATOPOIESIS
of GATA-switching on determining the transitions between dierent states, we introduce
noise to coecient k∗and consequently to the three degradation processes in the model.
We use the semi-implicit Euler method to simulate the proposed model [134]. Figure 5.6
provides four stochastic simulations for four dierent types of cell fate commitments with
model parameters k∗
0= 0.52,ψ= 0.0005,ω1= 0.04, and ω2=ω3= 0.08.Figure 5.6A and
Figure 5.6B show two simulations of unsuccessful GATA switching when the displacement
of GATA2 is not large enough. However, the sucient displacement of GATA2 can trigger
successful GATA switching, which leads to either the GMP state with high expression levels
of PU.1 in Figure 5.6C or the MEP state with high expression levels of GATA1 in Figure
5.6D.
To examine the heterogeneity of hematopoiesis with dierent displacement rates k∗
0and
ψtogether, we generate 20000 stochastic simulations for each set of k∗
0and ψvalues over
the range of [0.04,1] and [0,0.001], respectively. The ranges of k∗
0and ψare determined by
numerical testing. If all stochastic simulations move to a single stable state for the given
k∗
0and ψvalues, we change the lower bound and/or upper bound of the value range in
order that simulations may move to dierent stable states for the given k∗
0and ψvalues.
To show the boundary of parameter space, we also keep certain sets of parameter values
with which simulations move to one specic stable state. Figure 5.7A gives proportions of
simulations that have successful switching in 20000 simulations. When the value of k∗
0is
between 0.1 and 0.2, the displacement speed of GATA2 is low, which gives limited relief
of negative regulation to PU.1 but GATA1 increases gradually due to GATA-switching
and weak positive regulation from GATA2 to GATA1. Thus nearly all cells choose the
MEP state with high expression levels of GATA1. However, if the value of k∗
0is larger,
the negative regulation from GATA2 to PU.1 is eliminated quickly, thus the competition
between GATA1 and PU.1 will lead cells to dierent lineages. When the value of k∗
0is
relatively large but the value of ψis relatively small, the increase of GATA1 is slow due to
the smaller value of ψin GATA-switching. However, the negative regulation from GATA2
to PU.1 declines rapidly due to the larger value of k∗
0. Thus, Figure 5.7B shows that the
combination of larger k∗
0and smaller ψvalues allows more cells to move to the GMP lineage
with high expression level of PU.1. If there is no winner in the competition between GATA1
and PU.1, the cell then goes to the state with low expression levels of three genes (namely
LE3G). Figure 5.7C shows that, when the value of k∗
0is larger than 0.2, there are four types
of simulations as shown in Figure 5.6 for a set of k∗
0and ψvalues. We use a MATLAB
package [11] to give the violin plot for the expression distributions of three genes in three
dierent cellular states. The violin plot is a combination of a box plot and a kernel density
plot that illustrates data peaks. The violin plots in Figure 5.7D match the experimental
observations shown in Fig.1e in [151].
95
5.3. APPLICATION IN HEMATOPOIESIS
GATA1
GATA2
PU.1
0 5000 10000
0
10
20
30
40
5
15
25
35
45
0 5000 10000
0
10
20
30
40
5
15
25
35
45
0 5000 10000
0
20
40
60
80
100
0 5000 10000
0
10
30
50
70
20
40-
60
80
Expression level
A
Time
GATA1
GATA2
PU.1
GATA1
GATA2
PU.1
GATA1
GATA2
PU.1
B
C D
Figure 5.6: Stochastic simulations showing four stable states that correspond to
the experimentally observed four dierent states.
(A) Simulation of unsuccessful GATA switching that makes the cell stay at the HSC state,
which is the G2H state.
(B) Simulation of unsuccessful GATA switching but the cell enters the state with low
expression of all three genes, which is the LES CMP state.
(C) Simulation of successful switching that leads to the GMP state with high expression
levels of PU.1, which is the P1H state.
(D) Simulation of successful switching that leads to the MEP state with high expression
levels of GATA1, which is the G1H state.
Regarding the size of basins of attraction, we rst calculate the distances between the
stable states and saddle points in Figure 5.3C, which are given in Table 5.6. The minimal
distance between the G1H state and three saddle points is much larger than the minimal
distances of the other three stable states to the saddle points, which suggests that the size of
basin of attraction for the G1H state is larger than those of the other three stable states. In
addition, we observe the variability of stable states in 20000 stochastic simulations. Table
5.7 shows that the variations of GATA1 in the G1H state are much larger than those of the
other two genes when having high expression levels.
We also study the relative frequency of LE3G state. Figure 5.8 shows that, for a xed
96
5.4. SUMMARY
value of parameter ψ, the frequency increases as the value of k∗
0increases. In addition,
for a xed value of k∗
0, the frequency decreases as the value of ψincreases. The variation
of parameter ψis much more important than that of parameter k∗
0. For the simulations
showing in Figure 5.7D, the frequency is 0.1080 with k∗
0= 0.52 and ψ= 0.0005.Figure 5.7D
and Figure 5.8 suggest that more cells stay at the LE3G or P1H (GMP) state if GATA2
leaves the chromatin site fast (i.e. a large k∗
0value) and the expression of GATA1 is slow
(i.e. a small ψvalue). However, if the expression of GATA1 is fast (i.e. a large ψvalue),
more cells will transit to G1H (MEP) state and the frequency of the LE3G state is low,
which is consistent with the results in a recent study [31].
5.4 Summary
Waddington’s epigenetic landscape is a famous metaphor for how gene regulation drives
cell development. Marbles symbolise cells that roll downhill over a landscape of bifurcating
valleys. Each new valley represents a possible cell fate, while the ridges between the valleys
keep the cell fate after it has been determined [40]. Inspired by Waddington’s epigenetic
landscape model, we assume that a multistable system makes a series of binary decisions
for the selection of multiple evolutionary pathways. Compared with modelling studies for
multistable networks, it is relatively easy to develop models with bistability and there is
a rich literature for studying the bistable networks [37,39,71,84,107,118,135]. Thus,
our proposed embedding method is a novel and eective approach to develop multistable
models based on well-studied models with bistable properties. In addition, using the cell fate
commitment in hematopoiesis as the test problem, we have successfully realized tristability
in the GATA-PU.1 module by embedding two bistable modules together. More importantly,
by modifying the model using the experimentally determined regulatory mechanisms, the
developed model, that have no high co-operativity coecients, successfully realizes four
stable states that have been observed recently in a recent experimental study [151].
97
5.4. SUMMARY
A
C
D
0
1
0.2 0.4 0.6 0.8 ╳
10
-3
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0╳10-3
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0.
0.
0
0.5
1
0
0.5
1
0
1
2
3
4╳10-3
Expression level
Type of Cells
GATA2 GATA1 PU.1
B
Figure 5.7: Distributions of dierent cell types derived from stochastic simula-
tions.
(A) Frequencies of cells having successful switching for each set of parameters (k∗
0, ψ).
(B) Ratios of GMP cells to MEP cells when the cells have successful switching in (A) for
each set of parameters (k∗
0, ψ).
(C) Parameter sets of (k∗
0, ψ)that generate stochastic simulations with four steady states
as shown in Figure 5.6 (yellow part) or with two or three states (blue part).
(D) Violin plots of the natural log normalised (expression level per cell + 1) distributions
for three genes in dierent cell states derived from stochastic simulations with parameters
k∗
0= 0.52 and ψ= 0.0005.
98
5.4. SUMMARY
Unstable States
Stable States (0.3170, 0.0100, 31.4818) (0.6179, 78.1130, 0.0265) (13.3231, 2.7597, 0.9070)
G1H (51.7224, 2.9587, 0.0459) 60.3277 90.8837 38.4095
P1H (0.2486, 91.5198, 0.0216) 96.7667 13.4119 89.7222
G2H (0.0288, 0.0038, 41.8227) 10.3449 88.5907 43.1095
LE3G (2.3364, 0.7417, 8.6664) 22.9163 77.8712 13.6010
Table 5.6: Distances between four stable states and three saddle points shown in the phase portrait of Figure 5.3C
- Related to Results.
99
5.4. SUMMARY
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
The value of 10-3
0
0.1
0.2
0.3
0.4
0.5
0.6
The relative frequency of LE3G state
k*
0 = 0.2
k*
0 = 0.3
k*
0 = 0.4
k*
0 = 0.5
k*
0 = 0.6
k*
0 = 0.7
k*
0 = 0.8
k*
0 = 0.9
k*
0 = 1.0
Figure 5.8: The relative frequency of LE3G state with dierent values of k∗
0-
Related to Results.
G1H
(MEP)
State
P1H
(GMP)
State
G2H
(HSC)
State
LE3G
State
GATA1
Deterministic
Solution 51.7224 0.2486 0.0288 2.3364
Min 34.8137 0.1344 0.0243 1.5030
Max 77.0405 1.0789 0.0374 4.2895
PU.1
Deterministic
Solution 2.9587 91.5298 0.0038 0.7414
Min 1.6972 70.6865 0.0026 0.4675
Max 4.3167 105.4327 0.0057 1.2441
GATA2
Deterministic
Solution 0.0459 0.0216 41.8227 8.6664
Min 0.0265 0.0180 36.5418 4.8823
Max 0.0868 0.0351 47.7615 12.7441
Table 5.7: The expression variations in stochastic simulations around the four
stable states of the corresponding deterministic model - Related to Results. The
deterministic solutions (GATA1, PU.1, GATA2) for G1H, P1H, G2H and LE3G states are
(51.7224, 2.9587, 0.0459), (0.2486, 91.5298, 0.0216), (0.0288, 0.0038, 41.8227) and (2.3364,
0.7414, 8.6664), respectively (also shown in Figure 5.3C). The minimal/maximal expression
levels of each gene are obtained from 20000 stochastic simulations for each state.
100
6
Conclusions and open questions
Understanding the dynamical mechanism of genetic regulatory networks in cell fate de-
termination during hematopoiesis is crucial for biologists to control the cell dierential
pathways. This is essential for preventing and treating many diseases caused majorly by
genetic factors such as leukemia. This thesis has contributed to providing novel and ef-
fective methods for describing and studying the dynamical properties of genetic regulation
and accurately realizing the experimental results. This doctoral work aims at developing
mathematical and computational methods to better understand the detailed mechanism
of cell fate determination based on the experimental data [47,88,92,151]. This chapter
consists of two parts. The rst part reviews and summaries the contribution made by each
study. The second part identies the limitations of each study and remarks some interesting
question for future research.
6.1 Conclusion
To infer the underlying regulatory network, Chapter 3 and Chapter 4 provide a novel
method by combining both top-down approaches (i.e. probabilistic graphical model) and
bottom-up approaches (i.e. mathematical model). We rst applied the Forward Search
Algorithm (FSA) in Chapter 3 and Extended Forward Search Algorithm (EFSA) in Chapter
4to simplify the network topology and reduced the number of unknown parameters in the
mathematical model. Then the Genetic-Algorithm was used to estimate the unknown
parameters. The combination of these two approaches reduced the errors in simulation and
also improved the robustness property of the mathematical model. We then reduced the
network complexity by removing edges from the network, rather than studying the core
101
6.1. CONCLUSION
network and then adding the edges to the network in Tian’s previous study [147]. The
reason for changing the method from “adding edge” to “removing edge” in these two works
is mainly due to the high computational cost in the “adding edge” tests since the number of
candidate edges in the ”removing edge test” is much smaller than that in the “adding edge
test”. Thus, in these two works, we used the FSA and EFSA, respectively, to obtain more
candidate edges and then used the dynamic model to remove unimportant edges. If the
number of potential regulations derived from the probabilistic graphical model is relatively
large, the removal of one single regulation from the potential network may not have any
changes in simulation error. Numerical results suggested that a couple of regulations should
be removed simultaneously in order to achieve changes in simulation error, especially for
the study in Chapter 4.
The inferred regulatory networks from our proposed methods are partially supported
by experimental observations. For example, the regulation of GATA1-GATA2-PU.1 com-
plex in our inferred networks agrees with the experimental results [88]. The GATA1-PU.1
heterodimer plays an important role in regulating the hematopoiesis [161], which is also
included in our inferred model. In addition, the Ldb1-Lmo2 dimer is activated with sig-
nicantly expression proles during the erythroid dierentiation process [157], which is
consistent with our prediction. However, not all of the predictions can be conrmed by the
existing experimental observations, especially for the regulation from protein heterodimers
and/or synergistic eects to genes. The rst explanation is that the non-linear terms in
our mathematical model are introduced by mathematical operation (i.e. the Taylor series).
Some of these non-linear terms may be needed for realizing the nonlinear dynamics accu-
rately, but not supported by biological mechanisms. Note that another inference method,
called semi-supervised method, can include the validated regulations rst and then infer the
invalidated regulations [83]. Secondly, our inferred regulatory network may predict some
potential possible regulations between genes and from non-linear terms to genes, which
may be conrmed by future experimental studies. Thus, the rst contribution of these two
chapters is the inferred regulations in this work may provide testable prediction for further
experimental studies to explore the detailed mechanism of hematopoiesis. Another contri-
bution is that, in Chapter 4, we not only use EFSA to predict network structures, but also
use the 2nd order truncated Taylor expansion as a dynamic model to study detailed regula-
tory mechanisms. The proposed EFSA provides a method for inferring network structure
with both genes, protein heterodimers and/or synergistic eects. In addition, the truncated
Taylor expansion gives our dynamic model the ability to describe a nonlinear system while
having relatively few unknown parameters, as a linear model does. Therefore, it provides
a new idea to modelling the genetic regulatory networks. The proposed method can be
applied to model other regulatory networks and biological systems as well.
102
6.2. LIMITATIONS OF STUDY AND OPEN QUESTIONS
In Chapter 5, we rst propose an embeddedness principle, then used the toggle switch
as a testing system to test the eectiveness of our methodology. We then selected the
core driver of cell fate commitment, GATA1-GATA2-PU1, as the second testing system to
study cell fate determination in hematopoiesis. Despite the assumption of a binary choice
in each sub-module, the developed model is able to realize a rich variety of dynamics. Our
research suggests that, depending on the properties of bistable systems, the embedding
model of two bistable modules may have more than three stable steady states. In addition,
using the embedding method in Figure 5.1, the state Uis not a meta-stable state but
actually disappears from the system. Simulations show that, when the system leaves the
high GATA2 expression state due to GATA-switching, genes GATA1 and PU.1 begin to
increase their expression levels. Each stochastic simulation will reach one of the steady
states with either high GATA1 levels or high PU.1 levels or return to the stem cell state.
These simulations are consistent with the CLOUD-HSPC model in which dierentiation
is a process of uncommitted cells in transitory states that gradually acquire uni-lineage
priming [47,70,142]. In addition, stochastic simulations demonstrate that noise plays a key
role in determining dierent cell dierentiation pathways. Therefore, our proposed model
successfully contributes a novel approach to develop mathematical models for realising
multistability and heterogeneity in complex systems.
6.2 Limitations of study and open questions
As discussed in Chapter 3 and Chapter 4, the study raised a number of important issues
in the study of genetic regulations. One problem is that our nonlinear model cannot t
all the expression data very well if there is noise in the data. The noise in expression
data may increase the simulation error of our proposed model. It is a challenging issue
in mathematical modelling, if the noise ratio in expression data is large. Large variations
in the data may lead to incorrect inference results. In that case, stochastic modelling
therefore may be a more appropriate approach to describe the noise in gene expression
data [23,34,133]. However, a high dimensional system of stochastic models is dicult
to nd the analytic solutions. Thus, the development and convergence analysis of numer-
ical schemes for such a huge stochastic system with the network topological information
is essential. This is also an open question for future research. In addition, the Gaussian
graphical model is based on a covariance matrix, which only measures the linear relationship
between dierent components. Currently, other approaches, such as mutual information
and conditional mutual information, have been used to measures both linear and nonlinear
correlation between the gene expression data [162,163,165]. The question is how to de-
velop the graphical model with nonlinear information? Moreover, this research determines
103
6.2. LIMITATIONS OF STUDY AND OPEN QUESTIONS
the regulatory mechanisms based on numerical simulation and robustness property. More
information from experimental studies will be important to improve the accuracy of the
model and make more reasonable predictions. We may also use other key criteria to select
mathematical models, such as Akaike’s Information Criterion (AIC), Bayesian Information
Criterion (BIC), and Bayesian factor [55]. The another problem is how to speed up the
edge deletion or addition in the selection of Gaussian Graphical model? Since we have the
slow network model selection in either adding and deleting edges in Gaussian graphical
mode. Finally, there is a high computational cost in using a genetic algorithm to process
model parameter estimation. Exploring other ecient inference methods is necessary if we
want to apply our method to a large regulatory network with more genes and their protein
heterodimers and/or synergistic eects. All these issues will be the interesting topics for
future research.
In Chapter 5, even though the proposed model has successfully achieved multistabil-
ity, some equilibrium states have gene expression levels at zero, which is dierent from
experimental observations. The rst issue is therefore the development of mathematical
models that are able to understand experimental data accurately. In addition, this work
uses dierential equation models to determine stable states and then employs correspond-
ing stochastic models to realize the functions of noise in determining the uctuations of
gene expression. However, this continuous stochastic process is dierent from the discrete
nature of the gene expression process. The challenge is how to determine conditions for
realizing the multistable properties in stochastic models with discrete bursting processes.
When considering the gene expression process, existing gene expression bursting models
will use a xed constant to represent the time step at which bursting occurs and the
amount of transcription per burst. A more realistic scenario would be that depending on
the propensity of each chemical reaction, the time step at which each burst occurs and
the transcription size after it occurs should not be a constant, but a function related to
the propensity function. We now need to put aside the perception that the time interval
of each bursting occurrence and the amount of transcription are constants, and instead
model the bursting process innovatively in a functional form. However, the question is
how to determine the distribution of time steps and bursting size? Moreover, in this study
the stable states are achieved by a model without high cooperativity (i.e. Hill coecient
n= 1). Recently, the dynamics of toggle triad with self-activations have attracted much
attention [31,160]. Mathematical models with high cooperativity have been developed to
achieve pentastable, namely a hybrid X/Y state with high X, high Yand low Z. We
tried to realise pentastability by using our proposed model with high cooperativity (n= 2
or 3), but numerical tests were not successful. Thus, high cooperativity in self-activation
may be essential to realise pentastable. Lastly, as we introduced in Chapter 2, HSCs have
104
6.2. LIMITATIONS OF STUDY AND OPEN QUESTIONS
the capacity to dierentiate into all blood cells, and dierent cell types can be considered
as dierent equilibrium states within a system. Therefore, this is an ideal test system to
develop mathematical models with multistable dynamics. We achieved tristability or even
quadra-stability by embedding two binary systems, which produces results comparable to
those obtained in biological experiments [151]. However, the hematopoietic system is com-
plex, and modelling the entire system directly is dicult. If we can identify some bistable
modules within the hematopoietic system and nd connections between them. In that case,
our approach could theoretically construct a higher-order multistable system by embedding
more bistable systems, thereby allowing us to describe the mechanism of cell fate determi-
nation in hematopoiesis. However, the important question is how to embed more modules
with more TFs to develop mathematical models with more stable states. All these questions
will be interesting topics for further research. A recent study designed a powerful approach
to study the stochastic transitions by using the energy landscape approach [57]. This is
also an interesting and important topic for future study.
105
Bibliography
[1] Ackers, G. K., Johnson, A. D., and Shea, M. A. (1982). Quantitative model for gene
regulation by lambda phage repressor. Proc. Natl. Acad. Sci. U.S.A., 79(4):1129–1133.
[2] Aggarwal, R., Lu, J., Pompili, V. J., and Das, H. (2012). Hematopoietic stem cells:
transcriptional regulation, ex vivo expansion and clinical application. Curr. Mol. Med.,
12(1):34–49.
[3] Ali Al-Radhawi, M., Del Vecchio, D., and Sontag, E. D. (2019). Multi-modality in gene
regulatory networks with slow promoter kinetics. PLoS Comput. Biol., 15(2):1–27.
[4] Angeli, D., Ferrell, J. E., and Sontag, E. D. (2004). Detection of multistability, bifurca-
tions, and hysteresis in a large class of biological positive-feedback systems. Proc. Natl.
Acad. Sci. U.S.A., 101(7):1822–1827.
[5] Apri, M., Molenaar, J., Gee, M. d., and Voorn, G. v. (2010). Ecient Estimation
of the Robustness Region of Biological Models with Oscillatory Behavior. PLoS ONE,
5(4):e9865.
[6] Baltimore, D. (1970). Viral RNA-dependent DNA Polymerase: RNA-dependent DNA
Polymerase in Virions of RNA Tumour Viruses. Nature, 226(5252):1209–1211.
[7] Banaji, M. and Pantea, C. (2018). The inheritance of nondegenerate multistationarity
in chemical reaction networks. SIAM J. Appl. Math., 78(2):1105–1130.
[8] Bar-Joseph, Z., Gitter, A., and Simon, I. (2012). Studying and modelling dynamic
biological processes using time-series gene expression data. Nat. Rev. Genet., 13(8):552–
564.
[9] Bastiaansen, R., Jaïbi, O., Deblauwe, V., Eppinga, M. B., Siteur, K., Siero, E., Mermoz,
S., Bouvet, A., Doelman, A., and Rietkerk, M. (2018). Multistability of model and real
dryland ecosystems through spatial self-organization. Proc. Natl. Acad. Sci. U.S.A.,
115(44):11256–11261.
[10] Beaumont, M. A., Zhang, W., and Balding, D. J. (2002). Approximate bayesian
computation in population genetics. Genetics, 162(4):2025–2035.
[11] Bechtold, B. (2015). Violin Plots for Matlab.
[12] Birbrair, A. and Frenette, P. S. (2016). Niche heterogeneity in the bone marrow. Ann.
N. Y. Acad., 1370(1):82–96.
106
BIBLIOGRAPHY
[13] Bokes, P., King, J. R., and Loose, M. (2009). A bistable genetic switch which does not
require high co-operativity at the promoter: a two-timescale model for the PU.1-GATA-1
interaction. Math. Med. Biol., 26(2):117–32.
[14] Bresnick, E. H., Lee, H.-Y., Fujiwara, T., Johnson, K. D., and Keles, S. (2010). Gata
switches as developmental drivers. J. Biol. Chem., 285(41):31087–31093.
[15] Bruijn, M. d. and Dzierzak, E. (2017). Runx transcription factors in the development
and function of the denitive hematopoietic system. Blood, 129(15):2061–2069.
[16] Cedar, H. and Bergman, Y. (2011). Epigenetics of haematopoietic cell development.
Nat. Rev. Immunol., 11(7):478–88.
[17] Chang, A. N., Cantor, A. B., Fujiwara, Y., Lodish, M. B., Droho, S., Crispino, J. D.,
and Orkin, S. H. (2002). GATA-factor dependence of the multitype zinc-nger pro-
tein FOG-1 for its essential role in megakaryopoiesis. Proc. Natl. Acad. Sci. U.S.A.,
99(14):9237–9242.
[18] Chang, H. H., Oh, P. Y., Ingber, D. E., and Huang, S. (2006). Multistable and
multistep dynamics in neutrophil dierentiation. BMC Cel l Biol., 7(1):11.
[19] Chapra, S. C. (2012). Applied Numerical Methods with MATLAB for Engineers and
Scientists. McGraw-Hill.
[20] Chen, S. and Mar, J. C. (2018). Evaluating methods of inferring gene regulatory
networks highlights their lack of performance for single cell gene expression data. BMC
Bioinform., 19(1):232.
[21] Chickarmane, V., Enver, T., and Peterson, C. (2009). Computational modeling of the
hematopoietic erythroid-myeloid switch reveals insights into cooperativity, priming, and
irreversibility. PLoS Comput. Biol., 5(1):e1000268.
[22] Chippereld, A. J., Fleming, P. J., and Fonseca, C. M. (1994). Genetic algorithm tools
for control systems engineering. In Proceedings of Adaptive Computing in Engineering
Design and Control, volume 128, page 133.
[23] Chowdhury, A. R., Chetty, M., and Evans, R. (2015). Stochastic S-system modeling
of gene regulatory network. Cogn. Neurodyn., 9(5):535–547.
[24] Craciun, G., Tang, Y., and Feinberg, M. (2006). Understanding bistability in complex
enzyme-driven reaction networks. Proc. Natl. Acad. Sci. U.S.A., 103(23):8697–8702.
[25] Crick, F. (1970). Central Dogma of Molecular Biology. Nature, 227(5258):561–563.
[26] Crick, F. H. (1958). On protein synthesis. In The Symposia of the Society for Experi-
mental Biology 12, pages 138–163.
[27] Csete, M. E. and Doyle, J. C. (2002). Reverse engineering of biological complexity.
Science, 295(5560):1664–1669.
[28] de Jong, H. (2002). Modeling and simulation of genetic regulatory systems: a literature
review. J. Comput. Biol., 9(1):67–103.
107
BIBLIOGRAPHY
[29] Del Sol, A. and Jung, S. (2020). The importance of computational modeling in stem
cell research. Trends Biotechnol., 39(2):126–136.
[30] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from
incomplete data via the EM algorithm. J. R. Stat. Soc. Series. B Stat. Methodol.,
39(1):1–38.
[31] Duddu, A. S., Sahoo, S., Hati, S., Jhunjhunwala, S., and Jolly, M. K. (2020a). Multi-
stability in cellular dierentiation enabled by a network of three mutually repressing
master regulators. J. R. Soc. Interface, 17(170):20200631.
[32] Duddu, S., Chakrabarti, R., Ghosh, A., and Shukla, P. C. (2020b). Hematopoietic
Stem Cell Transcription Factors in Cardiovascular Pathology. Front. Genet., 11:588602.
[33] Du, C., Smith-Miles, K., Lopes, L., and Tian, T. (2012). Mathematical modelling of
stem cell dierentiation: the PU.1–GATA-1 interaction. J. Math. Biol., 64(3):449–468.
[34] El Samad, H., Khammash, M., Petzold, L., and Gillespie, D. (2005). Stochastic mod-
elling of gene regulatory networks. Int. J. Robust Nonlinear Control, 15(15):691–711.
[35] Eling, N., Morgan, M. D., and Marioni, J. C. (2019). Challenges in measuring and
understanding biological noise. Nat. Rev. Genet., 20(9):536–548.
[36] Euler, L. (1768). Institutiones calculi integralis. Lipsiae Et Berolini.
[37] Fang, X., Liu, Q., Bohrer, C., Hensel, Z., Han, W., Wang, J., and Xiao, J. (2018). Cell
fate potentials and switching kinetics uncovered in a classic bistable genetic switch. Nat.
Commun., 9(1):2787.
[38] Feliu, E., Rendall, A. D., and Wiuf, C. (2020). A proof of unlimited multistability for
phosphorylation cycles. Nonlinearity, 33(11):5629–5658.
[39] Feng, J., Kessler, D. A., Ben-Jacob, E., and Levine, H. (2014). Growth feedback as a
basis for persister bistability. Proc. Natl. Acad. Sci. U.S.A., 111(1):544–549.
[40] Ferrell, J. E. (2012). Bistability, bifurcations, and waddington’s epigenetic landscape.
Curr. Biol., 22(11):R458–R466.
[41] Friedman, A. D. (2007). Transcriptional control of granulocyte and monocyte devel-
opment. Oncogene, 26(47):6816–6828.
[42] Gardner, T. S., Cantor, C. R., and Collins, J. J. (2000). Construction of a genetic
toggle switch in Escherichia coli. Nature, 403(6767):339–342.
[43] Gekas, C., Rhodes, K. E., Gereige, L. M., Helgadottir, H., Ferrari, R., Kurdistani,
S. K., Montecino-Rodriguez, E., Bassel-Duby, R., Olson, E., Krivtsov, A. V., Arm-
strong, S., Orkin, S. H., Pellegrini, M., and Mikkola, H. K. A. (2009). Mef2C is a
lineage-restricted target of Scl/Tal1 and regulates megakaryopoiesis and B-cell home-
ostasis. Blood, 113(15):3461–3471.
108
BIBLIOGRAPHY
[44] Gelens, L., Beri, S., Sande, G. V. d., Mezosi, G., Sorel, M., Danckaert, J., and Ver-
schaelt, G. (2009). Exploring multistability in semiconductor ring lasers: theory and
experiment. Phys. Rev. Lett., 102(19):193904.
[45] Goardon, N., Lambert, J. A., Rodriguez, P., Nissaire, P., Herblot, S., Thibault, P.,
Dumenil, D., Strouboulis, J., Romeo, P.-H., and Hoang, T. (2006). Eto2 coordinates
cellular proliferation and dierentiation during erythropoiesis. EMBO J., 25(2):357–366.
[46] Grass, J. A., Boyer, M. E., Pal, S., Wu, J., Weiss, M. J., and Bresnick, E. H. (2003).
GATA-1-dependent transcriptional repression of GATA-2 via disruption of positive au-
toregulation and domain-wide chromatin remodeling. Proc. Natl. Acad. Sci. U.S.A.,
100(15):8811–8816.
[47] Hamey, F. K., Nestorowa, S., Kinston, S. J., Kent, D. G., Wilson, N. K., and Göttgens,
B. (2017). Reconstructing blood stem cell regulatory network models from single-cell
molecular proles. Proc. Natl. Acad. Sci. U.S.A., 114(23):5822–5829.
[48] Harrington, H. A., Feliu, E., Wiuf, C., and Stumpf, M. P. (2013). Cellular compart-
ments cause multistability and allow cells to process more information. Biophys. J.,
104(8):1824–1831.
[49] Higham, D. J. (2001). An algorithmic introduction to numerical simulation of stochas-
tic dierential equations. SIAM Review, 43(3):525–546.
[50] Hill, A. V. (1910). The possible eects of the aggregation of the molecules of
haemoglobin on its dissociation curves. J. Physiol., 40(Suppl):iv–vii.
[51] Hoppe, P. S., Schwarzscher, M., Loeer, D., Kokkaliaris, K. D., Hilsenbeck, O.,
Moritz, N., Endele, M., Filipczyk, A., Gambardella, A., Ahmed, N., Etzrodt, M., Coutu,
D. L., Rieger, M. A., Marr, C., Strasser, M. K., Schauberger, B., Burtscher, I., Ermakova,
O., Bürger, A., Lickert, H., Nerlov, C., Theis, F. J., and Schroeder, T. (2016). Early
myeloid lineage choice is not initiated by random PU.1 to GATA1 protein ratios. Nature,
535(7611):299–302.
[52] Huang, S., Guo, Y.-P., May, G., and Enver, T. (2007). Bifurcation dynamics in lineage-
commitment in bipotent progenitor cells. Dev. Biol., 305(2):695–713.
[53] Ingusci, S., Verlengia, G., Soukupova, M., Zucchini, S., and Simonato, M. (2019). Gene
Therapy Tools for Brain Diseases. Front. Pharmacol., 10:724.
[54] Inoue, A., Fujiwara, T., Okitsu, Y., Katsuoka, Y., Fukuhara, N., Onishi, Y., Ishizawa,
K., and Harigae, H. (2013). Elucidation of the role of lmo2 in human erythroid cells.
Exp. Hematol., 41(12):1062–1076.e1.
[55] Kadane, J. B. and Lazar, N. A. (2004). Methods and criteria for model selection. J.
Am. Stat. Assoc., 99(465):279–290.
[56] Kaneko, H., Shimizu, R., and Yamamoto, M. (2010). GATA factor switching during
erythroid dierentiation. Curr. Opin. Hematol., 17(3):163–168.
109
BIBLIOGRAPHY
[57] Kang, X. and Li, C. (2021). A Dimension Reduction Approach for Energy Landscape:
Identifying Intermediate States in Metabolism‐EMT Network. Adv. Sci., 8(10):2003133.
[58] Kelso, J. A. S. (2012). Multistability and metastability: Understanding dynamic co-
ordination in the brain. Philos. Trans. R. Soc. Lond., B, Biol. Sci., 367(1591):906–918.
[59] Kitano, H. (2004). Biological robustness. Nat. Rev. Genet., 5(11):826–37.
[60] Kitano, H. (2007). Towards a theory of biological robustness. Mol. Syst. Biol., 3(1):137.
[61] Kleden, P. E. and Platen, E. (1999). Numerical Solution of Stochastic Dierential
Equations. Spinger.
[62] Kobayashi, H., Kærn, M., Araki, M., Chung, K., Gardner, T. S., Cantor, C. R.,
and Collins, J. J. (2004). Programmable cells: Interfacing natural and engineered gene
networks. Proc. Natl. Acad. Sci. U.S.A., 101(22):8414–8419.
[63] Kothamachu, V. B., Feliu, E., Cardelli, L., and Soyer, O. S. (2015). Unlimited multi-
stability and Boolean logic in microbial signalling. J. R. Soc. Interface, 12(108).
[64] Krämer, N., Schäfer, J., and Boulesteix, A.-L. (2009). Regularized estimation of large-
scale gene association networks using graphical Gaussian models. BMC Bioinform.,
10(1):384.
[65] Kumano, K., Chiba, S., Shimizu, K., Yamagata, T., Hosoya, N., Saito, T., Takahashi,
T., Hamada, Y., and Hirai, H. (2001). Notch1 inhibits dierentiation of hematopoietic
cells by sustaining gata-2 expression. Blood, 98(12):3283–3289.
[66] Lancrin, C., Mazan, M., Stefanska, M., Patel, R., Lichtinger, M., Costa, G., Vargel, O.,
Wilson, N. K., Möröy, T., Bonifer, C., Göttgens, B., Kousko, V., and Lacaud, G. (2012).
GFI1 and GFI1B control the loss of endothelial identity of hemogenic endothelium during
hematopoietic commitment. Blood, 120(2):314–322.
[67] Landa, H., Schiró, M., and Misguich, G. (2020). Multistability of driven-dissipative
quantum spins. Phys. Rev. Lett., 124(4):043601.
[68] Larger, L., Penkovsky, B., and Maistrenko, Y. (2015). Laser chimeras as a paradigm
for multistable patterns in complex systems. Nat. Commun., 6(1):7752.
[69] Laslo, P., Spooner, C. J., Warmash, A., Lancki, D. W., Lee, H.-J., Sciammas, R.,
Gantner, B. N., Dinner, A. R., and Singh, H. (2006). Multilineage transcriptional priming
and determination of alternate hematopoietic cell fates. Cell, 126(4):755–766.
[70] Laurenti, E. and Göttgens, B. (2018). From haematopoietic stem cells to complex
dierentiation landscapes. Nature, 553(7689):418–426.
[71] Lebar, T., Bezeljak, U., Golob, A., Jerala, M., Kadunc, L., Pirš, B., Stražar, M.,
Vučko, D., Zupančič, U., Benčina, M., Forstnerič, V., Gaber, R., Lonzarić, J., Majerle,
A., Oblak, A., Smole, A., and Jerala, R. (2014). A bistable genetic switch based on
designable DNA-binding domains. Nat. Commun., 5(1):5007.
110
BIBLIOGRAPHY
[72] Lee, T. I. and Young, R. A. (2013). Transcriptional Regulation and Its Misregulation
in Disease. Cell, 152(6):1237–1251.
[73] Li, C. and Wang, J. (2013). Quantifying cell fate decisions for dierentiation and
reprogramming of a human stem cell network: landscape and biological paths. PLoS
Comput. Biol., 9(8):e1003165.
[74] Li, L., Jothi, R., Cui, K., Lee, J. Y., Cohen, T., Gorivodsky, M., Tzchori, I., Zhao,
Y., Hayes, S. M., Bresnick, E. H., Zhao, K., Westphal, H., and Love, P. E. (2011).
Nuclear adaptor Ldb1 regulates a transcriptional program essential for the maintenance
of hematopoietic stem cells. Nat. Immunol., 12(2):129–136.
[75] Li, Q., Wennborg, A., Aurell, E., Dekel, E., Zou, J.-Z., Xu, Y., Huang, S., and Ernberg,
I. (2016). Dynamics inside the cancer cell attractor reveal cell heterogeneity, limits of
stability, and escape. Proc. Natl. Acad. Sci. U.S.A., 113(10):2672–2677.
[76] Liew, C. W., Rand, K. D., Simpson, R. J. Y., Yung, W. W., Manseld, R. E., Crossley,
M., Proetorius-Ibba, M., Nerlov, C., Poulsen, F. M., and Mackay, J. P. (2006). Molecular
analysis of the interaction between the hematopoietic master transcription factors gata-1
and pu.1. J. Biol. Chem., 281(38):28296–28306.
[77] Ling, K.-W., Ottersbach, K., van Hamburg, J. P., Oziemlak, A., Tsai, F.-Y., Orkin,
S. H., Ploemacher, R., Hendriks, R. W., and Dzierzak, E. (2004). Gata-2 plays two
functionally distinct roles during the ontogeny of hematopoietic stem cells. J. Exp. Med.,
200(7):871–882.
[78] Liu, P. and Wang, F. (2008). Inference of biochemical network models in S-system
using multi-objective optimization approach. Bioinformatics, 24(8):1085–1092.
[79] Liu, Q., Herman, P. M. J., Mooij, W. M., Huisman, J., Scheer, M., Ol, H., and Van
de Koppel, J. (2014). Pattern formation at multiple spatial scales drives the resilience of
mussel bed ecosystems. Nat. Commun., 5(1):5234.
[80] Lodish, H., Berk, A., Kaiser, C. A., Krieger, M., Bretscher, A., Ploegh, H., Martin,
K. C., Yae, M. B., and Amon, A. (2021). Molecular Cel l Biology. Macmillan Learning,
New York, NY.
[81] Lulli, V., Romania, P., Morsilli, O., Gabbianelli, M., Pagliuca, A., Mazzeo, S., Testa,
U., Peschle, C., and Marziali, G. (2006). Overexpression of ets-1 in human hematopoietic
progenitor cells blocks erythroid and promotes megakaryocytic dierentiation. Cell Death
Dier., 13(7):1064–74.
[82] Mackey, M. C. (2020). Periodic hematological disorders: Quintessential examples of
dynamical diseases. Chaos, 30(6):063123.
[83] Maetschke, S. R., Madhamshettiwar, P. B., Davis, M. J., and Ragan, M. A. (2013).
Supervised, semi-supervised and unsupervised inference of gene regulatory networks.
Brief Bioinform., 15(2):195–211.
111
BIBLIOGRAPHY
[84] Maity, I., Wagner, N., Mukherjee, R., Dev, D., Peacock-Lopez, E., Cohen-Luria, R.,
and Ashkenasy, G. (2019). A chemically fueled non-enzymatic bistable network. Nat.
Commun., 10(1):4636.
[85] Mancini, E., Sanjuan‐Pla, A., Luciani, L., Moore, S., Grover, A., Zay, A., Rasmussen,
K. D., Luc, S., Bilbao, D., O’Carroll, D., Jacobsen, S. E., and Nerlov, C. (2012). FOG‐1
and GATA‐1 act sequentially to specify denitive megakaryocytic and erythroid progen-
itors. EMBO J., 31(2):351–365.
[86] Masel, J. and Siegal, M. L. (2009). Robustness: mechanisms and consequences. Trends
Genet., 25(9):395–403.
[87] Matharu, N. and Ahituv, N. (2020). Modulating gene regulation to treat genetic
disorders. Nat. Rev. Drug Discov., 19(11):757–775.
[88] May, G., Soneji, S., Tipping, A. J., Teles, J., McGowan, S. J., Wu, M., Guo, Y.,
Fugazza, C., Brown, J., Karlsson, G., Pina, C., Olariu, V., Taylor, S., Tenen, D. G.,
Peterson, C., and Enver, T. (2013). Dynamic analysis of gene expression and genome-
wide transcription factor binding during lineage specication of multipotent progenitors.
Cell Stem Cell, 13(6):754–768.
[89] McKnight, S. and Schibler, U. (1995). Dierentiation and gene regulation. Curr. Opin.
Genet. Dev., 5(5):549–551.
[90] Meek, C. (1995). Causal inference and causal explanation with background knowledge.
Uncertainty in Articial Intelligence, 11:403–410.
[91] Michaelis, L., Menten, M. L., Johnson, K. A., and Goody, R. S. (2011). The origi-
nal michaelis constant: translation of the 1913 michaelis-menten paper. Biochemistry,
50(39):8264–8269.
[92] Missal, K., Cross, M. A., and Drasdo, D. (2006). Gene network inference from incom-
plete expression data: transcriptional control of hematopoietic commitment. Bioinfor-
matics, 22(6):731–738.
[93] Moignard, V., Macaulay, I. C., Swiers, G., Buettner, F., Schütte, J., Calero-Nieto,
F. J., Kinston, S., Joshi, A., Hannah, R., Theis, F. J., Jacobsen, S. E., Bruijn, M. F. d.,
and Göttgens, B. (2013). Characterization of transcriptional networks in blood stem
and progenitor cells using high-throughput single-cell gene expression analysis. Nat. Cell
Biol., 15(4):363–372.
[94] Moignard, V., Woodhouse, S., Haghverdi, L., Lilly, A. J., Tanaka, Y., Wilkinson, A. C.,
Buettner, F., Macaulay, I. C., Jawaid, W., Diamanti, E., Nishikawa, S.-I., Piterman, N.,
Kousko, V., Theis, F. J., Fisher, J., and Göttgens, B. (2015). Decoding the regulatory
network of early blood development from single-cell gene expression measurements. Nat.
Biotechnol., 33(3):269–276.
[95] Mojtahedi, M., Skupin, A., Zhou, J., Castaño, I. G., Leong-Quong, R. Y. Y., Chang,
H., Trachana, K., Giuliani, A., and Huang, S. (2016). Cell fate decision as high-
dimensional critical state transition. PLoS Biol., 14(12):1–28.
112
BIBLIOGRAPHY
[96] Narula, J., Williams, C., Tiwari, A., Marks-Bluth, J., Pimanda, J. E., and Igoshin,
O. A. (2013). Mathematical model of a gene regulatory network reconciles eects of
genetic perturbations on hematopoietic stem cell emergence. Dev. Biol., 379(2):258–269.
[97] Ng, A. P. and Alexander, W. S. (2017). Haematopoietic stem cells: past, present and
future. Cell Death Discov., 3(1):17002.
[98] Nguyen, H., Tran, D., Tran, B., Pehlivan, B., and Nguyen, T. (2021). A comprehensive
survey of regulatory network inference methods using single-cell RNA sequencing data.
Brief. Bioinformatics, 22(3):bbaa190.
[99] Noor, A., Serpedin, E., Nounou, M., Nounou, H., Mohamed, N., and Chouchane, L.
(2013). An overview of the statistical methods used for inferring gene regulatory networks
and protein-protein interaction networks. Adv. Bioinform., 2013:953814.
[100] North, T. E., Stacy, T., Matheny, C. J., Speck, N. A., and Bruijn, M. F. d. (2004).
Runx1 is expressed in adult mouse hematopoietic stem cells and dierentiating myeloid
and lymphoid cells, but not in maturing erythroid cells. Stem Cells, 22(2):158–168.
[101] Novère, N. L. (2015). Quantitative and logic modelling of molecular and gene net-
works. Nat. Rev. Genet., 16(3):146–158.
[102] Ocone, A., Haghverdi, L., Mueller, N. S., and Theis, F. J. (2015). Reconstructing gene
regulatory dynamics from high-dimensional single-cell snapshot data. Bioinformatics,
31(12):i89–i96.
[103] Olariu, V. and Peterson, C. (2019). Kinetic models of hematopoietic dierentiation.
Wiley Interdiscip. Rev. Syst. Biol. Med., 11(1):e1424.
[104] Orkin, S. H. and Zon, L. I. (2008). Hematopoiesis: an evolving paradigm for stem
cell biology. Cell, 132(4):631–644.
[105] Ottersbach, K., Smith, A., Wood, A., and Göttgens, B. (2010). Ontogeny of
haematopoiesis: recent advances and open questions. Br. J. Haematol., 148(3):343–55.
[106] Ozbudak, E. M., Thattai, M., Lim, H. N., Shraiman, B. I., and Oudenaarden, A. v.
(2004). Multistability in the lactose utilization network of Escherichia coli. Nature,
427(6976):737–740.
[107] Perez-Carrasco, R., Barnes, C. P., Schaerli, Y., Isalan, M., Briscoe, J., and Page,
K. M. (2018). Combining a toggle switch and a repressilator within the AC-DC circuit
generates distinct dynamical behaviors. Cell Syst., 6(4):521–530.e3.
[108] Pisarchik, A. N. and Feudel, U. (2014). Control of multistability. Phys. Rep.,
540(4):167–218.
[109] Porcher, C., Swat, W., Rockwell, K., Fujiwara, Y., Alt, F. W., and Orkin, S. H.
(1996). The T cell leukemia oncoprotein SCL/tal-1 is essential for development of all
hematopoietic lineages. Cell, 86(1):47–57.
113
BIBLIOGRAPHY
[110] Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A., and Murali, T. M. (2020).
Benchmarking algorithms for gene regulatory network inference from single-cell tran-
scriptomic data. Nat. Methods, 17(2):147–154.
[111] Quarteroni, A., Sacco, R., and Saleri, F. (2006). Numerical Mathematics. Springer.
[112] Real, P. J., Ligero, G., Ayllon, V., Ramos-Mejia, V., Bueno, C., Gutierrez-Aranda,
I., Navarro-Montero, O., Lako, M., and Menendez, P. (2012). SCL/TAL1 regulates
hematopoietic specication from human embryonic stem cells. Mol. Ther., 20(7):1443–
1453.
[113] Rieger, M. A. and Schroeder, T. (2012). Hematopoiesis. Cold Spring Harb. Perspect.
Biol., 4(12):a008250.
[114] Roeder, I. and Glauche, I. (2006). Towards an understanding of lineage specication
in hematopoietic stem cells: A mathematical model for the interaction of transcription
factors GATA-1 and PU.1. J. Theor. Biol., 241(4):852–865.
[115] Rung, J. and Brazma, A. (2013). Reuse of public genome-wide gene expression data.
Nat. Rev. Genet., 14(2):89–99.
[116] Saint-Antoine, M. M. and Singh, A. (2020). Network inference in systems biology:
recent developments, challenges, and applications. Curr. Opin. Biotechnol., 63:89–98.
[117] Santos-Moreno, J., Tasiudi, E., Stelling, J., and Schaerli, Y. (2020). Multistable and
dynamic CRISPRi-based synthetic circuits. Nat. Commun., 11(1):2746.
[118] Semenov, S. N., Kraft, L. J., Ainla, A., Zhao, M., Baghbanzadeh, M., Campbell,
V. E., Kang, K., Fox, J. M., and Whitesides, G. M. (2016). Autocatalytic, bistable,
oscillatory networks of biologically relevant organic reactions. Nature, 537(7622):656–
660.
[119] Semrau, S., Goldmann, J. E., Soumillon, M., Mikkelsen, T. S., Jaenisch, R., and
Oudenaarden, A. v. (2017). Dynamics of lineage commitment revealed by single-cell
transcriptomics of dierentiating embryonic stem cells. Nat. Commun., 8(1):1096.
[120] Shea, M. A. and Ackers, G. K. (1985). The or control system of bacteriophage lambda.
a physical-chemical model for gene regulation. J. Mol. Biol., 181(2):211–30.
[121] Shivdasani, R. A. (2006). MicroRNAs: regulators of gene expression and cell dier-
entiation. Blood, 108(12):3646–3653.
[122] Shivdasani, R. A., Mayer, E. L., and Orkin, S. H. (1995). Absence of blood formation
in mice lacking the T-cell leukaemia oncoprotein Tal1/SCL. Nature, 373(6513):432–434.
[123] Skinnider, M. A., Squair, J. W., and Foster, L. J. (2019). Evaluating measures of
association for single-cell transcriptomics. Nat. Methods, 16(5):381–386.
[124] Snow, J. W., Trowbridge, J. J., Johnson, K. D., Fujiwara, T., Emambokus, N. E.,
Grass, J. A., Orkin, S. H., and Bresnick, E. H. (2011). Context-dependent function of
“GATA switch” sites in vivo. Blood, 117(18):4769–4772.
114
BIBLIOGRAPHY
[125] Soler, E., Andrieu-Soler, C., de Boer, E., Bryne, J. C., Thongjuea, S., Stadhouders,
R., Palstra, R.-J., Stevens, M., Kockx, C., van Ijcken, W., Hou, J., Steinho, C., Rijkers,
E., Lenhard, B., and Grosveld, F. (2010). The genome-wide dynamics of the binding of
Ldb1 complexes during erythroid dierentiation. Genes Dev., 24(3):277–289.
[126] Stewart, J. (2018). Calculus, chapter Innite Sequences and Series. Cengage Learning.
[127] Stiehl, T. and Marciniak-Czochra, A. (2012). Mathematical Modeling of Leukemoge-
nesis and Cancer Stem Cell Dynamics. Math. Model. Nat. Phenom., 7(1):166–202.
[128] Stier, S., Cheng, T., Dombkowski, D., Carlesso, N., and Scadden, D. T. (2002).
Notch1 activation increases hematopoietic stem cell self-renewal in vivo and favors lym-
phoid over myeloid lineage outcome. Blood, 99(7):2369–2378.
[129] Stumpf, M. P. (2021). Inferring better gene regulation networks from single-cell data.
Curr. Opin. Syst. Biol., 27:100342.
[130] Temin, H. M. and Mizutani, S. (1970). Viral RNA-dependent DNA poly-
merase: RNA-dependent DNA polymerase in virions of Rous sarcoma virus. Nature,
226(5252):1211–1213.
[131] The Gene Ontology Consortium (2017). Expansion of the gene ontology knowledge-
base and resources. Nucleic Acids Res., 45(D1):D331 – D338.
[132] Thomson, M. and Gunawardena, J. (2009). Unlimited multistability in multisite
phosphorylation systems. Nature, 460(7252):274–277.
[133] Tian, T. (2010). Stochastic models for inferring genetic regulation from microarray
gene expression data. Biosystems, 99(3):192–200.
[134] Tian, T. and Burrage, K. (2001). Implicit taylor methods for sti stochastic dier-
ential equations. Appl. Numer. Math., 38(1):167–185.
[135] Tian, T. and Burrage, K. (2006). Stochastic models for regulatory networks of the
genetic toggle switch. Proc. Natl. Acad. Sci. U.S.A., 103(22):8372–8377.
[136] Tian, T. and Smith-Miles, K. (2014a). Mathematical modeling of GATA-switching
for regulating the dierentiation of hematopoietic stem cell. BMC Syst. Biol., 8(Suppl
1):S8.
[137] Tian, T. and Smith-Miles, K. (2014b). Mathematical modeling of gata-switching for
regulating the dierentiation of hematopoietic stem cell. BMC Syst. Biol., 8 Suppl 1:S8.
[138] Tindemans, I., Serani, N., Di Santo, J. P., and Hendriks, R. W. (2014). GATA-3
Function in Innate and Adaptive Immunity. Immunity, 41(2).
[139] Turner, B. M. and Van Zandt, T. (2012). A tutorial on approximate bayesian com-
putation. J. Math. Psychol., 56(2):69–85.
[140] Uhler, C. (2017). Gaussian Graphical Models: An Algebraic and Geometric Perspec-
tive. arXiv.
115
BIBLIOGRAPHY
[141] van der Meer, L. T., Jansen, J. H., and van der Reijden, B. A. (2010). G1 and G1b:
key regulators of hematopoiesis. Leukemia, 24(11):1834–1843.
[142] Velten, L., Haas, S. F., Rael, S., Blaszkiewicz, S., Islam, S., Hennig, B. P., Hirche,
C., Lutz, C., Buss, E. C., Nowak, D., Boch, T., Hofmann, W.-K., Ho, A. D., Huber,
W., Trumpp, A., Essers, M. A. G., and Steinmetz, L. M. (2017). Human haematopoietic
stem cell lineage commitment is a continuous process. Nat. Cell Biol., 19(4):271–281.
[143] Viger, R. S., Guittot, S. M., Anttonen, M., Wilson, D. B., and Heikinheimo, M.
(2008). Role of the GATA Family of Transcription Factors in Endocrine Development,
Function, and Disease. Mol. Endocrinol., 22(4):781–798.
[144] Visvader, J. E., Mao, X., Fujiwara, Y., Hahm, K., and Orkin, S. H. (1997). The lim-
domain binding protein ldb1 and its partner lmo2 act as negative regulators of erythroid
dierentiation. Proc. Natl. Acad. Sci. U.S.A., 94(25):13707–13712.
[145] Wang, J., Myklebost, O., and Hovig, E. (2003). MGraph: graphical models for
microarray data analysis. Bioinformatics, 19(17):2210–2211.
[146] Wang, J. and Tian, T. (2010). Quantitative model for inferring dynamic regulation
of the tumour suppressor gene P53. BMC Bioinform., 11(1):36.
[147] Wang, J., Wu, Q., Hu, X. T., and Tian, T. (2016). An integrated approach to infer
dynamic protein-gene interactions, a case study of the human P53 protein. Methods,
110:3–13.
[148] Wang, Y. (2013). Gene Regulatory Networks, pages 801–805. Springer New York,
New York, NY.
[149] Wang, Y. X. R., Li, L., Li, J. J., and Huang, H. (2021). Network modeling in biology:
statistical methods for gene and brain networks. Statist. Sci., 36(1):89–108.
[150] Watson, J. D. and Crick, F. H. C. (1953). Molecular Structure of Nucleic Acids: A
Structure for Deoxyribose Nucleic Acid. Nature, 171(4356):737–738.
[151] Wheat, J. C., Sella, Y., Willcockson, M., Skoultchi, A. I., Bergman, A., Singer, R. H.,
and Steidl, U. (2020). Single-molecule imaging of transcription dynamics in somatic stem
cells. Nature, 583(7816):431–436.
[152] Woods, M. L., Leon, M., Perez-Carrasco, R., and Barnes, C. P. (2016). A statistical
approach reveals designs for the most robust stochastic gene oscillators. ACS Synth.
Biol., 5(6):459–470.
[153] Wu, S., Cui, T., and Tian, T. (2018). Mathematical modelling of genetic network
for regulating the fate determination of hematopoietic stem cells. In Proceedings of 2018
IEEE International Conference on Bioinformatics and Biomedicine, pages 2167–2173.
[154] Wu, S., Cui, T., Zhang, X., and Tian, T. (2020). A non-linear reverse-engineering
methodfor inferring genetic regulatory networks. PeerJ, 8:e9065.
116
BIBLIOGRAPHY
[155] Wu, S., Zhou, T., and Tian, T. (2022). non-linear reverse-engineering methodfor
inferring genetic regulatory networks. NPJ Syst. Biol. Appl.
[156] Xiong, W. and Ferrell, J. E. (2003). A positive-feedback-based bistable ‘memory
module’ that governs a cell fate decision. Nature, 426(6965):460–465.
[157] Xu, Z., Huang, S., Chang, L.-S., Agulnick, A., and Brandt, S. (2003). Identication
of a Tal1 target gene reveals a positive role for the LIM domain-binding protein Ldb1 in
erythroid gene expression and dierentiation. Mol. Cell. Biol., 23(21):7585–7599.
[158] Yang, B. and Bao, W. (2019). RNDEtree: Regulatory Network With Dierential
Equation Based on Flexible Neural Tree With Novel Criterion Function. IEEE Access,
7:58255–58263.
[159] Yang, B., Bao, W., Huang, D., and Yuehui, C. (2018). Inference of large-scale time-
delayed gene regulatory network with parallel MapReduce Cloud platform. Sci. Rep.,
8(1):17787.
[160] Yang, L., Sun, W., and Turcotte, M. (2021). Coexistence of Hopf-born rotation and
heteroclinic cycling in a time-delayed three-gene auto-regulated and mutually-repressed
core genetic regulation network. J. Theor. Biol., 527:110813.
[161] Zhang, P., Zhang, X., Iwama, A., Yu, C., Smith, K. A., Mueller, B. U., Narravula, S.,
Torbett, B. E., Orkin, S. H., and Tenen, D. G. (2000). PU.1 inhibits GATA-1 function
and erythroid dierentiation by blocking GATA-1 DNA binding. Blood, 96(8):2641–2648.
[162] Zhang, X., Zhao, J., Hao, J.-K., Zhao, X.-M., and Chen, L. (2015). Conditional
mutual inclusive information enables accurate quantication of associations in gene reg-
ulatory networks. Nucleic Acids Res., 43(5):e31.
[163] Zhang, X., Zhao, X., He, K., Lu, L., Cao, Y., Liu, J., Hao, J., Liu, Z., and Chen, L.
(2012). Inferring gene regulatory networks from gene expression data by path consistency
algorithm based on conditional mutual information. Bioinformatics, 28(1):98–104.
[164] Zhang, Y., Payne, K. J., Zhu, Y., Price, M. A., Parrish, Y. K., Zielinska, E., Barsky,
L. W., and Crooks, G. M. (2005). SCL expression at critical points in human hematopoi-
etic lineage commitment. Stem Cells, 23(6):852–860.
[165] Zhao, J., Zhou, Y., Zhang, X., and Chen, L. (2016). Part mutual information for
quantifying direct associations in networks. Proc. Natl. Acad. Sci. U.S.A., 113(18):5130–
5135.
[166] Zhao, W., Kitidis, C., Fleming, M. D., Lodish, H. F., and Ghaari, S. (2006). Erythro-
poietin stimulates phosphorylation and activation of GATA-1 via the PI3-kinase/AKT
signaling pathway. Blood, 107(3):907–915.
117