ArticlePDF Available

A non-linear reverse-engineering method for inferring genetic regulatory networks

Taylor & Francis
PeerJ
Authors:

Abstract and Figures

Hematopoiesis is a highly complex developmental process that produces various types of blood cells. This process is regulated by different genetic networks that control the proliferation, differentiation, and maturation of hematopoietic stem cells (HSCs). Although substantial progress has been made for understanding hematopoiesis, the detailed regulatory mechanisms for the fate determination of HSCs are still unraveled. In this study, we propose a novel approach to infer the detailed regulatory mechanisms. This work is designed to develop a mathematical framework that is able to realize nonlinear gene expression dynamics accurately. In particular, we intended to investigate the effect of possible protein heterodimers and/or synergistic effect in genetic regulation. This approach includes the Extended Forward Search Algorithm to infer network structure (top-down approach) and a non-linear mathematical model to infer dynamical property (bottom-up approach). Based on the published experimental data, we study two regulatory networks of 11 genes for regulating the erythrocyte differentiation pathway and the neutrophil differentiation pathway. The proposed algorithm is first applied to predict the network topologies among 11 genes and 55 non-linear terms which may be for heterodimers and/or synergistic effect. Then, the unknown model parameters are estimated by fitting simulations to the expression data of two different differentiation pathways. In addition, the edge deletion test is conducted to remove possible insignificant regulations from the inferred networks. Furthermore, the robustness property of the mathematical model is employed as an additional criterion to choose better network reconstruction results. Our simulation results successfully realized experimental data for two different differentiation pathways, which suggests that the proposed approach is an effective method to infer the topological structure and dynamic property of genetic regulations.
Content may be subject to copyright.
A non-linear reverse-engineering method
for inferring genetic regulatory networks
Siyuan Wu
1
, Tiangang Cui
1
, Xinan Zhang
2
and Tianhai Tian
1
1School of Mathematics, Monash University, Clayton, VIC, Australia
2School of Mathematics and Statistics, Central China Normal University, Wuhan, PR China
ABSTRACT
Hematopoiesis is a highly complex developmental process that produces various
types of blood cells. This process is regulated by different genetic networks that
control the proliferation, differentiation, and maturation of hematopoietic stem cells
(HSCs). Although substantial progress has been made for understanding
hematopoiesis, the detailed regulatory mechanisms for the fate determination of
HSCs are still unraveled. In this study, we propose a novel approach to infer the
detailed regulatory mechanisms. This work is designed to develop a mathematical
framework that is able to realize nonlinear gene expression dynamics accurately.
In particular, we intended to investigate the effect of possible protein heterodimers
and/or synergistic effect in genetic regulation. This approach includes the Extended
Forward Search Algorithm to infer network structure (top-down approach) and a
non-linear mathematical model to infer dynamical property (bottom-up approach).
Based on the published experimental data, we study two regulatory networks of
11 genes for regulating the erythrocyte differentiation pathway and the neutrophil
differentiation pathway. The proposed algorithm is rst applied to predict the
network topologies among 11 genes and 55 non-linear terms which may be for
heterodimers and/or synergistic effect. Then, the unknown model parameters are
estimated by tting simulations to the expression data of two different differentiation
pathways. In addition, the edge deletion test is conducted to remove possible
insignicant regulations from the inferred networks. Furthermore, the robustness
property of the mathematical model is employed as an additional criterion to choose
better network reconstruction results. Our simulation results successfully realized
experimental data for two different differentiation pathways, which suggests that the
proposed approach is an effective method to infer the topological structure and
dynamic property of genetic regulations.
Subjects Bioinformatics, Computational Biology
Keywords Genetic regulatory network, Network inference, Hematopoiesis, Probabilistic graphic
model, Differential equation
INTRODUCTION
Hematopoiesis is a highly complex process that controls the proliferation, differentiation
and maturation of hematopoietic stem cells (HSCs) (Ng & Alexander, 2017). It has been
widely accepted that genetic regulatory networks control the developmental processes
of various types of blood cells (Cedar & Bergman, 2011). Although the regulatory
mechanisms have been studied over a century, there are still many challenging questions
How to cite this article Wu S, Cui T, Zhang X, Tian T. 2020. A non-linear reverse-enginee ring method for inferring genetic regulatory
networks. PeerJ 8:e9065 DOI 10.7717/peerj.9065
Submitted 6 January 2020
Accepted 5 April 2020
Published 29 April 2020
Corresponding author
Tianhai Tian,
tianhai.tian@monash.edu
Academic editor
Walter de Azevedo Jr
Additional Information and
Declarations can be found on
page 20
DOI 10.7717/peerj.9065
Copyright
2020 Wu et al.
Distributed under
Creative Commons CC-BY 4.0
regarding the cell fate determination in hematopoiesis (Ottersbach et al., 2010). Thus, it
is imperative to unravel the regulatory mechanisms for the study of hematopoiesis.
In adult mammals, hematopoiesis occurs mostly in the bone marrow (Birbrair &
Frenette, 2016). HSCs have the feature of self-renewal and multipotent as well as the ability
to differentiate into multipotent progenitors (MPPs). Then, MPPs will differentiate
into two main lineages of blood cells, namely the myeloid lineage which starts at
common myeloid progenitors (CMPs) and the lymphoid lineage which starts at common
lymphoid progenitors (CLPs). In addition, the myeloid lineage has two distinct
progenitors, namely megakaryocyte-erythroid progenitors (MEPs) and granulocyte-
macrophage progenitors (GMPs). MEPs can differentiate into megakaryocytes and
erythrocytes, and GMPs can give rise to mast cells, macrophages and granulocytes.
Lymphoid lineage cells include T lymphocytes (T-cells), B lymphocytes (B-cells) and
natural killer cells (NK-cells) (Orkin & Zon, 2008). In this work, we focus on the fate
determination of HSCs in the myeloid lineage for the choice between erythrocytes and
neutrophils.
During the developmental process, a number of transcriptional factors (TFs) act as
regulators to control the fate determination of HSCs (Aggarwal et al., 2012). Among
them, the genetic complex Gata1-Gata2-PU.1 is a very important module for the cell-fate
choice of CMPs between erythrocytes or neutrophils (Friedman, 2007;Liew et al., 2006;
Ling et al., 2004). In particular, the Gata1-PU.1 complex forms a double negative
feedback module, in which each gene inhibits the expression of the other (Friedman, 2007).
Recently it has been elucidated that the fate determination of HSCs was dened not only by
the ratio of Gata1 and PU.1 (Hoppe et al., 2016), but also by a third party during the
regulation. For example, FOG-1 is a signicant third party to regulate the Gata1-PU.1
module (Chang et al., 2002;Mancini et al., 2012). Erythropoietin receptor (EpoR) signaling
also acts the essential role in regulating the Gata1-PU.1 Module (Zhao et al., 2006).
Although the regulatory mechanisms of the Gata1-Gata2-PU.1 complex in hematopoiesis
are relatively well studied, the connection of this triad complex with other signicant genes
as well as the role of these genes in hematopoiesis are mostly unclear (Liew et al., 2006).
Mathematical modeling is an important method for inferring the detailed regulatory
mechanisms. In 1910, Archibald Hill proposed a classical non-linear ordinary differential
equation (ODE or Hill equation) model to describe the sigmoidal oxygen binding curve of
hemoglobin (Hill, 1910). Since then, the Hill equation has been applied to explore the
mechanisms in a wide range of genetic regulatory networks and biological systems.
For example, the genetic toggle switching was achieved by the models with Hill equations
(Gardner, Cantor & Collins, 2000). In addition, the Hill equation was employed to
formalize the mechanisms of cell fate determination (Xiong & Ferrell, 2003;Laslo et al.,
2006;Huang et al., 2007). Recently, the Hill equation was also used to discover a regulatory
network of 52 genes with the uniform activation and repression strengthes (Li &
Wang, 2013). Another widely used approach is the SheaAckers formalism for studying
the thermodynamics of regulatory networks (Shea & Ackers, 1985). We developed a
mathematical model based on the SheaAckers formalism to study the regulations of the
Gata1-Gata2-PU.1 complex (Tian & Smith-Miles, 2014). A stochastic model was also
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 2/25
proposed to explore the function of noise in regulating the fate determination of HSCs.
Simulations suggested that uctuations of protein numbers may lead the HSC to different
developmental pathways. In recent years substantial process has been made to design
various types of mathematical models for describing the regulatory mechanisms of
gene networks, including stochastic differential equations, stochastic kinetic systems,
qualitative differential equations, MichaelisMenten formalism, S-system and power-law
formalism (de Jong, 2002;Liu & Wang, 2008;Wang & Tian, 2010;Maetschke et al., 2013;
Woods et al., 2016;Olariu & Peterson, 2018;Yang et al., 2018;Yang & Bao, 2019).
In particular, a number of mathematical models have been designed to realize the stable
states of gene expression levels in the differentiation of HSCs (Chang et al., 2006;Huang
et al., 2007;Chickarmane, Enver & Peterson, 2009;Olariu & Peterson, 2018). However,
the majority of these models only considered the functions of each gene independently,
namely variable x
i
for the expression level of gene iin the model is in the form of Piaixi.
Nonetheless, this type of models fails to represent the co-operation functions of genes
together. There is a lack of investigations for the effect of possible protein heterodimers
and/or synergistic effect in genetic regulations, namely variables x
i
in the model are also
in the form of Pi;jbijxixj. Most recently, single-cell studies have been conducted to
explore the hematopoietic system. Compared with the analysis of bulk cells, the advantage
of single-cell analysis is the ability to understand the heterogeneity within the cell
population (Guo et al., 2010;Ye, Huang & Guo, 2017). With the development of single-cell
analysis, researchers have raised more novel computational and statistical methods to
explore the regulatory mechanism of hematopoiesis. For example, the partial correlation
method, Boolean model and ODE model were employed to construct the genetic
regulatory networks from the single-cell expression proles (Hamey et al., 2017;Wei et al.,
2017). In addition, a deep learning method was applied to unravel the fate decision in
hematopoiesis (Athanasiadis et al., 2017).
Recently, we proposed a general approach that combines both top-down and bottom-up
approaches to reconstruct the genetic regulatory networks of the fate choice between
erythrocytes and neutrophils (Wu, Cui & Tian, 2018). The key issue in this work includes a
large number of unknown parameters and a high computational cost to add potential
regulations. For the issue of parameter number, a linear ODE model may have the least
number of unknown parameters among the models for all possible regulations between
genes. However, since the linear model is limited to describe the linear relationship,
it is not appropriate to use the linear model to study systems with complex non-linear
dynamics. Although the non-linearity has been addressed by the reverse-engineering
methods with the cost of more unknown parameters (Chickarmane & Peterson, 2008;
Crombach et al., 2012;Meister et al., 2013;Li & Wang, 2013;Wang et al., 2016), the issue
of protein heterodimers and/or synergistic effect between genes has not been discussed
in the majority of literature at all. This work is designed to address these issues by
proposing a novel approach for reconstructing genetic regulatory networks. The rst
innovation of this approach is the new non-linear ODE model as the bottom-up approach
to study the effect of protein heterodimers and/or synergistic effect explicitly. The second
innovation of this work is the proposed Extended Forward Search Algorithm as the
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 3/25
top-down approach to infer the structure of networks in our newly proposed non-linear
model. The proposed approach thus is able to not only reduce the complex structure
of genetic regulatory networks but also improve the inference efciency substantially
because the number of parameters in the mathematical model is decreased. We examined
the capability of our proposed method by studying the genetic regulatory networks for the
fate determination of HSCs.
METHODS
Experimental data
In this work, we used the sub-series GSE49987 as the experimental data from the published
microarray dataset GSE49991 (May et al., 2013). This dataset contains the expression
proles collected by experiments using the cell line FDCPmix. This dataset was generated
with the probe name version of Agilent Whole Mouse Genome Microarray 4 × 44 K
(May et al., 2013). It provides microarray gene expression proles of hematopoietic stem
cells (HSCs) differentiating into erythrocytes and neutrophils. This microarray dataset is
available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49991. To convert
all microarray probe IDs to gene names, we pre-processed this dataset based on the
Ensembl BioMart and GO Enrichment Analysis (The Gene Ontology Consortium, 2017).
From a previous study, the regulatory network of 18 core genes during the hematopoiesis
has been curated (Moignard et al., 2013). Moreover, the same research team studied
the regulatory interaction of 26 core genes during the hematopoiesis (Moignard et al.,
2015). The total number of distinct genes in these two studies is 30. Thus, in our work we
considered 30 genes whose names are listed in Table S1. There are three repeated
experiments for each developmental process, each of which contains the expression levels
of 30 genes from HSCs to differentiated cells at 30 time points spanning over 1 week.
The observation time points are those starting from the HSCs/progenitors stage (1 point),
then every 2 h over the rst day (12 points), every 3 h over the second day (8 points),
every 4 h over the third day (6 points), every 24 h until the fth day (2 points), and the
seventh day (1 point). In this study, we used the average data of these three repeated tests as
the experimental data for each time point. The time points and expression data of four
genes can be found in the following Figs. 1 and 2.
Selection of candidate genes
Based on our research experience (Wang et al., 2016), it is challenging to study a dynamic
network with 30 genes. Thus, we conducted an extensive literature review for selecting
a smaller number of important genes based on their relationship with the three genes
Gata1, Gata2 and PU.1. These candidate genes should be essential for the cell-fate choice
in hematopoiesis, or they signicantly interact with these three genes. For example, gene
Scl/Tal1 interacts with Gata1, Eto2/Cbfa2t3 and Ldb1 (Goardon et al., 2006), and is a
regulator in the differentiation of hematopoietic stem cells (HSCs) (Shivdasani,
Mayer & Orkin, 1995;Zhang et al., 2005;Porcher et al., 1996;Real et al., 2012). In addition,
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 4/25
Eto2/Cbfa2t3 regulates the differentiation of HSCs by repressing the expression of target
gene Scl/Tal1 (Goardon et al., 2006). Moreover, Ldb1 is a signicant transcriptional factor
(TF) for the differentiation of erythroid lineage (Soler et al., 2010). According to the
ChIPSeq analysis, Ldb1 is necessary for HSCs to control their maintenance since it
binds to the majority of enhancer elements in hematopoiesis (Li et al., 2011).
We also included a number of genes with potential regulatory relationship with the
three genes Gata1, Gata2 and PU.1. For example, it was indicated that there might
be unclear regulations between Gata2 and G1(Moignard et al., 2013). G1is an
important TF in the regulation of HSCs differentiation (van der Meer, Jansen & van der
Reijden, 2010;Lancrin et al., 2012). G1is required for the differentiation of common
lymphoid progenitors (CLPs) and common myeloid progenitors (CMPs) from HSCs
and exists in the majority of HSCs, CLPs and CMPs. Similar to gene G1, gene Runx1 is
also expressed in most HSCs and progenitor cells as well. Then, G1and/or Runx1 are
expressed continually in most cells which differentiate into the granulocyte lineage
(North et al., 2004). Lmo2 is a master regulator of hematopoiesis (Inouea et al., 2013).
However, its specic role in regulation is still unclear. Experimental studies suggested
that the knockdown of Lmo2 does not affect the expression of Gata1 and Scl/Tal1
(Inouea et al., 2013). However, the overexpression of Lmo2 gene also inhibited erythroid
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1.25
Expression level of Gata1
0610410210010
Time
061041021001008060402
Time
20 40 60 80
0610410210010
Time
061041021001008060402
Time
20 40 60 80
0.80
0.85
0.90
0.95
1.00
1.05
1.10
Expression level of PU.1
Expression level of Ets1
0.90
0.95
1.00
1.05
1.10
1.15
1.20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
Expression level of Tal1
.B.A
.D.C
Figure 1 Simulation results and experimental data of the regulatory network for erythrocyte
differentiation Red solid line: experimental microarray data; Blue star dash line: simulation of the
regulatory network. (A) Gene Gata1; (B) gene PU.1; (C) gene Ets1; (D) gene Tal1.
Full-size
DOI: 10.7717/peerj.9065/g-1
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 5/25
differentiation (Visvader et al., 1997). In addition, gene Ets1 is a suppressor in the
erythrocyte differentiation. It is downregulated in erythrocyte differentiation by binding
to and activating the Gata2 promoter (Lulli et al., 2006). The last candidate gene is
Notch1 that inhibits the differentiation of granulocyte lineage by maintaining the
expression of gene Gata2. It also enhances the HSCs differentiate to CLPs (Kumano et al.,
2001;Stier et al., 2002). Therefore, in this study we considered the regulatory networks
with the following 11 genes: Gata1, Gata2, PU.1/Sfpi1, Runx1, Eto2/Cbfa2t3, Ets1, Notch1,
Scl/Tal1, Ldb1, G1and Lmo2. The detailed information of the references for these 11
genes is also given in Table S2 in Supplemental Information.
Top-down approach: extended forward search algorithm
To reduce the number of unknown parameters in our proposed mathematical model,
we used the probabilistic graphical models as the top-down approach to infer the
topological structure of gene regulatory networks. Probabilistic graphical model is a useful
tool for inferring the network structure (Noor et al., 2013). One type of probabilistic
graphical models is the Gaussian graphical model (GGM), which provides a simple and
effective method to characterize the regulatory relationship between genes. The GGM is
based on the calculation of the conditional dependencies among genes using the gene
0.70
0.90
1.00
1.10
1.20
Expression level of Gata1
0.80
0610410210010
Time
061041021001008060402
Time
20 40 60 80
0610410210010
Time
061041021001008060402
Time
20 40 60 80
1.00
1.05
1.10
1.15
1.20
1.25
1.30
Expression level of PU.1
0.90
0.95
Expression level of Ets1
1.10
1.15
1.20
1.25
1.30
1.35
1.40
0.90
0.95
1.00
1.05
0.95
1.00
1.05
1.10
1.15
1.20
Expression level of Tal1
0.80
0.85
0.90
.B.A
.D.C
Figure 2 Simulation results and experimental data of the regulatory network for neutrophil
differentiation Red solid line: experimental microarray data; Blue star dash line: simulation of the
regulatory network. (A) Gene Gata1; (B) gene PU.1; (C) gene Ets1; (D) gene Tal1.
Full-size
DOI: 10.7717/peerj.9065/g-2
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 6/25
expression data. The edge connecting two genes in the model is neglected if they are
conditionally independent given all other genes (Krämer, Schäfer & Boulesteix, 2009).
In this work it is assumed that a system includes genes {G
1
,,G
m
} with expression
levels x
ij
for gene G
i
at time point j. Compared with the existing methods that study
networks with genes only, this work will study gene networks that include not only genes
in the form of monomers {G
1
,,G
m
}, which are represented by the linear terms in
the model, but also protein heterodimers and/or synergistic effect {G
k
G
l
}(k,l=1,,m),
which are represented by the non-linear terms (NLTs) in the model. There are two
reasons for using the NLTs {G
k
G
l
}. Firstly, we can use the product of two variables to
represent the synergistic effect of these two genes. Secondly, if the NLT represents the
protein heterodimer, we assumed that the binding and disassociation reactions for the
heterodimer {G
k
G
l
} reach an equilibrium state quickly. Thus the level of the heterodimer
{G
k
G
l
} can be written as C
kl
×G
k
×G
l
, where C
kl
is the equilibrium constant. We can
consider this constant C
kl
as a coefcient in our mathematical model. In both cases,
we only need to consider the product of the expression levels of these two genes, namely
y
klj
=x
kj
x
lj
, as the level of NLT {G
k
G
l
} at time t
j
for our algorithm computation. Since
the number of possible regulations from NLTs to genes is much larger than that of possible
regulations among genes (i.e. 726 vs 110), the regulations from NLTs to genes will
dominate the whole genetic regulatory system with high probability. However, the
regulations among genes should be the core mechanisms rather than the regulations from
NLTs to genes. To avoid the dominance of NLTs regulations, we assume that the number
of regulations from NLTs to genes does not exceed that between genes.
According to the GGM (Wang, Myklebost & Hovig, 2003;Wang et al., 2016), we
proposed a new algorithm, named Extended Forward Search Algorithm (EFSA), to
infer the topological structure of regulatory networks that includes both genes and NLTs.
Let X=(x
1
,x
2
,:::,x
N
) be a vector that consists of mgenes and nNLTs (N=m+n).
The following three matrices are constructed, namely a m×mcovariance matrix Aof
mgenes, a m×ncovariance matrix Bto measure the covariance between mgenes
and nNLTs, and a n×ncovariance matrix Cof nNLTs. The N-dimensional matrix Mis
dened by
M¼AB
B0C

;(1)
where Bis the transpose of B. An initial empty graph Gis built by the N-dimensional
identity matrix. This initial graph Gconsists of four matrices G
1
,G
2
,G
3
and G
4
which have
the same dimensions as A,B,Band C, respectively, namely
G¼G1G2
G3G4

;(2)
where G
1
and G
4
are identity matrix with dimensions mand n, respectively, and G
2
and G
3
are m×nand n×mzero matrices, respectively.
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 7/25
The proposed algorithm is given below.
Algorithm 1: extended forward search algorithm
1. Let X=(x
1
,x
2
,,x
N
) be a vector with Nelements, and Nbe the number of components
consist of mgenes and nNLTs. An initial empty graph Gis built by the N-dimensional
identity matrix, which is dened by Eq. (2).
2. Substitute all covariance values from the diagonal positions of sub-matrix Ainto the
corresponding positions of sub-matrix G
1
, and then based on the updated G
1
, use the
Iterative Maximum Likelihood Estimates Algorithm (IMLEA) to compute the new
covariance matrix (Dempster, Laird & Rubin, 1977).
3. Add an undirected edge E1
ij ((i,j)[1, m]
2
) into G
1
, namely add the symmetrical
covariance value between the ith gene and jth gene from the positions A(i,j) and A(j,i)
into the positions G
1
(i,j) and G
1
(j,i), respectively. Then compute a new covariance
matrix by the IMLEA. Based on the deviance difference between the new covariance
matrix and that before addition, test the signicance of the added edge E1
ij by using
the Chi-square distribution with one degree of freedom. The p-value of the Chi-square
test is used in the next step as the edge selection criterion. Record the p-value of this
tested edge and remove it from G
1
.
4. Add a new undirected edge into G
1
. Then, repeat the computation in Step 3.
After all possible undirected edges have been tested, sort all tested edges in ascending
order by their p-values. If the smallest p-value is lower than the predened
cut-off value, add the edge with the smallest p-value into the sub-graph G
1
permanently.
5. Go back to step 3, add the second edge in the updated sub-graph G
1
. Repeat the
computation in steps 3 and 4 until the smallest p-value of an added edge is larger than
the cutoff p-value.
6. Based on the last updated undirected graph G
1
, the graph orientation rules are applied to
transform the undirected graph into a directed acyclic graph (DAG) (Meek, 1995).
The inferred DAG with m
1
directed edges, denoted as A
s
, represents the predicted
regulatory network among mgenes.
7. Test the possible edges between mgenes and nNLTs. Based on the latest matrix G,
add an undirected edge E2
ij between the ith gene and the jth NLT. That is, add the
symmetrical covariance value between the ith gene and jth NLT from the positions
B(i,j) and B(j,i) into the positions G
2
(i,j) and G
3
(j,i), respectively. Then, compute a
new covariance matrix by the IMLEA. Based on the deviance difference between the new
covariance matrix and that before addition, test the signicance of the added edge E2
ij by
using the Chi-square distribution with one degree of freedom. The p-value of the
Chi-square test is used as the edge selection criterion. Record the p-value of this tested
edge E2
ij and remove it from G.
8. Repeat the computation in steps 7 for the regulation between genes and NLTs. The last
updated sub-graph G
3
with n
1
edges, denoted as B
s
, is the predicted directed regulatory
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 8/25
network from nNLTs to mgenes. Since we only consider regulations among genes
and those from NLTs to genes, the result matrix is given as follows:
Gs¼As
B0
s

:(3)
The output network includes m
1
directed edges among mgene and n
1
directed edges
from nNLTs to mgenes.
Note that we have initially applied the GGM in our previous work to the whole matrix
Mdirectly (Wang et al., 2016). However, since the number of NLTs is much larger
than that of genes, numerical results showed that the majority of selected edges connect
NLTs, but few edges are selected to connect genes. This result is not appropriate because
the regulations between genes should be the primary mechanisms of the network.
Then we conducted another test, in which we did not consider the regulations between
NLTs by changing matrix Cinto an identity matrix I
m
. Matrix Mnow is
M1¼AsB
B0Im

:(4)
However, when we applied the GGM to M
1
directly, the singular problem arose during
the computation of IMLEA. To satisfy our intention and make the algorithm stable,
we proposed EFSA which is executed in two steps. The rst step selects regulations
between genes and the second step nds regulations from NLTs to genes. The EFSA can be
used to predict the gene-gene interactions and the effect from NLTs to genes based on the
time-course experimental data.
Bottom-up approach: mathematical model
For a regulatory network with mgenes, the expression levels of the i-th gene at time tis
denoted as x
i
(t). We used the following ordinary differential equation (ODE) model to
describe the dynamics of the network (de Jong, 2002)
dx
dt ¼Fðt;xÞ;(5)
where x=(x
1
,,x
m
) is a vector representing the expression levels of mgenes. A number
of mathematical formalisms have been proposed to describe the dynamical interactions
between different genes in the network, such as the models with linear functions
(de Jong, 2002)
Fiðt;xÞ¼ X
n
j¼1;ji
aijxjkixi(6)
or the models with non-linear functions (Olariu & Peterson, 2018)
Fiðt;xÞ¼ Pn
j¼1aijxj
1þPn
j¼1bijxj
kixi(7)
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 9/25
The advantage of the model (Eq. 5) with the linear functions (Eq. 6) is that it has a
much smaller number of unknown parameters than the non-linear functions (Eq. 7).
However, the non-linear model is able to describe the non-linear dynamics more precisely.
Therefore, we proposed a method that combines the feature of additive terms in the
linear model and the advantages of non-linear model. We applied the second truncated
Taylor series approach to approximate the non-linear function (Eq. 7). Here the
Taylor series is a mathematical formula to approximate a function by using a polynomial
function (Stewart, 2018). Thus, we proposed an ODE model (Eq. 5) with the following
functions
Fiðt;xÞ¼ X
m
j¼1;ji
aijxjþX
1j,kn
bijkxjxkkixi(8)
where k
i
is the degradation rate of x
i
. This proposed model (Eq. 5) with the non-linear
function (Eq. 8) is based on the following assumptions:
1. The regulations from different genes to a particular gene are additive. Similarly, the
regulations from non-linear terms (NLTs) to a particular gene are also additive.
2. The regulations from gene jto gene iis represented by a
ij
x
j
, where a
ij
is the coefcient of
regulation strength.
3. The regulation of NLT x
j
x
k
to gene iis represented by β
ijk
x
j
x
k
, where β
ijk
consists of the
regulation strength and equilibrium constant C
ij
, as we discussed in the sub-section
Top-down Approach.
4. The auto-regulation is not considered, namely a
ii
= 0, to avoid confusion between
auto-regulation term a
ii
x
i
and degradation term k
i
x
i
. Note that the issue of
auto-regulation may be addressed using a model with non-linear function (Eq. 7).
In addition, we just consider the effect of NLTs x
j
x
k
for jksince the expression levels of
x
j
may be highly correlated to that of x2
j. Therefore, we assume that β
ijj
=0.
5. If the value of a
ij
is positive (negative or zero), it means that gene x
j
activates (represses
or has no regulation to) the expression of gene x
i
. Similar assumption is applied to the
value of β
ijk
.
We emphasize that the proposed method in this work is substantially different from our
previous work (Wang et al., 2016). The rst difference is that the proposed non-linear
model (Eq. 8) is different from the non-linear model in Wang et al. (2016). This new
model not only can study the regulations from genes to genes, as we considered in our
previously proposed model (Wang et al., 2016), but also can investigate the effects of
heterodimers and/or synergistic effect in genetic regulation. This new model also leads
to the second difference compared with our previous top-down approach, namely the
proposed Extended Forward Search Algorithm (EFSA) not only includes the probabilistic
graphical model in our previous work (Wang et al., 2016) but also can predict the
possible regulations from NLTs to genes. In addition, in this work, we will infer a
medium-sized network rst by using EFSA and then reduce the network size by removing
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 10/25
regulations from the network in the Results section, rather than inferring a core
network rst and then adding regulations to the core network in our previous approach
(Wang et al., 2016).
Parameter inference
When considering the full connected graph among mgenes and nnon-linear terms
(NLTs), we have an ordinary differential equation (ODE) system with mdifferential
equations. The total number of all unknown coefcients is m(m+n). After applying the
Extended Forward Search Algorithm (EFSA), we have an inferred regulatory network
which contains only m
1
edges among genes and n
1
edges from NLTs to genes. Thus,
the numbers of coefcients a
ij
and β
ijk
are reduced from m(m1) to m
1
and from mn to
n
1
, respectively. It is easier to estimate the parameters for the inferred network than for
the fully connected network.
In this work, we used a MATLAB toolbox of Genetic Algorithm to estimate the
parameters in the proposed mathematical model (Chippereld, Fleming & Fonseca, 1994).
The algorithm begins by generating a population of initial parameter values, for example,
100 values. Each initial value is called an individual and the whole population is called
one generation. Then it calculates the tness value for each individual of current
generation. Based on the tness values, the algorithm next creates new values for each
individual and thus forms a population of the next generation. This process is repeated
until a pre-dened number of generations have been calculated. In this work, we used
the following functions, namely function crtbp to generate initially binary populations,
function reins to effect tness-based reinsertion, function select to give a convenient
interface to the selection routines, function recombine to conduct crossover operators,
and function mut to conduct binary and integer mutations. The detailed information
of these functions and their alternatives can be found in the relevant reference
(Chippereld, Fleming & Fonseca, 1994).
To ensure the accuracy of estimates, we set the number of generations as 1,000 and
the number of individuals for each generation as 300. For the parameter vector (a
ij
,β
ijk
,k
i
),
we used the uniform distribution over the interval (W
min
,W
max
) to generate the
initial estimates. Here W
min
and W
max
are the minimal value and maximal value,
respectively, for choosing the samples of the parameters. The values of W
min
and W
max
are adjusted by computation. For example, if the majority of estimated parameters all
are close to W
min
, then we will further decrease the value of W
min
. However, if the majority
of estimated values are well above W
min
, then we need to increase the value of W
min
accordingly. The similar consideration is applied to W
max
. In this study, for the erythroid
lineage pathway, numerical results suggest that the values of W
min
and W
max
for (a
ij
,
β
ijk
,k
i
) are ( 3, 3, 0) and (3, 3, 1), respectively. In addition, for the neutrophil lineage
pathway, numerical results suggest that the values of W
min
and W
max
for (a
ij
,β
ijk
,k
i
)
are (2.5, 2.5, 0) and (2.5, 2.5, 1), respectively. We run the algorithm using an initial
random number to generate an initial set of model parameters, which leads to a set of
estimated parameters. For each model, we used 200 different initial random numbers,
which lead to 200 different sets of estimated model parameters. Denote x
i
(t
j
) and x
i
(t
j
)as
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 11/25
the observation data and numerical simulations at time point t
j
for j= {1,2,,M},
respectively. The simulation error is calculated by
E¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
m
i¼1X
M
j¼1
ðxiðtjÞx
iðtjÞÞ2
v
u
u
t:(9)
We selected the top ten sets with the minimal estimated errors out of 200 estimates for
further analysis and comparison.
Robustness analysis
We noted that, if a model with the estimated parameters is not robust, a perturbation to
the parameters might lead to substantial variations of the model output. Thus, we next
used the robustness property of the model to select the inferred model parameter sets from
the Genetic Algorithm. This property was designed to examine the robustness of the
inferred model to the perturbations of model parameters (Kitano, 2004). Robustness
property is also an important method for understanding the variations in genetic
regulatory networks mathematically (Masel & Siegal, 2009). Note that our perturbation
test is a mathematical technique. It is different from the perturbation of biological
experiments, which may be conducted by the over-expression/knock-down tests. Although
we will conduct removal tests by removing edges from the developed model, these tests
are designed to remove the unnecessary (or unimportant) regulations in the network.
In this perturbation test, based on the inferred parameter k
i
that is assumed to be the
unperturbed one, the perturbed parameter is generated by
ki¼kið1þmεÞ(10)
where εis a sample generated from either the normal distribution or the uniform
distribution. In this work, we used the standard Gaussian random variable N(0,1) to
generate samples. In addition, mis a parameter to determine the values of perturbation
(Wang et al., 2016). The value of parameter mdetermines the variations of simulations.
Numerical results suggest that when the value of mis small, perturbation has small effect on
the system dynamics, and it is difcult to distinguish the robustness properties of the
model with different parameter sets. However, if the value of mis large, perturbation
will make the model output substantially different, and it will be difcult to measure the
robustness property. To make the variations of simulations appropriately for robustness
analysis, m= 0.4 was employed in this study.
For each of the top ten sets of parameters determined in the previous sub-section,
we rstly obtained N( = 5,000) sets of perturbed model parameters by using (Eq. 10) and
then used these parameter sets to obtain Ncorresponding simulations. We used xðkÞ
ij ðpÞ
and xðkÞ
ij to denote the simulation of variable x
i
at time point t
j
obtained by the k-th
perturbed and unperturbed model parameters, respectively. Then, we dened
EðkÞ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
m
i¼1X
M
j¼1
ðxðkÞ
ij ðpÞxðkÞ
ij Þ2
v
u
u
t(11)
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 12/25
as the measure for the robustness property of the model with the k-th perturbed parameter
set. Afterwards, we dened the robust average for the given parameter set as
RA ¼1
NX
N
k¼1
EðkÞ
;(12)
and robust standard deviation as
RSTD ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
N1X
N
k¼1
ðEðkÞRAÞ2
v
u
u
t(13)
over Nperturbation tests. Smaller values of RA and RSTD mean that the model with the
given parameter set is more robust.
RESULTS
Inference of regulatory network
To reduce the complexity of regulatory networks, we rst used the Extended Forward
Search Algorithm (EFSA) to predict the topological structure of genetic networks.
The algorithm controls the number of edges by adjusting a pre-dened cut-off value.
This value is equivalent to the signicant value in statistics. If the threshold is too low,
we may miss some signicant regulations. However, if the threshold is relatively high, it is
quite possible to select insignicant regulations. This work considers the networks
including 11 genes and 55 non-linear terms (NLTs). For the sub-network of 11 genes only
(i.e. matrix A
s
in Eq. 3), to ensure the statistically signicant, we set a specic threshold
as 0.1 for both the erythroid regulatory network and neutrophil regulatory network.
The selection of this threshold value (i.e. 0.1) is based on the balance between neither
selecting much insignicant regulations nor choosing a small number of candidate
regulations. Then we had 46 and 40 directed edges for the erythroid regulatory network
and neutrophil network, respectively.
For the regulations from NLTs to genes (i.e. matrix B
s
in Eq. 3), the size of matrix B
s
is much larger than that of A
s
. To avoid the dominance of the regulations from NLTs
to genes, we also set the cut-off value as 0.1 for the two networks, or take the rst 46 and
40 directed edges from NLTs to genes for the erythrocyte and neutrophil differentiation,
respectively, if more edges are selected when using the cut-off value 0.1. The reason we
still applied threshold 0.1 here is that the number of selected edges that satisfy this value is
much larger than the required number (i.e. 46 for the erythroid regulatory network
and 40 for the neutrophil regulatory network). Since the edges are selected and ranked by
their signicance, we can simply select the top 46 edges and 40 edges for the erythroid and
neutrophil pathway, respectively, without conducting any further numerical tests.
Figures S1 and S2 in Supplemental Information present the inferred regulatory
networks for the erythroid and neutrophil networks, respectively. Note that there are
11 and 17 isolated NLTs in the erythroid and neutrophil networks, respectively, since no
signicant edges have been selected from these NLTs by our algorithm. All these isolated
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 13/25
NLTs are listed in the Isolated NLTs Table. Moreover, all arrows in these gures only
represent the direction of regulations, rather than the types of regulations (i.e. positive or
negative regulation). We will study the detailed regulatory mechanisms in the next
subsection. We found that the targeted gene of the protein heterodimer is a component
of that heterodimer in all situations. The possible explanation of this observation is that
the expression levels of a heterodimer are the product of the expression levels of the
two corresponding genes (namely x
i
x
j
for genes iand jwith expression levels x
i
and x
j
,
respectively). Thus, the expression data of the NLTs {x
i
x
j
} may be highly correlated to
those of the component genes, namely {x
i
}or{x
j
}.
Inference of mathematical model
After the success of constructing regulatory networks in the previous sub-section, we next
study the detailed dynamics of genetic networks in fate determination of hematopoietic
stem cells (HSCs) by using our proposed mathematical model. The major step is to
infer the values of unknown parameters in the model (Eq. 8). If we consider the fully
connected model, there should be 11 × (11 + 55) = 726 parameters. However, after the
application of EFSA, the number of unknown parameters is reduced to 103 (including
46 directed edges between genes, 46 directed edges from non-linear terms (NLTs) to genes
and 11 self-degradation rate constants) for the differentiation of erythrocytes and 91
(including 40 directed edges between genes, 40 directed edges from NLTs to genes and
11 self-degradation rate constants) for the differentiation of neutrophils. We next applied
the Genetic-Algorithm to estimate these unknown parameters for two networks.
We used 200 different random numbers to obtain different initial values of rate constants
(a
ij
,β
ijk
,k
i
) over the dened range (W
min
,W
max
), which was discussed in the Methods
section. This leads to 200 different sets of estimated parameters. Then, we chose the top
ten sets of estimated results for each differentiated lineage with the smallest estimation
errors for further robustness analysis. According to the denition of estimation error
(Eq. 9), the optimal inferred network for the erythrocyte differentiation in our tests has
estimation error 0.9902. In addition, the robust average (Eq. 12) and robust standard
deviation (Eq. 13) are 0.3977 and 0.1066, respectively. For the neutrophil differentiation,
the optimal inferred network has estimation error 0.8726, robust average 0.3983 and
robust standard deviation 0.1275.
Figures 1 and 2present the simulation results based on the optimal estimated
parameters for the expression levels of four genes, namely genes Gata1, PU.1, Ets1 and
Tal1, for the differentiation of erythrocyte and neutrophil, respectively. The expression
levels of Gata1 increase continuously in both simulated and experimental data during
the erythrocyte differentiation. However, during the neutrophil differentiation,
experimental data of Gata1 keep uctuations and then turn to slightly decreasing at
the end of differentiation, which is matched by our simulation. For gene PU. 1, both
microarray and simulated data decline in the differentiation of erythrocyte but climb
during the differentiation of neutrophil. Similarly, the expression levels of Ets1 in
microarray data increase during erythrocyte differentiation but decrease during neutrophil
differentiation. Simulation results also t the trends for both differentiation pathways.
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 14/25
The experimental data of Tal1 increase with uctuations during the rst 60 h of
erythrocyte differentiation, but then rises rapidly after the rst 60 h. Our simulated results
are consistent with the expression levels of Tal1 with the same trend in expression levels.
Thus, our simulation results t the trend of expression levels of these genes very well
during two developmental processes. Figures S3 and S4 in Supplemental Information give
the simulation results of the other six genes for the differentiation of erythrocyte and
neutrophil, respectively.
Reduction of network modeledge deletion
We have obtained two regulatory networks with 92 directed edges and 80 directed edges
for erythroid and neutrophil differentiation, respectively. Next we tested the possibility to
delete the potential insignicant edges from our predicted regulatory networks. In the
rst step, we tested the deletion of regulations from non-linear terms (NLTs) to genes.
We removed one edge in each test to form a temporary system model, and then
examined the simulation error and robustness property of the new model. Afterwards, we
removed one specic edge permanently if the corresponding new system has the minimal
change in simulation error and robustness property, and then formed an updated
model. This test is repeated until both the simulation error and robustness property of the
updated model are much worse than the original network without any removal. In the
second step, we evaluated the regulatory interactions between 11 genes using the same
method in the rst step.
For the erythrocyte differentiation, Table 1 suggests that after removing 3 regulations
from NLTs to genes, the estimation error (Eq. 9) is improved (shown in DEL1). Then,
we tested the regulation reduction from gene to gene. The nal result suggests that, after we
deleted (Ldb1 Lmo2), (Notch1 Lmo2), (Cbfa2t3 Lmo2) and (Runx1 Lmo2)
edges, the estimation error (Eq. 9) is slightly increased. However, the robustness property
is better than that of the DEL1 model since the robust average (Eq. 12) is decreased.
Thus, for the erythroid differentiation, numerical tests recommended to remove total
Table 1 Edge deletion test for erythrocyte differentiation. RR, Removed regulation; SE, Simulation
error, dened by Eq. (9); RA, Robust average, dened by Eq. (12); RSTD, Robust standard deviation,
dened by Eq. (13).
Model RR SE RA RSTD
OES N/A 0.9902 0.3977 0.1066
DEL1 Gata2-Notch1 Notch1 Tal1-G1
G1 Cbfa2t3-G1G1
0.9826 0.4594 0.1259
DEL2 Ldb1 Lmo2 0.9955 0.3938 0.1124
DEL3 Notch1 Lmo2 0.9861 0.4506 0.1263
DEL4 Cbfa2t3 Lmo2 1.0451 0.3820 0.0962
DEL5 Runx1 Lmo2 1.0298 0.3471 0.0904
Note:
Description of different models: OES, The original model without any deletion; DEL1, Model based on OES by removing
regulations from NLTs to genes; DEL2, Model based on DEL1 by removing a regulation among genes; DEL3, Model
based on DEL2 by removing a regulation among genes; DEL4, Model based on DEL3 by removing a regulation among
genes; DEL5, Model based on DEL4 by removing a regulation among genes.
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 15/25
seven edges from our predicted regulatory network. We stopped the deletion test after
obtaining the DEL5 model. If we proceed further deletion, both simulation error and
robustness property of the temporary network are much worse than the original network
without removal.
Table 2 shows that, for the neutrophil differentiation, there are no insignicant
regulations from NLTs to genes, because the removal of any edge from NLTs to genes will
increase the simulation error (Eq. 9) substantially and/or decrease the robustness property
by increasing the values of robust average (Eq. 12) and robust standard deviation
(Eq. 13). For the regulations between genes, we have removed the following four
regulations, namely (Gata2 Ldb1), (Runx1 Cbfa2t3), (Ldb1 Lmo2) and (Tal1
Lmo2), and formed an updated system. Table 2 shows that the simulation error and
robustness property of the updated system are close to those of the original system
without any removal of edges. Thus, for the neutrophil differentiation, numerical tests
recommended to remove only four edges from our predicted regulatory network.
Coincidentally, we stopped the deletion test after obtaining the DEL5 model because of the
same reason for the erythrocyte differentiation.
Figures 3 and 4present the inferred regulatory networks after edge deletion test for
erythroid and neutrophil differentiation, respectively. Initially, we have 92 directed edges
for the erythrocyte pathway and 80 directed edges for the neutrophil pathway. After
the edges deletion, seven and four directed edges have been taken away from the
erythrocyte network and neutrophil network, respectively, since the removal of these edges
has not much negative inuence on simulation error (Eq. 9), robust average (Eq. 12) and
robust standard deviation (Eq. 13). Thus, there are 85 and 76 directed edges left for
the erythrocyte and neutrophil pathways, respectively.
DISCUSSION
This work was designed to develop a mathematical framework that was able to realize
nonlinear gene expression dynamics accurately. In particular, we intended to investigate
the effect of possible protein heterodimers and/or synergistic effect in genetic regulation.
Table 2 Edge deletion test for neutrophil differentiation. RR, Removed regulation; SE, Simulation
error, dened by Eq. (9); RA, Robust average, dened by Eq. (12); RSTD, Robust standard deviation,
dened by Eq. (13).
Model RR SE RA RSTD
OES N/A 0.8726 0.3983 0.1275
DEL1 No Suggestion N/A N/A N/A
DEL2 Gata2 Ldb1 0.8726 0.3943 0.1273
DEL3 Runx1 Cbfa2t3 0.8726 0.3928 0.1265
DEL4 Ldb1 Lmo2 0.8748 0.4183 0.1333
DEL5 Tal1 Lmo2 0.8809 0.3925 0.1237
Note:
Description of different models: OES, The original model without any deletion; DEL1, Model based on OES by removing
regulations from NLTs to genes; DEL2, Model based on DEL1 by removing a regulation among genes; DEL3, Model
based on DEL2 by removing a regulation among genes; DEL4, Model based on DEL3 by removing a regulation among
genes; DEL5, Model based on DEL4 by removing a regulation among genes.
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 16/25
In this study, we designed the Extended Forward Search Algorithm (EFSA) to predict the
topology of regulatory networks connecting genes and heterodimers. We also proposed a
new mathematical model for inferring dynamic mechanisms of regulatory networks.
Using the EFSA, we derived two regulatory networks of 11 genes for erythrocyte and
neutrophil differentiation pathways. According to the predicted networks and
experimental data, we estimated parameters in our proposed mathematical model based
on the criteria of simulation error and robustness property. By removing regulations with
less importance based on simulation error and robustness property, we developed two
gene networks that regulate erythrocyte and neutrophil differentiation pathways.
Numerical results suggested that our proposed method is capable of reconstructing genetic
regulatory networks effectively and accurately.
To infer the regulatory mechanisms of heterodimers, we combined both the
top-down approach (i.e. probabilistic graphical model) and the bottom-up approach
(i.e. mathematical model). We used the top-down approach rst to simplify the network
topology and reduced the number of unknown parameters in the mathematical
model. Then the Genetic-Algorithm was used to estimate the unknown parameters.
The combination of these two approaches reduced the errors in simulation and also
improved the robustness property of the mathematical model. In this work, we considered
Gfi1−Gata1
Gfi1−Runx1
Gfi1−Gata2
Gfi1−PU.1
Gfi1−Ldb1
Notch1−Cbfa2t3
Gfi1−Lmo2
Lmo2−Gata2
Gfi1−Notch1
Cbfa2t3
Tal1
PU.1−Lmo2
PU.1−Ldb1
PU.1−Tal1
Runx1−Cbfa2t3
PU.1−Gata2
Runx1
PU.1−Runx1
Ets1−Lmo2
Ets1−Tal1
Ets1−Notch1
Ets1−PU.1
Gata1
Ets1
Gata1−Cbfa2t3
Ets1−Runx1
Gata1−Ldb1
Gata1−Lmo2
Ldb1
Gata2
Gata1−Tal1
Gata1−Gata2
Gfi1
Gata1−Pu.1
Notch1−Runx1
Lmo2
Notch1−PU.1
Notch1
Notch1−Tal1
PU.1−Cbfa2t3
PU.1
Notch1−Lmo2
Notch1−Ldb1
Tal1−Lmo2
Runx1−Tal1
Ets1−Gata1
Tal1−Gata2
Ets1−Gata2
Ets1−Cbfa2t3
Ets1−Ldb1Runx1−Gata2
Cbfa2t3−Gata2
Regulation among genes
Regulation from NLTs to Ets1
Regulation from NLTs to Gata1
Regulation from NLTs to Tal1
Regulation from NLTs to Cbfa2t3
Regulation from NLTs to Runx1
Regulation from NLTs to Notch1
Regulation from NLTs to PU.1
Regulation from NLTs to Lmo2
Color Reference Table
Isolated NLTs Table
Gata1−Runx1
Gata1−Notch1
Gata2−Notch1
Gata2−Ldb1
Runx1−Lmo2
Runx1−Ldb1
Tal1−Ldb1
Tal1−Gfi1
Cbfa2t3−Lmo2
Cbfa2t3−Ldb1
Cbfa2t3−Gfi1
Tal1−Cbfa2t3
Lmo2−Ldb1
Gfi1−Ets1
Figure 3 Predicted genetic regulatory network of erythrocyte pathway. The genetic regulatory
network predicted by the Extended Forward Search Algorithm with 11 genes and 41 non-linear terms
(NLTs) (14 isolated NLTs excluded) after edges deletion test, which is related to the fate deter-
mination of erythrocyte pathway: regulatory network for hematopoietic stem cells differentiate to
megakaryocyte-erythroid progenitors. The network is visualized by Cytoscape software.
Full-size
DOI: 10.7717/peerj.9065/g-3
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 17/25
the network with medium-sized complexity initially. We then reduced the network
complexity by removing edges from the network, rather than studying the core
network and then adding the edges to the network in our previous study (Wang et al.,
2016). The reason for changing the method from adding edgeto removing edgein
this work is mainly due to the high computational cost in the adding edgetests
since the number of candidate edges in the removing edge testis much smaller than
that in the adding edge test. Thus, in this work, we used the EFSA to obtain more
candidate edges and then used the dynamic model to remove unimportant edges.
If the number of potential regulations derived from the probabilistic graphical model is
relatively large, the removal of one single regulation from the potential network may
not have any changes in simulation error. Numerical results suggested that a couple of
regulations should be removed simultaneously in order to achieve changes in simulation
error.
The inferred regulatory networks from our proposed models are partially supported
by experimental observations. For example, the regulation of Gata1-Gata2-PU.1 complex
in our inferred networks agrees with the experimental results (May et al., 2013).
The Gata1-PU.1 heterodimer plays an important role in regulating the hematopoiesis
(Zhang et al., 2000), which is also included in our inferred model. In addition, the Ldb1-
Lmo2 dimer is activated with signicantly expression proles during the erythroid
Notch1−PU.1
Lmo2−Tal1
Notch1−Gata2
Notch1−Ldb1
Notch1−Gfi1
Notch1−Cbfa2t3
Notch1−Lmo2
Notch1−Gata1
Notch1
PU.1
Lmo2−Cbfa2t3
Ldb1−Cbfa2t3
Lmo2
Runx1
Gfi1−Tal1
Gfi1−Runx1
Notch1−Tal1
Ets1
Cbfa2t3
Notch1−Runx1 Tal1
PU.1−Cbfa2t3
Gata1−Cbfa2t3
Gfi1−Gata1
Gata1−Runx1
Ets1−Tal1
Gfi1−Cbfa2t3
Gfi1−Gata2
Ets1−Notch1
Gata1−Tal1
Ets1−Runx1
Ets1−PU.1
Gata1−Pu.1
Ets1−Cbfa2t3
Ets1−Lmo2
PU.1−Runx1
PU.1−Gata2
Ets1−Ldb1PU.1−Ldb1
Ets1−Gata2
PU.1−Tal1 Ets1−Gata1
Gata2
Gfi1
Gfi1−Lmo2
Gfi1−Ldb1
Ldb1
Gata1
Gfi1−PU.1
Regulation among genes
Regulation from NLTs to Ets1
Regulation from NLTs to Gata1
Regulation from NLTs to PU.1
Regulation from NLTs to Lmo2
Regulation from NLTs to Notch1
Regulation from NLTs to Ldb1
Color Reference Table
Isolated NLTs Table
Gata1−Gata2
Gata1−Lmo2
Gata1−Ldb1
Gata2−Runx1
Gata2−Tal1
Gata2−Cbfa2t3
Runx1−Cbfa2t3
Runx1−Lmo2
Runx1−Ldb1
PU.1−Lmo2
Tal1−Cbfa2t3
Gata2−Lmo2
Tal1−Ldb1
Lmo2−Ldb1
Gata2−Ldb1
Runx1−Tal1
Gfi1−Ets1
Figure 4 Predicted genetic regulatory network of neutrophil pathway. The genetic regulatory
networks predicted by the Extended Forward Search Algorithm with 11 genes and 38 non-linear terms
(NLTs) (17 isolated NLTs excluded) after edges deletion test, which is related to the fate deter-
mination of neutrophil pathway: regulatory network for hematopoietic stem cells differentiate to
granulocyte-macrophage progenitors. The network is visualized by Cytoscape software.
Full-size
DOI: 10.7717/peerj.9065/g-4
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 18/25
differentiation process (Xu et al., 2003), which is consistent with our prediction. Moreover,
there are evidences to show the existence of synergistic effect of Tal1, Lmo2 and Gata1
(Mead et al., 2001), which has been inferred in our regulatory networks as well.
However, not all of the predictions can be conrmed by the existing experimental
observations. The rst explanation is that the non-linear terms in our mathematical
model are introduced by mathematical operation (i.e. the Taylor series). Some of these
non-linear terms may be needed for realizing the nonlinear dynamics accurately, but not
supported by biological mechanisms. Note that another inference method, called
semi-supervised method, can include the validated regulations rst and then infer the
invalidated regulations (Maetschke et al., 2013). Secondly, our inferred regulatory network
may predict some potential possible regulations between genes and from non-linear terms
to genes, which may be conrmed by future experimental studies. Thus, the inferred
regulations in this work may provide testable prediction for further experimental studies to
explore the detailed mechanism of hematopoiesis.
This work also raised a number of important issues in the study of genetic regulations.
One question is that our non-linear model still cannot t all the expression data very well
due to noise in the data. Figures 1 and 2show that the noise in expression data may
increase the simulation error of our proposed model. If the noise ratio in expression data is
large, it is a challenging issue in mathematical modeling. Large variations in the data may
lead to incorrect inference results. In that case, stochastic modeling may be a more
appropriate approach to describe the noise in gene expression data (Samad et al., 2005;
Tian, 2010;Chowdhury, Chetty & Evans, 2015). In addition, the Gaussian graphical model
is based on the covariance matrix. However, the correlation coefcient is suitable to
measure the linear correlation relationship. Currently, other approaches, such as mutual
information and conditional mutual information, have been used to measures both
linear and non-linear correlation relationships between the gene expression data
(Zhang et al., 2012,2015;Zhao et al., 2016). Finally, this research determines the regulatory
mechanisms based on numerical simulation and robustness property. More information
from experimental studies will be important to improve the accuracy of the model and
make more reasonable predictions. In addition, we may use other key criteria to select
mathematical models, such as Akaikes Information Criterion (AIC), Bayesian
Information Criterion (BIC), and Bayesian factor (Kadane & Lazar, 2004). All these
issues will be the interesting topics of our future research.
CONCLUSION
In conclusion, this study proposes a new method to construct the network topology
from genes and heterodimers by a new top-down approach and then develops a
non-linear ordinary differential equation model to infer the dynamic mechanisms of
regulatory networks. The derived two networks may provide insights regarding the
genetic regulations in the cell fate determination of hematopoietic stem cells.
The proposed method can also be applied to model other regulatory pathways and
biological systems.
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 19/25
ADDITIONAL INFORMATION AND DECLARATIONS
Funding
This work was supported by the National Natural Science Foundation of China
(Nos.11871238 and 11931019). The funders had no role in study design, data collection
and analysis, decision to publish, or preparation of the manuscript.
Grant Disclosures
The following grant information was disclosed by the authors:
National Natural Science Foundation of China: 11871238 and 11931019.
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
Siyuan Wu performed the experiments, analyzed the data, prepared gures and/or
tables, authored or reviewed drafts of the paper, and approved the nal draft.
Tiangang Cui analyzed the data, prepared gures and/or tables, and approved the nal
draft.
Xinan Zhang analyzed the data, authored or reviewed drafts of the paper, and approved
the nal draft.
Tianhai Tian conceived and designed the experiments, analyzed the data, prepared
gures and/or tables, authored or reviewed drafts of the paper, and approved the nal
draft.
Data Availability
The following information was supplied regarding data availability:
The data is available at NCBI GEO: GSE49991. The code is available at GitHub:
https://github.com/ThaddeusWu/PeerJ_Submission.
Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/
peerj.9065#supplemental-information.
REFERENCES
Aggarwal R, Lu J, Pompili VJ, Das H. 2012. Hematopoietic stem cells: transcriptional regulation,
ex vivo expansion and clinical application. Current Molecular Medicine 12(1):3449
DOI 10.2174/156652412798376125.
Athanasiadis EI, Botthof JG, Andres H, Ferreira L, Lio P, Cvejic A. 2017. Single-cell
RNA-sequencing uncovers transcriptional states and fate decisions in haematopoiesis.
Nature Commmunication 8(1):631 DOI 10.1038/s41467-017-02305-6.
Birbrair A, Frenette PS. 2016. Niche heterogeneity in the bone marrow. Annals of the New York
Academy of Science 1370(1):8296 DOI 10.1111/nyas.13016.
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 20/25
Cedar H, Bergman Y. 2011. Epigenetics of haematopoietic cell development. Nature Reviews
Immunology 11(7):478488 DOI 10.1038/nri2991.
Chang AN, Cantor AB, Fujiwara Y, Lodish MB, Droho S, Crispino JD, Orkin SH. 2002.
GATA-factor dependence of the multitype zinc-nger protein FOG-1 for its essential role in
megakaryopoiesis. Proceedings of the National Academy of Sciences of the United States of
America 99(14):92379242 DOI 10.1073/pnas.142302099.
Chang HH, Oh PY, Ingber DE, Huang S. 2006. Multistable and multistep dynamics in neutrophil
differentiation. BMC Cell Biology 7(1):11 DOI 10.1186/1471-2121-7-11.
Chickarmane V, Enver T, Peterson C. 2009. Computational modeling of the hematopoietic
erythroid-myeloid switch reveals insights into cooperativity, priming, and irreversibility.
PLOS Computational Biology 5(1):e1000268 DOI 10.1371/journal.pcbi.1000268.
Chickarmane V, Peterson C. 2008. A computational model for understanding stem cell,
trophectoderm and endoderm lineage determination. PLOS ONE 3(10):e3478
DOI 10.1371/journal.pone.0003478.
Chippereld AJ, Fleming PJ, Fonseca CM. 1994. Genetic algorithm tools for control systems
engineering. In: Proceedings of Adaptive Computing in Engineering Design and Control. 128133.
Chowdhury AR, Chetty M, Evans R. 2015. Stochastic S-system modeling of gene regulatory
network. Cognitive Neurodynamics 9(5):535547 DOI 10.1007/s11571-015-9346-0.
Crombach A, Wotton KR, Cicin-Sain D, Ashyraliyev M, Jaeger J. 2012. Efcient
reverse-engineering of a developmental gene regulatory network. PLOS Computational Biology
8(7):e1002589 DOI 10.1371/journal.pcbi.1002589.
de Jong H. 2002. Modeling and simulation of genetic regulatory systems: a literature review.
Journal of Computational Biology 9(1):67103 DOI 10.1089/10665270252833208.
Dempster AP, Laird NM, Rubin DB. 1977. Maximum likelihood from incomplete data via the EM
algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39(1):122
DOI 10.1111/j.2517-6161.1977.tb01600.x.
Friedman AD. 2007. Transcriptional control of granulocyte and monocyte development. Oncogene
26(47):68166828 DOI 10.1038/sj.onc.1210764.
Gardner TS, Cantor CR, Collins JJ. 2000. Construction of a genetic toggle switch in
Escherichia coli.Nature 403(6767):339342 DOI 10.1038/35002131.
Goardon N, Lambert JA, Rodriguez P, Nissaire P, Herblot S, Thibault P, Dumenil D,
Strouboulis J, Romeo P-H, Hoang T. 2006. ETO2 coordinates cellular proliferation and
differentiation during erythropoiesis. EMBO Journal 25(2):357366
DOI 10.1038/sj.emboj.7600934.
Guo G, Huss M, Tong GQ, Wang C, Sun LL, Clarke ND, Robson P. 2010. Resolution of cell fate
decisions revealed by single-cell gene expression analysis from zygote to blastocyst.
Developmental Cell 18(4):675685 DOI 10.1016/j.devcel.2010.02.012.
Hamey FK, Nestorowa S, Kinston SJ, Kent DG, Wilson NK, Göttgens B. 2017. Reconstructing
blood stem cell regulatory network models from single-cell molecular proles. Proceedings of the
National Academy of Sciences of the United States of America 114(23):58225829
DOI 10.1073/pnas.1610609114.
Hill AV. 1910. The possible effects of the aggregation of the molecules of hæmoglobin on its
dissociation curves. Journal of Physiology 40:ivvii.
Hoppe PS, Schwarzscher M, Loefer D, Kokkaliaris KD, Hilsenbeck O, Moritz N, Endele M,
Filipczyk A, Gambardella A, Ahmed N, Etzrodt M, Coutu DL, Rieger MA, Marr C,
Strasser MK, Schauberger B, Burtscher I, Ermakova O, Bürger A, Lickert H, Nerlov C,
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 21/25
Theis FJ, Schroeder T. 2016. Early myeloid lineage choice is not initiated by random PU.1 to
GATA1 protein ratios. Nature 535(7611):299302 DOI 10.1038/nature18320.
Huang S, Guo Y, May G, Enver T. 2007. Bifurcation dynamics in lineage-commitment in bipotent
progenitor cells. Developmental Biology 305(2):695713 DOI 10.1016/j.ydbio.2007.02.036.
Inouea A, Fujiwaraa T, Okitsua Y, Katsuokaa Y, Fukuharaa N, Onishia Y, Ishizawaa K,
Harigaea H. 2013. Elucidation of the role of LMO2 in human erythroid cells.
Experimental Hematology 41(12):10621076 DOI 10.1016/j.exphem.2013.09.003.
Kadane JB, Lazar NA. 2004. Methods and criteria for model selection. Journal of the American
Statistical Association 99(465):279290 DOI 10.1198/016214504000000269.
Kitano H. 2004. Biological robustness. Nature Reviews Genetics 5(11):826837
DOI 10.1038/nrg1471.
Krämer N, Schäfer J, Boulesteix A-L. 2009. Regularized estimation of large-scale gene association
networks using graphical Gaussian models. BMC Bioinformatics 10(1):384
DOI 10.1186/1471-2105-10-384.
Kumano K, Chiba S, Shimizu K, Yamagata T, Hosoya N, Saito T, Takahashi T, Hamada Y,
Hirai H. 2001. Notch1 inhibits differentiation of hematopoietic cells by sustaining GATA-2
expression. Blood 98(12):32833289 DOI 10.1182/blood.V98.12.3283.
Lancrin C, Mazan M, Stefanska M, Patel R, Lichtinger M, Costa G, Vargel Ö, Wilson NK,
Möröy T, Bonifer C, Göttgens B, Kouskoff V, Lacaud G. 2012. GFI1 and GFI1B control the
loss of endothelial identity of hemogenic endothelium during hematopoietic commitment. Blood
120(2):314322 DOI 10.1182/blood-2011-10-386094.
Laslo P, Spooner CJ, Warmash A, Lancki DW, Lee H-J, Sciammas R, Gantner BN, Dinner AR,
Singh H. 2006. Multilineage transcriptional priming and determination of alternate
hematopoietic cell fates. Cell 126(4):755766 DOI 10.1016/j.cell.2006.06.052.
Li C, Wang J. 2013. Quantifying cell fate decisions for differentiation and reprogramming of a
human stem cell network: landscape and biological paths. PLOS Computational Biology
9(8):e1003165 DOI 10.1371/journal.pcbi.1003165.
Li L, Jothi R, Cui K, Lee JY, Cohen T, Gorivodsky M, Tzchori I, Zhao Y, Hayes SM,
Bresnick EH, Zhao K, Westphal H, Love PE. 2011. Nuclear adaptor Ldb1 regulates a
transcriptional program essential for the maintenance of hematopoietic stem cells.
Nature Immunology 12(2):129136 DOI 10.1038/ni.1978.
Liew CW, Rand KD, Simpson RJY, Yung WW, Manseld RE, Crossley M, Proetorius-Ibba M,
Nerlov C, Poulsen FM, Mackay JP. 2006. Molecular analysis of the interaction between the
hematopoietic master transcription factors GATA-1 and PU.1. Journal of Biological Chemistry
281(38):2829628306 DOI 10.1074/jbc.M602830200.
Ling KW, Ottersbach K, Van Hamburg JP, Oziemlak A, Tsai FY, Orkin SH, Ploemacher R,
Hendriks RW, Dzierzak E. 2004. GATA-2 plays two functionally distinct roles during the
ontogeny of hematopoietic stem cells. Journal of Experimental Medicine 200(7):871872
DOI 10.1084/jem.20031556.
Liu P, Wang F. 2008. Inference of biochemical network models in S-system using multi-objective
optimization approach. Bioinformatics 24(8):10851092 DOI 10.1093/bioinformatics/btn075.
Lulli V, Romania P, Morsilli O, Gabbianelli M, Pagliuca A, Mazzeo S, Testa U, Peschle C,
Marziali G. 2006. Overexpression of Ets-1 in human hematopoietic progenitor cells blocks
erythroid and promotes megakaryocytic differentiation. Cell Death and Differentiation
13(7):10641074 DOI 10.1038/sj.cdd.4401811.
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 22/25
Maetschke SR, Madhamshettiwar PB, Davis MJ, Ragan MA. 2013. Supervised, semi-supervised
and unsupervised inference of gene regulatory networks. Briengs in Bioinformatics
15(2):195211 DOI 10.1093/bib/bbt034.
Mancini E, Sanjuan-Pla A, Luciani L, Moore S, Grover A, Zay A, Rasmussen KD, Luc S,
Bilbao D, OCarroll D, Jacobsen SE, Nerlov C. 2012. FOG-1 and GATA-1 act sequentially to
specify denitive megakaryocytic and erythroid progenitors. EMBO Journal 31(2):351365
DOI 10.1038/emboj.2011.390.
Masel J, Siegal ML. 2009. Robustness: mechanisms and consequences. Trends in Genetics
25(9):395403 DOI 10.1016/j.tig.2009.07.005.
May G, Soneji S, Tipping AJ, Teles J, McGowan SJ, Wu M, Guo Y, Fugazza C, Brown J,
Karlsson G, Pina C, Olariu V, Taylor S, Tenen DG, Peterson C, Enver T. 2013. Dynamic
analysis of gene expression and genome-wide transcription factor binding during lineage
specication of multipotent progenitors. Cell Stem Cell 13(6):754768
DOI 10.1016/j.stem.2013.09.003.
Mead P, Deconinck A, Huber T, Orkin S, Zon L. 2001. Primitive erythropoiesis in the Xenopus
embryo: the synergistic role of LMO-2, SCL and GATA-binding proteins. Development
128(12):23012308.
Meek C. 1995. Causal inference and causal explanation with background knowledge.
Uncertainty in Articial Intelligence 11:403410.
Meister A, Li YH, Choi B, Wong WH. 2013. Learning a nonlinear dynamical system model of
gene regulation: a perturbed steady-state approach. Annals of Applied Statistics 7(3):13111333
DOI 10.1214/13-AOAS645.
Moignard V, Macaulay IC, Swiers G, Buettner F, Schütte J, Calero-Nieto FJ, Kinston S, Joshi A,
Hannah R, Theis FJ, Jacobsen SE, De Bruijn M, Göttgens B. 2013. Characterization of
transcriptional networks in blood stem and progenitor cells using high-throughput single-cell
gene expression analysis. Nature Cell Biology 15(4):363372 DOI 10.1038/ncb2709.
Moignard V, Woodhouse S, Haghverdi L, Lilly AJ, Tanaka Y, Wilkinson AC, Buettner F,
Macaulay IC, Jawaid W, Diamanti E, Nishikawa S-I, Piterman N, Kouskoff V, Theis FJ,
Fisher J, Göttgens B. 2015. Decoding the regulatory network of early blood development from
single-cell gene expression measurements. Nature Biotechnology 33(3):269279
DOI 10.1038/nbt.3154.
Ng AP, Alexander WS. 2017. Haematopoietic stem cells: past, present and future. Cell Death &
Disease 3(17002):371380.
Noor A, Serpedin E, Nounou M, Nounou H, Mohamed N, Chouchane L. 2013. An overview of
the statistical methods used for inferring gene regulatory networks and proteinprotein
interaction networks. Advances in Bioinformatics 2013(6912):112 DOI 10.1155/2013/953814.
North TE, Stacy T, Matheny CJ, Speck NA, De Bruijn MF. 2004. Runx1 is expressed in adult
mouse hematopoietic stem cells and differentiating myeloid and lymphoid cells, but not in
maturing erythroid cells. Stem Cells 22(2):158168 DOI 10.1634/stemcells.22-2-158.
Olariu V, Peterson C. 2018. Kinetic models of hematopoietic differentiation. Wiley
Interdisciplinary Reviews: Systems Biology and Medicine 11(1):e1424 DOI 10.1002/wsbm.1424.
Orkin SH, Zon LI. 2008. Hematopoiesis: an evolving paradigm for stem cell biology. Cell
132(4):631644 DOI 10.1016/j.cell.2008.01.025.
Ottersbach K, Smith A, Wood A, Göttgens B. 2010. Ontogeny of haematopoiesis: recent advances
and open questions. British Journal of Haematology 148(3):345355
DOI 10.1111/j.1365-2141.2009.07953.x.
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 23/25
Porcher C, Swat W, Rockwell K, Fujiwara Y, Alt F, Orkin SH. 1996. The T cell leukemia
oncoprotein SCL/tal-1 Is essential for development of all hematopoietic lineages. Cell
86(1):4757 DOI 10.1016/S0092-8674(00)80076-8.
Real PJ, Ligero G, Ayllon V, Ramos-Mejia V, Bueno C, Gutierrez-Aranda I,
Navarro-Montero O, Lako M, Menendez P. 2012. SCL/TAL1 regulates hematopoietic
specication from human embryonic stem cells. Molecular Therapy 20(7):14431453
DOI 10.1038/mt.2012.49.
Samad HE, Khammash M, Petzold L, Gillespie D. 2005. Stochastic modelling of gene regulatory
networks. International Journal of Robust and Nonlinear Control 15(15):691711
DOI 10.1002/rnc.1018.
Shea MA, Ackers GK. 1985. The OR control system of bacteriophage lambda a physical-chemical
model for gene regulation. Journal of Molecular Biology 181(2):211230
DOI 10.1016/0022-2836(85)90086-5.
Shivdasani RA, Mayer EL, Orkin SH. 1995. Absence of blood formation in mice lacking the T-cell
leukaemia oncoprotein Tal1/SCL. Nature 373(6513):432434 DOI 10.1038/373432a0.
Soler E, Andrieu-Soler C, De Boer E, Bryne JC, Thongjuea S, Stadhouders R, Palstra1 R-J,
Stevens M, Kockx C, Van IJcken W, Hou J, Steinhoff C, Rijkers E, Lenhard B, Grosveld F.
2010. The genome-wide dynamics of the binding of Ldb1 complexes during erythroid
differentiation. Genes and Development 24(3):277289 DOI 10.1101/gad.551810.
Stewart J. 2018. Calculus, chapter innite sequences and series. Cengage learning. Available at
https://www.stewartcalculus.com/.
Stier S, Cheng T, Dombkowski D, Carlesso N, Scadden DT. 2002. Notch1 activation increases
hematopoietic stem cell self-renewal in vivo and favors lymphoid over myeloid lineage outcome.
Blood 99(7):23692378 DOI 10.1182/blood.V99.7.2369.
The Gene Ontology Consortium. 2017. Expansion of the gene ontology knowledgebase and
resources. Nucleic Acids Research 45(D1):D331D338.
Tian T. 2010. Stochastic models for inferring genetic regulation from microarray gene expression
data. Biosystems 99(3):192200 DOI 10.1016/j.biosystems.2009.11.002.
Tian T, Smith-Miles K. 2014. Mathematical modelling of GATA-switching for regulating the
differentiation of hematopoietic stem cell. BMC Systems Biology 8(Suppl. 1):S8
DOI 10.1186/1752-0509-8-S1-S8.
van der Meer LT, Jansen JH, van der Reijden BA. 2010. G1 and G1b: key regulators of
hematopoiesis. Leukemia 24(11):18341843 DOI 10.1038/leu.2010.195.
Visvader JE, Mao X, Fujiwara Y, Hahm K, Orkin SH. 1997. The LIM-domain binding protein
Ldb1 and its partner LMO2 act as negative regulators of erythroid differentiation. Proceedings of
the National Academy of Sciences of the United States of America 94(25):1370713712
DOI 10.1073/pnas.94.25.13707.
Wang J, Myklebost O, Hovig E. 2003. MGraph: graphical models for microarray data analysis.
Bioinformatics 19(17):22102211 DOI 10.1093/bioinformatics/btg298.
Wang J, Tian T. 2010. Quantitative model for inferring dynamic regulation of the tumour
suppressor gene P53. BMC Bioinformatics 11(36):45 DOI 10.1186/1471-2105-11-36.
Wang J, Wu Q, Hu XT, Tian T. 2016. An integrated approach to infer dynamic proteingene
interactions, a case study of the human P53 protein. Methods 110:313
DOI 10.1016/j.ymeth.2016.08.001.
Wei J, Hu X, Zou X, Tian T. 2017. Reverse-engineering of gene networks for regulating early
blood development from single-cell measurements. BMC Medical Genomics 10(S5):72
DOI 10.1186/s12920-017-0312-z.
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 24/25
Woods ML, Leon M, Perez-Carrasco R, Barnes CP. 2016. A statistical approach reveals designs
for the most robust stochastic gene oscillators. ACS Synthetic Biology 5(6):459470
DOI 10.1021/acssynbio.5b00179.
Wu S, Cui T, Tian T. 2018. Mathematical modelling of genetic network for regulating the fate
determination of hematopoietic stem cells. In: 2018 International Conference on Bioinformatics
and Biomedicine (BIBM). 21672173.
Xiong W, Ferrell JE. 2003. A positive-feedback-based bistable memory modulethat governs a cell
fate decision. Nature 426(6965):460465 DOI 10.1038/nature02089.
Xu Z, Huang S, Chang L-S, Agulnick A, Brandt S. 2003. Identication of a Tal1 target gene
reveals a positive role for the LIM domain-binding protein Ldb1 in erythroid gene expression
and differentiation. Molecular and Cellular Biology 23(21):75857599
DOI 10.1128/MCB.23.21.7585-7599.2003.
Yang B, Bao W. 2019. RNDEtree: regulatory network with differential equation based on exible
neural tree with novel criterion function. IEEE Access 7:5825558263
DOI 10.1109/ACCESS.2019.2913084.
Yang B, Bao W, Huang D-S, Chen Y. 2018. Inference of large-scale time-delayed gene regulatory
network with parallel mapreduce cloud platform. Scientic Reports 8(1):17787
DOI 10.1038/s41598-018-36180-y.
Ye F, Huang W, Guo G. 2017. Studying hematopoiesis using single-cell technologies.
Journal of Hematology & Oncology 10(1):27 DOI 10.1186/s13045-017-0401-7.
Zhang P, Zhang X, Iwama A, Yu C, Smith KA, Mueller BU, Narravula S, Torbett BE, Orkin SH,
Tenen DG. 2000. PU.1 inhibits GATA-1 function and erythroid differentiation by blocking
GATA-1 DNA binding. Blood 96(8):26412648 DOI 10.1182/blood.V96.8.2641.
Zhang X, Zhao J, Hao J-K, Zhao X-M, Chen L. 2015. Conditional mutual inclusive information
enables accurate quantication of associations in gene regulatory networks. Nucleic Acids
Research 43(5):e31 DOI 10.1093/nar/gku1315.
Zhang X, Zhao X-M, He K, Lu L, Cao Y, Liu J, Hao J-K, Liu Z-P, Chen L. 2012. Inferring gene
regulatory networks from gene expression data by path consistency algorithm based on
conditional mutual information. Bioinformatics 28(1):98104
DOI 10.1093/bioinformatics/btr626.
Zhang Y, Payne KJ, Zhu Y, Price MA, Parrish YK, Zielinska E, Barsky LW, Crooks GM. 2005.
SCL expression at critical points in human hematopoietic lineage commitment. Stem Cells
23(6):852860 DOI 10.1634/stemcells.2004-0260.
Zhao J, Zhou Y, Zhang X, Chen L. 2016. Part mutual information for quantifying direct
associations in networks. Proceedings of the National Academy of Sciences of the United States of
America 113(18):51305135 DOI 10.1073/pnas.1522586113.
Zhao W, Kitidis C, Fleming MD, Lodish HF, Ghaffari S. 2006. Erythropoietin stimulates
phosphorylation and activation of GATA-1 via the PI3-kinase/AKT signaling pathway. Blood
107(3):907915 DOI 10.1182/blood-2005-06-2516.
Wu et al. (2020), PeerJ, DOI 10.7717/peerj.9065 25/25
... • Chapter 4: This chapter presents a general approach with a new inference algorithm, Extended Forward Search Algorithm, and a new mathematical model to infer the genetic regulatory networks of genes and the protein heterodimers and/or synergistic effects [154]. ...
... The proposed algorithm is given below [154]. ...
... Here the Taylor series is a dynamic formula to approximate a function by using a polynomial function [126]. Thus, we proposed an ODE model (4.5) with the following functions [154] ...
Thesis
Full-text available
Mathematical modelling and inference methods are powerful tools to study genetic regulatory networks. This thesis focuses on the development of mathematical methods to explore and analyze the dynamical mechanisms of genetic regulation related to cell fate determination in hematopoiesis.
... In the past twenty years, many modeling approaches have been developed to infer GRN architectures using "omics" data [9,[13][14][15]. GRN inference models can be broadly categorized into three distinct categories, based on the algorithms and hypotheses they employ (see reviews: [16][17][18][19][20][21][22][23][24]): (i) data-driven static models, which do not simulate the biological processes such as transcription or translation, but hypothesize that interacting genes have correlated expression and use the correlations to infer GRN architecture [25,26]; (ii) discrete models, which simulate the time evolution of discrete variables that qualitatively describe the activity of genes [27,28]; and (iii) continuous dynamical models which simulate the dynamics of gene expression processes in a quantitative manner based a set of linear [29] or non-linear [30,31] ordinary differential equations (ODEs). ...
Article
Full-text available
Genetic regulatory networks (GRNs) regulate the flow of genetic information from the genome to expressed messenger RNAs (mRNAs) and thus are critical to controlling the phenotypic characteristics of cells. Numerous methods exist for profiling mRNA transcript levels and identifying protein-DNA binding interactions at the genome-wide scale. These enable researchers to determine the structure and output of transcriptional regulatory networks, but uncovering the complete structure and regulatory logic of GRNs remains a challenge. The field of GRN inference aims to meet this challenge using computational modeling to derive the structure and logic of GRNs from experimental data and to encode this knowledge in Boolean networks, Bayesian networks, ordinary differential equation (ODE) models, or other modeling frameworks. However, most existing models do not incorporate dynamic transcriptional data since it has historically been less widely available in comparison to “static” transcriptional data. We report the development of an evolutionary algorithm-based ODE modeling approach (named EA) that integrates kinetic transcription data and the theory of attractor matching to infer GRN architecture and regulatory logic. Our method outperformed six leading GRN inference methods, none of which incorporate kinetic transcriptional data, in predicting regulatory connections among TFs when applied to a small-scale engineered synthetic GRN in Saccharomyces cerevisiae. Moreover, we demonstrate the potential of our method to predict unknown transcriptional profiles that would be produced upon genetic perturbation of the GRN governing a two-state cellular phenotypic switch in Candida albicans. We established an iterative refinement strategy to facilitate candidate selection for experimentation; the experimental results in turn provide validation or improvement for the model. In this way, our GRN inference approach can expedite the development of a sophisticated mathematical model that can accurately describe the structure and dynamics of the in vivo GRN.
... To reduce the complexity, the hybrid approaches, which combine the correlation-based methods and model-based methods together, are used to infer the gene regulatory networks. The correlation-based methods are employed first to generate sparse networks that are the basis for the next step to use the model-based methods [39][40][41][42][43]. In recent years, there has been a trend to use machine learning techniques for developing genetic regulatory networks [44][45][46]. ...
Article
Full-text available
One of the key challenges in systems biology and molecular sciences is how to infer regulatory relationships between genes and proteins using high-throughout omics datasets. Although a wide range of methods have been designed to reverse engineer the regulatory networks, recent studies show that the inferred network may depend on the variable order in the dataset. In this work, we develop a new algorithm, called the statistical path-consistency algorithm (SPCA), to solve the problem of the dependence of variable order. This method generates a number of different variable orders using random samples, and then infers a network by using the path-consistent algorithm based on each variable order. We propose measures to determine the edge weights using the corresponding edge weights in the inferred networks, and choose the edges with the largest weights as the putative regulations between genes or proteins. The developed method is rigorously assessed by the six benchmark networks in DREAM challenges, the mitogen-activated protein (MAP) kinase pathway, and a cancer-specific gene regulatory network. The inferred networks are compared with those obtained by using two up-to-date inference methods. The accuracy of the inferred networks shows that the developed method is effective for discovering molecular regulatory systems.
... We will discuss Energy LEACH and multi-hop LEACH to achieve improved performance. A Genetic Algorithm (GA) is famous as an essential tool for optimizing complicated challenges based on the fitness principle of the gene (Lambora, et al. 2019;Wu et al., 2020). In addition to optimization, it also serves the purpose of machine learning and for research and development. ...
Article
Full-text available
A decentralized form represents a wireless network that facilitates the computers to direct communication without any router. The mobility of individual nodes is necessary within the restricted radio spectrum where contact is often possible on an Adhoc basis. The routing protocol must face the critical situation in these networks forwarding exploration between communicating nodes may create the latency problem in the future. The assault is one of the issues has direct impact network efficiency by disseminating false messages or altering routing detail. Hence, an enhanced routing approach proposes to defend against such challenges. The efficiency of the designated model of wireless devices relies on various output parameters to ensure the requirements. The high energy efficient algorithms: LEACH with FUZZY LOGIC, GENETIC, and FIREFLY are the most effective in optimizing scenarios. The firefly algorithm applies in a model of hybrid state logic with energy parameters: data percentage, transmission rate, and real-time application where the architecture methodology needs to incorporate the design requirements for the attacks within the specified network environment, which can affect energy and packet distribution under various system parametric circumstances. These representations can determine with the statistical linear congestion model in a wireless sensor network mixed state environment.
Preprint
Full-text available
Genetic regulatory networks (GRNs) regulate the flow of genetic information from the genome to expressed messenger RNAs (mRNAs) and thus are critical to controlling the phenotypic characteristics of cells. Numerous methods exist for profiling mRNA transcript levels and identifying protein-DNA binding interactions at the genome-wide scale. These enable researchers to determine the structure and output of transcriptional regulatory networks, but uncovering the complete structure and regulatory logic of GRNs remains a challenge. The field of GRN inference aims to meet this challenge using computational modeling to derive the structure and logic of GRNs from experimental data and to encode this knowledge in Boolean networks, Bayesian networks, ordinary differential equation (ODE) models, or other modeling frameworks. However, most existing models do not incorporate dynamic transcriptional data since it has historically been less widely available in comparison to “static” transcriptional data. We report the development of an evolutionary algorithm-based ODE modeling approach that integrates kinetic transcription data and the theory of attractor matching to infer GRN architecture and regulatory logic. Our method outperformed six leading GRN inference methods, none of which incorporate kinetic transcriptional data, in predicting regulatory connections among TFs when applied to a small-scale engineered synthetic GRN in Saccharomyces cerevisiae . Moreover, we demonstrate the potential of our method to predict unknown transcriptional profiles that would be produced upon genetic perturbation of the GRN governing a two-state cellular phenotypic switch in Candida albicans . We established an iterative refinement strategy to facilitate candidate selection for experimentation; the experimental results in turn provide validation or improvement for the model. In this way, our GRN inference approach can expedite the development of a sophisticated mathematical model that can accurately describe the structure and dynamics of the in vivo GRN. Author Summary The establishment of distinct transcriptional programs, where specific sets of genes are activated or repressed, is fundamental to all forms of life. Sequence-specific DNA-binding proteins, often referred to as regulatory transcription factors, form interconnected gene regulatory networks (GRNs) which underlie the establishment and maintenance of specific transcriptional programs. Since their discovery, many modeling approaches have sought to understand the structure and regulatory behaviors of these GRNs. The field of GRN inference uses experimental measurements of transcript abundance to predict how regulatory transcription factors interact with their downstream target genes to establish specific transcriptional programs. However, most prior approaches have been limited by the exclusive use of “static” or steady-state measurements. We have developed a unique approach which incorporates dynamic transcriptional data into a sophisticated ordinary differential equation model to infer GRN structures that give rise to distinct transcriptional programs. Our model not only outperforms six other leading models, it also is capable of accurately predicting how changes in GRN structure will impact the resulting transcriptional programs. These unique attributes of our model, combined with “real world” experimental validation of our model predictions, represent a significant advance in the field of gene regulatory network inference.
Article
Background Single-cell technologies provide unprecedented opportunities to study heterogeneity of molecular mechanisms. In particular, single-cell RNA-sequence data have been successfully used to infer gene regulatory networks with stochastic expressions. However, there are still substantial challenges in measuring the relationships between genes and selecting the important genetic regulations. Objective This prospective provides a brief review of effective methods for the inference of gene regulatory networks. Methods We concentrate on two types of inference methods, namely the model-free methods and mechanistic methods for constructing gene networks. Results For the model-free methods, we mainly discuss two issues, namely the measures for quantifying gene relationship and criteria for selecting significant connections between genes. The issue for mechanistic methods is different mathematical models to describe genetic regulations accurately. Conclusions We and advocates the development of ensemble methods that combine two or more methods together.
Article
Full-text available
Gene regulatory network (GRN) could provide guidance for understanding the internal laws of biological phenomena and analyzing several diseases. Ordinary differential equation model, which owns continuity and flexibility, has been utilized to identify gene regulatory network over the past decade. In this paper, we propose a novel algorithm, which is named RNDEtree, nonlinear ordinary differential equation model based on flexible neural tree to improve the accuracy of GRN reconstruction. In this model, flexible neural tree can be utilized to approximate the nonlinear regulation function of ordinary differential equation model. Multi expression programming is proposed to evolve the structure of flexible neural tree and brain storm optimization algorithm is utilized to optimize the parameters of RNDEtree model. In order to improve the false-positive ratio of this method, a novel fitness function is proposed, in which sparse and minimum redundancy maximum relevance (mRMR) terms are considered when optimizing RNDEtree. The performances of our proposed algorithm can be evaluated by the benchmark datasets from DREAM challenge and real biological dataset in Escherichia coli (E.coli). Experiment results demonstrate that the proposed method could infer more correctly gene regulatory network than other state-the-art methods.
Article
Full-text available
Abstract Inference of gene regulatory network (GRN) is crucial to understand intracellular physiological activity and function of biology. The identification of large-scale GRN has been a difficult and hot topic of system biology in recent years. In order to reduce the computation load for large-scale GRN identification, a parallel algorithm based on restricted gene expression programming (RGEP), namely MPRGEP, is proposed to infer instantaneous and time-delayed regulatory relationships between transcription factors and target genes. In MPRGEP, the structure and parameters of time-delayed S-system (TDSS) model are encoded into one chromosome. An original hybrid optimization approach based on genetic algorithm (GA) and gene expression programming (GEP) is proposed to optimize TDSS model with MapReduce framework. Time-delayed GRNs (TDGRN) with hundreds of genes are utilized to test the performance of MPRGEP. The experiment results reveal that MPRGEP could infer more accurately gene regulatory network than other state-of-art methods, and obtain the convincing speedup.
Article
Full-text available
As cell and molecular biology is becoming increasingly quantitative, there is an upsurge of interest in mechanistic modeling at different levels of resolution. Such models mostly concern kinetics and include gene and protein interactions as well as cell population dynamics. The final goal of these models is to provide experimental predictions, which is now taking on. However, even without matured predictions, kinetic models serve the purpose of compressing a plurality of experimental results into something that can empower the data interpretation, and importantly, suggesting new experiments by turning “knobs” in silico. Once formulated, kinetic models can be executed in terms of molecular rate equations for concentrations or by stochastic simulations when only a limited number of copies are involved. Developmental processes, in particular those of stem and progenitor cell commitments, are not only topical but also particularly suitable for kinetic modeling due to the finite number of key genes involved in cellular decisions. Stem and progenitor cell commitment processes have been subject to intense experimental studies over the last decade with some emphasis on embryonic and hematopoietic stem cells. Gene and protein interactions governing these processes can be modeled by binary Boolean rules or by continuous‐valued models with interactions set by binding strengths. Conceptual insights along with tested predictions have emerged from such kinetic models. Here we review kinetic modeling efforts applied to stem cell developmental systems with focus on hematopoiesis. We highlight the future challenges including multi‐scale models integrating cell dynamical and transcriptional models. This article is categorized under: • Models of Systems Properties and Processes > Mechanistic Models • Developmental Biology > Stem Cell Biology and Regeneration
Article
Full-text available
Background: Recent advances in omics technologies have raised great opportunities to study large-scale regulatory networks inside the cell. In addition, single-cell experiments have measured the gene and protein activities in a large number of cells under the same experimental conditions. However, a significant challenge in computational biology and bioinformatics is how to derive quantitative information from the single-cell observations and how to develop sophisticated mathematical models to describe the dynamic properties of regulatory networks using the derived quantitative information. Methods: This work designs an integrated approach to reverse-engineer gene networks for regulating early blood development based on singel-cell experimental observations. The wanderlust algorithm is initially used to develop the pseudo-trajectory for the activities of a number of genes. Since the gene expression data in the developed pseudo-trajectory show large fluctuations, we then use Gaussian process regression methods to smooth the gene express data in order to obtain pseudo-trajectories with much less fluctuations. The proposed integrated framework consists of both bioinformatics algorithms to reconstruct the regulatory network and mathematical models using differential equations to describe the dynamics of gene expression. Results: The developed approach is applied to study the network regulating early blood cell development. A graphic model is constructed for a regulatory network with forty genes and a dynamic model using differential equations is developed for a network of nine genes. Numerical results suggests that the proposed model is able to match experimental data very well. We also examine the networks with more regulatory relations and numerical results show that more regulations may exist. We test the possibility of auto-regulation but numerical simulations do not support the positive auto-regulation. In addition, robustness is used as an importantly additional criterion to select candidate networks. Conclusion: The research results in this work shows that the developed approach is an efficient and effective method to reverse-engineer gene networks using single-cell experimental observations.
Article
Full-text available
The success of marker-based approaches for dissecting haematopoiesis in mouse and human is reliant on the presence of well-defined cell surface markers specific for diverse progenitor populations. An inherent problem with this approach is that the presence of specific cell surface markers does not directly reflect the transcriptional state of a cell. Here, we used a marker-free approach to computationally reconstruct the blood lineage tree in zebrafish and order cells along their differentiation trajectory, based on their global transcriptional differences. Within the population of transcriptionally similar stem and progenitor cells, our analysis reveals considerable cell-to-cell differences in their probability to transition to another committed state. Once fate decision is executed, the suppression of transcription of ribosomal genes and upregulation of lineage-specific factors coordinately controls lineage differentiation. Evolutionary analysis further demonstrates that this haematopoietic programme is highly conserved between zebrafish and higher vertebrates.
Article
Full-text available
The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of-omics and related data. Our continued focus is on improving the quality and utility of the GO resources, and we welcome and encourage input from researchers in all areas of biology. In this update, we summarize the current contents of the GO knowledgebase, and present several new features and improvements that have been made to the ontology, the annotations and the tools. Among the highlights are 1) developments that facilitate access to, and application of, the GO knowledgebase, and 2) extensions to the resource as well as increasing support for descriptions of causal models of biological systems and network biology. To learn more, visit http://geneontology.org/.
Article
Full-text available
The discovery and characterisation of haematopoietic stem cells has required decades of research. The identification of adult bone marrow as a source of haematopoietic cells capable of protecting an organism from otherwise lethal irradiation led to the intense search for their identity and characteristics. Using functional assays along with evolving techniques for isolation of haematopoietic cells, haematopoietic stem cell populations were able to be enriched and their characteristics analysed. The key haematopoietic stem cell characteristics of pluripotentiality and the ability for self-renewal have emerged as characteristics of several haematopoietic stem cell populations, including those that have recently challenged the conventional concepts of the haematopoietic hierarchy. Human allogeneic stem cell therapy relies on these functional characteristics of haematopoietic stem cells that can be isolated from peripheral blood, bone marrow or cord blood, with the additional requirement that immunological barriers need to be overcome to allow sustained engraftment while minimising risk of graft-versus-host disease developing in the recipient of transplanted stem cells. Current and future research will continue to focus on the identification of haematopoietic stem cell regulators and methods for in vitro and in vivo stem cell manipulation, including genome editing, to expand the scope, potential and safety of therapy using haematopoietic stem cells.
Article
The lineage-specific transcription factors GATA-1 and PU.1 can physically interact to inhibit each other's function, but the mechanism of repression of GATA-1 function by PU.1 has not been elucidated. Both the N terminus and the C terminus of PU.1 can physically interact with the C-terminal zinc finger of GATA-1. It is demonstrated that the PU.1 N terminus, but not the C terminus, is required for inhibiting GATA-1 function. Induced overexpression of PU.1 in K562 erythroleukemia cells blocks hemin-induced erythroid differentiation. In this system, PU.1 does not affect the expression of GATA-1 messenger RNA, protein, or nuclear localization. However, GATA-1 DNA binding decreases dramatically. By means of electrophoretic mobility shift assays with purified proteins, it is demonstrated that the N-terminal 70 amino acids of PU.1 can specifically block GATA-1 DNA binding. In addition, PU.1 had a similar effect in the G1ER cell line, in which the GATA-1 null erythroid cell line G1E has been transduced with a GATA-1–estrogen receptor fusion gene, which is directly dependent on induction of the GATA-1 fusion protein to effect erythroid maturation. Consistent with in vitro binding assays, overexpression of PU.1 blocked DNA binding of the GATA-1 fusion protein as well as GATA-1–mediated erythroid differentiation of these G1ER cells. These results demonstrate a novel mechanism by which function of a lineage-specific transcription factor is inhibited by another lineage-restricted factor through direct protein–protein interactions. These findings contribute to understanding how protein–protein interactions participate in hematopoietic differentiation and leukemogenesis.
Article
Adult blood contains a mixture of mature cell types, each with specialized functions. Single hematopoietic stem cells (HSCs) have been functionally shown to generate all mature cell types for the lifetime of the organism. Differentiation of HSCs toward alternative lineages must be balanced at the population level by the fate decisions made by individual cells. Transcription factors play a key role in regulating these decisions and operate within organized regulatory programs that can be modeled as transcriptional regulatory networks. As dysregulation of single HSC fate decisions is linked to fatal malignancies such as leukemia, it is important to understand how these decisions are controlled on a cell-by-cell basis. Here we developed and applied a network inference method, exploiting the ability to infer dynamic information from single-cell snapshot expression data based on expression profiles of 48 genes in 2,167 blood stem and progenitor cells. This approach allowed us to infer transcriptional regulatory network models that recapitulated differentiation of HSCs into progenitor cell types, focusing on trajectories toward megakaryocyte-erythrocyte progenitors and lymphoid-primed multipotent progenitors. By comparing these two models, we identified and subsequently experimentally validated a difference in the regulation of nuclear factor, erythroid 2 (Nfe2) and core-binding factor, runt domain, alpha subunit 2, translocated to, 3 homolog (Cbfa2t3h) by the transcription factor Gata2. Our approach confirms known aspects of hematopoiesis, provides hypotheses about regulation of HSC differentiation, and is widely applicable to other hierarchical biological systems to uncover regulatory relationships.