An estimation method for inference of gene regulatory network using Bayesian network with uniting of partial problems.
ABSTRACT Bayesian networks (BNs) have been widely used to estimate gene regulatory networks. Many BN methods have been developed to estimate networks from microarray data. However, two serious problems reduce the effectiveness of current BN methods. The first problem is that BNbased methods require huge computational time to estimate largescale networks. The second is that the estimated network cannot have cyclic structures, even if the actual network has such structures.
In this paper, we present a novel BNbased deterministic method with reduced computational time that allows cyclic structures. Our approach generates all the combinational triplets of genes, estimates networks of the triplets by BN, and unites the networks into a single network containing all genes. This method decreases the search space of predicting gene regulatory networks without degrading the solution accuracy compared with the greedy hill climbing (GHC) method. The order of computational time is the cube of number of genes. In addition, the network estimated by our method can include cyclic structures.
We verified the effectiveness of the proposed method for all known gene regulatory networks and their expression profiles. The results demonstrate that this approach can predict regulatory networks with reduced computational time without degrading the solution accuracy compared with the GHC method.

Article: LegumeGRN: A Gene Regulatory Network Prediction Server for Functional and Comparative Studies.
Mingyi Wang, Jerome Verdier, Vagner A Benedito, Yuhong Tang, Jeremy D Murray, Yinbing Ge, Jörg D Becker, Helena Carvalho, Christian Rogers, Michael Udvardi, Ji He[Show abstract] [Hide abstract]
ABSTRACT: Building accurate gene regulatory networks (GRNs) from highthroughput gene expression data is a longstanding challenge. However, with the emergence of new algorithms combined with the increase of transcriptomic data availability, it is now reachable. To help biologists to investigate gene regulatory relationships, we developed a webbased computational service to build, analyze and visualize GRNs that govern various biological processes. The web server is preloaded with all available Affymetrix GeneChipbased transcriptomic and annotation data from the three model legume species, i.e., Medicago truncatula, Lotus japonicus and Glycine max. Users can also upload their own transcriptomic and transcription factor datasets from any other species/organisms to analyze their inhouse experiments. Users are able to select which experiments, genes and algorithms they will consider to perform their GRN analysis. To achieve this flexibility and improve prediction performance, we have implemented multiple mainstream GRN prediction algorithms including coexpression, Graphical Gaussian Models (GGMs), Context Likelihood of Relatedness (CLR), and parallelized versions of TIGRESS and GENIE3. Besides these existing algorithms, we also proposed a parallel Bayesian network learning algorithm, which can infer causal relationships (i.e., directionality of interaction) and scale up to several thousands of genes. Moreover, this web server also provides tools to allow integrative and comparative analysis between predicted GRNs obtained from different algorithms or experiments, as well as comparisons between legume species. The web site is available at http://legumegrn.noble.org.PLoS ONE 07/2013; 8(7):e67434. · 3.53 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Techniques in molecular biology have permitted the gathering of an extremely large amount of information relating organisms and their genes. The current challenge is assigning a putative function to thousands of genes that have been detected in different organisms. One of the most informative types of genomic data to achieve a better knowledge of protein function is gene expression data. Based on gene expression data and assuming that genes involved in the same function should have a similar or correlated expression pattern, a function can be attributed to those genes with unknown functions when they appear to be linked in a gene coexpression network (GCN). Several tools for the construction of GCNs have been proposed and applied to plant gene expression data. Here, we review recent methodologies used for plant gene expression data and compare the results, advantages and disadvantages in order to help researchers in their choice of a method for the construction of GCNs.Briefings in functional genomics 02/2013; · 3.43 Impact Factor
Page 1
PROCEEDINGSOpen Access
An estimation method for inference of gene
regulatory network using Bayesian network with
uniting of partial problems
Yukito Watanabe*, Shigeto Seno, Yoichi Takenaka, Hideo Matsuda
From The Tenth Asia Pacific Bioinformatics Conference (APBC 2012)
Melbourne, Australia. 1719 January 2012
Abstract
Background: Bayesian networks (BNs) have been widely used to estimate gene regulatory networks. Many BN
methods have been developed to estimate networks from microarray data. However, two serious problems reduce
the effectiveness of current BN methods. The first problem is that BNbased methods require huge computational
time to estimate largescale networks. The second is that the estimated network cannot have cyclic structures,
even if the actual network has such structures.
Results: In this paper, we present a novel BNbased deterministic method with reduced computational time that
allows cyclic structures. Our approach generates all the combinational triplets of genes, estimates networks of the
triplets by BN, and unites the networks into a single network containing all genes. This method decreases the
search space of predicting gene regulatory networks without degrading the solution accuracy compared with the
greedy hill climbing (GHC) method. The order of computational time is the cube of number of genes. In addition,
the network estimated by our method can include cyclic structures.
Conclusions: We verified the effectiveness of the proposed method for all known gene regulatory networks and
their expression profiles. The results demonstrate that this approach can predict regulatory networks with reduced
computational time without degrading the solution accuracy compared with the GHC method.
Background
Finding gene regulations is an important objective of
systems biology [1,2]. Causal gene regulatory interac
tions are widely described using gene regulatory net
works. Estimating gene regulatory networks can help
reveal complicated regulations.
Recently, microarray [3,4] has rapidly produced a
wealth of information about gene expression activities.
The volume of data necessitates computational methods
to identify and analyze the underlying gene regulatory
networks [5]. A number of analytical methods have
been proposed to estimate gene regulatory networks
from gene expression profiles. Boolean networks, graphi
cal Gaussian models (GGM), differential equation
models, and Bayesian networks (BNs) are widely used
models.
A Boolean network is a discrete dynamical network
[6,7]. In a Boolean network, the state of a gene is repre
sented by a Boolean variable (ON or OFF) and interac
tions between the genes are represented by Boolean
functions that determine the state of a gene on the basis
of the states of certain other genes. Hence, continuous
gene expression data must be transformed into binary
data before a Boolean network can be estimated, and
much information is lost in this binary encoding. As
gene expression cannot be described adequately by only
two states, Boolean networks are limited by their
definition.
A GGM is an undirected probabilistic graphical model
[8]. This model allows the identification of conditional
independence relations among the nodes under the
* Correspondence: wyukito@ist.osakau.ac.jp
Department of Bioinformatic Engineering, Graduate School of Information
Science and Technology, Osaka University, Osaka, Japan
Watanabe et al. BMC Genomics 2012, 13(Suppl 1):S12
http://www.biomedcentral.com/14712164/13/S1/S12
© 2012 Watanabe et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Page 2
assumption of a multivariate Gaussian distribution of
the data. In a GGM, regulations between genes are esti
mated by calculating the correlation between pairs of
variables. Therefore, the GGM does not identify the
direction of regulatory relationships between two genes,
but rather only calculates the correlations between their
gene expression data.
A differential equation model describes gene expres
sion changes as a function of the expression of other
genes and environmental factors [911]. Their flexibility
allows the complex relations among components to be
described. In a differential equation model, a gene regu
lation is described as the function of several gene
expression levels. When the input data includes experi
mental noise, this model cannot estimate the gene regu
latory network accurately. Also, if there is not sufficient
data input, overfitting occurs.
BN is a graphical model for representing probabilistic
relationships among a set of random variables [1216].
These relationships are encoded in the structure of a
directed acyclic graph whose nodes are the random vari
ables. The relationships between the variables are
described by a joint probability distribution. In a BN,
causal interactions between more than three genes can
be estimated. BN has advantages over the above models
in applications where BN deals better with the experi
mental noise.
Using a BN, it is hard to estimate a largescale net
work because the search space grows exponentially as
the number of genes increases. Therefore, overcoming
this problem has been the focus of much research. The
proposed solutions to this problem can be divided into
three types. The first type limits the number of esti
mated genes. Even when estimating a largescale net
work, part of the network is often attracted. The second
type parallelizes the estimation by supercomputer or
other highperformance computer. Effective parallelizing
makes it possible to estimate largescale networks. The
third type improve the algorithm itself. These methods
reduce computational time and estimate the network by
a heuristic.
An example of the first type of solution is proposed by
Peña et al. [17]. This method overcomes the problem of
the user having to decide in advance which genes are
included in or excluded from the learning process. The
method receives a seed gene S and a positive integer R
from the user, and returns a BN. It starts the BN from S
genes, then adds the parents and children of all the
genes in the BN R + 1 times, and prunes some genes. In
this way, the user avoids deciding in advance which
genes to include.
A solution of the second type proposed by Tamada et
al. [18] can estimate gene regulatory networks consist
ing of more than 20,000 genes from gene expression
data. The method uses a supercomputer, and it is mas
sively parallelized. It repeatedly estimates subnetworks
by hill climbing in parallel for genes selected by neigh
bor node sampling. The method highhandedly over
comesthe problemof
supercomputer. Even if a supercomputer can effectively
provide a largescale network, an estimation method
designed to run on a workstation is also required.
A solution of the third type for estimating gene regu
latory networks was implemented by Bøttcher et al.
[19]: the greedy hill climbing (GHC) method. By com
paring networks that differ only by a single directed
edge, either added, removed, or reversed, a GHC
method can estimate networks of larger scale than a
search of all possible networks and do so on a worksta
tion rather than a supercomputer, thus overcoming two
problems at once. However, the estimation accuracy of
this method is not high, because the method tends to
produce only local optimal solutions.
In this paper, we present a novel BNbased determi
nistic method with reduced computational time to over
come the abovementioned problems. The proposed
method can estimate a network as largescale as those
estimated by the GHC method, run on a workstation,
and estimate more accurately than the GHC method.
We take another approach to estimate more accurately
than the GHC method. First, our method generates all
the combinational subsets with three genes. Then, we
estimate all possible networks for each subset using the
BN method and unite the networks into a single net
work including all genes. This approach enables us to
estimate more accurately for the same computational
time than the GHC method.
In order to verify the effectiveness of the proposed
method, we perform two experiments, to evaluate scal
ability and accuracy: i.e., one to verify the proposed
method can estimate networks as largescale as those
estimated by the GHC method, and one to verify it can
estimate more accurately than the GHC method. These
experiments are performed using randomly sampled
genes. In addition, we conduct a third experiment to
confirm that our method outperforms the GHC method
using real data.
theBNbyusing the
Results
Bayesian networks
Let D = (V, E) be a directed acyclic graph (DAG), where
V is a finite set of nodes and E is a finite set of directed
edges between the nodes [19]. The DAG defines the
structure of the BN.
Each node v Î V in the graph corresponds to a ran
dom variable xv. The set of variables associated with the
graph D is then X = {xv}. Often we do not distinguish
between a variable xvand the corresponding node v. To
Watanabe et al. BMC Genomics 2012, 13(Suppl 1):S12
http://www.biomedcentral.com/14712164/13/S1/S12
Page 2 of 9
Page 3
each node v with parents pa(v), a local probability distri
bution, p(xvxpa(v)), is attached. The set of local probabil
ity distributions for all variables in the network is P. A
BN for a set of random variables X is the pair (D,P).
Directed edges in D encode conditional dependencies
between the random variables X through the factoriza
tion of the joint probability distribution.
?
As a measure of how well a DAG D represents the
conditional dependencies between the random variables,
we use the relative probability
p(x) =
v∈V
p?xvxpa(v)
?.
(1)
p(D,d) = p(dD)p(D),
and refer to it as a network score, where d is data and
p(dD) is called the likelihood of D.
The log network score contribution of a node is evalu
ated whenever the node is learned. The log network
score N(D) is given by
(2)
N(D) = logp(D,d).
(3)
The number of possible DAGs grows exponentially
with the number of nodes, and the problem of identify
ing the network with the highest score is NPhard. If
the number of random variables in a network is large, it
is not computationally possible to calculate the network
score for all possible DAGs. For these situations, the
search strategy GHC method is implemented.
The GHC method is as follows.
1. Select an initial DAG D0randomly from which to
start the search.
2. Calculate the Bayes scores of D0and all possible
networks that differ by only one directed edge, that
is, an edge is added to D0, an edge in D0is deleted,
or the direction of an edge in D0is reversed.
3. Among all these networks, select the one that
increases the Bayes score the most.
4. If the Bayes score was not improved, stop the
search. Otherwise, make the select network D0and
repeat from step 2.
In the GHC method, we can limit the maximum num
ber of these steps in the search algorithm. Also, the
search algorithm can restart an arbitrary number of
times. More details on the parameter setting will be
described later in this paper.
Methods
We propose a new method to estimate a gene regulatory
network with reduced computational time. The pro
posed method is composed of three steps: dividing the
whole problem into partial problems, estimating gene
regulatory networks of partial problems, and uniting the
estimated networks. In this section, we describe our
BNbased method using the analysis of a set of expres
sion data as an example. This example includes five
genes V = {vi1 ≤ i ≤ 5}. A conceptual representation of
our approach is presented in Figure 1. We call a search
of all possible networks an exhaustive search to distin
guish it from the GHC method.
Step 1: Dividing the whole problem into partial problems
Our approach first divides the set of all genes V into all
the combinational subset with three genes (triplets) t =
{vi, vj, vk Î V1 ≤ i <j <k ≤ 5}. For example, our
approach obtains5C3= 10 partial problems {v1, v2, v3}.
{v1, v2, v4}, ..., {v3, v4, v5}.
Step 2: Estimating gene regulatory networks
After making partial problems, we next calculate inde
pendently the scores of all the possible networks of each
partial problem by exhaustive search and obtain esti
mated DAGs G. The number of possible alternative net
works for a triplet {v1, v2, v3} is 33= 27 because there
are three cases for each potential edge (vi, vj) (1 ≤ i <j ≤
3): a directed edge from vito vj, a directed edge from vj
to vi, and no edge.
Let c = (D, SD, RD) be a tuple, where D Î G is a DAG,
SD= p(D, d) is a score of D, where p(D, d) is given by
Equation 2, and RDis a rank of D.
We add tuples of all the partial problems to Z, where
Z is a set of c. For example, when we have 10 partial
problems {v1, v2, v3}.{v1, v2, v4}, ... , {v3, v4, v5}, we add
270 tuples of networks to Z.
Step 3: Uniting estimated partial problems
To solve the original problem, this step unites three
gene networks into a single gene regulatory network.
The policy of the step is to classify relationships
between genes, i.e., determine (vi, vj) (1 ≤ i <j ≤ 3) into
one of the three edge types (a directed edge from vito
vj, a directed edge from vjto vi, or no edge between vi
and vj) according to the score calculated in Step 2.
To select an edge type between genes viand vj, we
calculate an edge (vi, vj) value for each of the three
types t using the following:
?
where D has edge (vi, vj). Then we select one edge
type that has the highest total value.
When two or more edge types have the highest total
value, we use edge scores of the partial problems whose
ranks are 2 or more.
Algorithm
Input: V = V1, ..., Vn: a set of genes, GEP: gene expres
sion profiles of V
(D,SD,1)∈Z
SD,
(4)
Watanabe et al. BMC Genomics 2012, 13(Suppl 1):S12
http://www.biomedcentral.com/14712164/13/S1/S12
Page 3 of 9
Page 4
Output: GV: DAG including genes V
Variable:Z: a set of tuples (graph, score, rank)
1: Make a collection of set V that includes all the sub
sets of V with three elements
21: for each U in V do
22: Make a collection of set Duthat includes all the
DAGs of U
23:for each D in Dudo
24: calculate rank RDand score SDwith GEP
25:add (D, SD, RD) to Z
26: end for
27: end for
31: i ¬ 1
32: repeat
33:for each edge between genes (x, y) in D of (D,
SD, i) do
34: add all SDof (D, SD, i) for each of the three
edge types
35: if one edge type has the highest total SDthen
36: add an edge between genes (x, y) to GV
37: end if
38:
total SDthen
39:
GV, where w is a gene ≠ x, y do
310:
of (D, SD, i), where D includes genes x, y, and w.
311:end for
312: add edge (x, y) selected in (310) with the
highest SDto GV
313:end if
314:end for
315:i¬i+1
316: until directions of all edges in GVare assigned
317: return GV
A flowchart of the algorithm can be found in Figure 2.
if two or more edge types have the highest
for each edge between genes (x or y, w) in
select edge between genes (x, y) from D
Computational experiments
To verify the effectiveness of the proposed method, we
performed three experiments. The first experiment
determines computational time for different numbers of
genes. The purpose of this experiment is to verify that
Figure 1 Conceptual representation of our approach. Yellow circles represent genes. Blue circles represent partial problems. Small directed
edges represent regulatory relationships between genes. Large directed edges represent the flow of the method.
Watanabe et al. BMC Genomics 2012, 13(Suppl 1):S12
http://www.biomedcentral.com/14712164/13/S1/S12
Page 4 of 9
Page 5
the proposed method is able to estimate gene regulatory
networks that are as largescale as those estimated by
the GHC method. The second experiment demonstrates
that the proposed method is more accurate than the
GHC method. The third experiment shows, through an
example, that our algorithm works well for inferring real
gene regulatory networks. We estimate the networks,
including the known gene regulatory network, and com
pare the network estimated by the proposed method
and that by the GHC method.
Implementation, system, and materials
Steps 1 and 2 are implemented using the deal package
version 1.233 written in R. We use R 2.10.1. Step 3 is
implemented using Perl 5.10.1.
The GHC method is implemented in the deal package
version 1.233. In these experiments, the maximum
number of actions, i.e., adding, deleting, or reversing a
directed edge, is set at 50 and the number of restarts is
set at 0. We call these parameters the default parameter
set.
We performed all the experiments on a computer with
Intel Core2 Duo 6600 CPU 2.40 GHz processors with
3.0 GB memory. The operation system is Ubuntu 10.04.
We used a dataset of two timeseries gene expression
profiles including 45102 genes from a mouse adipocyte
and osteoblast. The number of time points is 62.
Experiment 1 We verified that the proposed method
can estimate gene regulatory networks as largescale as
Figure 2 Flowchart of the algorithm. Circles represent start and end points. Rectangles represent generic processing steps. Diamonds
represent decision steps.
Watanabe et al. BMC Genomics 2012, 13(Suppl 1):S12
http://www.biomedcentral.com/14712164/13/S1/S12
Page 5 of 9
Page 6
those estimated by the GHC method. We used the pro
posed method, an exhaustive search, and the GHC
method, and compared the estimation time for from 3
to 70 genes. In this experiment, we selected genes from
the gene expression profile from a mouse adipocyte by
random sampling. We ran this process 50 times and cal
culated the mean estimation time. The results are sum
marized in Figure 3.
In Figure 3, the horizontal axis corresponds to the
number of genes and the vertical axis corresponds to
the logarithm of the estimation time. The proposed
method was able to estimate the network including 70
genes, and the estimation times were almost the same
as those of the GHC method. The estimation time of
the proposed method was shorter than that of the GHC
method for 40 or more genes. The estimation time of
the proposed method was longer than that of the GHC
method for 15 or fewer genes. The estimation time of
the exhaustive search was very large by 5 genes.
Experiment 2 We verified that the estimation accuracy
of the proposed method is higher than that of the GHC
method for nearly identical estimation times. We com
pared the estimation results of the exhaustive search
with the results of the proposed method and the GHC
method. In this experiment, we selected five genes ran
domly from the gene expression profile 100 times from
a mouse adipocyte and osteoblast. We estimated the
network of these five genes by the proposed method
and the GHC method. There are 59049 DAGs for five
genes, and all the DAGs are ranked by the scores of the
exhaustive search. The ranking was used to evaluate the
networks estimated by the proposed method and the
GHC method. The results are listed in Figure 4.
The two bar charts in Figure 4 show the ranks of 100
networks estimated by the proposed method and the
GHC method. The left bar chart is the results for adipo
cyte, and the right are those for osteoblast. The
correspondence count is the number of times that the
network estimated by the proposed method or the GHC
method corresponded with the network of the exhaus
tive search. The ranking in the exhaustive search is the
ranking of the networks estimated by the exhaustive
search. The networks are ranked by the scores of the
exhaustive search. As there are 59049 DAGs for five
nodes, the ranks are from 1st to 59049th.
The correspondence count of the proposed method
from the 1st to 10th networks of the exhaustive search
exceeded 50. For the correspondence count from the
30001th to the 59049th network of the exhaustive
search, the GHC method exceeded 50 and the proposed
method was less than 10.
Experiment 3 We used a known gene regulatory net
work and verified that the proposed method can esti
mate more accurately than the GHC method with the
same or less computational time. We compared the reg
ulations estimated by the proposed method with those
of the GHC method. In this experiment, we used 40
genes from the gene expression profile from a mouse
adipocyte. Of these, 7 genes are Pparg and the genes
that regulate or are regulated by Pparg in adipocyte.
These are shown in Figure 5(a). The remaining 33 genes
were selected by random sampling. The results and
known networks are shown in Figure 5. In this experi
ment, we used two parameter sets for the GHC method.
One is the default parameter set. In the other parameter
set, the maximum number of actions is 100 and the
number of restarts is 10, which will return a better net
work but requires about 20fold longer computational
time than the default.
In Figure 5, results of the default and other parameter
set are shown as networks (b) and (c), respectively. We
call (c) the network estimated by the highly accurate
GHC method in this experiment. Network (d) is esti
mated by the proposed method. The edges in networks
Figure 3 Comparison of the estimation time. The estimation time of the exhaustive search, the GHC method, and the proposed method.
Watanabe et al. BMC Genomics 2012, 13(Suppl 1):S12
http://www.biomedcentral.com/14712164/13/S1/S12
Page 6 of 9
Page 7
(b), (c), and (d) are categorized according to the edges of
network (a). The red edges are also in network (a), the
blue edges have a different direction from those in net
work (a), and the black edges have no relationship in
network (a).
Figure 5 shows that the proposed method was able to
estimate more correctly than the GHC method. The
sensitivity and selectivity of the proposed method were
33% and 30%, those of the GHC method were 0% and
0%, and those of the high accurate GHC method were
11% and 14%. Networks (b), (c), and (d) have many
edges that the known gene regulatory network does not
have, but these edges describe indirect regulations. For
example, in Figure 5(d), there is a black edge from C/
EBPa to Stat1. The edge describes the indirect regula
tion from C/EBPa to Stat1 via Pparg because there are
edges from C/EBPa to Pparg and from Pparg to Stat1
in Figure 5(a).
Discussion
The GHC method tends to produce local optimal solu
tions. For example, in Figure 4, the results of the GHC
method have two peaks, corresponding to the classes of
110 and 3000159049. We cannot completely avoid
selecting a local optimal solution when using the GHC
method, because the solution accuracy depends on the
initial DAG from which the search is started. To obtain
the best network when using the GHC method, the esti
mation must be repeated using different initial DAGs.
In contrast, the proposed method can produce one
result as the best network.
The results of our experiments indicate that dividing
the set of all genes and uniting the network results can
estimate more accurately than the GHC method. With
the GHC method, the maximum number of actions, i.e.,
adding, deleting, or reversing a directed edge, and the
number of restarts can be adjusted. If these parameters
are increased as much as possible, the estimation accu
racy can be made comparable to that of the exhaustive
search. However, this would spoil the advantage of the
GHC method that it can estimate with high speed. The
GHC method selects the action that increases the net
work score the most; therefore, a regulation that
increases the network score only slightly is rarely
selected. In this sense, the search of the GHC method is
considerably biased. This aspect becomes pronounced
when the limiting parameters are set strictly. With the
proposed method, regulations that have a positive effect
will be selected independently of whether that effect is
slight or strong. For example, in Figure 5, the regulatory
relationship between Pparg and C/EBPb could not be
estimated by the GHC method, even if the parameters
of the restart and the actions were significantly
increased.
We verified that the proposed method can estimate
networks as largescale as those estimated using the
GHC method. We spend at most 0.1 second to estimate
the network of one partial problem with three genes
Figure 4 Comparison of the estimated network. Frequency that the networks estimated by the GHC method and the proposed method
correspond to those of the exhaustive search (from 1 to 59049).
Watanabe et al. BMC Genomics 2012, 13(Suppl 1):S12
http://www.biomedcentral.com/14712164/13/S1/S12
Page 7 of 9
Page 8
and repeat the estimationnC3times in the proposed
method. Therefore, the proposed method can estimate
the network with a low amount of memory compared
with the GHC method, which, like the exhaustive
search, requires much memory. When we estimate a
network for a data set from a large number of genes
using the GHC method, it is easy to run out of memory,
making the actual computational time longer than the
theoretical time.
Conclusions
In this study, we present a novel BNbased deterministic
method with reduced computational time. We con
firmed experimentally that the proposed method can
reduce the computational time drastically without
degrading the solution accuracy. The proposed method
can estimate networks as largescale as those estimated
by the GHC method. Furthermore, the proposed
method can estimate more accurately than the GHC
method, even if the computational time of the GHC
method is increased to more than 20 times that of the
proposed method.
Acknowledgements
This work was partially supported by GrantinAid for Scientific Research
(22680023 and 22310125) from the Japan Society for the Promotion of
Science (JSPS), and by the HPCI STRATEGIC PROGRAM Computational Life
Science and Application in Drug Discovery and Medical Development from
Figure 5 Comparison of the network including Pparg and genes that regulate or are regulated by Pparg. (a) is the known gene
regulatory network. (b) is the network estimated by the GHC method with the maximum number of actions set at 50 and the number of
restarts set at 0. (c) is the network estimated by the GHC method with the maximum number of actions set at 100 and the number of restarts
set at 10. (d) is the network estimated by the proposed method. Blue circles represent genes. Red edges indicate edges also in network (a), blue
edges indicate edges with a different direction from those in network (a), and black edges indicate that there are no such relationships in
network (a).
Watanabe et al. BMC Genomics 2012, 13(Suppl 1):S12
http://www.biomedcentral.com/14712164/13/S1/S12
Page 8 of 9
Page 9
the Ministry of Education, Culture, Sports, Science and Technology of Japan
(MEXT).
This article has been published as part of BMC Genomics Volume 13
Supplement 1, 2012: Selected articles from the Tenth Asia Pacific
Bioinformatics Conference (APBC 2012). The full contents of the supplement
are available online at http://www.biomedcentral.com/14712164/13?
issue=S1.
Authors’ contributions
YW implemented the algorithm and performed the analyses. YW, SS, YT, and
HM conceived and designed the experiments and wrote the paper.
Competing interests
The authors declare that they have no competing interests.
Published: 17 January 2012
References
1. In Inference of Genetic Regulatory Networks from Time Series Gene Expression
Data. International Joint Conference on Neural Networks;Xu R, Hu X,
Wunsch DC 2004:.
2. Schlitt T, Brazma A: Current approaches to gene regulatory network
modelling. BMC Bioinformatics 2007, 8(Suppl 6):S9.
3. DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control
of gene expression on a genomic scale. Science 1997, 278(5338):680686.
4.Spellman P, Sherlock G, Zhang M, Iyer V, Anders K, Eisen M, Brown P,
Botstein D, Futcher B: Comprehensive identification of cell cycle
regulated genes of the yeast Saccharomyces cerevisiae by microarray
hybridization. Mol Cell Biol 1998, 9(12):32733297.
5.Kitano H: Systems biology: a brief overview. Science 2002, 295:16621664.
6. Xiao Y: A tutorial on analysis and simulation of boolean gene regulatory
network models. Curr Genomics 2009, 10(7):511525.
7.Kim H, Lee J, Park T: Boolean networks using the chisquare test for
inferring largescale gene regulatory networks. BMC Bioinformatics 2007,
8:837.
8. Toh H, Horimoto K: Inference of a genetic network by a combined
approach of cluster analysis and graphical gaussian modeling.
Bioinformatics 2002, 18(2):287297.
9. Savageau MA: Biochemical Systems Analysis: A Study of Function and Design
in Molecular Biology AddisonWesley Educational Publishers Inc; 1976.
10.Chen T, He HL, Church GM: Modeling gene expression with differential
equations. Pac Symp Biocomput 1999, 2940.
11. Iba H, Mimura A: Inference of a gene regulatory network by means of
interactive evolutionary computing. Proc of Fourth Conference on
Computational Biology and Genome Informatics 2002.
12.Heckerman D: A Tutorial on Learning with Bayesian Networks Microsoft
Research; 1996.
13.Bottcher SG: Learning Bayesian Networks with Mixed Variables Department of
Mathematical Sciences; 2004.
14.Friedman N, Linial M, Nachman I, Pe’er D: Using Bayesian networks to
analyze expression data. J Comput Biol 2000, 7:601620.
15.Pe’er D, Regev A, Elidan G, Friedman N: Inferring subnetworks from
perturbed expression profiles. Bioinformatics 2001, 17(Suppl 1):S215S224.
16. Kim S, Imoto S, Miyano S: Dynamic Bayesian network and nonparametric
regression for nonlinear modeling of gene networks from time series
gene expression data. Biosystems 2004, 75:5765.
17.Pena JM, Bjorkegren J, Tegner J: Growing Bayesian network models of
gene networks from seed genes. Bioinformatics 2005, 21:ii224ii229.
18.Tamada Y, Imoto S, Araki H, Nagasaki M, Print C, CharnockJones DS,
Miyano S: Estimating genemowide gene networks using nonparametric
Bayesian network models on massively parallel computers. IEEE/ACM
Trans Comput Biol Bioinform 2011, 8(3):683697.
19. Bottcher SG, Dethlefsen C: deal: A Package for Learning Bayesian Networks J
Stat Softw; 2003.
doi:10.1186/1471216413S1S12
Cite this article as: Watanabe et al.: An estimation method for inference
of gene regulatory network using Bayesian network with uniting of
partial problems. BMC Genomics 2012, 13(Suppl 1):S12.
Submit your next manuscript to BioMed Central
and take full advantage of:
• Convenient online submission
• Thorough peer review
• No space constraints or color figure charges
• Immediate publication on acceptance
• Inclusion in PubMed, CAS, Scopus and Google Scholar
• Research which is freely available for redistribution
Submit your manuscript at
www.biomedcentral.com/submit
Watanabe et al. BMC Genomics 2012, 13(Suppl 1):S12
http://www.biomedcentral.com/14712164/13/S1/S12
Page 9 of 9