Conference Paper

Learning Optimal Bayesian Networks Using A* Search.

DOI: 10.5591/978-1-57735-516-8/IJCAI11-364 Conference: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011
Source: DBLP


This paper formulates learning optimal Bayesian network as a shortest path finding problem. An A* search algorithm is introduced to solve the prob- lem. With the guidance of a consistent heuristic, the algorithm learns an optimal Bayesian network by only searching the most promising parts of the solution space. Empirical results show that the A* search algorithm significantly improves the time and space efficiency of existing methods on a set of benchmark datasets.

Download full-text


Available from: Changhe Yuan, Sep 30, 2015
88 Reads
  • Source
    • "It is a probabilistic approach of graph construction [14] [30] which has become one of the efficient representations of reconstruction of gene regulatory network. Bayesian network represents causal relationships between the nodes [37], [38], [39], [40], [41] rather than a flow of information. In gene regulatory network this causal relation is drawn between the expression levels Xi of the gene i involved in the system. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Gene Regulatory Network (GRN) specifies the series of regulatory interactions between different genes. A target gene is interacted by a signal which is originated from the expression of its regulator gene. A gene is known to be expressed when it synthesizes a protein and the degree of synthesizing the protein determines the level of its expression. The same gene can behave as a 'target' in one state of interaction and a 'regulator' in the next state. If there are many interacting genes in a biological system, a network can be formed out of them where the genes are treated as nodes and interaction between any two genes is treated as an edge. This network is known as Gene Regulatory Network. Simulation of GRN addresses the issue of reconstructing the network on the basis of the expression levels of the interacting genes. Various mathematical tools are used to design the system and different optimization techniques are used to find the optimal design. The process of designing starts with time-dependent (Time-series) and condition-dependent (Steady state) gene expression data, available from micro-array chips. The target gene is activated depending on the collective interactions made to it. The problem can be modeled using Neural Network and application of Fuzzy logic may improve the design. There are two issues to discuss. One is related to uncover the parameters involved in GRN called parameter estimation problem. The other is to predict the network structure step by step while learning the parameters. Applications of meta-heuristic algorithms are proved to be efficient in resolving both the issues.
    International Journal of Scientific and Engineering Research 09/2013; 4(9):10. · 3.20 Impact Factor
  • Source
    • "These algorithms use local search to find "good" networks; however, they offer no guarantee to find the one that optimizes the scoring function. Recently, exact algorithms for learning optimal Bayesian networks have been developed based on dynamic programming [15-17,30,31], branch and bound [18], linear and integer programming (LP) [22,23], and heuristic search [19-21]. These algorithms have enabled us to learn optimal Bayesian networks for datasets with dozens of variables. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this work, we empirically evaluate the capability of various scoring functions of Bayesian networks for recovering true underlying structures. Similar investigations have been carried out before, but they typically relied on approximate learning algorithms to learn the network structures. The suboptimal structures found by the approximation methods have unknown quality and may affect the reliability of their conclusions. Our study uses an optimal algorithm to learn Bayesian network structures from datasets generated from a set of gold standard Bayesian networks. Because all optimal algorithms always learn equivalent networks, this ensures that only the choice of scoring function affects the learned networks. Another shortcoming of the previous studies stems from their use of random synthetic networks as test cases. There is no guarantee that these networks reflect real-world data. We use real-world data to generate our gold-standard structures, so our experimental design more closely approximates real-world situations. A major finding of our study suggests that, in contrast to results reported by several prior works, the Minimum Description Length (MDL) (or equivalently, Bayesian information criterion (BIC)) consistently outperforms other scoring functions such as Akaike's information criterion (AIC), Bayesian Dirichlet equivalence score (BDeu), and factorized normalized maximum likelihood (fNML) in recovering the underlying Bayesian network structures. We believe this finding is a result of using both datasets generated from real-world applications rather than from random processes used in previous studies and learning algorithms to select high-scoring structures rather than selecting random models. Other findings of our study support existing work, e.g., large sample sizes result in learning structures closer to the true underlying structure; the BDeu score is sensitive to the parameter settings; and the fNML performs pretty well on small datasets. We also tested a greedy hill climbing algorithm and observed similar results as the optimal algorithm.
    BMC Bioinformatics 09/2012; 13 Suppl 15(Suppl 15):S14. DOI:10.1186/1471-2105-13-S15-S14 · 2.58 Impact Factor
Show more