Article

Large-scale dynamic gene regulatory network inference combining differential equation models with local dynamic Bayesian network analysis

Monsanto Company, Mail zone CC1A, Chesterfield, MO 63017, USA.
Bioinformatics (Impact Factor: 4.62). 08/2011; 27(19):2686-91. DOI: 10.1093/bioinformatics/btr454
Source: PubMed

ABSTRACT Reverse engineering gene regulatory networks, especially large size networks from time series gene expression data, remain a challenge to the systems biology community. In this article, a new hybrid algorithm integrating ordinary differential equation models with dynamic Bayesian network analysis, called Differential Equation-based Local Dynamic Bayesian Network (DELDBN), was proposed and implemented for gene regulatory network inference.
The performance of DELDBN was benchmarked with an in vivo dataset from yeast. DELDBN significantly improved the accuracy and sensitivity of network inference compared with other approaches. The local causal discovery algorithm implemented in DELDBN also reduced the complexity of the network inference algorithm and improved its scalability to infer larger networks. We have demonstrated the applicability of the approach to a network containing thousands of genes with a dataset from human HeLa cell time series experiments. The local network around BRCA1 was particularly investigated and validated with independent published studies. BRAC1 network was significantly enriched with the known BRCA1-relevant interactions, indicating that DELDBN can effectively infer large size gene regulatory network from time series data.
The R scripts are provided in File 3 in Supplementary Material.
zheng.li@monsanto.com; jingdong.liu@monsanto.com
Supplementary data are available at Bioinformatics online.

1 Bookmark
 · 
112 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mutual information (MI), a quantity describing the nonlinear dependence between two random variables, has been widely used to construct gene regulatory networks (GRNs). Despite its good performance, MI cannot separate the direct regulations from indirect ones among genes. Although the conditional mutual information (CMI) is able to identify the direct regulations, it generally underestimates the regulation strength, i.e. it may result in false negatives when inferring gene regulations. In this work, to overcome the problems, we propose a novel concept, namely conditional mutual inclusive information (CMI2), to describe the regulations between genes. Furthermore, with CMI2, we develop a new approach, namely CMI2NI (CMI2-based network inference), for reverse-engineering GRNs. In CMI2NI, CMI2 is used to quantify the mutual information between two genes given a third one through calculating the Kullback-Leibler divergence between the postulated distributions of including and excluding the edge between the two genes. The benchmark results on the GRNs from DREAM challenge as well as the SOS DNA repair network in Escherichia coli demonstrate the superior performance of CMI2NI. Specifically, even for gene expression data with small sample size, CMI2NI can not only infer the correct topology of the regulation networks but also accurately quantify the regulation strength between genes. As a case study, CMI2NI was also used to reconstruct cancer-specific GRNs using gene expression data from The Cancer Genome Atlas (TCGA). CMI2NI is freely accessible at http://www.comp-sysbio.org/cmi2ni. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
    Nucleic Acids Research 12/2014; DOI:10.1093/nar/gku1315 · 8.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Microarray and RNA-seq experiments have become an important part of modern genomics and systems biology. Obtaining meaningful biological data from these experiments is an arduous task that demands close attention to many details. Negligence at any step can lead to gene expression data containing inadequate or composite information that is recalcitrant for pattern extraction. Therefore, it is imperative to carefully consider experimental design before launching a time-consuming and costly experiment. Contemporarily, most genomics experiments have two objectives: (1) generate two or more groups of comparable data for identifying differentially expressed genes, gene families, biological processes, or metabolic pathways under experimental condition. (2) build local gene regulatory networks and identify hierarchically important regulators governing biological processes and pathways of interest. Since the first objective aims to identify the active molecular identities and the second provides a basis for understanding the underlying molecular mechanisms through inferring causality relationships mediated by treatment, an optimal experiment is to produce biologically relevant and extractable data to meet both objectives without substantially increasing the cost. This review discussed the major issues that researchers commonly face when embarking on a microarray or RNA-seq experiments and summarized important aspects of experimental design, which aim to help researchers deliberate how to generate gene expression profiles with low background noise but more interaction to facilitate novel biological knowledge discoveries in modern plant genomics.
    Molecular Plant 11/2014; 8(2). DOI:10.1093/mp/ssu136 · 6.61 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Microarray data is often utilized in inferring regulatory networks. Quantile normalization (QN) is a popular method to reduce array-to-array variation. We show that in the context of time series measurements QN may not be the best choice for this task, especially not if the inference is based on continuous time ODE model. We propose an alternative normalization method that is better suited for network inference from time series data.
    07/2014; 3(3):203-211. DOI:10.3390/microarrays3030203

Full-text

Download
20 Downloads
Available from
Oct 7, 2014