Zhu, J. et al. Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat. Genet. 40, 854-861

Rosetta Inpharmatics, LLC, Seattle, Washington 98109, USA.
Nature Genetics (Impact Factor: 29.35). 08/2008; 40(7):854-61. DOI: 10.1038/ng.167
Source: PubMed


A key goal of biology is to construct networks that predict complex system behavior. We combine multiple types of molecular data, including genotypic, expression, transcription factor binding site (TFBS), and protein-protein interaction (PPI) data previously generated from a number of yeast experiments, in order to reconstruct causal gene networks. Networks based on different types of data are compared using metrics devised to assess the predictive power of a network. We show that a network reconstructed by integrating genotypic, TFBS and PPI data is the most predictive. This network is used to predict causal regulators responsible for hot spots of gene expression activity in a segregating yeast population. We also show that the network can elucidate the mechanisms by which causal regulators give rise to larger-scale changes in gene expression activity. We then prospectively validate predictions, providing direct experimental evidence that predictive networks can be constructed by integrating multiple, appropriate data types.

Download full-text


Available from: Roger Bumgarner, Feb 19, 2014
  • Source
    • "Using genome information is a promising approach to identify directionality that is less susceptible to confounding. Previous applications in data integration using gene expression data and genotypes have followed a similar logic9101112. For example, Mehrabian et al., [9] integrated genotypic and phenotypic data in a segregating mouse population to generate causal relationships. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Understanding causal relationships among large numbers of variables is a fundamental goal of biomedical sciences and can be facilitated by Directed Acyclic Graphs (DAGs) where directed edges between nodes represent the influence of components of the system on each other. In an observational setting, some of the directions are often unidentifiable because of Markov equivalency. Additional exogenous information, such as expert knowledge or genotype data can help establish directionality among the endogenous variables. In this study, we use the method of principle component analysis to extract information across the genome in order to generate a robust statistical causal network among phenotypes, the variables of primary interest. The method is applied to 590,020 SNP genotypes measured on 1,596 individuals to generate the statistical causal network of 13 cardiovascular disease risk factor phenotypes. First, principal component analysis was used to capture information across the genome. The principal components were then used to identify a robust causal network structure, GDAG, among the phenotypes. Analyzing a robust causal network over risk factors reveals the flow of information in direct and alternative paths, as well as determining predictors and good targets for intervention. For example, the analysis identified BMI as influencing multiple other risk factor phenotypes and a good target for intervention to lower disease risk.
    Full-text · Article · Jan 2016 · Journal of Biomedical Informatics
  • Source
    • "Whereas interaction networks can present a global and holistic view of the interacting elements directly or indirectly involved in disease progression, probabilistic causal networks can elucidate causal relationships as well as potential mechanisms (Zhu et al., 2008, 2012). Bayesian networks represent one class of probabilistic causal modeling approaches that are in widespread use today. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Posttraumatic stress disorder (PTSD) and other deployment-related outcomes originate from a complex interplay between constellations of changes in DNA, environmental traumatic exposures, and other biological risk factors. These factors affect not only individual genes or bio-molecules but also the entire biological networks that in turn increase or decrease the risk of illness or affect illness severity. This review focuses on recent developments in the field of systems biology which use multidimensional data to discover biological networks affected by combat exposure and post-deployment disease states. By integrating large-scale, high-dimensional molecular, physiological, clinical, and behavioral data, the molecular networks that directly respond to perturbations that can lead to PTSD can be identified and causally associated with PTSD, providing a path to identify key drivers. Reprogrammed neural progenitor cells from fibroblasts from PTSD patients could be established as an in vitro assay for high throughput screening of approved drugs to determine which drugs reverse the abnormal expression of the pathogenic biomarkers or neuronal properties.
    Full-text · Article · Aug 2014 · European Journal of Psychotraumatology
  • Source
    • "For example, gene co-expression networks constructed with correlation-based measures have been used to identify transitive relationships (Zhou et al., 2002), gene regulatory patterns (van Noort et al., 2004), and biological modules (Mason et al., 2009). Further, they have been successfully combined with transcription factor, eQTL, and PPI data into integrative Bayesian networks (Zhu et al., 2008). Gene co-expression networks are therefore one commonly used example of a complex model built from the high-throughput biological data collections. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A fundamental goal of systems biology is to create models that describe relationships between biological components. Networks are an increasingly popular approach to this problem. However, a scientist interested in modeling biological (e.g., gene expression) data as a network is quickly confounded by the fundamental problem: how to construct the network? It is fairly easy to construct a network, but is it the network for the problem being considered? This is an important problem with three fundamental issues: How to weight edges in the network in order to capture actual biological interactions? What is the effect of the type of biological experiment used to collect the data from which the network is constructed? How to prune the weighted edges (or what cut-off to apply)? Differences in the construction of networks could lead to different biological interpretations. Indeed, we find that there are statistically significant dissimilarities in the functional content and topology between gene co-expression networks constructed using different edge weighting methods, data types, and edge cut-offs. We show that different types of known interactions, such as those found through Affinity Capture-Luminescence or Synthetic Lethality experiments, appear in significantly varying amounts in networks constructed in different ways. Hence, we demonstrate that different biological questions may be answered by the different networks. Consequently, we posit that the approach taken to build a network can be matched to biological questions to get targeted answers. More study is required to understand the implications of different network inference approaches and to draw reliable conclusions from networks used in the field of systems biology.
    Full-text · Article · Aug 2014
Show more