Zhu, J. et al. Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat. Genet. 40, 854-861

Rosetta Inpharmatics, LLC, Seattle, Washington 98109, USA.
Nature Genetics (Impact Factor: 29.35). 08/2008; 40(7):854-61. DOI: 10.1038/ng.167
Source: PubMed


A key goal of biology is to construct networks that predict complex system behavior. We combine multiple types of molecular data, including genotypic, expression, transcription factor binding site (TFBS), and protein-protein interaction (PPI) data previously generated from a number of yeast experiments, in order to reconstruct causal gene networks. Networks based on different types of data are compared using metrics devised to assess the predictive power of a network. We show that a network reconstructed by integrating genotypic, TFBS and PPI data is the most predictive. This network is used to predict causal regulators responsible for hot spots of gene expression activity in a segregating yeast population. We also show that the network can elucidate the mechanisms by which causal regulators give rise to larger-scale changes in gene expression activity. We then prospectively validate predictions, providing direct experimental evidence that predictive networks can be constructed by integrating multiple, appropriate data types.

Download full-text


Available from: Roger Bumgarner, Feb 19, 2014
  • Source
    • "Whereas interaction networks can present a global and holistic view of the interacting elements directly or indirectly involved in disease progression, probabilistic causal networks can elucidate causal relationships as well as potential mechanisms (Zhu et al., 2008, 2012). Bayesian networks represent one class of probabilistic causal modeling approaches that are in widespread use today. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Posttraumatic stress disorder (PTSD) and other deployment-related outcomes originate from a complex interplay between constellations of changes in DNA, environmental traumatic exposures, and other biological risk factors. These factors affect not only individual genes or bio-molecules but also the entire biological networks that in turn increase or decrease the risk of illness or affect illness severity. This review focuses on recent developments in the field of systems biology which use multidimensional data to discover biological networks affected by combat exposure and post-deployment disease states. By integrating large-scale, high-dimensional molecular, physiological, clinical, and behavioral data, the molecular networks that directly respond to perturbations that can lead to PTSD can be identified and causally associated with PTSD, providing a path to identify key drivers. Reprogrammed neural progenitor cells from fibroblasts from PTSD patients could be established as an in vitro assay for high throughput screening of approved drugs to determine which drugs reverse the abnormal expression of the pathogenic biomarkers or neuronal properties.
    European Journal of Psychotraumatology 08/2014; 5. DOI:10.3402/ejpt.v5.23938 · 2.40 Impact Factor
  • Source
    • "For example, gene co-expression networks constructed with correlation-based measures have been used to identify transitive relationships (Zhou et al., 2002), gene regulatory patterns (van Noort et al., 2004), and biological modules (Mason et al., 2009). Further, they have been successfully combined with transcription factor, eQTL, and PPI data into integrative Bayesian networks (Zhu et al., 2008). Gene co-expression networks are therefore one commonly used example of a complex model built from the high-throughput biological data collections. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A fundamental goal of systems biology is to create models that describe relationships between biological components. Networks are an increasingly popular approach to this problem. However, a scientist interested in modeling biological (e.g., gene expression) data as a network is quickly confounded by the fundamental problem: how to construct the network? It is fairly easy to construct a network, but is it the network for the problem being considered? This is an important problem with three fundamental issues: How to weight edges in the network in order to capture actual biological interactions? What is the effect of the type of biological experiment used to collect the data from which the network is constructed? How to prune the weighted edges (or what cut-off to apply)? Differences in the construction of networks could lead to different biological interpretations. Indeed, we find that there are statistically significant dissimilarities in the functional content and topology between gene co-expression networks constructed using different edge weighting methods, data types, and edge cut-offs. We show that different types of known interactions, such as those found through Affinity Capture-Luminescence or Synthetic Lethality experiments, appear in significantly varying amounts in networks constructed in different ways. Hence, we demonstrate that different biological questions may be answered by the different networks. Consequently, we posit that the approach taken to build a network can be matched to biological questions to get targeted answers. More study is required to understand the implications of different network inference approaches and to draw reliable conclusions from networks used in the field of systems biology.
    08/2014; 2(02):139-161. DOI:10.1017/nws.2014.13
  • Source
    • "By first clustering transcripts with similar expression into groups, sparse partial leastsquares regression framework has been proposed to select markers associated with each cluster of genes (Chun and Keles, 2009). Adaptive multi-task least absolute shrinkage and selection operator (LASSO; Zhu et al., 2008) has been developed for detecting eQTLs that takes into account related expression traits simultaneously while incorporating many regulatory features. On the other hand, the graph-guided fused LASSO (Kim and Xing, 2009; Kim et al., 2009) considers regulatory networks over multiple expression traits within an association analysis, but previous knowledge on genomic locations is not incorporated. "
    H Gao · T Zhang · Y Wu · L Jiang · J Zhan · J Li · R Yang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Given the drawbacks of implementing multivariate analysis for mapping multiple traits in genome-wide association study (GWAS), principal component analysis (PCA) has been widely used to generate independent 'super traits' from the original multivariate phenotypic traits for the univariate analysis. However, parameter estimates in this framework may not be the same as those from the joint analysis of all traits, leading to spurious linkage results. In this paper, we propose to perform the PCA for residual covariance matrix instead of the phenotypical covariance matrix, based on which multiple traits are transformed to a group of pseudo principal components. The PCA for residual covariance matrix allows analyzing each pseudo principal component separately. In addition, all parameter estimates are equivalent to those obtained from the joint multivariate analysis under a linear transformation. However, a fast least absolute shrinkage and selection operator (LASSO) for estimating the sparse oversaturated genetic model greatly reduces the computational costs of this procedure. Extensive simulations show statistical and computational efficiencies of the proposed method. We illustrate this method in a GWAS for 20 slaughtering traits and meat quality traits in beef cattle.Heredity advance online publication, 2 July 2014; doi:10.1038/hdy.2014.57.
    Heredity 07/2014; 114(4). DOI:10.1038/hdy.2014.57 · 3.81 Impact Factor
Show more