Article

Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks

Rosetta Inpharmatics, LLC, Seattle, Washington 98109, USA.
Nature Genetics (Impact Factor: 29.65). 08/2008; 40(7):854-61. DOI: 10.1038/ng.167
Source: PubMed

ABSTRACT A key goal of biology is to construct networks that predict complex system behavior. We combine multiple types of molecular data, including genotypic, expression, transcription factor binding site (TFBS), and protein-protein interaction (PPI) data previously generated from a number of yeast experiments, in order to reconstruct causal gene networks. Networks based on different types of data are compared using metrics devised to assess the predictive power of a network. We show that a network reconstructed by integrating genotypic, TFBS and PPI data is the most predictive. This network is used to predict causal regulators responsible for hot spots of gene expression activity in a segregating yeast population. We also show that the network can elucidate the mechanisms by which causal regulators give rise to larger-scale changes in gene expression activity. We then prospectively validate predictions, providing direct experimental evidence that predictive networks can be constructed by integrating multiple, appropriate data types.

Download full-text

Full-text

Available from: Roger Bumgarner, Feb 19, 2014
0 Followers
 · 
109 Views
  • Source
    • "Although, PPI data were used previously (Zhu et al., 2008) in the context of GRN inference, the approach used by previous researchers was very different from the approach used in this study. For instance, protein interactions among target genes were used by Zhu et al. (2008) to determine co-regulation of multiple genes. Here, we use protein interaction among TFs to determine combinatorial regulations by multiple TFs. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Reconstruction of gene regulatory networks (GRNs) from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein–protein interactions) with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS) and physical protein interactions (PPI) among transcription factors (TFs) in a Bayesian variable selection (BVS) algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of least absolute shrinkage and selection operator (LASSO) regression-based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression-based method in some circumstances.
    Frontiers in Bioengineering and Biotechnology 05/2014; 2. DOI:10.3389/fbioe.2014.00013
  • Source
    • "This reprint differs from the original in pagination and typographic detail. 1 (2010); Zhang et al. (2012)]. Methods are available to integrate multiple genomic data to draw causal inference on a biological network [Schadt et al. (2005); Zhu et al. (2008); Hageman et al. (2011); Neto et al. (2013)]. We focus in this paper on joint analysis of multiple eQTL SNPs of a gene and their corresponding mRNA expression for their effects on disease phenotypes . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genetic association studies have been a popular approach for assessing the association between common Single Nucleotide Polymorphisms (SNPs) and complex diseases. However, other genomic data involved in the mechanism from SNPs to disease, e.g., gene expressions, are usually neglected in these association studies. In this paper, we propose to exploit gene expression information to more powerfully test the association between SNPs and diseases by jointly modeling the relations among SNPs, gene expressions and diseases. We propose a variance component test for the total effect of SNPs and a gene expression on disease risk. We cast the test within the causal mediation analysis framework with the gene expression as a potential mediator. For eQTL SNPs, the use of gene expression information can enhance power to test for the total effect of a SNP-set, which are the combined direct and indirect effects of the SNPs mediated through the gene expression, on disease risk. We show that the test statistic under the null hypothesis follows a mixture of χ (2) distributions, which can be evaluated analytically or empirically using the resampling-based perturbation method. We construct tests for each of three disease models that is determined by SNPs only, SNPs and gene expression, or includes also their interactions. As the true disease model is unknown in practice, we further propose an omnibus test to accommodate different underlying disease models. We evaluate the finite sample performance of the proposed methods using simulation studies, and show that our proposed test performs well and the omnibus test can almost reach the optimal power where the disease model is known and correctly specified. We apply our method to re-analyze the overall effect of the SNP-set and expression of the ORMDL3 gene on the risk of asthma.
    The Annals of Applied Statistics 03/2014; 8(1):352-376. DOI:10.1214/13-AOAS690 · 1.69 Impact Factor
  • Source
    • "These studies have the potential to identify the mechanisms controlling transcript levels because the genotype can be considered to be causative, although it may affect gene expression indirectly. Linkage analysis does not immediately identify the causative allele or the mechanism for the observed expression variation, but incorporation of other high-throughput data—such as protein–protein interactions, knowledge of TF binding sites, and knowledge of the effects that known regulators have on genome-wide expression—can facilitate the challenging process of mechanistic dissection (Lee et al. 2006; Choi and Kim 2008; Zhu et al. 2008). Highlights among the many findings of these studies (and subsequent reanalyses of the same data) include the following: A large category of linkages corresponds to cis-effects on single genes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The term "transcriptional network" refers to the mechanism(s) that underlies coordinated expression of genes, typically involving transcription factors (TFs) binding to the promoters of multiple genes, and individual genes controlled by multiple TFs. A multitude of studies in the last two decades have aimed to map and characterize transcriptional networks in the yeast Saccharomyces cerevisiae. We review the methodologies and accomplishments of these studies, as well as challenges we now face. For most yeast TFs, data have been collected on their sequence preferences, in vivo promoter occupancy, and gene expression profiles in deletion mutants. These systematic studies have led to the identification of new regulators of numerous cellular functions and shed light on the overall organization of yeast gene regulation. However, many yeast TFs appear to be inactive under standard laboratory growth conditions, and many of the available data were collected using techniques that have since been improved. Perhaps as a consequence, comprehensive and accurate mapping among TF sequence preferences, promoter binding, and gene expression remains an open challenge. We propose that the time is ripe for renewed systematic efforts toward a complete mapping of yeast transcriptional regulatory mechanisms.
    Genetics 09/2013; 195(1):9-36. DOI:10.1534/genetics.113.153262 · 4.87 Impact Factor
Show more