Network-Assisted Investigation of Combined Causal Signals from Genome-Wide Association Studies in Schizophrenia

Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America.
PLoS Computational Biology (Impact Factor: 4.62). 07/2012; 8(7):e1002587. DOI: 10.1371/journal.pcbi.1002587
Source: PubMed


With the recent success of genome-wide association studies (GWAS), a wealth of association data has been accomplished for more than 200 complex diseases/traits, proposing a strong demand for data integration and interpretation. A combinatory analysis of multiple GWAS datasets, or an integrative analysis of GWAS data and other high-throughput data, has been particularly promising. In this study, we proposed an integrative analysis framework of multiple GWAS datasets by overlaying association signals onto the protein-protein interaction network, and demonstrated it using schizophrenia datasets. Building on a dense module search algorithm, we first searched for significantly enriched subnetworks for schizophrenia in each single GWAS dataset and then implemented a discovery-evaluation strategy to identify module genes with consistent association signals. We validated the module genes in an independent dataset, and also examined them through meta-analysis of the related SNPs using multiple GWAS datasets. As a result, we identified 205 module genes with a joint effect significantly associated with schizophrenia; these module genes included a number of well-studied candidate genes such as DISC1, GNA12, GNA13, GNAI1, GPR17, and GRIN2B. Further functional analysis suggested these genes are involved in neuronal related processes. Additionally, meta-analysis found that 18 SNPs in 9 module genes had P(meta)<1 × 10⁻⁴, including the gene HLA-DQA1 located in the MHC region on chromosome 6, which was reported in previous studies using the largest cohort of schizophrenia patients to date. These results demonstrated our bi-directional network-based strategy is efficient for identifying disease-associated genes with modest signals in GWAS datasets. This approach can be applied to any other complex diseases/traits where multiple GWAS datasets are available.

Download full-text


Available from: Todd L Edwards,
  • Source
    • "Although most drug targets are not identified through GWAS studies, they are obviously as much involved in the disease mechanism as GWAS genes, and so may be expected to have similar properties, particularly in terms of pathway and network relationships. A number of studies have incorporated network information to aid in identifying various classes of genes, for example using a network module formalism to combine signals from multiple GWAS studies [13,14] and using network flow models to predict drug targets from expression and other data in prostate cancer [15]. Network models have also been used to identify pathways implicated in cancer [16]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Genome wide association studies (GWAS) have revealed a large number of links between genome variation and complex disease. Among other benefits, it is expected that these insights will lead to new therapeutic strategies, particularly the identification of new drug targets. In this paper, we evaluate the power of GWAS studies to find drug targets by examining how many existing drug targets have been directly 'rediscovered' by this technique, and the extent to which GWAS results may be leveraged by network information to discover known and new drug targets. Results We find that only a very small fraction of drug targets are directly detected in the relevant GWAS studies. We investigate two possible explanations for this observation. First, we find evidence of negative selection acting on drug target genes as a consequence of strong coupling with the disease phenotype, so reducing the incidence of SNPs linked to the disease. Second, we find that GWAS genes are substantially longer on average than drug targets and than all genes, suggesting there is a length related bias in GWAS results. In spite of the low direct relationship between drug targets and GWAS reported genes, we found these two sets of genes are closely coupled in the human protein network. As a consequence, machine-learning methods are able to recover known drug targets based on network context and the set of GWAS reported genes for the same disease. We show the approach is potentially useful for identifying drug repurposing opportunities. Conclusions Although GWA studies do not directly identify most existing drug targets, there are several reasons to expect that new targets will nevertheless be discovered using these data. Initial results on drug repurposing studies using network analysis are encouraging and suggest directions for future development.
    BMC Genomics 05/2014; 15 Suppl 4(Suppl 4):S5. DOI:10.1186/1471-2164-15-S4-S5 · 3.99 Impact Factor
  • Source
    • "The lack of explanatory power of GWAS thus calls for additional methods to uncover the mechanisms that underlie complex diseases. Over the last few years, different approaches have been explored to discover functional relationships between genes at the associated loci, for instance, by searching for genes with similar functions or within the same molecular pathway [3] [4] [5]. One widely used approach is the gene set enrichment analysis, which determines whether an a priori defined set of genes is statistically enriched for disease associations. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Most common diseases are complex, involving multiple genetic and environmental factors and their interactions. In the past decade, genome-wide association studies (GWAS) have successfully identified thousands of genetic variants underlying susceptibility to complex diseases. However, the results from these studies often do not provide evidence on how the variants affect downstream pathways and lead to the disease. Therefore, in the post-GWAS era the greatest challenge lies in combining GWAS findings with additional molecular data to functionally characterize the associations. The advances in various~omics techniques have made it possible to investigate the effect of risk variants on intermediate molecular levels, such as gene expression, methylation, protein abundance or metabolite levels. As disease aetiology is complex, no single molecular analysis is expected to fully unravel the disease mechanism. Multiple molecular levels can interact and also show plasticity in different physiological conditions, cell types and disease stages. There is therefore a great need for new integrative approaches that can combine data from different molecular levels and can help construct the causal inference from genotype to phenotype. Systems genetics is such an approach; it is used to study genetic effects within the larger scope of systems biology by integrating genotype information with various~omics datasets as well as with environmental and physiological variables. In this review, we describe this approach and discuss how it can help us unravel the molecular mechanisms through which genetic variation causes disease. This article is part of a Special Issue entitled: From Genome to Function.
    Biochimica et Biophysica Acta 05/2014; 1842(10). DOI:10.1016/j.bbadis.2014.04.025 · 4.66 Impact Factor
  • Source
    • "Specifically, Rossin et al. found that PPI connections between loci defined in GWAS of a specific disease were more densely connected than chance expectation [24], and Nicolae et al. [14] observed that SNPs found in GWAS were more likely to be eSNPs. The comprehensiveness of our work relied on combining eQTL data with the PPI network and not merely GWAS data, as described in previous studies [27]. This allowed us to examine source-target connections across the network, rather than be limited to studying the source nodes as in GWAS-PPI analyses. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent years many genetic variants (eSNPs) have been reported as associated with expression of transcripts in trans. However, the causal variants and regulatory mechanisms through which they act remain mostly unknown. In this paper we follow two kinds of usual suspects: SNPs that alter coding regions or transcription factors, identifiable by sequencing data with transcriptional profiles in the same cohort. We show these interpretable genomic regions are enriched for eSNP association signals, thereby naturally defining source-target gene pairs. We map these pairs onto a protein-protein interaction (PPI) network and study their topological properties. For exonic eSNP sources, we report source-target proximity and high target degree within the PPI network. These pairs are more likely to be co-expressed and the eSNPs tend to have a cis effect, modulating the expression of the source gene. In contrast, transcription factor source-target pairs are not observed to have such properties, but instead a transcription factor source tends to assemble into units of defined functional roles along with its gene targets, and to share with them the same functional cluster of the PPI network. Our results suggest two modes of trans regulation: transcription factor variation frequently acts via a modular regulation mechanism, with multiple targets that share a function with the transcription factor source. Notwithstanding, exon variation often acts by a local cis effect, delineating shorter paths of interacting proteins across functional clusters of the PPI network.
    Genome biology 07/2013; 14(7):R71. DOI:10.1186/gb-2013-14-7-r71 · 10.81 Impact Factor
Show more