Comparison of Pathway Analysis Approaches Using Lung Cancer GWAS Data Sets

Prosserman Centre for Health Research, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada.
PLoS ONE (Impact Factor: 3.23). 02/2012; 7(2):e31816. DOI: 10.1371/journal.pone.0031816
Source: PubMed


Pathway analysis has been proposed as a complement to single SNP analyses in GWAS. This study compared pathway analysis methods using two lung cancer GWAS data sets based on four studies: one a combined data set from Central Europe and Toronto (CETO); the other a combined data set from Germany and MD Anderson (GRMD). We searched the literature for pathway analysis methods that were widely used, representative of other methods, and had available software for performing analysis. We selected the programs EASE, which uses a modified Fishers Exact calculation to test for pathway associations, GenGen (a version of Gene Set Enrichment Analysis (GSEA)), which uses a Kolmogorov-Smirnov-like running sum statistic as the test statistic, and SLAT, which uses a p-value combination approach. We also included a modified version of the SUMSTAT method (mSUMSTAT), which tests for association by averaging χ(2) statistics from genotype association tests. There were nearly 18000 genes available for analysis, following mapping of more than 300,000 SNPs from each data set. These were mapped to 421 GO level 4 gene sets for pathway analysis. Among the methods designed to be robust to biases related to gene size and pathway SNP correlation (GenGen, mSUMSTAT and SLAT), the mSUMSTAT approach identified the most significant pathways (8 in CETO and 1 in GRMD). This included a highly plausible association for the acetylcholine receptor activity pathway in both CETO (FDR≤0.001) and GRMD (FDR = 0.009), although two strong association signals at a single gene cluster (CHRNA3-CHRNA5-CHRNB4) drive this result, complicating its interpretation. Few other replicated associations were found using any of these methods. Difficulty in replicating associations hindered our comparison, but results suggest mSUMSTAT has advantages over the other approaches, and may be a useful pathway analysis tool to use alongside other methods such as the commonly used GSEA (GenGen) approach.

Download full-text


Available from: Geoffrey Liu,
  • Source
    • "This is a conservative strategy designed to minimize false-positive associations, while missing many true-positive associations that do not meet statistical significance. In contrast, pathway analysis methods are a companion to GWAS studies that consider much larger proportions of the top SNPs and investigate their aggregate associations to known biological groupings or metabolic pathways [12]. Pathway analysis studies have been successful in identifying additional biological insight and finding groupings of genes that represent biological disease processes [28]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We show here that combining two existing genome wide association studies (GWAS) yields additional biologically relevant information, beyond that obtained by either GWAS separately. We propose Joint GWAS Analysis, a method that compares a pair of GWAS for similarity among the top SNP associations, top genes identified, gene functional clusters, and top biological pathways. We show that Joint GWAS Analysis identifies additional enriched biological pathways that would be missed by traditional Single-GWAS analysis. Furthermore, we examine the similarities of six complex genetic disorders at the SNP-level, gene-level, gene-cluster-level, and pathway-level. We make concrete hypotheses regarding novel pathway associations for several complex disorders considered, based on the results of Joint GWAS Analysis. Together, these results demonstrate that common complex disorders share substantially more genomic architecture than has been previously realized and that the meta-analysis of GWAS needs not be limited to GWAS of the same phenotype to be informative.
    Genomics Data 12/2014; 2:202–211. DOI:10.1016/j.gdata.2014.04.004
  • Source
    • "Besides these fundamental differences, the used bioinformatics databases and/or the releases also vary between the studies (see Table S1). Due to these facts, it is not straightforward to compare different pathway analysis methods [20]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide association studies (GWAS) led to the identification of numerous novel loci for a number of complex diseases. Pathway-based approaches using genotypic data provide tangible leads which cannot be identified by single marker approaches as implemented in GWAS. The available pathway analysis approaches mainly differ in the employed databases and in the applied statistics for determining the significance of the associated disease markers. So far, pathway-based approaches using GWAS data failed to consider the overlapping of genes among different pathways or the influence of protein–interactions. We performed a multistage integrative pathway (MIP) analysis on three common diseases - Crohn's disease (CD), rheumatoid arthritis (RA) and type 1 diabetes (T1D) - incorporating genotypic, pathway, protein- and domain-interaction data to identify novel associations between these diseases and pathways. Additionally, we assessed the sensitivity of our method by studying the influence of the most significant SNPs on the pathway analysis by removing those and comparing the corresponding pathway analysis results. Apart from confirming many previously published associations between pathways and RA, CD and T1D, our MIP approach was able to identify three new associations between disease phenotypes and pathways. This includes a relation between the influenza-A pathway and RA, as well as a relation between T1D and the phagosome and toxoplasmosis pathways. These results provide new leads to understand the molecular underpinnings of these diseases. The developed software herein used is available at
    PLoS ONE 10/2013; 8(10):e78577. DOI:10.1371/journal.pone.0078577 · 3.23 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Pancreatic cancer is the fourth leading cause of cancer death in the U.S. and the etiology of this highly lethal disease has not been well defined. To identify genetic susceptibility factors for pancreatic cancer, we conducted pathway analysis of genome-wide association study (GWAS) data in 3,141 pancreatic cancer patients and 3,367 controls with European ancestry. Using the gene set ridge regression in association studies (GRASS) method, we analyzed 197 pathways identified from the Kyoto Encyclopedia of Genes and Genomes database. We used the logistic kernel machine (LKM) test to identify major contributing genes to each pathway. We conducted functional enrichment analysis of the most significant genes (P<0.01) using the Database for Annotation, Visualization, and Integrated Discovery (DAVID). Two pathways were significantly associated with risk of pancreatic cancer after adjusting for multiple comparisons (P<0.00025) and in replication testing: neuroactive ligand-receptor interaction, (Ps<0.00002), and the olfactory transduction pathway (P = 0.0001). LKM test identified four genes that were significantly associated with risk of pancreatic cancer after Bonferroni correction (P<1×10(-5)): ABO, HNF1A, OR13C4, and SHH. Functional enrichment analysis using DAVID consistently found the G protein-coupled receptor signaling pathway (including both neuroactive ligand-receptor interaction and olfactory transduction pathways) to be the most significant pathway for pancreatic cancer risk in this study population. These novel findings provide new perspectives on genetic susceptibility to and molecular mechanisms of pancreatic cancer.
    PLoS ONE 10/2012; 7(10):e46887. DOI:10.1371/journal.pone.0046887 · 3.23 Impact Factor
Show more