SNP Set Association Analysis for Genome-Wide Association Studies

Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China.
PLoS ONE (Impact Factor: 3.23). 05/2013; 8(5):e62495. DOI: 10.1371/journal.pone.0062495
Source: PubMed


Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs). In order to avoid low power caused by overmuch correction for multiple comparisons in single locus association study, some methods have been proposed by grouping SNPs together into a SNP set based on genomic features, then testing the joint effect of the SNP set. We compare the performances of principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), and sliced inverse regression (SIR). Simulated SNP sets are generated under scenarios of 0, 1 and ≥2 causal SNPs model. Our simulation results show that all of these methods can control the type I error at the nominal significance level. SPCA is always more powerful than the other methods at different settings of linkage disequilibrium structures and minor allele frequency of the simulated datasets. We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population.

Download full-text


Available from: Ruyang Zhang
  • Source
    • "Genes acting on the neoplastic lineage, intrinsically or extrinsically, to affect the cancer phenotype can be discovered using modifier genetics (Dietrich et al. 1993; Gould and Dove 1996; Cormier et al. 1997). Genome-wide discovery programs for polymorphic modifiers of cancer risk are being carried out both in human populations (Gabriel et al. 2002; Carvajal-Carmona et al. 2011; Peters et al. 2012; Cai et al. 2013) and in rodent models (Ewart-Toland et al. 2003; Elahi et al. 2009; Kwong and Dove 2009; Crist et al. 2011; Liu et al. 2011; Smits et al. 2011; Quan et al. 2011; Eversley et al. 2012; Nnadi et al. 2012). However, identifying the causative elements underlying a polymorphic quantitative risk modifier is a Herculean task (Drinkwater and Gould 2012; e.g., see Lewis and Tomlinson 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: A central goal in the analysis of complex traits is to identify genes that modify a phenotype. Modifiers of a cancer phenotype may act either intrinsically or extrinsically on the salient cell lineage. Germline point mutagenesis by ethylnitrosourea can provide alleles for a gene of interest that include loss-, gain-, or alteration-of-function. Unlike strain polymorphisms, point mutations with heterozygous quantitative phenotypes are detectable in both essential and non-essential genes, and are unlinked from other variants that might confound their identification and analysis. This report analyzes strategies seeking quantitative mutational modifiers of Apc(Min) in the mouse. To identify a quantitative modifier of a phenotype of interest, a cluster of test progeny is needed. The cluster size can be increased as necessary for statistical significance if the founder is a male whose sperm is cryopreserved. A second critical element in this identification is a mapping panel free of polymorphic modifiers of the phenotype, to enable low-resolution mapping followed by targeted resequencing to identify the causative mutation. Here, we describe the development of a panel of six "isogenic mapping partner lines" for C57BL/6J, carrying single-nucleotide markers introduced by mutagenesis. One such derivative, B6.SNVg, shown to be phenotypically neutral in combination with Apc(Min), is an appropriate mapping partner to locate induced mutant modifiers of the Apc(Min) phenotype. The evolved strategy can complement four current major initiatives in the genetic analysis of complex systems: the Genome-wide Association Study; the Collaborative Cross; the Knockout Mouse Project; and The Cancer Genome Atlas.
    Full-text · Article · Apr 2014 · G3-Genes Genomes Genetics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Diarrhea is a highly common infection among children, responsible for significant morbidity and mortality rate worldwide. After pneumonia, diarrhea remains the second leading cause of neonatal deaths. Numerous viral, bacterial, and parasitic enteric pathogens are associated with diarrhea. With increasing antibiotic resistance among enteric pathogens, there is an urgent need for global surveillance of the mutations and resistance genes primarily responsible for resistance to antibiotic treatment. Single Nucleotide Polymorphisms are important in this regard as they have a vast potential to be utilized as molecular diagnostics for gene-disease or pharmacogenomics association studies linking genotype to phenotype. DBDiaSNP is a comprehensive repository of mutations and resistance genes among various diarrheal pathogens and hosts to advance breakthroughs that will find applications from development of sequence-based diagnostic tools to drug discovery. It contains information about 946 mutations and 326 resistance genes compiled from literature and various web resources. As of March 2015, it houses various pathogen genes and the mutations responsible for antibiotic resistance. The pathogens include, for example, DEC (Diarrheagenic E.coli), Salmonella spp., Campylobacter spp., Shigella spp., Clostridium difficile, Aeromonas spp., Helicobacter pylori, Entamoeba histolytica, Vibrio cholera, and viruses. It also includes mutations from hosts (e.g., humans, pigs, others) that render them either susceptible or resistant to a certain type of diarrhea. DBDiaSNP is therefore intended as an integrated open access database for researchers and clinicians working on diarrheal diseases. Additionally, we note that the DBDiaSNP is one of the first antibiotic resistance databases for the diarrheal pathogens covering mutations and resistance genes that have clinical relevance from a broad range of pathogens and hosts. For future translational research involving integrative biology and global health, the database offers veritable potentials, particularly for developing countries and worldwide monitoring and personalized effective treatment of pathogens associated with diarrhea. The database is accessible on the public domain at .
    Full-text · Article · May 2015 · Omics: a journal of integrative biology
  • [Show abstract] [Hide abstract]
    ABSTRACT: With a typical sample size of a few thousand subjects, a single genome-wide association study (GWAS) using traditional one single nucleotide polymorphism (SNP)-at-a-time methods can only detect genetic variants conferring a sizable effect on disease risk. Set-based methods, which analyze sets of SNPs jointly, can detect variants with smaller effects acting within a gene, a pathway, or other biologically relevant sets. Although self-contained set-based methods (those that test sets of variants without regard to variants not in the set) are generally more powerful than competitive set-based approaches (those that rely on comparison of variants in the set of interest with variants not in the set), there is no consensus as to which self-contained methods are best. In particular, several self-contained set tests have been proposed to directly or indirectly "adapt" to the a priori unknown proportion and distribution of effects of the truly associated SNPs in the set, which is a major determinant of their power. A popular adaptive set-based test is the adaptive rank truncated product (ARTP), which seeks the set of SNPs that yields the best-combined evidence of association. We compared the standard ARTP, several ARTP variations we introduced, and other adaptive methods in a comprehensive simulation study to evaluate their performance. We used permutations to assess significance for all the methods and thus provide a level playing field for comparison. We found the standard ARTP test to have the highest power across our simulations followed closely by the global model of random effects (GMRE) and a least absolute shrinkage and selection operator (LASSO)-based test.
    No preview · Article · Dec 2015 · Genetic Epidemiology

We use cookies to give you the best possible experience on ResearchGate. Read our cookies policy to learn more.