SNP Set Association Analysis for Genome-Wide Association Studies

Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China.
PLoS ONE (Impact Factor: 3.23). 05/2013; 8(5):e62495. DOI: 10.1371/journal.pone.0062495
Source: PubMed


Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs). In order to avoid low power caused by overmuch correction for multiple comparisons in single locus association study, some methods have been proposed by grouping SNPs together into a SNP set based on genomic features, then testing the joint effect of the SNP set. We compare the performances of principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), and sliced inverse regression (SIR). Simulated SNP sets are generated under scenarios of 0, 1 and ≥2 causal SNPs model. Our simulation results show that all of these methods can control the type I error at the nominal significance level. SPCA is always more powerful than the other methods at different settings of linkage disequilibrium structures and minor allele frequency of the simulated datasets. We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population.

Download full-text


Available from: Ruyang Zhang,
  • Source
    • "Genes acting on the neoplastic lineage, intrinsically or extrinsically, to affect the cancer phenotype can be discovered using modifier genetics (Dietrich et al. 1993; Gould and Dove 1996; Cormier et al. 1997). Genome-wide discovery programs for polymorphic modifiers of cancer risk are being carried out both in human populations (Gabriel et al. 2002; Carvajal-Carmona et al. 2011; Peters et al. 2012; Cai et al. 2013) and in rodent models (Ewart-Toland et al. 2003; Elahi et al. 2009; Kwong and Dove 2009; Crist et al. 2011; Liu et al. 2011; Smits et al. 2011; Quan et al. 2011; Eversley et al. 2012; Nnadi et al. 2012). However, identifying the causative elements underlying a polymorphic quantitative risk modifier is a Herculean task (Drinkwater and Gould 2012; e.g., see Lewis and Tomlinson 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: A central goal in the analysis of complex traits is to identify genes that modify a phenotype. Modifiers of a cancer phenotype may act either intrinsically or extrinsically on the salient cell lineage. Germline point mutagenesis by ethylnitrosourea can provide alleles for a gene of interest that include loss-, gain-, or alteration-of-function. Unlike strain polymorphisms, point mutations with heterozygous quantitative phenotypes are detectable in both essential and non-essential genes, and are unlinked from other variants that might confound their identification and analysis. This report analyzes strategies seeking quantitative mutational modifiers of Apc(Min) in the mouse. To identify a quantitative modifier of a phenotype of interest, a cluster of test progeny is needed. The cluster size can be increased as necessary for statistical significance if the founder is a male whose sperm is cryopreserved. A second critical element in this identification is a mapping panel free of polymorphic modifiers of the phenotype, to enable low-resolution mapping followed by targeted resequencing to identify the causative mutation. Here, we describe the development of a panel of six "isogenic mapping partner lines" for C57BL/6J, carrying single-nucleotide markers introduced by mutagenesis. One such derivative, B6.SNVg, shown to be phenotypically neutral in combination with Apc(Min), is an appropriate mapping partner to locate induced mutant modifiers of the Apc(Min) phenotype. The evolved strategy can complement four current major initiatives in the genetic analysis of complex systems: the Genome-wide Association Study; the Collaborative Cross; the Knockout Mouse Project; and The Cancer Genome Atlas.
    G3-Genes Genomes Genetics 04/2014; 4(6). DOI:10.1534/g3.114.010595 · 3.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Diarrhea is a highly common infection among children, responsible for significant morbidity and mortality rate worldwide. After pneumonia, diarrhea remains the second leading cause of neonatal deaths. Numerous viral, bacterial, and parasitic enteric pathogens are associated with diarrhea. With increasing antibiotic resistance among enteric pathogens, there is an urgent need for global surveillance of the mutations and resistance genes primarily responsible for resistance to antibiotic treatment. Single Nucleotide Polymorphisms are important in this regard as they have a vast potential to be utilized as molecular diagnostics for gene-disease or pharmacogenomics association studies linking genotype to phenotype. DBDiaSNP is a comprehensive repository of mutations and resistance genes among various diarrheal pathogens and hosts to advance breakthroughs that will find applications from development of sequence-based diagnostic tools to drug discovery. It contains information about 946 mutations and 326 resistance genes compiled from literature and various web resources. As of March 2015, it houses various pathogen genes and the mutations responsible for antibiotic resistance. The pathogens include, for example, DEC (Diarrheagenic E.coli), Salmonella spp., Campylobacter spp., Shigella spp., Clostridium difficile, Aeromonas spp., Helicobacter pylori, Entamoeba histolytica, Vibrio cholera, and viruses. It also includes mutations from hosts (e.g., humans, pigs, others) that render them either susceptible or resistant to a certain type of diarrhea. DBDiaSNP is therefore intended as an integrated open access database for researchers and clinicians working on diarrheal diseases. Additionally, we note that the DBDiaSNP is one of the first antibiotic resistance databases for the diarrheal pathogens covering mutations and resistance genes that have clinical relevance from a broad range of pathogens and hosts. For future translational research involving integrative biology and global health, the database offers veritable potentials, particularly for developing countries and worldwide monitoring and personalized effective treatment of pathogens associated with diarrhea. The database is accessible on the public domain at .
    Omics: a journal of integrative biology 05/2015; 19(6):354-360. DOI:10.1089/omi.2015.0030 · 2.36 Impact Factor