The Use of Imputed Values in the Meta-Analysis of Genome-Wide Association Studies

Cancer Prevention Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
Genetic Epidemiology (Impact Factor: 2.6). 11/2011; 35(7):597-605. DOI: 10.1002/gepi.20608
Source: PubMed

ABSTRACT In genome-wide association studies (GWAS), it is a common practice to impute the genotypes of untyped single nucleotide polymorphism (SNP) by exploiting the linkage disequilibrium structure among SNPs. The use of imputed genotypes improves genome coverage and makes it possible to perform meta-analysis combining results from studies genotyped on different platforms. A popular way of using imputed data is the "expectation-substitution" method, which treats the imputed dosage as if it were the true genotype. In current practice, the estimates given by the expectation-substitution method are usually combined using inverse variance weighting (IVM) scheme in meta-analysis. However, the IVM is not optimal as the estimates given by the expectation-substitution method are generally biased. The optimal weight is, in fact, proportional to the inverse variance and the expected value of the effect size estimates. We show both theoretically and numerically that the bias of the estimates is very small under practical conditions of low effect sizes in GWAS. This finding validates the use of the expectation-substitution method, and shows the inverse variance is a good approximation of the optimal weight. Through simulation, we compared the power of the IVM method with several methods including the optimal weight, the regular z-score meta-analysis and a recently proposed "imputation aware" meta-analysis method (Zaitlen and Eskin [2010] Genet Epidemiol 34:537-542). Our results show that the performance of the inverse variance weight is always indistinguishable from the optimal weight and similar to or better than the other two methods.

Download full-text


Available from: Shuo Jiao, Sep 27, 2015
18 Reads
  • Source
    • "Excluding poorly genotyped variants from only a subset of individuals introduces an unequal sample size across sites, making the downstream statistics more complex. Commonly, this is overcome via the imputation of missing data [48], in which the state of an un-genotyped marker is inferred from the haplotypes of the other individuals. This approach may be valid when data is missing due to technical reasons (low coverage sequencing or poor hybridization to genotyping arrays); however, it is likely to miss-infer the correct state if more than two alleles are present at a site, which will occur whenever SVs and CNVs overlap a SNP. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Over the last 10 years, high-density SNP arrays and DNA re-sequencing have illuminated the majority of the genotypic space for a number of organisms, including humans, maize, rice and Arabidopsis. For any researcher willing to define and score a phenotype across many individuals, Genome Wide Association Studies (GWAS) present a powerful tool to reconnect this trait back to its underlying genetics. In this review we discuss the biological and statistical considerations that underpin a successful analysis or otherwise. The relevance of biological factors including effect size, sample size, genetic heterogeneity, genomic confounding, linkage disequilibrium and spurious association, and statistical tools to account for these are presented. GWAS can offer a valuable first insight into trait architecture or candidate loci for subsequent validation.
    Plant Methods 07/2013; 9(1):29. DOI:10.1186/1746-4811-9-29 · 3.10 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The genetic traits that result in autoimmune diseases represent complicating factors in explicating the molecular and cellular elements of autoimmune responses and how these responses can be overcome or manipulated. This article focuses on the major non-major histocompatibility complex genes that have been found to be linked to autoimmune diseases. A given gene may associate with a number of autoimmune diseases and, conversely, a given disease may link to a number of common autoimmune disease (AD) genes. Collaboration and interaction among genes and the number of diseases that develop and the extensive risk factors shared among ADs further complicate the outcome. This article describes the various relationships between gene regions associated with multiple ADs and the complexity of those relationships.
    Critical Reviews in Immunology 01/2012; 32(3):193-285. · 3.70 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background & aims: Heritable factors contribute to the development of colorectal cancer. Identifying the genetic loci associated with colorectal tumor formation could elucidate the mechanisms of pathogenesis. Methods: We conducted a genome-wide association study that included 14 studies, 12,696 cases of colorectal tumors (11,870 cancer, 826 adenoma), and 15,113 controls of European descent. The 10 most statistically significant, previously unreported findings were followed up in 6 studies; these included 3056 colorectal tumor cases (2098 cancer, 958 adenoma) and 6658 controls of European and Asian descent. Results: Based on the combined analysis, we identified a locus that reached the conventional genome-wide significance level at less than 5.0 × 10(-8): an intergenic region on chromosome 2q32.3, close to nucleic acid binding protein 1 (most significant single nucleotide polymorphism: rs11903757; odds ratio [OR], 1.15 per risk allele; P = 3.7 × 10(-8)). We also found evidence for 3 additional loci with P values less than 5.0 × 10(-7): a locus within the laminin gamma 1 gene on chromosome 1q25.3 (rs10911251; OR, 1.10 per risk allele; P = 9.5 × 10(-8)), a locus within the cyclin D2 gene on chromosome 12p13.32 (rs3217810 per risk allele; OR, 0.84; P = 5.9 × 10(-8)), and a locus in the T-box 3 gene on chromosome 12q24.21 (rs59336; OR, 0.91 per risk allele; P = 3.7 × 10(-7)). Conclusions: In a large genome-wide association study, we associated polymorphisms close to nucleic acid binding protein 1 (which encodes a DNA-binding protein involved in DNA repair) with colorectal tumor risk. We also provided evidence for an association between colorectal tumor risk and polymorphisms in laminin gamma 1 (this is the second gene in the laminin family to be associated with colorectal cancers), cyclin D2 (which encodes for cyclin D2), and T-box 3 (which encodes a T-box transcription factor and is a target of Wnt signaling to β-catenin). The roles of these genes and their products in cancer pathogenesis warrant further investigation.
    Gastroenterology 12/2012; 144(4). DOI:10.1053/j.gastro.2012.12.020 · 16.72 Impact Factor
Show more