Estimation and testing of genotype and haplotype effects in case-control studies: comparison of weighted regression and multiple imputation procedures.

Department of Medical Genetics, University of Cambridge, Cambridge Institute for Medical Research, Addenbrooke's Hospital, Cambridge, UK.
Genetic Epidemiology (Impact Factor: 2.95). 05/2006; 30(3):259-75. DOI: 10.1002/gepi.20142
Source: PubMed

ABSTRACT A popular approach for testing and estimating genotype and haplotype effects associated with a disease outcome is to conduct a population-based case/control study, in which haplotypes are not directly observed but may be inferred probabilistically from unphased genotype data. A variety of methods exist to analyse the resulting data while accounting for the uncertainty in haplotype assignment, but most focus on the issue of testing the global null hypothesis that no genotype or haplotype effects exist. A more interesting question, once a region of disease association has been identified, is to estimate the relevant genotypic or haplotypic effects and to perform tests of complex null hypotheses such as the hypothesis that some loci, but not others, are associated with disease. Here I examine the assumptions behind, and the performance of, two classes of methods for addressing this question. The first is a weighted regression approach in which posterior probabilities of haplotype assignments are used as weights in a logistic regression analysis, generating a test based on either a weighted pseudo-likelihood, or a weighted log-likelihood. The second is a multiple imputation approach using either an improper procedure in which the posterior probabilities are used to generate replicate imputed data sets, or a proper data augmentation procedure. I compare these approaches to a simple expectation substitution (haplotype trend regression) approach. In simulations, all methods gave unbiased parameter estimation but the weighted pseudo-likelihood, expectation substitution and multiple imputation methods had superior confidence interval coverage. For the weighted pseudo-likelihood and expectation substitution methods it was necessary to estimate posterior haplotype assignment probabilities using the combined case/control data, whereas for the multiple imputation approaches it was necessary to estimate these probabilities in the case and control groups separately. Overall, multiple imputation was easiest approach to implement in standard statistical software and to extend to more complex models such as those that include gene-gene or gene-environment interactions.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Killer Immunoglobulin-like Receptors (KIRs) are surface receptors of natural killer cells that bind to their corresponding Human Leukocyte Antigen (HLA) class I ligands, making them interesting candidate genes for HLA-associated autoimmune diseases, including type 1 diabetes (T1D). However, allelic and copy number variation in the KIR region effectively mask it from standard genome-wide association studies: single nucleotide polymorphism (SNP) probes targeting the region are often discarded by standard genotype callers since they exhibit variable cluster numbers. Quantitative Polymerase Chain Reaction (qPCR) assays address this issue. However, their cost is prohibitive at the sample sizes required for detecting effects typically observed in complex genetic diseases. We propose a more powerful and cost-effective alternative, which combines signals from SNPs with more than three clusters found in existing datasets, with qPCR on a subset of samples. First, we showed that noise and batch effects in multiplexed qPCR assays are addressed through normalisation and simultaneous copy number calling of multiple genes. Then, we used supervised classification to impute copy numbers of specific KIR genes from SNP signals. We applied this method to assess copy number variation in two KIR genes, \textit{KIR3DL1} and KIR3DS1, which are suitable candidates for T1D susceptibility since they encode the only KIR molecules known to bind with HLA-Bw4 epitopes. We find no association between KIR3DL1/3DS1 copy number and T1D in 6744 cases and 5362 controls; a sample size twenty-fold larger than in any previous KIR association study. Due to our sample size, we can exclude odds ratios larger than 1.1 for the common KIR3DL1/3DS1 copy number groups at the 5% significance level. We found no evidence of association of KIR3DL1/3DS1 copy number with T1D, either overall or dependent on HLA-Bw4 epitope. Five other KIR genes, KIR2DS4, KIR2DL3, KIR2DL5, KIR2DS5 and KIR2DS1, in high linkage disequilibrium with KIR3DL1 and KIR3DS1, are also unlikely to be significantly associated. Our approach could potentially be applied to other KIR genes to allow cost effective assaying of gene copy number in large samples.
    BMC Genomics 04/2014; 15(1):274. · 4.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide association studies allow detection of non-genotyped disease-causing variants through testing of nearby genotyped SNPs. This approach may fail when there are no genotyped SNPs in strong LD with the causal variant. Several genotyped SNPs in weak LD with the causal variant may, however, considered together, provide equivalent information. This observation motivates popular but computationally intensive approaches based on imputation or haplotyping. Here we present a new method and accompanying software designed for this scenario. Our approach proceeds by selecting, for each genotyped "anchor" SNP, a nearby genotyped "partner" SNP, chosen via a specific algorithm we have developed. These two SNPs are used as predictors in linear or logistic regression analysis to generate a final significance test. In simulations, our method captures much of the signal captured by imputation, while taking a fraction of the time and disc space, and generating a smaller number of false-positives. We apply our method to a case/control study of severe malaria genotyped using the Affymetrix 500K array. Previous analysis showed that fine-scale sequencing of a Gambian reference panel in the region of the known causal locus, followed by imputation, increased the signal of association to genome-wide significance levels. Our method also increases the signal of association from P≈2×10-6 to P≈6×10-11. Our method thus, in some cases, eliminates the need for more complex methods such as sequencing and imputation, and provides a useful additional test that may be used to identify genetic regions of interest.
    Genetic Epidemiology 02/2014; · 2.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Meng, J. F. & Fingerlin, T. E. 2008: Linear models for analysis of multiple single nucleotide poly-morphisms with quantitative traits in unrelated individuals. — Ann. Zool. Fennici 45: 429–440. Population-based genetic association studies are increasingly used to explore the association between genetic polymorphisms and outcomes such as disease-status and disease-related quantitative traits. Because multiple polymorphisms are typically avail-able, there are several statistical analysis strategies that might be appropriate depend-ing on the goal of the study. In this paper, we compare several linear model parameter-izations that might be used to perform a test of association between a genomic region defined by multiple SNPs and a quantitative trait. We compare via simulation the type I error and power of the omnibus F-test to detect association. As expected, there is no one most powerful test across the genetic models we considered, although tests based on simple parameterizations that do not rely on phase information can be as powerful as more complicated haplotype-based tests even when it is a haplotype that is truly associated with the trait.
    Annales Zoologici Fennici 10/2008; 45(5). · 1.03 Impact Factor