Cordell, H.J. Estimation and testing of genotype and haplotype effects in case-control studies: comparison of weighted regression and multiple imputation procedures. Genet. Epidemiol. 30, 259-275

Department of Medical Genetics, University of Cambridge, Cambridge Institute for Medical Research, Addenbrooke's Hospital, Cambridge, UK.
Genetic Epidemiology (Impact Factor: 2.6). 04/2006; 30(3):259-75. DOI: 10.1002/gepi.20142
Source: PubMed


A popular approach for testing and estimating genotype and haplotype effects associated with a disease outcome is to conduct a population-based case/control study, in which haplotypes are not directly observed but may be inferred probabilistically from unphased genotype data. A variety of methods exist to analyse the resulting data while accounting for the uncertainty in haplotype assignment, but most focus on the issue of testing the global null hypothesis that no genotype or haplotype effects exist. A more interesting question, once a region of disease association has been identified, is to estimate the relevant genotypic or haplotypic effects and to perform tests of complex null hypotheses such as the hypothesis that some loci, but not others, are associated with disease. Here I examine the assumptions behind, and the performance of, two classes of methods for addressing this question. The first is a weighted regression approach in which posterior probabilities of haplotype assignments are used as weights in a logistic regression analysis, generating a test based on either a weighted pseudo-likelihood, or a weighted log-likelihood. The second is a multiple imputation approach using either an improper procedure in which the posterior probabilities are used to generate replicate imputed data sets, or a proper data augmentation procedure. I compare these approaches to a simple expectation substitution (haplotype trend regression) approach. In simulations, all methods gave unbiased parameter estimation but the weighted pseudo-likelihood, expectation substitution and multiple imputation methods had superior confidence interval coverage. For the weighted pseudo-likelihood and expectation substitution methods it was necessary to estimate posterior haplotype assignment probabilities using the combined case/control data, whereas for the multiple imputation approaches it was necessary to estimate these probabilities in the case and control groups separately. Overall, multiple imputation was easiest approach to implement in standard statistical software and to extend to more complex models such as those that include gene-gene or gene-environment interactions.

1 Read
  • Source
    • "We tested for association of T1D with the predicted copy numbers from the qPCR and SNP datasets using logistic regression. We allowed for uncertainty in the copy number call when estimating individual odds ratios by using the ten multiply imputed datasets generated from the qPCR posterior probabilities [15], and averaging the estimates over those with the R mitools package [13]. We allowed for statistical interaction with HLA-Bw4 by repeating the association test in the subsets of carriers of the target ligand HLA-Bw4 epitopes, HLA-Bw4 for KIR3DL1 and the putative ligand HLA-Bw4-80I for KIR3DS1. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Killer Immunoglobulin-like Receptors (KIRs) are surface receptors of natural killer cells that bind to their corresponding Human Leukocyte Antigen (HLA) class I ligands, making them interesting candidate genes for HLA-associated autoimmune diseases, including type 1 diabetes (T1D). However, allelic and copy number variation in the KIR region effectively mask it from standard genome-wide association studies: single nucleotide polymorphism (SNP) probes targeting the region are often discarded by standard genotype callers since they exhibit variable cluster numbers. Quantitative Polymerase Chain Reaction (qPCR) assays address this issue. However, their cost is prohibitive at the sample sizes required for detecting effects typically observed in complex genetic diseases. We propose a more powerful and cost-effective alternative, which combines signals from SNPs with more than three clusters found in existing datasets, with qPCR on a subset of samples. First, we showed that noise and batch effects in multiplexed qPCR assays are addressed through normalisation and simultaneous copy number calling of multiple genes. Then, we used supervised classification to impute copy numbers of specific KIR genes from SNP signals. We applied this method to assess copy number variation in two KIR genes, \textit{KIR3DL1} and KIR3DS1, which are suitable candidates for T1D susceptibility since they encode the only KIR molecules known to bind with HLA-Bw4 epitopes. We find no association between KIR3DL1/3DS1 copy number and T1D in 6744 cases and 5362 controls; a sample size twenty-fold larger than in any previous KIR association study. Due to our sample size, we can exclude odds ratios larger than 1.1 for the common KIR3DL1/3DS1 copy number groups at the 5% significance level. We found no evidence of association of KIR3DL1/3DS1 copy number with T1D, either overall or dependent on HLA-Bw4 epitope. Five other KIR genes, KIR2DS4, KIR2DL3, KIR2DL5, KIR2DS5 and KIR2DS1, in high linkage disequilibrium with KIR3DL1 and KIR3DS1, are also unlikely to be significantly associated. Our approach could potentially be applied to other KIR genes to allow cost effective assaying of gene copy number in large samples.
    BMC Genomics 04/2014; 15(1):274. DOI:10.1186/1471-2164-15-274 · 3.99 Impact Factor
  • Source
    • "A theoretical advantage of studying haplotypes is that the genetic variants on a particular haplotype may confer a unique phenotype when they occur together (Silverman, 2007). The question of whether to combine or separate case and control populations for haplotype inferences has been considered by (Cordell, 2006), who demonstrated this to be specific to the analytical method chosen. For a logistic regression analysis weighted according to haplotype probabilities, it is advised to be inferred separately for case and controls populations (Mensah et al., 2007). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Maedi-Visna (MV) and ovine pulmonary adenocarcinoma (OPA) are two retroviral diseases occurring worldwide that affect adult sheep. Differences in incidence, which may be related to sheep-rearing and housing choices, as well as to genetics, and disease progression have been reported for both diseases. In this work four microsatellites located in immune-relevant regions, the major histocompatibility complex (MHC) region, interferon-γ and interleukin-12p35, were genotyped to determine their association with disease progression. The analysed sample included Latxa sheep with and without OPA and MV-characteristic lesions in their lungs. The microsatellites in the MHC were the most diverse, while the ones located in the cytokines were the less polymorphic. In the case of IFN-γ the results suggested the presence of null alleles. Significant results were detected for several microsatellite alleles in the association analysis carried out by logistic regression. All statistical analyses included a flock effect adjustment to avoid false positives due to genetic structuration. MHC Class I microsatellite alleles OMHC1*205 and OMHC1*193 were associated with disease progression for Maedi and OPA, respectively. Moreover, MHC Class II microsatellite allele DRB2*275 was associated with presence of lesions in Maedi. Furthermore, the MHC microsatellites were combined for a bioinformatic haplotype inference with the PHASE software. In total, 73 haplotypes were detected, 18 of them in more than 6 animals. After standard and weighted logistic regression analysis, two of them were significantly associated with susceptibility: OMHC1*205-DRB2*271 for Maedi and OMHC1*193-DRB2*271 for OPA, both with the Class I microsatellite alleles associated in the marker by marker study. Although more extensive analyses are needed to disentangle the relationship between host genetics and disease, as far as we know this is the first study demonstrating a significant association between sheep MHC Class I microsatellite alleles and susceptibility to Maedi-Visna and OPA viral diseases.
    Veterinary Immunology and Immunopathology 01/2012; 145(1-2):438-46. DOI:10.1016/j.vetimm.2011.12.020 · 1.54 Impact Factor
  • Source
    • "In the context of sibship analysis, two studies have used expected haplotype counts, estimated under the null hypothesis , in a form of weighted analysis (Jonasdottir et al., 2008; Stone et al., 2010). This type of approach has a history of application in genetic epidemiology (Cordell, 2006), although in principle its results will be biased towards the null hypothesis , and neither study considered the effect of population stratification on their models. A general approach to missing data in nuclear families has been proposed (Dudbridge, 2008), which considers all possible completions of missing data with reasonable robustness to confounding by population stratification. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A common design in family-based association studies consists of siblings without parents. Several methods have been proposed for analysis of sibship data, but they mostly do not allow for missing data, such as haplotype phase or untyped markers. On the other hand, general methods for nuclear families with missing data are computationally intensive when applied to sibships, since every family has missing parents that could have many possible genotypes. We propose a computationally efficient model for sibships by conditioning on the sets of alleles transmitted into the sibship by each parent. This means that the likelihood can be written only in terms of transmitted alleles and we do not have to sum over all possible untransmitted alleles when they cannot be deduced from the siblings. The model naturally accommodates missing data and admits standard theory of estimation, testing, and inclusion of covariates. Our model is quite robust to population stratification and can test for association in the presence of linkage. We show that our model has similar power to FBAT for single marker analysis and improved power for haplotype analysis. Compared to summing over all possible untransmitted alleles, we achieve similar power with considerable reductions in computation time.
    Annals of Human Genetics 05/2011; 75(3):428-38. DOI:10.1111/j.1469-1809.2010.00636.x · 2.21 Impact Factor
Show more