Estimation and testing of genotype and haplotype effects in case-control studies: comparison of weighted regression and multiple imputation procedures

Department of Medical Genetics, University of Cambridge, Cambridge Institute for Medical Research, Addenbrooke's Hospital, Cambridge, UK.
Genetic Epidemiology (Impact Factor: 2.95). 04/2006; 30(3):259-75. DOI: 10.1002/gepi.20142
Source: PubMed

ABSTRACT A popular approach for testing and estimating genotype and haplotype effects associated with a disease outcome is to conduct a population-based case/control study, in which haplotypes are not directly observed but may be inferred probabilistically from unphased genotype data. A variety of methods exist to analyse the resulting data while accounting for the uncertainty in haplotype assignment, but most focus on the issue of testing the global null hypothesis that no genotype or haplotype effects exist. A more interesting question, once a region of disease association has been identified, is to estimate the relevant genotypic or haplotypic effects and to perform tests of complex null hypotheses such as the hypothesis that some loci, but not others, are associated with disease. Here I examine the assumptions behind, and the performance of, two classes of methods for addressing this question. The first is a weighted regression approach in which posterior probabilities of haplotype assignments are used as weights in a logistic regression analysis, generating a test based on either a weighted pseudo-likelihood, or a weighted log-likelihood. The second is a multiple imputation approach using either an improper procedure in which the posterior probabilities are used to generate replicate imputed data sets, or a proper data augmentation procedure. I compare these approaches to a simple expectation substitution (haplotype trend regression) approach. In simulations, all methods gave unbiased parameter estimation but the weighted pseudo-likelihood, expectation substitution and multiple imputation methods had superior confidence interval coverage. For the weighted pseudo-likelihood and expectation substitution methods it was necessary to estimate posterior haplotype assignment probabilities using the combined case/control data, whereas for the multiple imputation approaches it was necessary to estimate these probabilities in the case and control groups separately. Overall, multiple imputation was easiest approach to implement in standard statistical software and to extend to more complex models such as those that include gene-gene or gene-environment interactions.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Maedi-Visna (MV) and ovine pulmonary adenocarcinoma (OPA) are two retroviral diseases occurring worldwide that affect adult sheep. Differences in incidence, which may be related to sheep-rearing and housing choices, as well as to genetics, and disease progression have been reported for both diseases. In this work four microsatellites located in immune-relevant regions, the major histocompatibility complex (MHC) region, interferon-γ and interleukin-12p35, were genotyped to determine their association with disease progression. The analysed sample included Latxa sheep with and without OPA and MV-characteristic lesions in their lungs. The microsatellites in the MHC were the most diverse, while the ones located in the cytokines were the less polymorphic. In the case of IFN-γ the results suggested the presence of null alleles. Significant results were detected for several microsatellite alleles in the association analysis carried out by logistic regression. All statistical analyses included a flock effect adjustment to avoid false positives due to genetic structuration. MHC Class I microsatellite alleles OMHC1*205 and OMHC1*193 were associated with disease progression for Maedi and OPA, respectively. Moreover, MHC Class II microsatellite allele DRB2*275 was associated with presence of lesions in Maedi. Furthermore, the MHC microsatellites were combined for a bioinformatic haplotype inference with the PHASE software. In total, 73 haplotypes were detected, 18 of them in more than 6 animals. After standard and weighted logistic regression analysis, two of them were significantly associated with susceptibility: OMHC1*205-DRB2*271 for Maedi and OMHC1*193-DRB2*271 for OPA, both with the Class I microsatellite alleles associated in the marker by marker study. Although more extensive analyses are needed to disentangle the relationship between host genetics and disease, as far as we know this is the first study demonstrating a significant association between sheep MHC Class I microsatellite alleles and susceptibility to Maedi-Visna and OPA viral diseases.
    Veterinary Immunology and Immunopathology 01/2012; 145(1-2):438-46. DOI:10.1016/j.vetimm.2011.12.020 · 1.75 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A common design in family-based association studies consists of siblings without parents. Several methods have been proposed for analysis of sibship data, but they mostly do not allow for missing data, such as haplotype phase or untyped markers. On the other hand, general methods for nuclear families with missing data are computationally intensive when applied to sibships, since every family has missing parents that could have many possible genotypes. We propose a computationally efficient model for sibships by conditioning on the sets of alleles transmitted into the sibship by each parent. This means that the likelihood can be written only in terms of transmitted alleles and we do not have to sum over all possible untransmitted alleles when they cannot be deduced from the siblings. The model naturally accommodates missing data and admits standard theory of estimation, testing, and inclusion of covariates. Our model is quite robust to population stratification and can test for association in the presence of linkage. We show that our model has similar power to FBAT for single marker analysis and improved power for haplotype analysis. Compared to summing over all possible untransmitted alleles, we achieve similar power with considerable reductions in computation time.
    Annals of Human Genetics 05/2011; 75(3):428-38. DOI:10.1111/j.1469-1809.2010.00636.x · 1.93 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Penalized likelihood methods have become increasingly popular in recent years for evaluating haplotype-phenotype association in case-control studies. Although a retrospective likelihood is dictated by the sampling scheme, these penalized methods are typically built on prospective likelihoods due to their modeling simplicity and computational feasibility. It has been well documented that for unpenalized methods, prospective analyses of case-control data can be valid but less efficient than their retrospective counterparts when testing for association, and result in substantial bias when estimating the haplotype effects. For penalized methods, which combine effect estimation and testing in one step, the impact of using a prospective likelihood is not clear. In this work, we examine the consequences of ignoring the sampling scheme for haplotype-based penalized likelihood methods. Our results suggest that the impact of prospective analyses depends on (1) the underlying genetic mode and (2) the genetic model adopted in the analysis. When the correct genetic model is used, the difference between the two analyses is negligible for additive and slight for dominant haplotype effects. For recessive haplotype effects, the more appropriate retrospective likelihood clearly outperforms the prospective likelihood. If an additive model is incorrectly used, as the true underlying genetic mode is unknown a priori, both retrospective and prospective penalized methods suffer from a sizeable power loss and increase in bias. The impact of using the incorrect genetic model is much bigger on retrospective analyses than prospective analyses, and results in comparable performances for both methods. An application of these methods to the Genetic Analysis Workshop 15 rheumatoid arthritis data is provided.
    Genetic Epidemiology 12/2010; 34(8):892-911. DOI:10.1002/gepi.20545 · 2.95 Impact Factor