Article

Retrospective analysis of haplotype-based case control studies under a flexible model for gene environment association.

Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan, People's Republic of China.
Biostatistics (Impact Factor: 2.24). 02/2008; 9(1):81-99. DOI: 10.1093/biostatistics/kxm011
Source: PubMed

ABSTRACT Genetic epidemiologic studies often involve investigation of the association of a disease with a genomic region in terms of the underlying haplotypes, that is the combination of alleles at multiple loci along homologous chromosomes. In this article, we consider the problem of estimating haplotype-environment interactions from case-control studies when some of the environmental exposures themselves may be influenced by genetic susceptibility. We specify the distribution of the diplotypes (haplotype pair) given environmental exposures for the underlying population based on a novel semiparametric model that allows haplotypes to be potentially related with environmental exposures, while allowing the marginal distribution of the diplotypes to maintain certain population genetics constraints such as Hardy-Weinberg equilibrium. The marginal distribution of the environmental exposures is allowed to remain completely nonparametric. We develop a semiparametric estimating equation methodology and related asymptotic theory for estimation of the disease odds ratios associated with the haplotypes, environmental exposures, and their interactions, parameters that characterize haplotype-environment associations and the marginal haplotype frequencies. The problem of phase ambiguity of genotype data is handled using a suitable expectation-maximization algorithm. We study the finite-sample performance of the proposed methodology using simulated data. An application of the methodology is illustrated using a case-control study of colorectal adenoma, designed to investigate how the smoking-related risk of colorectal adenoma can be modified by "NAT2," a smoking-metabolism gene that may potentially influence susceptibility to smoking itself.

Download full-text

Full-text

Available from: Yi-Hau Chen, Jul 29, 2014
0 Followers
 · 
72 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Two important contributors to missing heritability are believed to be rare variants and gene-environment interaction (GXE). Thus, detecting GXE where G is a rare haplotype variant (rHTV) is a pressing problem. Haplotype analysis is usually the natural second step to follow up on a genomic region that is implicated to be associated through single nucleotide variants (SNV) analysis. Further, rHTV can tag associated rare SNV and provide greater power to detect them than popular collapsing methods. Recently we proposed Logistic Bayesian LASSO (LBL) for detecting rHTV association with case-control data. LBL shrinks the unassociated (especially common) haplotypes toward zero so that an associated rHTV can be identified with greater power. Here, we incorporate environmental factors and their interactions with haplotypes in LBL. As LBL is based on retrospective likelihood, this extension is not trivial. We model the joint distribution of haplotypes and covariates given the case-control status. We apply the approach (LBL-GXE) to the Michigan, Mayo, AREDS, Pennsylvania Cohort Study on Age-related Macular Degeneration (AMD). LBL-GXE detects interaction of a specific rHTV in CFH gene with smoking. To the best of our knowledge, this is the first time in the AMD literature that an interaction of smoking with a specific (rather than pooled) rHTV has been implicated. We also carry out simulations and find that LBL-GXE has reasonably good powers for detecting interactions with rHTV while keeping the type I error rates well controlled. Thus, we conclude that LBL-GXE is a useful tool for uncovering missing heritability.
    Genetic Epidemiology 01/2014; 38(1). DOI:10.1002/gepi.21773 · 2.95 Impact Factor
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In the past decade, many statistical methods have been proposed for the analysis of case–control genetic data with an emphasis on haplotype-based disease association studies. Most of the methodology has concentrated on the estimation of genetic (haplotype) main effects. Most methods accounted for environmental and gene-environment interaction effects by utilizing prospective-type analyses that may lead to biased estimates when used with case–control data. Several recent publications addressed the issue of retrospective sampling in the analysis of case–control genetic data in the presence of environmental factors by developing new efficient semiparametric statistical methods. I present the new Stata command, haplologit, that implements efficient profile-likelihood semiparametric methods for fitting gene–environment models in the very important special cases of a) a rare disease, b) a single candidate gene in Hardy-Weinberg equilibrium, and c) independence of genetic and environmental factors.