Statistical inference on the penetrances of rare genetic mutations based on a case-family design.

Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA.
Biostatistics (Impact Factor: 2.24). 02/2010; 11(3):519-32. DOI: 10.1093/biostatistics/kxq009
Source: PubMed

ABSTRACT We propose a formal statistical inference framework for the evaluation of the penetrance of a rare genetic mutation using family data generated under a kin-cohort type of design, where phenotype and genotype information from first-degree relatives (sibs and/or offspring) of case probands carrying the targeted mutation are collected. Our approach is built upon a likelihood model with some minor assumptions, and it can be used for age-dependent penetrance estimation that permits adjustment for covariates. Furthermore, the derived likelihood allows unobserved risk factors that are correlated within family members. The validity of the approach is confirmed by simulation studies. We apply the proposed approach to estimating the age-dependent cancer risk among carriers of the MSH2 or MLH1 mutation.

Download full-text


Available from: Hong Zhang, Apr 18, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Some case-control genome-wide association studies (CCGWASs) select promising single nucleotide polymorphisms (SNPs) by ranking corresponding p-values, rather than by applying the same p-value threshold to each SNP. For such a study, we define the detection probability (DP) for a specific disease-associated SNP as the probability that the SNP will be "T-selected," namely have one of the top T largest chi-square values (or smallest p-values) for trend tests of association. The corresponding proportion positive (PP) is the fraction of selected SNPs that are true disease-associated SNPs. We study DP and PP analytically and via simulations, both for fixed and for random effects models of genetic risk, that allow for heterogeneity in genetic risk. DP increases with genetic effect size and case-control sample size and decreases with the number of nondisease-associated SNPs, mainly through the ratio of T to N, the total number of SNPs. We show that DP increases very slowly with T, and the increment in DP per unit increase in T declines rapidly with T. DP is also diminished if the number of true disease SNPs exceeds T. For a genetic odds ratio per minor disease allele of 1.2 or less, even a CCGWAS with 1000 cases and 1000 controls requires T to be impractically large to achieve an acceptable DP, leading to PP values so low as to make the study futile and misleading. We further calculate the sample size of the initial CCGWAS that is required to minimize the total cost of a research program that also includes follow-up studies to examine the T-selected SNPs. A large initial CCGWAS is desirable if genetic effects are small or if the cost of a follow-up study is large.
    Biostatistics 05/2008; 9(2):201-15. DOI:10.1093/biostatistics/kxm032 · 2.24 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multivariate survival data arise from case-control family studies in which the ages at disease onset for family members may be correlated. In this paper, we consider a multivariate survival model with the marginal hazard function following the proportional hazards model. We use a frailty-based approach in the spirit of Glidden and Self (1999) to account for the correlation of ages at onset among family members. Specifically, we first estimate the baseline hazard function nonparametrically by the innovation theorem, and then obtain maximum pseudolikelihood estimators for the regression and correlation parameters plugging in the baseline hazard function estimator. We establish a connection with a previously proposed generalized estimating equation-based approach. Simulation studies and an analysis of case-control family data of breast cancer illustrate the methodology's practical utility.
    Biostatistics 08/2006; 7(3):387-98. DOI:10.1093/biostatistics/kxj014 · 2.24 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Estimating marginal hazard function from the correlated failure time data arising from case-control family studies is complicated by noncohort study design and risk heterogeneity due to unmeasured, shared risk factors among the family members. Accounting for both factors in this article, we propose a two-stage estimation procedure. At the first stage, we estimate the dependence parameter in the distribution for the risk heterogeneity without obtaining the marginal distribution first or simultaneously. Assuming that the dependence parameter is known, at the second stage we estimate the marginal hazard function by iterating between estimation of the risk heterogeneity (frailty) for each family and maximization of the partial likelihood function with an offset to account for the risk heterogeneity. We also propose an iterative procedure to improve the efficiency of the dependence parameter estimate. The simulation study shows that both methods perform well under finite sample sizes. We illustrate the method with a case-control family study of early onset breast cancer.
    Biometrics 01/2005; 60(4):936-44. DOI:10.1111/j.0006-341X.2004.00249.x · 1.52 Impact Factor