Statistical inference on the penetrances of rare genetic mutations based on a case-family design

Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA.
Biostatistics (Impact Factor: 2.65). 02/2010; 11(3):519-32. DOI: 10.1093/biostatistics/kxq009
Source: PubMed


We propose a formal statistical inference framework for the evaluation of the penetrance of a rare genetic mutation using family data generated under a kin-cohort type of design, where phenotype and genotype information from first-degree relatives (sibs and/or offspring) of case probands carrying the targeted mutation are collected. Our approach is built upon a likelihood model with some minor assumptions, and it can be used for age-dependent penetrance estimation that permits adjustment for covariates. Furthermore, the derived likelihood allows unobserved risk factors that are correlated within family members. The validity of the approach is confirmed by simulation studies. We apply the proposed approach to estimating the age-dependent cancer risk among carriers of the MSH2 or MLH1 mutation.

Download full-text


Available from: Hong Zhang, Apr 18, 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Identifying loci that modify the risk of cancer for mutation carriers is an important topic in oncogenetics. Within this research area, we are concerned with the analysis of the association between a genetic variant (single-nucleotide polymorphism rs13281615) and breast cancer among women with a pathogenic mutation in the BRCA2 gene. As this mutation is rare, data were collected retrospectively according to a case-study design through genetic screening programmes. This involves a selection bias and an intrafamilial correlation, which complicates the statistical analysis. We derive a Cramer–von Mises-type statistic to test the equality of genotype-specific survival functions when the proportional hazards model does not hold. A Clayton copula is specified to model the residual phenotype familial dependence and an innovative semiparametric bootstrap procedure is proposed to approximate the distribution of the test statistic under the null hypothesis. The test proposed is applied to data from European and North American mutation carriers and its performance is evaluated by simulations.
    No preview · Article · Nov 2014 · Journal of the Royal Statistical Society Series C Applied Statistics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the discovery of an increasing number of causal genes for complex human disorders, it is crucial to assess the genetic risk of disease onset for individuals who are carriers of these causal mutations and to compare the distribution of the age-at-onset for such individuals with the distribution for noncarriers. In many genetic epidemiological studies that aim to estimate causal gene effect on disease, the age-at-onset of disease is subject to censoring. In addition, the mutation carrier or noncarrier status of some individuals may be unknown, due to the high cost of in-person ascertainment by collecting DNA samples or because of the death of older individuals. Instead, the probability of such individuals’ mutation status can be obtained from various other sources. When mutation status is missing, the available data take the form of censored mixture data. Recently, various methods have been proposed for risk estimation using such data, but none is efficient for estimating a nonparametric distribution. We propose a fully efficient sieve maximum likelihood estimation method, in which we estimate the logarithm of the hazard ratio between genetic mutation groups using B-splines, while applying nonparametric maximum likelihood estimation to the reference baseline hazard function. Our estimator can be calculated via an expectation-maximization algorithm which is much faster than existing methods. We show that our estimator is consistent and semiparametrically efficient and establish its asymptotic distribution. Simulation studies demonstrate the superior performance of the proposed method, which is used to estimate the distribution of the age-at-onset of Parkinson's disease for carriers of mutations in the leucine-rich repeat kinase 2, LRRK2, gene.
    Full-text · Article · Sep 2015 · Biometrika