Rare-Variant Extensions of the Transmission Disequilibrium Test: Application to Autism Exome Sequence Data

The American Journal of Human Genetics (Impact Factor: 10.99). 12/2013; 94(1). DOI: 10.1016/j.ajhg.2013.11.021
Source: PubMed

ABSTRACT Many population-based rare-variant (RV) association tests, which aggregate variants across a region, have been developed to analyze sequence data. A drawback of analyzing population-based data is that it is difficult to adequately control for population substructure and admixture, and spurious associations can occur. For RVs, this problem can be substantial, because the spectrum of rare variation can differ greatly between populations. A solution is to analyze parent-child trio data, by using the transmission disequilibrium test (TDT), which is robust to population substructure and admixture. We extended the TDT to test for RV associations using four commonly used methods. We demonstrate that for all RV-TDT methods, using proper analysis strategies, type I error is well-controlled even when there are high levels of population substructure or admixture. For trio data, unlike for population-based data, RV allele-counting association methods will lead to inflated type I errors. However type I errors can be properly controlled by obtaining p values empirically through haplotype permutation. The power of the RV-TDT methods was evaluated and compared to the analysis of case-control data with a number of genetic and disease models. The RV-TDT was also used to analyze exome data from 199 Simons Simplex Collection autism trios and an association was observed with variants in ABCA7. Given the problem of adequately controlling for population substructure and admixture in RV association studies and the growing number of sequence-based trio studies, the RV-TDT is extremely beneficial to elucidate the involvement of RVs in the etiology of complex traits.

1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Family-based study design will play a key role in identifying rare causal variants, because rare causal variants can be enriched in families with multiple affected subjects. Furthermore, different from population-based studies, family studies are robust to bias induced by population substructure. It is well known that rare causal variants are difficult to detect from single-locus tests. Therefore, burden tests and non-burden tests have been developed, by combining signals of multiple variants in a chromosomal region or a functional unit. This inevitably incorporates some neutral variants into the test statistics, which can dilute the power of statistical methods. To guard against the noise caused by neutral variants, we here propose an 'adaptive combination of P-values method' (abbreviated as 'ADA'). This method combines per-site P-values of variants that are more likely to be causal. Variants with large P-values (which are more likely to be neutral variants) are discarded from the combined statistic. In addition to performing extensive simulation studies, we applied these tests to the Genetic Analysis Workshop 17 data sets, where real sequence data were generated according to the 1000 Genomes Project. Compared with some existing methods, ADA is more robust to the inclusion of neutral variants. This is a merit especially when dichotomous traits are analyzed. However, there are some limitations for ADA. First, it is more computationally intensive. Second, pedigree structures and founders' sequence data are required for the permutation procedure. Third, unrelated controls cannot be included. We here show that, for family-based studies, the application of ADA is limited to dichotomous trait analyses with full pedigree information.
    PLoS ONE 12/2014; 9(12):e115971. DOI:10.1371/journal.pone.0115971 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The contribution of genetic variants to sporadic amyotrophic lateral sclerosis (ALS) remains largely unknown. Either recessive or de novo variants could result in an apparently sporadic occurrence of ALS. In an attempt to find such variants we sequenced the exomes of 44 ALS-unaffected-parents trios. Rare and potentially damaging compound heterozygous variants were found in 27% of ALS patients, homozygous recessive variants in 14% and coding de novo variants in 27%. In 20% of patients more than one of the above variants was present. Genes with recessive variants were enriched in nucleotide binding capacity, ATPase activity, and the dynein heavy chain. Genes with de novo variants were enriched in transcription regulation and cell cycle processes. This trio study indicates that rare private recessive variants could be a mechanism underlying some case of sporadic ALS, and that de novo mutations are also likely to play a part in the disease.
  • [Show abstract] [Hide abstract]
    ABSTRACT: : Compound heterozygous mutations are mutations that occur on different copies of genes and may completely “knock-out” gene function. Compound heterozygous mutations have been implicated in a large number of diseases, but there are few statistical methods for analyzing their role in disease, especially when such mutations are rare. A major barrier is that phase information is required to determine whether both gene copies are affected and phasing rare variants is difficult. Here, we propose a method to test compound heterozygous and recessive disease models in case–parent trios. We propose a simple algorithm for phasing and show via simulations that tests based on phased trios have almost the same power as tests using true phase information. A further complication in the study of compound heterozygous mutations is that only families where both parents carry mutations are informative. Thus, the informative sample size will be quite small even when the overall sample size is not, making asymptotic approximations of the null distribution of the test statistic inappropriate. To address this, we develop an exact test that will give appropriate P-values regardless of sample size. Using simulation, we show that our method is robust to population stratification and significantly outperforms other methods when the causal model is recessive.
    Genetic Epidemiology 01/2015; DOI:10.1002/gepi.21885 · 2.95 Impact Factor