Zeggini ENext-generation association studies for complex traits. Nat Genet 43:287-288

Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Nature Genetics (Impact Factor: 29.35). 03/2011; 43(4):287-8. DOI: 10.1038/ng0411-287
Source: PubMed


A new study successfully applies complementary whole-genome sequencing and imputation approaches to establish robust disease associations in an isolated population. This strategy is poised to help elucidate the role of variants at the low end of the allele frequency spectrum in the genetic architecture of complex traits.

9 Reads
  • Source
    • "The study of rare variation can be empowered by focusing on population isolates (31,32). Isolated populations are characterized by increased phenotypic, genetic and environmental homogeneity. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The allelic architecture of complex traits is likely to be underpinned by a combination of multiple common-frequency and rare variants. Targeted genotyping arrays and next generation sequencing technologies at the whole genome and whole exome scales are increasingly employed to access sequence variation across the full minor allele frequency spectrum. Different study design strategies that make use of diverse technologies, imputation and sample selection approaches are an active target of development and evaluation efforts. Initial insights into the contribution of rare variants in common diseases and medically-relevant quantitative traits point to low-frequency and rare alleles acting either independently or in aggregate and in several cases alongside common variants. Studies conducted in population isolates have been successful in detecting rare variant associations with complex phenotypes. Statistical methodologies that enable the joint analysis of rare variants across regions of the genome continue to evolve with current efforts focusing on incorporating information such as functional annotation, and on the meta-analysis of these burden tests. In addition, population stratification, defining genome-wide statistical significance thresholds and the design of appropriate replication experiments constitute important considerations for the powerful analysis and interpretation of rare variant association studies. Progress in addressing these emerging challenges and the accrual of sufficiently large data sets are poised to help the field of complex trait genetics enter a promising era of discovery.
    Human Molecular Genetics 08/2013; 22(R1). DOI:10.1093/hmg/ddt376 · 6.39 Impact Factor
  • Source
    • "The recently released 1000 Genomes haplotypes [3] are a particularly large and dense reference panel that will be commonly used as an imputation reference panel, particularly in GWAS consortia. At the same time, theoretical studies and empirical studies using other primary reference panels, have shown that imputation accuracy in a study population can be increased by use of an additional reference panel such as whole genome or exome sequence data drawn from a subset of the population under study [2] [4] [5] [6] [7] [8] [9]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The analysis of less common variants in genome-wide association studies promises to elucidate complex trait genetics but is hampered by low power to reliably detect association. We show that addition of population-specific exome sequence data to global reference data allows more accurate imputation, particularly of less common SNPs (minor allele frequency 1-10%) in two very different European populations. The imputation improvement corresponds to an increase in effective sample size of 28-38%, for SNPs with a minor allele frequency in the range 1-3%.
    PLoS ONE 07/2013; 8(7):e68604. DOI:10.1371/journal.pone.0068604 · 3.23 Impact Factor
  • Source
    • "Our gene prioritization approach differs from current statistical methods for gene burden testing (reviewed in [19,20]) in that we do not require a control population to prioritize genes, although a matched control population, if available, is useful for eliminating genes that are artifactually significant as a result of sequencing and variant calling errors. In addition, we show that the causal genes can be identified without including allele frequency information, although we expect that using allele frequency information to filter the list of variants detected in exome sequencing should make it easier to detect disease genes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Whole exome sequencing studies identify hundreds to thousands of rare protein coding variants of ambiguous significance for human health. Computational tools are needed to accelerate the identification of specific variants and genes that contribute to human disease. Results We have developed the Variant Effect Scoring Tool (VEST), a supervised machine learning-based classifier, to prioritize rare missense variants with likely involvement in human disease. The VEST classifier training set comprised ~ 45,000 disease mutations from the latest Human Gene Mutation Database release and another ~45,000 high frequency (allele frequency >1%) putatively neutral missense variants from the Exome Sequencing Project. VEST outperforms some of the most popular methods for prioritizing missense variants in carefully designed holdout benchmarking experiments (VEST ROC AUC = 0.91, PolyPhen2 ROC AUC = 0.86, SIFT4.0 ROC AUC = 0.84). VEST estimates variant score p-values against a null distribution of VEST scores for neutral variants not included in the VEST training set. These p-values can be aggregated at the gene level across multiple disease exomes to rank genes for probable disease involvement. We tested the ability of an aggregate VEST gene score to identify candidate Mendelian disease genes, based on whole-exome sequencing of a small number of disease cases. We used whole-exome data for two Mendelian disorders for which the causal gene is known. Considering only genes that contained variants in all cases, the VEST gene score ranked dihydroorotate dehydrogenase (DHODH) number 2 of 2253 genes in four cases of Miller syndrome, and myosin-3 (MYH3) number 2 of 2313 genes in three cases of Freeman Sheldon syndrome. Conclusions Our results demonstrate the potential power gain of aggregating bioinformatics variant scores into gene-level scores and the general utility of bioinformatics in assisting the search for disease genes in large-scale exome sequencing studies. VEST is available as a stand-alone software package at and is hosted by the CRAVAT web server at
    BMC Genomics 05/2013; 14(3). DOI:10.1186/1471-2164-14-S3-S3 · 3.99 Impact Factor
Show more

Similar Publications

Preview (2 Sources)

9 Reads
Available from