Deleterious- and Disease-Allele Prevalence in Healthy Individuals: Insights from Current Predictions, Mutation Databases, and Population-Scale Resequencing

The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
The American Journal of Human Genetics (Impact Factor: 10.93). 12/2012; 91(6):1022-1032. DOI: 10.1016/j.ajhg.2012.10.015
Source: PubMed


We have assessed the numbers of potentially deleterious variants in the genomes of apparently healthy humans by using (1) low-coverage whole-genome sequence data from 179 individuals in the 1000 Genomes Pilot Project and (2) current predictions and databases of deleterious variants. Each individual carried 281-515 missense substitutions, 40-85 of which were homozygous, predicted to be highly damaging. They also carried 40-110 variants classified by the Human Gene Mutation Database (HGMD) as disease-causing mutations (DMs), 3-24 variants in the homozygous state, and many polymorphisms putatively associated with disease. Whereas many of these DMs are likely to represent disease-allele-annotation errors, between 0 and 8 DMs (0-1 homozygous) per individual are predicted to be highly damaging, and some of them provide information of medical relevance. These analyses emphasize the need for improved annotation of disease alleles both in mutation databases and in the primary literature; some HGMD mutation data have been recategorized on the basis of the present findings, an iterative process that is both necessary and ongoing. Our estimates of deleterious-allele numbers are likely to be subject to both overcounting and undercounting. However, our current best mean estimates of ∼400 damaging variants and ∼2 bona fide disease mutations per individual are likely to increase rather than decrease as sequencing studies ascertain rare variants more effectively and as additional disease alleles are discovered.

Download full-text


Available from: David N Cooper,
  • Source
    • "Finally, the introduction of next-generation sequencing in clinical laboratories is causing an explosion in the number of DNA variants identified in and around genes [Yang et al., 2013]. Unfortunately, interpreting the clinical implications of variants in or near splice sites is challenging as functional annotation of DNA variants in publically available databases is inadequate [Xue et al., 2012]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Assessment of the functional consequences of variants near splice sites is a major challenge in the diagnostic laboratory. To address this issue, we created Expression Minigenes (EMGs) to determine the RNA and protein products generated by splice site variants (n = 10) implicated in cystic fibrosis (CF). Experimental results were compared with the splicing predictions of eight in silico tools. EMGs containing the full-length Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) coding sequence and flanking intron sequences generated wild-type transcript and fully processed protein in Human Embryonic Kidney (HEK293) and CF bronchial epithelial (CFBE41o-) cells. Quantification of variant induced aberrant mRNA isoforms was concordant using fragment analysis and pyrosequencing. The splicing patterns of c.1585–1G>A and c.2657+5G>A were comparable to those reported in primary cells from individuals bearing these variants. Bioinformatics predictions were consistent with experimental results for 9/10 variants (MES), 8/10 variants (NNSplice) and 7/10 variants (SSAT and Sroogle). Programs that estimate the consequences of mis-splicing predicted 11/16 (HSF and ASSEDA) and 10/16 (Fsplice and SplicePort) experimentally observed mRNA isoforms. EMGs provide a robust experimental approach for clinical interpretation of splice site variants and refinement of in silico tools.This article is protected by copyright. All rights reserved
    Human Mutation 10/2014; 35(10). DOI:10.1002/humu.22624 · 5.14 Impact Factor
  • Source
    • "The analysis of the reconstructed mitochondrial genomes highlighted a widespread distribution of polymorphisms in healthy samples. Noteworthy, a parallelism may be observed between the enrichment in damaging and probably damaging rare variants within nuclear low frequency alleles of the 1000 Genomes Low Coverage [50] and Exon Pilot Projects [51], and the numerous group of mitochondrial pathogenic predicted alleles [20-22] and mutations with a confirmed disease association [17,18], detected in our dataset. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Whole Exome Sequencing (WES) is one of the most used and cost-effective next generation technologies that allows sequencing of all nuclear exons. Off-target regions may be captured if they present high sequence similarity with baits. Bioinformatics tools have been optimized to retrieve a large amount of WES off-target mitochondrial DNA (mtDNA), by exploiting the aspecificity of probes, partially overlapping to Nuclear mitochondrial Sequences (NumtS). The 1000 Genomes project represents one of the widest resources to extract mtDNA sequences from WES data, considering the large effort the scientific community is undertaking to reconstruct human population history using mtDNA as marker, and the involvement of mtDNA in pathology. Results: A previously published pipeline aimed at assembling mitochondrial genomes from off-target WES reads and further improved to detect insertions and deletions (indels) and heteroplasmy in a dataset of 1242 samples from the 1000 Genomes project, enabled to obtain a nearly complete mitochondrial genome from 943 samples (76% analyzed exomes). The robustness of our computational strategy was highlighted by the reduction of reads amount recognized as mitochondrial in the original annotation produced by the Consortium, due to NumtS filtering. Conclusions: To the best of our knowledge, this is likely the most extended population-scale mitochondrial genotyping in humans enriched with the estimation of heteroplasmies.
    BMC Genomics 05/2014; 15(Suppl 3). DOI:10.1186/1471-2164-15-S3-S2 · 3.99 Impact Factor
  • Source
    • "Evolution requires de novo germline mutations that are newly generated in germ lineage cells and inheritable to the offspring. It is evident that germline mutations occur, because sporadic and deleterious mutations that cannot be transmitted to offspring continuously appear in human populations1234. The human de novo germline mutation rate is estimated to be 1.20 × 10−8/nucleotide/generation1. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Spontaneous germline mutations generate genetic diversity in populations of sexually reproductive organisms, and are thus regarded as a driving force of evolution. However, the cause and mechanism remain unclear. 8-oxoguanine (8-oxoG) is a candidate molecule that causes germline mutations, because it makes DNA more prone to mutation and is constantly generated by reactive oxygen species in vivo. We show here that endogenous 8-oxoG caused de novo spontaneous and heritable G to T mutations in mice, which occurred at different stages in the germ cell lineage and were distributed throughout the chromosomes. Using exome analyses covering 40.9 Mb of mouse transcribed regions, we found increased frequencies of G to T mutations at a rate of 2 × 10(-7) mutations/base/generation in offspring of Mth1/Ogg1/Mutyh triple knockout (TOY-KO) mice, which accumulate 8-oxoG in the nuclear DNA of gonadal cells. The roles of MTH1, OGG1, and MUTYH are specific for the prevention of 8-oxoG-induced mutation, and 99% of the mutations observed in TOY-KO mice were G to T transversions caused by 8-oxoG; therefore, we concluded that 8-oxoG is a causative molecule for spontaneous and inheritable mutations of the germ lineage cells.
    Scientific Reports 04/2014; 4:4689. DOI:10.1038/srep04689 · 5.58 Impact Factor
Show more