A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes

Wellcome Trust Sanger Institute, Hinxton, UK.
Science (Impact Factor: 33.61). 02/2012; 335(6070):823-8. DOI: 10.1126/science.1215040
Source: PubMed


Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of
protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative
LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes
typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious
LoF alleles, including 26 known and 21 predicted severe disease–causing variants, as well as common LoF variants in nonessential
genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for
using these differences to prioritize candidate genes found in clinical sequencing studies.

Download full-text


Available from: Daniel Macarthur
  • Source
    • "Therefore, there is a growing interest in characterizing and monitoring autozygosity in this breed to preserve genetic diversity and allow the long-term sustainability of breeding programs in Brazil. Evidence from whole-genome sequencing studies in humans indicate that highly deleterious variants are common across healthy individuals (MacArthur et al., 2012; Xue et al., 2012), and although no such systematical survey has been conducted in cattle to the present date, it is highly expected that unfavorable alleles also segregate in cattle populations. Therefore, the use of ever-smaller numbers of animals as founders is expected to inadvertently increase autozygosity of such unfavorable alleles (Szpiech et al., 2013), potentially causing economic losses. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The use of relatively low numbers of sires in cattle breeding programs, particularly on those for carcass and weight traits in Nellore beef cattle (Bos indicus) in Brazil, has always raised concerns about inbreeding, which affects conservation of genetic resources and sustainability of this breed. Here, we investigated the distribution of autozygosity levels based on runs of homozygosity (ROH) in a sample of 1,278 Nellore cows, genotyped for over 777,000 SNPs. We found ROH segments larger than 10 Mb in over 70% of the samples, representing signatures most likely related to the recent massive use of few sires. However, the average genome coverage by ROH (>1 Mb) was lower than previously reported for other cattle breeds (4.58%). In spite of 99.98% of the SNPs being included within a ROH in at least one individual, only 19.37% of the markers were encompassed by common ROH, suggesting that the ongoing selection for weight, carcass and reproductive traits in this population is too recent to have produced selection signatures in the form of ROH. Three short-range highly prevalent ROH autosomal hotspots (occurring in over 50% of the samples) were observed, indicating candidate regions most likely under selection since before the foundation of Brazilian Nellore cattle. The putative signatures of selection on chromosomes 4, 7, and 12 may be involved in resistance to infectious diseases and fertility, and should be subject of future investigation.
    Full-text · Article · Jan 2015 · Frontiers in Genetics
  • Source
    • "It has been suggested that each genome contains 1.5 10 5 new single nucleotide variants (SNVs) that are not present in the dbSNP database (Pelak et al., 2010). These variants are present in different genome regions, like the exome, and may be related to genes involved in human diseases (MacArthur et al., 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: It is assumed that DNA sequences are conserved in the diverse cell types present in a multicellular organism like the human being. Thus, in order to compare the sequences in the genome of DNA from different individuals, nucleic acid is commonly isolated from a single tissue. In this regard, blood cells are widely used for this purpose because of their availability. Thus blood DNA has been used to study genetic familiar diseases that affect other tissues and organs, such as the liver, heart, and brain. While this approach is valid for the identification of familial diseases in which mutations are present in parental germinal cells and, therefore, in all the cells of a given organism, it is not suitable to identify sporadic diseases in which mutations might occur in specific somatic cells. This review addresses somatic DNA variations in different tissues or cells (mainly in the brain) of single individuals and discusses whether the dogma of DNA invariance between cell types is indeed correct. We will also discuss how single nucleotide somatic variations arise, focusing on the presence of specific DNA mutations in the brain.
    Full-text · Article · Nov 2014 · Frontiers in Aging Neuroscience
  • Source
    • "In the DGRP population, we studied one type of LoF, PTCs, with respect to their DAFs, transcriptional profiles, chromosomal distributions and roles in the loss of new genes. We confirmed the previous finding that PTC-containing genes tend to have narrow expression breath (Lee and Reinhardt 2012; MacArthur et al. 2012). Furthermore, we also discovered two novel patterns. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Widespread premature termination codon mutations (PTCs) were recently observed in human and fly populations. We took advantage of the population resequencing data in the Drosophila Genetic Reference Panel (DGRP) to investigate how the expression profile and the evolutionary age of genes shaped the allele frequency distribution of PTCs. After generating a high-quality dataset of PTCs, we clustered genes harboring PTCs into three categories: genes encoding low-frequency PTCs (≤ 1.5%), moderate-frequency PTCs (1.5%-10%) and high-frequency PTCs (> 10%). All three groups show narrow transcription compared to PTC-free genes, with the moderate- and high-PTC frequency groups showing a pronounced pattern. Moreover, nearly half (42%) of the PTC-encoding genes are not expressed in any tissue. Interestingly, the moderate-frequency PTC group is strongly enriched for genes expressed in midgut, whereas genes harboring high-frequency PTCs tend to have sex-specific expression. We further find that although young genes born in the last 60 million years (Myr) compose a mere 9% of the genome, they represent 16%, 30% and 50% of the genes containing low-, moderate- and high-frequency PTCs, respectively. Among DNA-based and RNA-based duplicated genes, the child copy is approximately twice as likely to contain PTCs as the parent copy, whereas young de novo genes are as likely to encode PTCs as DNA-based duplicated new genes. Based on these results, we conclude that expression profile and gene age jointly shaped the landscape of PTC-mediated gene loss. Therefore, we propose that new genes may need a long time to become stably maintained after the origination.
    Full-text · Article · Nov 2014 · Molecular Biology and Evolution
Show more