A high-resolution map of human evolutionary constraint using 29 mammals.

Broad Institute of Harvard and Massachusetts Institute of Technology, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA.
Nature (Impact Factor: 42.35). 10/2011; 478(7370):476-82. DOI: 10.1038/nature10530
Source: PubMed

ABSTRACT The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.

Download full-text


Available from: Wesley C Warren, Jun 30, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The pteropodid fruit bat genus Eidolon is comprised of two extant species: E. dupreanum on Madagascar and E. helvum on the African mainland and offshore islands. Recent population genetic studies of E. helvum indicate widespread panmixia across the continent, although island populations off western Africa show genetic structure. Little is known about the genetic connectivity of E. dupreanum or the divergence time between these two sister species. We examine sequence data for one mitochondrial (cyt-b) and three nuclear regions (β-fib, RAG1, and RAG2) to assess population genetic structure within E. dupreanum and divergence between the two Eidolon spp. In addition, we characterize the demographic history of both taxa using coalescent-based methods. We find little evidence for population structure within E. dupreanum, and suggest that this reflects dispersal based on seasonal fruit availability and a preference for roosting sites in exposed rock outcrops. However, despite apparent panmixia in both Eidolon spp. and large dispersal distances reported in previous studies for E. helvum, these two taxa diverged in the mid-to-late Miocene. Both species are also characterized by population expansion and young, Pleistocene clade ages, although slower population growth in E. dupreanum is likely explained by its divergence via colonization from the mainland. Finally, we discuss the implications of population connectivity in E. dupreanum in the context of its potential role as a reservoir host for pathogens capable of infecting humans.
    Acta Chiropterologica 12/2014; 16(2). DOI:10.3161/150811014X687242 · 0.83 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark datasets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole genome alignment (WGA). Using the same model as the successful Assemblathon competitions we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three datasets were used; two were simulated and based on primate and mammalian phylogenies and one was comprised of 20 real fly genomes. In total 35 submissions were assessed, submitted by ten teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable difference in the alignment quality of differently annotated regions and found few tools aligned the duplications analysed. We found many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all datasets, submissions and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.
    Genome Research 10/2014; 24(12). DOI:10.1101/gr.174920.114 · 13.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide association studies have revealed numerous risk loci associated with diverse diseases. However, identification of disease-causing variants within association loci remains a major challenge. Divergence in gene expression due to cis-regulatory variants in noncoding regions is central to disease susceptibility. We show that integrative computational analysis of phylogenetic conservation with a complexity assessment of co-occurring transcription factor binding sites (TFBS) can identify cis-regulatory variants and elucidate their mechanistic role in disease. Analysis of established type 2 diabetes risk loci revealed a striking clustering of distinct homeobox TFBS. We identified the PRRX1 homeobox factor as a repressor of PPARG2 expression in adipose cells and demonstrate its adverse effect on lipid metabolism and systemic insulin sensitivity, dependent on the rs4684847 risk allele that triggers PRRX1 binding. Thus, cross-species conservation analysis at the level of co-occurring TFBS provides a valuable contribution to the translation of genetic association signals to disease-related molecular mechanisms.
    Cell 01/2014; 156(1-2):343-58. DOI:10.1016/j.cell.2013.10.058. · 33.12 Impact Factor