Detection of nonneutral substitution rates on mammalian phylogenies.

Gladstone Institutes, University of California, San Francisco, San Francisco, California 94158, USA.
Genome Research (Impact Factor: 13.85). 10/2009; 20(1):110-21. DOI: 10.1101/gr.097857.109
Source: PubMed

ABSTRACT Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny. Here we examine the more general problem of detecting departures from the neutral rate of substitution in either direction, possibly in a clade-specific manner. We consider four statistical, phylogenetic tests for addressing this problem: a likelihood ratio test, a score test, a test based on exact distributions of numbers of substitutions, and the genomic evolutionary rate profiling (GERP) test. All four tests have been implemented in a freely available program called phyloP. Based on extensive simulation experiments, these tests are remarkably similar in statistical power. With 36 mammalian species, they all appear to be capable of fairly good sensitivity with low false-positive rates in detecting strong selection at individual nucleotides, moderate selection in 3-bp elements, and weaker or clade-specific selection in longer elements. By applying phyloP to mammalian multiple alignments from the ENCODE project, we shed light on patterns of conservation/acceleration in known and predicted functional elements, approximate fractions of sites subject to constraint, and differences in clade-specific selection in the primate and glires clades. We also describe new "Conservation" tracks in the UCSC Genome Browser that display both phyloP and phastCons scores for genome-wide alignments of 44 vertebrate species.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The human β-globin, δ-globin and ɛ-globin genes contain almost identical coding strand sequences centered about codon 6 having potential to form a stem-loop with a 5'GAGG loop. Provided with a sufficiently stable stem, such a structure can self-catalyze depurination of the loop 5'G residue, leading to a potential mutation hotspot. Previously, we showed that such a hotspot exists about codon 6 of β-globin, with by far the highest incidence of mutations across the gene, including those responsible for 6 anemias (notably Sickle Cell Anemia) and β-thalassemias. In contrast, we show here that despite identical loop sequences, there is no mutational hotspot in the δ- or ɛ1-globin potential self-depurination sites, which differ by only one or two base pairs in the stem region from that of the β-globin gene. These differences result in either one or two additional mismatches in the potential 7-base pair-forming stem region, thereby weakening its stability, so that either DNA cruciform extrusion from the duplex is rendered ineffective or the lifetime of the stem-loop becomes too short to permit self-catalysis to occur. Having that same loop sequence, paralogs HB-γ1 and HB-γ2 totally lack stem-forming potential. Hence the absence in δ- and ɛ1-globin genes of a mutational hotspot in what must now be viewed as non-functional homologs of the self-depurination site in β-globin. Such stem-destabilizing variants appeared early among vertebrates and remained conserved among mammals and primates. Thus, this study has revealed conserved sequence determinants of self-catalytic DNA depurination associated with variability of mutation incidence among human β-globin paralogs. Copyright © 2015 Elsevier B.V. All rights reserved.
    05/2015; 778. DOI:10.1016/j.mrfmmm.2015.05.001
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at
    Scientific Reports 05/2015; 5. DOI:10.1038/srep10576 · 5.08 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The inability to predict long noncoding RNAs from genomic sequence has impeded the use of comparative genomics for studying their biology. Here, we develop methods that use RNA sequencing (RNA-seq) data to annotate the transcriptomes of 16 vertebrates and the echinoid sea urchin, uncovering thousands of previously unannotated genes, most of which produce long intervening noncoding RNAs (lincRNAs). Although in each species, >70% of lincRNAs cannot be traced to homologs in species that diverged >50 million years ago, thousands of human lincRNAs have homologs with similar expression patterns in other species. These homologs share short, 5'-biased patches of sequence conservation nested in exonic architectures that have been extensively rewired, in part by transposable element exonization. Thus, over a thousand human lincRNAs are likely to have conserved functions in mammals, and hundreds beyond mammals, but those functions require only short patches of specific sequences and can tolerate major changes in gene architecture. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
    Cell Reports 05/2015; DOI:10.1016/j.celrep.2015.04.023 · 7.21 Impact Factor


1 Download
Available from