Detection of nonneutral substitution rates on mammalian phylogenies.

Gladstone Institutes, University of California, San Francisco, San Francisco, California 94158, USA.
Genome Research (Impact Factor: 13.85). 10/2009; 20(1):110-21. DOI: 10.1101/gr.097857.109
Source: PubMed

ABSTRACT Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny. Here we examine the more general problem of detecting departures from the neutral rate of substitution in either direction, possibly in a clade-specific manner. We consider four statistical, phylogenetic tests for addressing this problem: a likelihood ratio test, a score test, a test based on exact distributions of numbers of substitutions, and the genomic evolutionary rate profiling (GERP) test. All four tests have been implemented in a freely available program called phyloP. Based on extensive simulation experiments, these tests are remarkably similar in statistical power. With 36 mammalian species, they all appear to be capable of fairly good sensitivity with low false-positive rates in detecting strong selection at individual nucleotides, moderate selection in 3-bp elements, and weaker or clade-specific selection in longer elements. By applying phyloP to mammalian multiple alignments from the ENCODE project, we shed light on patterns of conservation/acceleration in known and predicted functional elements, approximate fractions of sites subject to constraint, and differences in clade-specific selection in the primate and glires clades. We also describe new "Conservation" tracks in the UCSC Genome Browser that display both phyloP and phastCons scores for genome-wide alignments of 44 vertebrate species.

  • [Show abstract] [Hide abstract]
    ABSTRACT: RNA-binding proteins control many aspects of cellular biology through binding single-stranded RNA binding motifs (RBMs). However, RBMs can be buried within their local RNA structures, thus inhibiting RNA-protein interactions. N(6)-methyladenosine (m(6)A), the most abundant and dynamic internal modification in eukaryotic messenger RNA, can be selectively recognized by the YTHDF2 protein to affect the stability of cytoplasmic mRNAs, but how m(6)A achieves its wide-ranging physiological role needs further exploration. Here we show in human cells that m(6)A controls the RNA-structure-dependent accessibility of RBMs to affect RNA-protein interactions for biological regulation; we term this mechanism 'the m(6)A-switch'. We found that m(6)A alters the local structure in mRNA and long non-coding RNA (lncRNA) to facilitate binding of heterogeneous nuclear ribonucleoprotein C (HNRNPC), an abundant nuclear RNA-binding protein responsible for pre-mRNA processing. Combining photoactivatable-ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP) and anti-m(6)A immunoprecipitation (MeRIP) approaches enabled us to identify 39,060 m(6)A-switches among HNRNPC-binding sites; and global m(6)A reduction decreased HNRNPC binding at 2,798 high-confidence m(6)A-switches. We determined that these m(6)A-switch-regulated HNRNPC-binding activities affect the abundance as well as alternative splicing of target mRNAs, demonstrating the regulatory role of m(6)A-switches on gene expression and RNA maturation. Our results illustrate how RNA-binding proteins gain regulated access to their RBMs through m(6)A-dependent RNA structural remodelling, and provide a new direction for investigating RNA-modification-coded cellular biology.
    Nature 02/2015; 518(7540):560-4. DOI:10.1038/nature14234 · 42.35 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Purpose: Familial exudative vitreoretinopathy (FEVR) is a developmental disease that can cause visual impairment and retinal detachment at a young age. Four genes involved in the Wnt signaling pathway were previously linked to this disease: NDP, FDZ4, LRP5, and TSPAN12. Identification of novel disease causing alleles, allows for a deeper understanding of the disease, better molecular diagnosis, and improved treatment. Methods: Sequencing libraries from 92 FEVR patients were generated using a custom capture panel to enrich for 163 known or suspected retinal disease causing genes in humans. Samples were processed using next generation sequencing (NGS) techniques followed by data analysis to identify and classify single nucleotide variants and indels. Sanger validation and segregation testing were used to verify suspected variants. This is the largest study of a FEVR cohort utilizing NGS that we are aware of. Results: Of the cohort of 92, 45 patients were potentially solved (48.9%). Solved cases resulted from the determination of 49 unique variants, 41 of which are novel. 18 of the novel variants discovered were highly likely to cause FEVR due to the nature of these variants (frameshifting indels, splicing mutations, and nonsense variants types) Conclusions: We were able to determine probable disease causing variants in a large number of FEVR patients, the majority of which were novel. Knowledge of these variants will help to further characterize and diagnose FEVR. Copyright © 2015 by Association for Research in Vision and Ophthalmology.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Epigenomic data from ENCODE can be used to associate specific combinations of chromatin marks with regulatory elements in the human genome. Hidden Markov models and the expectation-maximization (EM) algorithm are often used to analyze epigenomic data. However, the EM algorithm can have overfitting problems in data sets where the chromatin states show high class-imbalance and it is often slow to converge. Here we use spectral learning instead of EM and find that our software Spectacle overcame these problems. Furthermore, Spectacle is able to find enhancer subtypes not found by ChromHMM but strongly enriched in GWAS SNPs. Spectacle is available at
    Genome Biology 12/2015; 16(1). DOI:10.1186/s13059-015-0598-0 · 10.47 Impact Factor


1 Download
Available from