Detection of non-neutral substitution rates on Mammalian phylogenies

Gladstone Institutes, University of California, San Francisco, San Francisco, California 94158, USA.
Genome Research (Impact Factor: 14.63). 10/2009; 20(1):110-21. DOI: 10.1101/gr.097857.109
Source: PubMed


Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny. Here we examine the more general problem of detecting departures from the neutral rate of substitution in either direction, possibly in a clade-specific manner. We consider four statistical, phylogenetic tests for addressing this problem: a likelihood ratio test, a score test, a test based on exact distributions of numbers of substitutions, and the genomic evolutionary rate profiling (GERP) test. All four tests have been implemented in a freely available program called phyloP. Based on extensive simulation experiments, these tests are remarkably similar in statistical power. With 36 mammalian species, they all appear to be capable of fairly good sensitivity with low false-positive rates in detecting strong selection at individual nucleotides, moderate selection in 3-bp elements, and weaker or clade-specific selection in longer elements. By applying phyloP to mammalian multiple alignments from the ENCODE project, we shed light on patterns of conservation/acceleration in known and predicted functional elements, approximate fractions of sites subject to constraint, and differences in clade-specific selection in the primate and glires clades. We also describe new "Conservation" tracks in the UCSC Genome Browser that display both phyloP and phastCons scores for genome-wide alignments of 44 vertebrate species.

    • "This hallmark has been successfully applied to alternative splicing to identify putative regulatory elements in alternative exons and flanking intronic sequences (Sugnet et al. 2006; Wang et al. 2008). We compared the PhyloP conservation scores (placental mammals using a 46-way alignment) for different classes of alternative events associated with isoform-specific polyribosome association (Siepel et al. 2005; Pollard et al. 2010). As expected, cassette exons appear to be less conserved than adjacent constitutive exons, suggesting recent evolution (Gelfman et al. 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene expression profiling is widely used as a measure of the protein output of cells. However, it is becoming more evident that there are multiple layers of post-transcriptional gene regulation that greatly impact protein output (Battle et al., Science 347:664-667, 2014; Khan et al., Science 342:1100-1104, 2013; Vogel et al., Mol Syst Biol 6:400, 2010). Alternative splicing (AS) impacts the expression of protein coding genes in several ways. Firstly, AS increases exponentially the coding-capacity of genes generating multiple transcripts from the same genomic sequence. Secondly, alternatively spliced mRNAs are subjected differentially to RNA-degradation via pathways such as nonsense mediated decay (AS-NMD) or microRNAs (Shyu et al., EMBO J 27:471-481, 2008). And thirdly, cytoplasmic export from the nucleus and translation are regulated in an isoform-specific manner, adding an extra layer of regulation that impacts the protein output of the cell (Martin and Ephrussi, Cell 136:719-730, 2009; Sterne-Weiler et al., Genome Res 23:1615-1623, 2013). These data highlight the need of a method that allows analyzing both the nuclear events (AS) and the cytoplasmic fate (polyribosome-binding) of individual mRNA isoforms.In order to determine how alternative splicing determines the polyribosome association of mRNA isoforms we developed Frac-seq. Frac-seq combines subcellular fractionation and high throughput RNA sequencing (RNA-seq). Frac-seq gives a window onto the translational fate of specific alternatively spliced isoforms on a genome-wide scale. There is evidence of preferential translation of specific mRNA isoforms (Coldwell and Morley, Mol Cell Biol 26:8448-8460, 2006; Sanford et al., Genes Dev 18:755-768; Zhong et al., Mol Cell 35:1-10, 2009; Michlewski et al., Mol Cell 30:179-189, 2008); the advantage of Frac-seq is that it allows analyzing the binding of alternatively spliced isoforms to polyribosomes and comparing their relative abundance to the cytosolic fraction. Polyribosomes are resolved by sucrose gradient centrifugation of cytoplasmic extracts, subsequent reading and extraction. The total mRNA fraction is taken prior ultracentrifugation as a measure of all mRNAs present in the sample. Both populations of RNAs are then isolated using phenol-chloroform precipitation; polyadenylated RNAs are selected and converted into libraries and sequenced. Bioinformatics analysis is then performed to measure alternatively spliced isoforms; several tools can be used such as MISO, RSEM, or Cufflinks (Katz et al., Nat Methods 7:1009-1015, 2010; Li and Dewey, BMC Bioinformatics 12:323, 2011; Trapnell et al., Nat Protoc 7:562-578, 2012). Comparison of total mRNAs and polyribosome-bound mRNAs can be used as a measure of the polyribosome association of specific isoforms based on the presence/absence of specific alternative splicing events in each fraction. Frac-seq shows that not all isoforms from a gene are equally loaded into polyribosomes, that mRNA preferential loading does not always correlate to its expression in the cytoplasm and that the presence of specific events such as microRNA binding sites or Premature Termination Codons determine the loading of specific isoforms into polyribosomes.
    No preview · Chapter · Jan 2016
  • Source
    • "We used phyloP combined with the phastCons program in PHAST (Pollard et al. 2010; Hubisz et al. 2011) to scan whole-genome alignments in vertebrates for sequences that are present in therian and non-therian vertebrates but changed significantly in the therian mammal ancestor and remained highly conserved during therian diversification. We identified 177,346 vertebrate genomic regions that are conserved among therians (therian conserved regions; TCRs), of which 4797 have a strong signature for accelerated evolution in the therian ancestor (false discovery rate <1%, table S1, Supplementary Material online). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Mammals have evolved remarkably different sensory, reproductive, metabolic, and skeletal systems. To explore the genetic basis for these differences, we developed a comparative genomics approach to scan whole-genome multiple sequence alignments to identify regions that evolved rapidly in an ancestral lineage but are conserved within extant species. This pattern suggests that ancestral changes in function were maintained in descendants. After applying this test to therian mammals, we identified 4797 accelerated regions, many of which are non-coding and located near developmental transcription factors. We then used mouse transgenic reporter assays to test if non-coding accelerated regions are enhancers and to determine how therian-specific substitutions affect their activity in vivo. We discovered enhancers with expression specific to the therian version in brain regions involved in the hormonal control of milk ejection, uterine contractions, blood pressure, temperature, and visual processing. This work underscores the idea that changes in developmental gene expression are important for mammalian evolution, and it pinpoints candidate genes for unique aspects of mammalian biology.
    Preview · Article · Dec 2015 · Molecular Biology and Evolution
  • Source
    • "Both the proband and the unaffected mother harboured the heterozygous sequence change NCBI36/hg18:chr18:2,631,610 T > C. This nucleotide is conserved in primates (human, chimpanzee , gorilla, orang-utan, gibbon, rhesus monkey, crab-eating macaque, baboon, green monkey and squirrel monkey); the evolutionary conservation score, phyloP[20], calculated across 44 vertebrates, was 0.557, indicating a conserved nucleotide. The second novel sequence change, NCBI36/hg18: 2,631,886 G > A, was found in the proband from family 2. Molecular testing based on EcoRI/BlnI analysis[10]identified two intact D4Z4 repeats in the proband and >11 in the mother. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Facioscapulohumeral dystrophy (FSHD) is commonly associated with contraction of the D4Z4 macro-satellite repeat on chromosome 4q35 (FSHD1) or mutations in the SMCHD1 gene (FSHD2). Recent studies have shown that the clinical manifestation of FSHD1 can be modified by mutations in the SMCHD1 gene within a given family. The absence of either D4Z4 contraction or SMCHD1 mutations in a small cohort of patients suggests that the disease could also be due to disruption of gene regulation. In this study, we postulated that mutations responsible for exerting a modifier effect on FSHD might reside within remotely acting regulatory elements that have the potential to interact at a distance with their cognate gene promoter via chromatin looping. To explore this postulate, genome-wide Hi-C data were used to identify genomic fragments displaying the strongest interaction with the SMCHD1 gene. These fragments were then narrowed down to shorter regions using ENCODE and FANTOM data on transcription factor binding sites and epigenetic marks characteristic of promoters, enhancers and silencers. We identified two regions, located respectively ~14 and ~85 kb upstream of the SMCHD1 gene, which were then sequenced in 229 FSHD/FSHD-like patients (200 with D4Z4 repeat units <11). Three heterozygous sequence variants were found ~14 kb upstream of the SMCHD1 gene. One of these variants was found to be of potential functional significance based on DNA methylation analysis. Further functional ascertainment will be required in order to establish the clinical/functional significance of the variants found. In this study, we propose an improved approach to predict the possible locations of remotely acting regulatory elements that might influence the transcriptional regulation of their associated gene(s). It represents a new way to screen for disease-relevant mutations beyond the immediate vicinity of the specific disease gene. It promises to be useful for investigating disorders in which mutations could occur in remotely acting regulatory elements.
    Full-text · Article · Dec 2015 · Human genomics
Show more