Detection of non-neutral substitution rates on Mammalian phylogenies

Gladstone Institutes, University of California, San Francisco, San Francisco, California 94158, USA.
Genome Research (Impact Factor: 14.63). 10/2009; 20(1):110-21. DOI: 10.1101/gr.097857.109
Source: PubMed


Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny. Here we examine the more general problem of detecting departures from the neutral rate of substitution in either direction, possibly in a clade-specific manner. We consider four statistical, phylogenetic tests for addressing this problem: a likelihood ratio test, a score test, a test based on exact distributions of numbers of substitutions, and the genomic evolutionary rate profiling (GERP) test. All four tests have been implemented in a freely available program called phyloP. Based on extensive simulation experiments, these tests are remarkably similar in statistical power. With 36 mammalian species, they all appear to be capable of fairly good sensitivity with low false-positive rates in detecting strong selection at individual nucleotides, moderate selection in 3-bp elements, and weaker or clade-specific selection in longer elements. By applying phyloP to mammalian multiple alignments from the ENCODE project, we shed light on patterns of conservation/acceleration in known and predicted functional elements, approximate fractions of sites subject to constraint, and differences in clade-specific selection in the primate and glires clades. We also describe new "Conservation" tracks in the UCSC Genome Browser that display both phyloP and phastCons scores for genome-wide alignments of 44 vertebrate species.

20 Reads
  • Source
    • "Whenever needed, we manually investigated the raw sequence reads using the Integrative Genomics Viewer (IGV)[13] to exclude false positives calls. The impact of the selected candidate pathogenic missense mutations on the protein function was assessed using three in silico " pathogenicity " predictor tools such as Polyphen- 2[14], MutationTaster[15] and SIFT[16], accounting also for nucleotide evolutionary conservation across species evaluated by the PhyloP algorithm[17]. Furthermore, in order to verify modifications at protein level the protein models have been created (see Supplementary). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The technological improvements over the last years made considerable progresses in the knowledge of the etiology of Intellectual Disability (ID). However at present very little is known about the genetic heterogeneity underlying the Non-Syndromic form of ID (NS-ID). To investigate the genetic basis of NS-ID we analyzed 43 trios and 22 isolated NS-ID patients using a Targeted Sequencing (TS) approach. 71 NS-ID genes have been selected and sequenced in all subjects. We found putative pathogenic mutations in 7 out of 65 patients. The pathogenic role of mutations was evaluated through sequence comparison and structural analysis was performed to predict the effect of alterations in a 3D computational model through molecular dynamics simulations. Additionally, a deep patient clinical re-evaluation has been performed after the molecular results. This approach allowed us to find novel pathogenic mutations with a detection rate close to 11% in our cohort of patients. This result supports the hypothesis that many NS-ID related genes still remain to be discovered and that NS-ID is a more complex phenotype compared to syndromic form, likely caused by a complex and broad interaction between genes alterations and environment factors.
    Mutation Research - Fundamental and Molecular Mechanisms of Mutagenesis 09/2015; 781. DOI:10.1016/j.mrfmmm.2015.09.002
  • Source
    • "Specifically, we used a maximum likelihood test (Pollard et al. 2010) to first identify 113,577 DHSs that exhibit significant evolutionary constraint across primates, which manifest as regions of low sequence divergence compared with carefully defined putatively neutral flanking sequence (FDR = 0.01) (Fig. 1). Next, for DHSs that are conserved in primates, we performed a second likelihood ratio test (Pollard et al. 2010) and identified 524 regulatory sequences that have experienced a significant acceleration of evolution in the human lineage and therefore exhibit an excess of human-specific substitutions (FDR = 0.05) (Fig. 1; Supplemental Table 2). "
    [Show abstract] [Hide abstract]
    ABSTRACT: It has long been hypothesized that changes in gene regulation have played an important role in human evolution, but regulatory DNA has been much more difficult to study compared with protein-coding regions. Recent large-scale studies have created genome-scale catalogs of DNase I hypersensitive sites (DHSs), which demark potentially functional regulatory DNA. To better define regulatory DNA that has been subject to human-specific adaptive evolution, we performed comprehensive evolutionary and population genetics analyses on over 18 million DHSs discovered in 130 cell types. We identified 524 DHSs that are conserved in nonhuman primates but accelerated in the human lineage (haDHS), and estimate that 70% of substitutions in haDHSs are attributable to positive selection. Through extensive computational and experimental analyses, we demonstrate that haDHSs are often active in brain or neuronal cell types; play an important role in regulating the expression of developmentally important genes, including many transcription factors such as SOX6, POU3F2, and HOX genes; and identify striking examples of adaptive regulatory evolution that may have contributed to human-specific phenotypes. More generally, our results reveal new insights into conserved and adaptive regulatory DNA in humans and refine the set of genomic substrates that distinguish humans from their closest living primate relatives.
  • Source
    • ") and PhyloP, which compares the probability of observed substitutions under the hypothesis of neutral evolutionary rate: positive scores suggest constraint (conservation) (Pollard et al., 2010). Effects of amino acid changes were analysed using SIFT (probability of being pathogenic: 0 = highest; 1 = lowest) (Adzhubei et al., 2010) and PolyPhen-2 (probability of being pathogenic: 0 = lowest; 1 = highest) (Sim et al., 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cerebral palsy is a sporadic disorder with multiple likely aetiologies, but frequently considered to be caused by birth asphyxia. Genetic investigations are rarely performed in patients with cerebral palsy and there is little proven evidence of genetic causes. As part of a large project investigating children with ataxia, we identified four patients in our cohort with a diagnosis of ataxic cerebral palsy. They were investigated using either targeted next generation sequencing or trio-based exome sequencing and were found to have mutations in three different genes, KCNC3, ITPR1 and SPTBN2. All the mutations were de novo and associated with increased paternal age. The mutations were shown to be pathogenic using a combination of bioinformatics analysis and in vitro model systems. This work is the first to report that the ataxic subtype of cerebral palsy can be caused by de novo dominant point mutations, which explains the sporadic nature of these cases. We conclude that at least some subtypes of cerebral palsy may be caused by de novo genetic mutations and patients with a clinical diagnosis of cerebral palsy should be genetically investigated before causation is ascribed to perinatal asphyxia or other aetiologies. © The Author (2015). Published by Oxford University Press on behalf of the Guarantors of Brain.
    Brain 05/2015; 138(7). DOI:10.1093/brain/awv117 · 9.20 Impact Factor
Show more


20 Reads
Available from