Exonic Transcription Factor Binding Directs Codon Choice and Affects Protein Evolution

Science (Impact Factor: 33.61). 12/2013; 342(6164):1367-1372. DOI: 10.1126/science.1243490
Source: PubMed


Genomes contain both a genetic code specifying amino acids and a regulatory code specifying transcription factor (TF) recognition
sequences. We used genomic deoxyribonuclease I footprinting to map nucleotide resolution TF occupancy across the human exome
in 81 diverse cell types. We found that ~15% of human codons are dual-use codons (“duons”) that simultaneously specify both
amino acids and TF recognition sites. Duons are highly conserved and have shaped protein evolution, and TF-imposed constraint
appears to be a major driver of codon usage bias. Conversely, the regulatory code has been selectively depleted of TFs that
recognize stop codons. More than 17% of single-nucleotide variants within duons directly alter TF binding. Pervasive dual
encoding of amino acid and regulatory information appears to be a fundamental feature of genome evolution.

98 Reads
  • Source
    • "For the latter data set, there were only raw sequencing reads available for the 40 cell types when we started this study. Because processing the huge number of reads to call footprints is both time-consuming and sensitive to informatics pipelines, which may create inconsistency between our data and the data of Stergachis et al. (2013), in this study we considered only the approximately 8.4 million footprints provided by Neph et al. (2012). We show that the biased codon usage and evolutionary conservation within DNases I footprints (i.e., the duon hypothesis) based on the footprinting data of 41 + 40 = 81 cells can be well reproduced using the footprints from the 41 cells (supplementary fig. "
    [Show abstract] [Hide abstract]
    ABSTRACT: There are two distinct types of DNA sequences, namely coding sequences and regulatory sequences, in a genome. A recent study of the occupancy of transcription factors (TFs) in human cells suggested that protein-coding sequences also serve as the codes of TF occupancy, and proposed a "duon" hypothesis in which up to 15% of codons of human protein genes are constrained by the additional coding requirements that regulate gene expression. This hypothesis challenges our basic understanding on the human genome. We re-analyzed the data and found that the previous study was confounded by ascertainment bias related to base composition. Using an unbiased comparison in which G/C and A/T sites are considered separately we reveal a similar level of conservation between TF-bound codons and TF-depleted codons, suggesting largely no extra purifying selection provided by the TF occupancy on the codons of human genes. Given the generally short binding motifs of TFs and the open chromatin structure during transcription, we argue that the occupancy of TFs on protein-coding sequences is mostly passive and evolutionarily neutral, with to-be-determined functions in the regulation of gene expression. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail:
    Preview · Article · Jan 2015 · Molecular Biology and Evolution
  • Source
    • "(Stergachis et al., 2013; Weatheritt and Babu, 2013 "
    [Show abstract] [Hide abstract]
    ABSTRACT: Early global measures of genome complexity (power spectra, the analysis of fluctuations in DNA walks or compositional segmentation) uncovered a high degree of complexity in eukaryotic genome sequences. The main evolutionary mechanisms leading to increases in genome complexity (i.e. gene duplication and transposon proliferation) can all potentially produce increases in DNA clustering. To quantify such clustering and provide a genome-wide description of the formed clusters, we developed GenomeCluster, an algorithm able to detect clusters of whatever genome element identified by chromosome coordinates. We obtained a detailed description of clusters for ten categories of human genome elements, including functional (genes, exons, introns), regulatory (CpG islands, TFBSs, enhancers), variant (SNPs) and repeat (Alus, LINE1) elements, as well as DNase hypersensitivity sites. For each category, we located their clusters in the human genome, then quantifying cluster length and composition, and estimated the clustering level as the proportion of clustered genome elements. In average, we found a 27% of elements in clusters, although a considerable variation occurs among different categories. Genes form the lowest number of clusters, but these are the longest ones, both in bp and the average number of components, while the shortest clusters are formed by SNPs. Functional and regulatory elements (genes, CpG islands, TFBSs, enhancers) show the highest clustering level, as compared to DNase sites, repeats (Alus, LINE1) or SNPs. Many of the genome elements we analyzed are known to be composed of clusters of low-level entities. In addition, we found here that the clusters generated by GenomeCluster can be in turn clustered into high-level super-clusters. The observation of ‘clusters-within-clusters’ parallels the ‘domains within domains’ phenomenon previously detected through global statistical methods in eukaryotic sequences, and reveals a complex human genome landscape dominated by hierarchical clustering.
    Full-text · Article · Dec 2014 · Computational Biology and Chemistry
  • Source
    • "To date, the role of gene body methylation remains unclear, although intriguing correlations have been identified related to differential promoter use and alternative splicing (Maunakea et al., 2010; Shukla et al., 2011). The recent discovery of dual-use codons (duons) for transcription factor binding generates further possible regulatory roles for cytosine modifications in gene bodies, as it could impact transcription factor binding (Stergachis et al., 2013). In this regard, our BGS and TAB-BGS results from the HOXA9 locus showing elevated 5hmC focused specifically at an exon-intron junction upon depletion of DNMT3B are intriguing. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Global patterns of DNA methylation, mediated by the DNA methyltransferases (DNMTs), are disrupted in all cancers by mechanisms that remain largely unknown, hampering their development as therapeutic targets. Combinatorial acute depletion of all DNMTs in a pluripotent human tumor cell line, followed by epigenome and transcriptome analysis, revealed DNMT functions in fine detail. DNMT3B occupancy regulates methylation during differentiation, whereas an unexpected interplay was discovered in which DNMT1 and DNMT3B antithetically regulate methylation and hydroxymethylation in gene bodies, a finding confirmed in other cell types. DNMT3B mediated non-CpG methylation, whereas DNMT3L influenced the activity of DNMT3B toward non-CpG versus CpG site methylation. Altogether, these data reveal functional targets of each DNMT, suggesting that isoform selective inhibition would be therapeutically advantageous. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
    Full-text · Article · Nov 2014 · Cell Reports
Show more