Codons Support the Maintenance of Intrinsic DNA Polymer Flexibility over Evolutionary Timescales

T.H. Gosnell School of Life Sciences, Rochester Institute of Technology.
Genome Biology and Evolution (Impact Factor: 4.23). 08/2012; 4(9):870-81. DOI: 10.1093/gbe/evs073
Source: PubMed


Despite our long familiarity with how the genetic code specifies the amino acid sequence, we still know little about why it is organized in the way that it is. Contrary to the view that the organization of the genetic code is a "frozen accident" of evolution, recent studies have demonstrated that it is highly nonrandom, with implications for both codon assignment and usage. We hypothesize that this inherent nonrandomness may facilitate the coexistence of both sequence and structural information in DNA. Here, we take advantage of a simple metric of intrinsic DNA flexibility to analyze mutational effects on the four phosphate linkages present in any given codon. Application of a simple evolutionary neutral model of substitution to random sequences, translated with alternative genetic codes, reveals that the standard code is highly optimized to favor synonymous substitutions that maximize DNA polymer flexibility, potentially counteracting neutral evolutionary drift toward stiffer DNA caused by spontaneous deamination. Comparison to existing mutational patterns in yeast also demonstrates evidence of strong selective constraint on DNA flexibility, especially at so-called "silent" sites. We also report a fundamental relationship between DNA flexibility, codon usage bias, and several important evolutionary descriptors of comparative genomics (e.g., base composition, transition/transversion ratio, and nonsynonymous vs. synonymous substitution rate). Recent advances in structural genomics have emphasized the role of the DNA polymer's flexibility in both gene function and whole genome folding, thereby implicating possible reasons for codons to facilitate the multiplexing of both genetic and structural information within the same molecular context.

Download full-text


Available from: Gregory Alan Babbitt, Jul 16, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: While mRNA stability has been demonstrated to control rates of translation, generating both global and local synonymous codon biases in many unicellular organisms, this explanation cannot adequately explain why codon bias strongly tracks neighboring intergene GC content; suggesting that structural dynamics of DNA might also influence codon choice. Because minor groove width is highly governed by 3- base periodicity in GC, the existence of triplet-based codons might imply a functional role for the optimization of local DNA molecular dynamics via GC content at synonymous sites (≈GC3). We confirm a strong association between GC3-related intrinsic DNA flexibility and codon bias across 24 different prokaryotic multiple whole-genome alignments. We develop a novel test of natural selection targeting synonymous sites and demonstrate that GC3-related DNA backbone dynamics have been subject to moderate selective pressure, perhaps contributing to our observation that many genes possess extreme DNA backbone dynamics for their given protein space. This dual function of codons may impose universal functional constraints affecting the evolution of synonymous and non-synonymous sites. We propose that synonymous sites may have evolved as an ‘accessory’ during an early expansion of a primordial genetic code, allowing formultiplexed protein coding and structural dynamic information within the same molecular context.
    Full-text · Article · Aug 2014 · Nucleic Acids Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: It is now widely-accepted that DNA sequences defining DNA-protein interactions functionally depend upon local biophysical features of DNA backbone that are important in defining sites of binding interaction in the genome (e.g. DNA shape, charge and intrinsic dynamics). However, these physical features of DNA polymer are not directly apparent when analyzing and viewing Shannon information content calculated at single nucleobases in a traditional sequence logo plot. Thus, sequence logos plots are severely limited in that they convey no explicit information regarding the structural dynamics of DNA backbone, a feature often critical to binding specificity. Software and implementation: We present TRX-LOGOS, an R software package and Perl wrapper code that interfaces the JASPAR database for computational regulatory genomics. TRX-LOGOS extends the traditional sequence logo plot to include Shannon information content calculated with regard to the dinucleotide-based BI-BII conformation shifts in phosphate linkages on the DNA backbone, thereby adding a visual measure of intrinsic DNA flexibility that can be critical for many DNA-protein interactions. TRX-LOGOS is available as an R graphics module offered at both SourceForge and as a download supplement at this journal. Results: To demonstrate the general utility of TRX logo plots, we first calculated the information content for 416 Saccharomyces cerevisiae transcription factor binding sites functionally confirmed in the Yeastract database and matched to previously published yeast genomic alignments. We discovered that flanking regions contain significantly elevated information content at phosphate linkages than can be observed at nucleobases. We also examined broader transcription factor classifications defined by the JASPAR database, and discovered that many general signatures of transcription factor binding are locally more information rich at the level of DNA backbone dynamics than nucleobase sequence. We used TRX-logos in combination with MEGA 6.0 software for molecular evolutionary genetics analysis to visually compare the human Forkhead box/FOX protein evolution to its binding site evolution. We also compared the DNA binding signatures of human TP53 tumor suppressor determined by two different laboratory methods (SELEX and ChIP-seq). Further analysis of the entire yeast genome, center aligned at the start codon, also revealed a distinct sequence-independent 3 bp periodic pattern in information content, present only in coding region, and perhaps indicative of the non-random organization of the genetic code. Conclusion: TRX-LOGOS is useful in any situation in which important information content in DNA can be better visualized at the positions of phosphate linkages (i.e. dinucleotides) where the dynamic properties of the DNA backbone functions to facilitate DNA-protein interaction.
    Full-text · Article · Sep 2015 · Source Code for Biology and Medicine
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A long-held presupposition in the field of bioinformatics holds that genetic, and now even epigenetic 'information', can be abstracted from the physicochemical details of the macromolecular polymers in which it resides. It is perhaps rather ironic that this basic conjecture originated upon the first observations of DNA structure itself. This static model of DNA led very quickly to the conclusion that only the nucleobase sequence itself is rich enough in molecular complexity to replicate a complex biology. This idea has been pervasive throughout genomic science, higher education and popular culture ever since; to the point that most of us would accept it unquestioningly as fact. What is more alarming is that this conjecture is driving a significant portion of the technological development in modern genomics towards methods strongly rooted in DNA sequencing, thereby reducing a dynamic multi-dimensional biology into single-dimensional forms of data. Evidence countering this central tenet of bioinformatics has been quietly mounting over many decades, prompting some to propose that the genome must be studied from the perspective of its molecular reality, rather than as a body of information to be represented symbolically. Here, we explore the epistemological boundary between bioinformatics and molecular biology, and warn against an 'overtly' bioinformatic perspective. We review a selection of new bioinformatic methods that move beyond sequence-based approaches to include consideration of databased three dimensional structures. However, we also note that these hybrid methods still ignore the most important element of gene function when attempting to improve outcomes; the fourth dimension of molecular dynamics over time.
    Full-text · Article · Jan 2016 · Gene