Article

The Triplet Code From First Principles

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Temporal order ("chronology") of appearance of amino acids and their respective codons on evolutionary scene is reconstructed. A consensus chronology of amino acids is built on the basis of 60 different criteria each offering certain temporal order. After several steps of filtering the chronology vectors are averaged resulting in the consensus order: G, A, D, V, P, S, E, (L, T), R, (I, Q, N), H, K, C, F, Y, M, W. It reveals two important features: the amino acids synthesized in imitation experiments of S. Miller appeared first, while the amino acids associated with codon capture events came last. The reconstruction of codon chronology is based on the above consensus temporal order of amino acids, supplemented by the stability and complementarity rules first suggested by M. Eigen and P. Schuster, and on the earlier established processivity rule. At no point in the reconstruction the consensus amino-acid chronology was in conflict with these three rules. The derived genealogy of all 64 codons suggested several important predictions that are confirmed. The reconstruction of the origin and evolutionary history of the triplet code becomes, thus, a powerful research tool for molecular evolution studies, especially in its early stages.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Many attempts have been made to derive a universal order of the recruitment of amino acids during evolution (Trifonov 2000;Trifonov 2004). Using a combination of 60 different criteria, Trifonov reconstructed a 'consensus temporal order of amino acids' (Trifonov 2004). ...
... Many attempts have been made to derive a universal order of the recruitment of amino acids during evolution (Trifonov 2000;Trifonov 2004). Using a combination of 60 different criteria, Trifonov reconstructed a 'consensus temporal order of amino acids' (Trifonov 2004). Although this consensus order has been criticized on several grounds (Knight 2001), it should be noted that the resulting list of amino acids is in a nearly perfect agreement with the combined results of Miller and Urey type experiments. ...
... small and sensitive to the choice of N ( Fig. 9 and data not shown) but the non-optimality of the assignments of Tyr, Cys, and Trp was striking and is unambiguous (Fig. 9). Taking into account that Tyr, Cys, and Trp are among the 'latest' amino acids according to Trifonov's consensus of amino acid appearance (Trifonov 2004), and that they are coded by supercodons with the lowest stability of codon-anticodon interactions (Fig. 2), it appears most likely that the primordial 2-letter genetic code did not accommodate these amino acids that were added to the amino acid repertoire only after the transition to the standard 3-letter code. Given these observations, we assessed the error minimization level of 2-letter codes without assigning the supercodons UAN and UGN (Fig. 10). ...
Preprint
We investigated the error-minimization properties of putative primordial codes that consisted of 16 supercodons, with the third base being completely redundant, using a previously derived cost function and the error minimization percentage as the measure of a code's robustness to mistranslation. It is shown that, when the 16-supercodon table is populated with 10 putative primordial amino acids, inferred from the results of abiotic synthesis experiments and other evidence independent of the code evolution, and with minimal assumptions used to assign the remaining supercodons, the resulting 2-letter codes are nearly optimal in terms of the error minimization level. The results of the computational experiments with putative primordial genetic codes that contained only two meaningful letters in all codons and encoded 10 to 16 amino acids indicate that such codes are likely to have been nearly optimal with respect to the minimization of translation errors. This near-optimality could be the outcome of extensive early selection during the co-evolution of the code with the primordial, error-prone translation system, or a result of a unique, accidental event. Under this hypothesis, the subsequent expansion of the code resulted in a decrease of the error minimization level that became sustainable owing to the evolution of a high-fidelity translation system.
... This assay and related techniques leverage highthroughput sequencing to measure the activity of thousands of candidate sequences in a mixed pool [50][51][52][53] . The six substrates (analogs of tryptophan, phenylalanine, leucine, isoleucine, valine, and methionine) represent a range of sizes and biochemical classes (aromatic, aliphatic, sulfur-containing), as well as amino acids thought to be early (Leu, Ile, Val) and late (Trp, Phe, Met) incorporations into the genetic code [54][55][56][57][58] . Because of this span, the chosen amino acids should be considered model systems to study trends in rate enhancement, specificity, and proximity of ribozymes in sequence space, rather than as a detailed model of the early prebiotic emergence of the genetic code. ...
... Within the group of small hydrophobic side chains, both β-branched and -unbranched residues were included. The set includes side chains that are considered early as well as side chains that are considered late additions to the genetic code [54][55][56][57][58] . In particular, aromatic residues, of which two were chosen to assess specificity within the class, are thought to have been added relatively late. ...
... Such properties of overlapping fitness landscapes could facilitate the expansion from a weakly active, promiscuous ribozyme to an elaborated system of ribozyme-substrate pairs. While the order in which amino acids were incorporated into the genetic code is a subject of debate, the amino acid substrates tested here include those that are generally believed to be early (L, I, V) and late (W, F, M) additions to the code [54][55][56][57][58] . The aromatic residues were generally preferred by all ribozyme families. ...
Article
Full-text available
Systems of catalytic RNAs presumably gave rise to important evolutionary innovations, such as the genetic code. Such systems may exhibit particular tolerance to errors (error minimization) as well as coding specificity. While often assumed to result from natural selection, error minimization may instead be an emergent by-product. In an RNA world, a system of self-aminoacylating ribozymes could enforce the mapping of amino acids to anticodons. We measured the activity of thousands of ribozyme mutants on alternative substrates (activated analogs for tryptophan, phenylalanine, leucine, isoleucine, valine, and methionine). Related ribozymes exhibited shared preferences for substrates, indicating that adoption of additional amino acids by existing ribozymes would itself lead to error minimization. Furthermore, ribozyme activity was positively correlated with specificity, indicating that selection for increased activity would also lead to increased specificity. These results demonstrate that by-products of ribozyme evolution could lead to adaptive value in specificity and error tolerance. Complex biochemical systems exhibit traits that appear to be highly adapted. Studies of catalytic RNA demonstrate that adaptive traits, such as increased specificity and error tolerance, could originate as evolutionary by-products.
... Although several analyses suggest the probable sequence of the late amino acid incorporation into the genetic code, many debates and questions about the order and factors that influenced these events remain open [56,130,131]. ...
... To start with, cysteine is regarded as one of the latest additions to the code according to the Trifonov meta-analysis [130]. At the same time, it is one of the most active and unique amino acids involved particularly in Fe-S clusters (such as in ferredoxin, considered one of the earliest protein domains) and conflicting hypotheses have been proposed as to whether these features were indispensable in early evolution. ...
... Finally, Met, Trp and Tyr are considered the latest additions to the amino acid alphabet [58,130]. A study by Granold et al. [58] suggests that they were incorporated into the genetic code during the great oxidation event as they showed antioxidant properties. ...
Article
Full-text available
Recent developments in Origins of Life research have focused on substantiating the narrative of an abiotic emergence of nucleic acids from organic molecules of low molecular weight, a paradigm that typically sidelines the roles of peptides. Nevertheless, the simple synthesis of amino acids, the facile nature of their activation and condensation, their ability to recognize metals and cofactors and their remarkable capacity to self-assemble make peptides (and their analogues) favourable candidates for one of the earliest functional polymers. In this mini-review, we explore the ramifications of this hypothesis. Diverse lines of research in molecular biology, bioinformatics, geochemistry, biophysics and astrobiology provide clues about the progression and early evolution of proteins, and lend credence to the idea that early peptides served many central prebiotic roles before they were encodable by a polynucleotide template, in a putative ‘peptide-polynucleotide stage’. For example, early peptides and mini-proteins could have served as catalysts, compartments and structural hubs. In sum, we shed light on the role of early peptides and small proteins before and during the nucleotide world, in which nascent life fully grasped the potential of primordial proteins, and which has left an imprint on the idiosyncratic properties of extant proteins.
... In the book Genesis and Evolutionary Development of Life, Oparin proposed that prokaryotes may have emerged three to four billion years ago with a highly ambiguous and/or primitive genetic sequence encoding proteins built from about seven most abundant amino acids in the primordial soup. Therefore, LUCA may just have had a simpler genetic codon system than that in present-day life forms [6], which gained higher complexity along the process of life evolution [7,8]. ...
... It is considered that the process of amino acid acquisition and reduction during life evolution follows the pattern assumed in the neutral model theory [9,10]. Specifically, newly recruited amino acids will sacrifice the use of older ones to gain a foothold [7,11,12]. However, the introduction of new amino acids must be slow because frequent changes in protein composition can be fatal to life [13,14]. ...
... In 1975, Wong proposed an evolutionary map of the genetic code and defined two minor amino acid centres (Phe-Tyr and Val-Leu) [17]. In 2000 and 2004, Trifonov suggested the temporal appearance order of amino acids and their respective codons based on various criteria such as thermostability, complementarity, and processivity [7,18]. The coevolution of amino acids and genetic codons is still ongoing today [19,20] and is supported by several recent findings. ...
Article
Full-text available
The mechanisms shaping the amino acids recruitment pattern into the proteins in the early life history presently remains a huge mystery. In this study, we conducted genome-wide analyses of amino acids usage and genetic codons structure in 7270 species across three domains of life. The carried-out analyses evidenced ubiquitous usage bias of amino acids that were likely independent from codon usage bias. Taking advantage of codon usage bias, we performed pseudotime analysis to re-determine the chronological order of the species emergence, which inspired a new species relationship by tracing the imprint of codon usage evolution. Furthermore, the multidimensional data integration showed that the amino acids A, D, E, G, L, P, R, S, T and V might be the first recruited into the last universal common ancestry (LUCA) proteins. The data analysis also indicated that the remaining amino acids most probably were gradually incorporated into proteogenesis process in the course of two long-timescale parallel evolutionary routes: I→F→Y→C→M→W and K→N→Q→H. This study provides new insight into the origin of life, particularly in terms of the basic protein composition of early life. Our work provides crucial information that will help in a further understanding of protein structure and function in relation to their evolutionary history.
... Specifically, it has been shown that the non-random assignments of amino acids in the standard code can be almost completely explained by incremental code evolution by codon capture or ambiguity reduction processes. However, this conclusion relies on the exact order of amino acids recruitment to the genetic code (Trifonov 2000;Trifonov 2004), primarily, on a specific interpretation of the evolution of biosynthetic pathways for amino acids, which remains a controversial issue. ...
... Phase 1 amino acids are orange, and phase 2 amino acids are green. The numbers show the order of amino acid appearance in the code according to (Trifonov 2004). The arrows define 13 precursor-product pairs of amino acids, their color defines the biosynthetic families of Glu (blue), Asp (dark-green), Phe (magenta), Ser (red), and Val (light-green) validity of the statistical analysis of Wong (Wong 1975) appears dubious (Ronneberg, Landweber, and Freeland 2000). ...
Preprint
The genetic code is nearly universal, and the arrangement of the codons in the standard codon table is highly non-random. The three main concepts on origin and evolution of the code are the stereochemical theory; the coevolution theory; and the error minimization theory. These theories are not mutually exclusive and are also compatible with the frozen accident hypothesis. Mathematical analysis of the structure and possible evolutionary trajectories of the code shows that it is highly robust to translational error but there is a huge number of more robust codes, so that the standard code potentially could evolve from a random code via a short sequence of codon series reassignments. Thus, much of the evolution that led to the standard code can be interpreted as a combination of frozen accident with selection for translational error minimization although contributions from coevolution of the code with metabolic pathways and/or weak affinities between amino acids and nucleotide triplets cannot be ruled out. However, such scenarios for the code evolution are based on formal schemes whose relevance to the actual primordial evolution is uncertain, so much caution in interpretation is necessary. A real understanding of the code's origin and evolution is likely to be attainable only in conjunction with a credible scenario for the evolution of the coding principle itself and the translation system.
... Trifonov [52] на основе комплексного анализа 60 физических, химических и биохимических свойств аминокислот и соответствующих им кодонов/антикодонов установил временную последовательность («хронологию») аминокислот и кодонов в генетическом коде. Финальный консенсус выявил следующий временной порядок канонических аминокислот: 1-Gly, 2-Ala, 3-Asp, 4-Val, 5-Pro, 6-Ser, 7-Glu, -Thr, 9-Leu, 10-Arg, 11-Ile, 12-Gln, 13-Asn, 14-His, 15-Lys, 16-Cys, 17-Phe, 18-Tyr, 19-Met, 20-Trp. ...
... Затем Higgs и Pudritz [20] для углубленного анализа хронологии аминокислот в коде использовали дополнительные термодинамические критерии. Установленный ими временной порядок аминокислот почти полностью совпал с таковым по Trifonov [52], но Pro был переставлен с пятой на седьмую позицию вместо Glu. Также, согласно теории коэволюции [51], Glu напрямую продуцируется из α-кетоглутарата, а Pro является производной иминокислотой из Glu. ...
... It is usually enhanced by the endophytic microbial communities living inside healthy host tissues (seed, root, leaf, flower), playing an important role in mitigating abiotic and biotic stresses in plants. (NOTE: Because of its central role in linking amino acids with nucleotide triplets contained in tRNAs, tRNA ligase is thought to be among the first proteins that appeared in evolution [5]). The evolutionary analyses presented in this phylogenetic tree were conducted in MEGA X [6,7]. ...
... The evolutionary analyses presented in this phylogenetic tree were conducted in MEGA X [6,7]. (NOTE: Because of its central role in linking amino acids with nucleotide triplets contained in tRNAs, tRNA ligase is thought to be among the first proteins that appeared in evolution [5]). The evolutionary analyses presented in this phylogenetic tree were conducted in MEGA X [6,7]. ...
Article
Full-text available
According to the World Health Organization (WHO), depression is a leading cause of disability worldwide and a major contributor to the overall global burden of mental disorders. An increasing number of studies have revealed that among 20 different amino acids, high proline consumption is a dietary factor with the strongest impact on depression in humans and animals, including insects. Recent studies acknowledged that gut microbiota play a key role in proline-related pathophysiology of depression. In addition, the multi-omics approach has alleged that a high level of metabolite proline is directly linked to depression severity, while variations in levels of circulating proline are dependent on microbiome composition. The gut–brain axis proline analysis is a gut microbiome model of studying depression, highlighting the critical importance of diet, but nothing is known about the role of the plant microbiome–food axis in determining proline concentration in the diet and thus about preventing excessive proline intake through food consumption. In this paper, we discuss the protocooperative potential of a holistic study approach combining the microbiota–gut–brain axis with the microbiota–plant–food–diet axis, as both are involved in proline biogenesis and metabolism and thus on in its effect on mood and cognitive function. In preharvest agriculture, the main scientific focus must be directed towards plant symbiotic endophytes, as scavengers of abiotic stresses in plants and modulators of high proline concentration in crops/legumes/vegetables under climate change. It is also implied that postharvest agriculture—including industrial food processing—may be critical in designing a proline-balanced diet, especially if corroborated with microbiome-based preharvest agriculture, within a circular agrifood system. The microbiome is suggested as a target for selecting beneficial plant endophytes in aiming for a balanced dietary proline content, as it is involved in the physiology and energy metabolism of eukaryotic plant/human/animal/insect hosts, i.e., in core aspects of this amino acid network, while opening new venues for an efficient treatment of depression that can be adapted to vast groups of consumers and patients. In that regard, the use of artificial intelligence (AI) and molecular biomarkers combined with rapid and non-destructive imaging technologies were also discussed in the scope of enhancing integrative science outcomes, agricultural efficiencies, and diagnostic medical precisions.
... The latter are also proteinogenic amino acids, but are not believed to have been encoded early, based on the proposed genealogy of the 64 codons of the genetic code. [39,40] It is not unlikely that they were present in prebiotic mixtures, though. Phenylalanine has been found in the product mixture of the Miller experiment. ...
... Today's biosynthesis of tryptophan is long and complex, and the aromatic amino acids are known to have appeared late in the genetic code. [39,40] From a chemical point of view, though, it is not unreasonable to assume that the known abiotic syntheses of tryptophan could have produced concentration as high as those found in the oceanic lithosphere. [45,46] Indoles are resonance-stabilized aromatic compounds that can form readily under a range of chemical conditions, as known from the Fischer indole synthesis, the Reissert indole synthesis, the Bartoli indole synthesis, the Japp-Klingemann reaction, the Leimgruber-Batcho indole synthesis, the Madelung synthesis, the Nenitzescu indole synthesis, and the Baeyer-Emmerling indole synthesis, to name just a few. ...
Article
Full-text available
The formation of peptides from amino acids is one of the processes associated with life. Because of the dominant role of translation in extant biology, peptide‐forming processes that are RNA induced are of particular interest. We have previously reported the formation of phosphoramidate‐linked peptido RNAs as the products of spontaneous condensation reactions between ribonucleotides and free amino acids in aqueous solution. We now asked whether four‐helix bundle (4HB) DNA or RNA folding motifs with a single‐ or double‐nucleotide gap next to a 5’‐phosphate can act as reaction sites for phosphoramidate formation. For glycine, this was found to be the case, whereas phenylalanine and tryptophan showed accelerated formation of peptides without a covalent link to the nucleic acid. Free peptides with up to 11 tryptophan or phenylalanine residues were found in precipitates forming in the presence of gap‐containing DNA or RNA 4HBs. Control experiments using motifs with just a nick or primer alone did not have the same effect. Because folded structures with a gap in a double helix are likely products of hybridization of strands formed in statistically controlled oligomerization reactions, our results are interesting in the context of prebiotic scenarios. Independent of a putative role in evolution, our findings suggest that for some aromatic amino acids an RNA‐induced pathway for oligomerization exists that does not have a discernable link to translation.
... The set of prebiotically available amino acids has been studied in prebiotic model reactions, [18][19][20][36][37][38] and several approaches comparing the amino acid's chemical complexity and the comparisons of their codons. 39 Because the current study is focused on the earliest self-replicating systems long before LUCA, the results from prebiotic model reactions were prioritized. The frequency of the ten amino acids in the ten peptides used in this study was based on their abundance in a representative prebiotic model reaction that was analyzed by modern LC-MS, and should be seen as one of the many, similar plausible distributions for prebiotic scenarios: 18,19 22 Â glycine, 14 Â alanine, 9 Â aspartate, 8 Â b-alanine, 6 Â a-amino butyric acid, 6 Â b-amino butyric acid, 5 Â valine, 5 Â a-amino isobutyric acid, 3 Â g-amino butyric acid, 2 Â serine. ...
Article
Full-text available
From a library of 10 ¹⁴ RNA sequences, the strongest benefit of a prebiotically plausible peptide was by peptide 4 (arrow) on ribozyme S2. The names 1–10 of ten tested peptides are indicated, together with their overall charges (0, −, +).
... On the other side, retrieving tRNAs does not rely on what type of ExGC the tRNA molecule encodes, but rather on the antiquity of incorporation of the tRNA-activating aa into the genetic code, which means that the more ancient is the incorporation of the corresponding aa, the better the recovery of the tRNA molecule. For example, glycyl-tRNA (Gly-tRNA) is better recovered than triptophanyl-tRNA (Trp-tRNA), because Gly was among the first aa to be incorporated into the genetic code and Trp among the latter [171,172]. ...
Article
From the most ancient RNAs, which followed an RNY pattern and folded into small hairpins , modern RNA molecules evolved by two different pathways, dubbed Extended Genetic Code 1 and 2, finally conforming to the current standard genetic code. Herein, we describe the evolutionary path of the RNAome based on these evolutionary routes. In general, all the RNA molecules analysed contain portions encoded by both genetic codes, but crucial features seem to be better recovered by Extended 2 triplets. In particular, the whole Peptidyl Transferase Centre, anti-Shine-Dalgarno motif, and a characteristic quadruplet of the RNA moiety of RNAse-P are clearly unveiled. Differences between bacteria and archaea are also detected; in most cases, the biological sequences are more stable than their controls. We then describe an evolutionary trajectory of the RNAome formation, based on two complementary evolutionary routes: one leading to the formation of essen-tials, while the other complemented the molecules, with the cooperative assembly of their constituents giving rise to modern RNAs.
... AMY3 deglutathionylation and S-S reduction was respectively promoted by GRX and TRX [161]. 14 Implications of reversible S-glutathionylation in the regulation of glycolytic and respiratory metabolism was reviewed a short while ago [6]. More recently, the cytosolic NADP-dependent isocitrate dehydrogenase (cICDH) was shown to be subject to regulatory S-glutathionylation ( Figure 4A) in a study that provides an example of GSNO as S-glutathionylation agent [174]. ...
Preprint
Full-text available
Cys is one of the least abundant amino acids in proteins. However, it is often highly conserved and usually found in important structural and functional regions of proteins. Its unique chemical properties allow it to undergo several post-translational modifications, many of which mediated by reactive oxygen, nitrogen, sulfur or carbonyl species. Thus, in addition to their role in catalysis, protein stability and metal binding, Cys residues are crucial for redox regulation of metabolism and signal transduction. In this review, we discuss Cys post-translational modifications (PTMs) and their role in plant metabolism and signal transduction. These modifications include oxidation of the thiol group (S-sulfenylation, S-sulfinylation and S-sulfonylation), the formation of disulfide bridges, S-glutathionylation, persulfidation, S-cyanylation S-nitrosation, S-carbonylation, S-acylation, prenylation, CoAlation as well as the formation of thiohemiacetal. For each of these PTMs, we discuss the origin of the modifier, the mechanisms involved in PTM, as well as their reversibility. Examples of the involvement of Cys PTMs in the modulation of protein structure, function, stability, and localization are presented to highlight their importance in the regulation of plant metabolic and signaling pathways.
... [18] for the latter). The recruitment order of codons in the literatures [19] generally agrees with the recruitment order of codons in the roadmap. And the recruitment order of the amino acids obtained from the roadmap can be used to explain the variation trends of amino acid frequencies [20] (Fig S2b4, S2b5). ...
Preprint
The post-genomic era has brought opportunities to bridge traditionally separate fields on early history of life. New methods promote a deeper understanding of the origin of biodiversity. Relative stabilities of base triplexes are able to regulate base substitutions in triplex DNAs. We constructed a roadmap based on such a regulation to explain concurrent origins of the genetic code and the homochirality of life. Based on the recruitment order of codons in the roadmap and the complete genome sequences, we reconstructed the three-domain tree of life. The Phanerozoic biodiversity curve has been reconstructed based on genomic, climatic and eustatic data; this result supports tectonic cause of mass extinctions. Our results indicate that chirality played a crucial role in the origin and evolution of life. Here is Part I of my two-part series paper; technical details are in Part II of this paper (see “Concurrent origins of the genetic code and the homochirality of life, and the origin and evolution of biodiversity. Part II: Technical appendix” on bioRxiv).
... It is likely that amino acids existed before the genetic code and so thermodynamics provides the best explanation in terms of their prevalence and evolution through time (Vranova et al., 2011). In this scenario the genetic code maps onto the thermodynamic outcome of amino acid synthesis (Trifonov, 2004), where a peptide world predates an nucleic acid world (Fried et al., 2022). ...
... Root-Bernstein, 1982; Biro 2007; Lella and Mahalakshmi, 2017; Frenkel-Pinter et al, 2020). The most abundant amino acids are (and possibly were) the structurally versatile Gly and Ala(Trifonov 2004). Peptide flexibility is guaranteed by the small, rotating Gly, rigidity is guaranteed by Ala, Tyr, Trp, Phe and conformation changes are guaranteed by His. ...
Article
Full-text available
The central dogma of molecular biology dictates that, with only a few exceptions, information proceeds from DNA to protein through an RNA intermediate. Examining the enigmatic steps from prebiotic to biological chemistry, we take another road suggesting that primordial peptides acted as template for the self-assembly of the first nucleic acids polymers. Arguing in favour of a sort of archaic "reverse translation" from proteins to RNA, our basic premise is a Hadean Earth where key biomolecules such as amino acids, polypeptides, purines, pyrimidines, nucleosides and nucleotides were available under different prebiotically plausible conditions, including meteorites delivery, shallow ponds and hydrothermal vents scenarios. Supporting a protein-first scenario alternative to the RNA world hypothesis, we propose the primeval occurrence of short two-dimensional peptides termed "selective amino acid-and nucleotide-matching oligopeptides" (henceforward SANMAOs) that noncovalently bind at the same time the polymerized amino acids and the single nucleotides dispersed in the prebiotic milieu. In this theoretical paper, we describe the chemical features of this hypothetical oligopeptide, its biological plausibility and its virtues from an evolutionary perspective. We provide a theoretical example of SANMAO's selective pairing between amino acids and nucleosides, simulating a poly-Glycine peptide that acts as a template to build a purinic chain corresponding to the glycine's extant triplet codon GGG. Further, we discuss how SANMAO might have endorsed the formation of low-fidelity RNA's polymerized strains, well before the appearance of the accurate genetic material's transmission ensured by the current translation apparatus.
... The ten peptides were chosen to represent sets of amino acids that reect different stages in the early evolution of life (Fig. 1A): peptides P1 to P6 are dominated by amino acids that are abundant in Miller-type reactions 38 and were likely present in the earliest stages of life. 39 Several of these peptides explored the addition of arginine and lysine: arginine precursors are generated in cyanosuldic models for prebiotic reactions, 19,20 and lysine has been identied in meteorites. 40,41 In contemporary proteins, these two amino acids are the most common side chains in direct contact with RNA 42 and were therefore promising to explore for their benet to ribozyme function. ...
Article
Full-text available
Early stages of life likely employed catalytic RNAs (ribozymes) in many functions that are today filled by proteins. However, the earliest life forms must have emerged from heterogenous chemical mixtures, which included amino acids, short peptides, and many other compounds. Here we explored whether the presence of short peptides can help the emergence of catalytic RNAs. To do this, we conducted an in vitro selection for catalytic RNAs from randomized sequence in the presence of ten different peptides with a prebiotically plausible length of eight amino acids. This in vitro selection generated dozens of ribozymes, one of them with ∼900-fold higher activity in the presence of one specific peptide. Unexpectedly, the beneficial peptide had retained its N-terminal Fmoc protection group, and this group was required to benefit ribozyme activity. The same, or higher benefit resulted from peptide conjugates with prebiotically plausible polyaromatic hydrocarbons (PAHs) such as fluorene and naphthalene. This shows that PAH-peptide conjugates can act as potent cofactors to enhance ribozyme activity. The results are discussed in the context of the origin of life.
... It is currently thought that glycine, alanine, leucine, proline, and serine are among the eight amino acids first recruited into genetic code. 76,77 The UV photochemistry of all five has now been studied. All of the amino acids studied thus far directly produce HOCO radicals when exposed to 213 nm light. ...
Article
The ultraviolet photochemistry of the amino acids glycine, leucine, proline, and serine in their neutral forms was investigated using parahydrogen matrix-isolation spectroscopy. Irradiation by 213 nm light destroys the chirality of all three chiral amino acids as a result of the α-carbonyl C-C bond cleavage and hydrocarboxyl (HOCO) radical production. The temporal behavior of the Fourier-transform infrared spectra revealed that HOCO radicals rapidly reach a steady state, which occurs predominantly due to photodissociation of HOCO into CO + OH or CO2 + H. In glycine and leucine, the amine radicals generated by the α-carbonyl C-C bond cleavage rapidly undergo hydrogen elimination to yield methanimine and 3-methylbutane-1-imine, respectively. Breaking of the α-carbonyl C-C bond in proline appeared to yield 1-pyrroline, although due to its weak absorption it remains unconfirmed. In serine, additional products were formaldehyde and E/Z ethanimine. The present study shows that the direct production of HOCO previously observed in α-alanine generalizes to other amino acids of varying structure. It also revealed a tendency for amino acid photolysis to form imines rather than amine radicals. HOCO should be useful in the search for amino acids in interstellar space, particularly in combination with simple imine molecules.
... It is also conceivable that during the establishment of the genetic code, primitive tRNA genes underwent duplication and mutation in a manner that incrementally expanded the genetic code by matching new amino acids with new codon-anticodon interactions. This indicates that the near-universal genetic code of extant life emerged by descent with modification from smaller and simpler codes containing fewer codons, potentially where glycine, alanine, aspartic acid, and valine were among the first amino acids to join the genetic code (Trifonov 2004;Macé and Gillet 2016). As the genetic code increased in complexity, coded proteins would have gradually replaced many of the functions once carried out by RNA. ...
Article
Full-text available
Darwin’s assertion that “it is mere rubbish thinking, at present, of origin of life” (quoted from Peretó et al. 2009) is no longer valid. By synthesizing origin of life (OoL) research from its inception to recent findings, with a focus on (i) proof-of-principle prebiotically plausible syntheses and (ii) molecular relics of the ancient RNA World, we present a comprehensive up-to-date description of science’s understanding of the OoL and the RNA World hypothesis. Based on these observations, we solidify the consensus that RNA evolved before coded proteins and DNA genomes, such that the biosphere began with an RNA core where much of the translation apparatus and related RNA architecture arose before RNA transcription and DNA replication. This supports the conclusion that the OoL was a gradual process of chemical evolution involving a series of transitional forms between prebiotic chemistry and the last universal common ancestor (LUCA) during which RNA played a central role, and that many of the events and their relative order of occurrence along this pathway are known. The integrative nature of this synthesis also extends previous descriptions and concepts and should help inform future questions and experiments about the ancient RNA World and the OoL.
... To bring this number down, it should be noted that not all the possible amino acid pairs are available for bonding and higher affinities generate more persisting biomolecules (Root-Bernstein, 1982; Biro 2007; Lella and Mahalakshmi, 2017; Frenkel-Pinter et al, 2020). The most abundant amino acids are (and possibly were) Gln and the structurally versatile Gly and Ala (Trifonov 2004). Peptide flexibility is guaranteed by the small, rotating Gly, rigidity is guaranteed by Ala, Tyr, Trp, Phe and conformation changes are guaranteed by His. ...
Preprint
Full-text available
The central dogma of molecular biology dictates that, with only a few exceptions, information proceeds from DNA to protein through an RNA intermediate. Examining the enigmatic steps from prebiotic to biological chemistry, we take another road suggesting that primordial peptides acted as template for the self-assembly of the first nucleic acids polymers. Arguing in favour of a sort of archaic “reverse translation” from proteins to RNA, our basic premise is a Hadean Hearth where key biomolecules such as amino acids, polypeptides, purines, pyrimidines, nucleosides and nucleotides were available under different prebiotically plausible conditions, including meteorites delivery, shallow ponds and hydrothermal vents scenarios. Supporting a protein-first scenario alternative to the RNA world hypothesis, we propose the primeval occurrence of short peptides termed “selective amino acid- and nucleotide-matching oligopeptides” (henceforward SANMAOs) that noncovalently bind at the same time the polymerized amino acids and the single nucleotides dispersed in the prebiotic milieu. We describe the chemical features of this hypothetical oligopeptide, its biological plausibility and its virtues from an evolutionary perspective. We provide a theoretical example of SANMAO’s selective pairing between amino acids and nucleosides, simulating a poly-Glycine peptide that acts as a template to build a purinic chain corresponding to the glycine’s extant triplet codon GGG. Further, we discuss how SANMAO might have endorsed the formation of low-fidelity RNA’s polymerized strains, well before the appearance of the accurate genetic material’s transmission ensured by the current translation apparatus.
... Another possibility is that there was a major, ancient shift in REs on the branch separating Bacteria from Archaea+eukaryotes, followed by changes in response to specific factors such as the shifts in genomic base composition or environmental shifts (e.g., high salt concentrations in Halobacteriaceae). Notably, the aromatic amino acids, as well as some of the other amino acids involved in highly variable pairs, are thought to be late additions to the genetic code and to have increased in frequency since the time of the last universal common ancestor of life [63][64][65]. Regardless of the basis for the RE shifts, it seems clear that the shifts represent an important feature of protein evolution. ...
Preprint
The factors that determine the relative rates of amino acid substitution during protein evolution are complex and they are known to vary among taxa. We estimated relative exchangeabilities for pairs of amino acids from clades spread across the tree of life and assessed the historical signal in the distances among these clade-specific models. We trained these models separately on collections of arbitrarily selected protein alignments and on ribosomal protein alignments. In both cases we found a clear separation between the models trained using multiple sequence alignments from bacterial clades and the models trained on archaeal and eukaryotic data. We assessed the predictive power of our novel clade-specific models of sequence evolution by asking whether fit to the models could be used to identify the source of multiple sequence alignments. Model fit was generally able to classify protein alignments correctly at the level of domain (bacterial versus archaeal), but the accuracy of classification at finer scales was much lower. The only exceptions to this were the relatively high classification accuracy for two archaeal lineages: Halobacteriaceae and Thermoprotei. Genomic GC content had a modest impact on relative exchangeabilities despite having a large impact on amino acid frequencies. Relative exchangeabilities involving aromatic residues exhibited the largest differences among models. There were a small number of exchangeabilities that exhibited large differences in comparisons among major clades and between generalized models and ribosomal protein models. Taken as a whole, these results reveal that a small number of relative exchangeabilities are responsible for much of the structure of the “model space” for protein sequence evolution. If we look beyond the information that these clade-specific models reveal about protein evolution the models themselves are likely to be useful tools for phylogenomic inference across the tree of life.
... On the other side, the universal set of amino acids is a comprehensive assessment of biosynthetic cost, solubility, stability, etc. [48]. A consensus chronology of amino acids has been built based on many different criteria [49,50]. ...
Article
Full-text available
Extant biology uses RNA to record genetic information and proteins to execute biochemical functions. Nucleotides are translated into amino acids via transfer RNA in the central dogma. tRNA is essential in translation as it connects the codon and the cognate amino acid. To reveal how the translation emerged in the prebiotic context, we start with the structure and dissection of tRNA, followed by the theory and hypothesis of tRNA and amino acid recognition. Last, we review how amino acids assemble on the tRNA and further form peptides. Understanding the origin of life will also promote our knowledge of artificial living systems.
... The stereochemical matching of amino acids and nucleotide triplets/anti-triplets gives insight into how the genetic code originated [47,48]. There is a predicted order in which amino acids were assigned to codons during the development of the genetic code [49]; in this order, amino acids that were integrated early into the genetic code tend to bind their codons whilst those that were integrated late tend to bind their anticodons [48]. The idea that translation was initially based on codon-amino acid affinities before it was mediated by tRNA adaptors may explain the direct interactions between mRNAs and their cognate proteins in modern cells [50]. ...
Article
Full-text available
It is not entirely clear why, at some stage in its evolution, terrestrial life adopted double-stranded DNA as the hereditary material. To explain this, we propose that small, double-stranded, polynucleotide circlets have special catalytic properties. We then use this proposal as the basis for a ‘view from here’ that we term the Circlet hypothesis as part of a broader Ring World. To maximize the potential explanatory value of this hypothesis, we speculate boldly about the origins of several of the fundamental characteristics and briefly describe the main methods or treatments applied. The principal prediction of the paper is that the highly constrained, conformational changes will occur preferentially in dsDNA, dsRNA and hybrid RNA-DNA circlets that are below a critical size (e.g., 306 bp) and that these will favor the polymerization of precursors into RNA and DNA. We can conclude that the Circlet hypothesis and the Ring World therefore have the attraction of offering the same solution to the fundamental problems that probably confront both the earliest cells and the most recent ones.
... Clusters in the conformation [4Fe-4S] coordinated by monomeric cysteine (Figure 3.30) were chosen for this experiment as evidence suggests that their abiotic formation in alkaline, anaerobic environments is possible . Cysteine is considered to be a late addition amino acid (Trifonov, 2004), and although its synthesis has been calculated to be thermodynamically unfavourable under alkaline conditions (Amend and Shock, 1998), recent work has reported high yields of cysteine via its prebiotic formation in water (Foden et al., 2020), ...
Thesis
There is little agreement on how life might have started on Earth. Following life as a guide, phylogenetic and comparative biochemical studies point to an autotrophic origin, with complexity accruing over time. In modern metabolism, the universal energy currency is adenosine triphosphate (ATP), which not only drives metabolism through phosphorylation and condensation reactions but is also key for the synthesis of the informational molecules RNA, DNA, and proteins. Such deep conservation suggests an early origin of ATP, before the emergence of genes or genetically-encoded macromolecular machines such as the ATPase. This thesis explores the plausibility of an early emergence of ATP and the role it might have had at the origin of life. I first confirm earlier work showing moderate (15-20%) ATP yield from the non-enzymatic phosphorylation of ADP by acetyl phosphate (AcP), before systematically exploring the prebiotic context for this synthesis. AcP is a universally conserved intermediate between acetyl-CoA and ATP, bridging between thioester and phosphate metabolism. I show that it is possible to form moderate yields of ATP in a variety of aqueous environments. The combination of AcP and the catalyst Fe³⁺ is surprisingly favoured. No other prebiotically relevant metal ion, mineral, and phosphorylating agent tested here favoured ADP phosphorylation. Nor could AcP phosphorylate other nucleoside diphosphates to the triphosphates. I demonstrate a reaction mechanism that implicates the N7 and the N6 amino group on the adenine ring in the Fe³⁺-catalysed phosphorylation of ADP, implying a deep significance of the adenine base. Finally, I explore how ATP might have facilitated condensation reactions to generate nucleotide and peptide polymers in an aqueous environment, using life as a guide. These efforts met with limited success, confirming that condensation reactions are not facile in water. Nonetheless, my findings overall support the approach of taking life as a guide to study the origin of life.
... The recruitment order of the 20 amino acids from No.1t oNo.20 can be obtained by the roadmap (Fig 3a, 9a), which meets the basic requirement that Phase I amino acids appeared earlier than the Phase II amino acids (Wong 1975;Wong and Lazcano 2009). The species with complete genome sequences are sorted by the order R 10/10 according to their amino acid frequencies, where the order R 10/10 is defined as the ratio of the average amino acid frequencies for the last 10 amino acids to that for the first 10 amino acids (Li and Zhang 2009;Trifonov 2000;Trifonov 2004;Trifonov et al. 2001;Trifonov et al. 2006). Along the evolutionary direction indicated by the increasing R 10/10 ,t h ea m i n o acid frequencies vary in different monotonous manners for the 20 amino acids respectively (Fig 9). ...
Preprint
Full-text available
Nirenberg’s genetic code chart shows a profound correspondence between codons and amino acids. The aim of this article is to try to explain the primordial formation of the codon degeneracy. It remains a puzzle how informative molecules arose from the supposed prebiotic random sequences. If introducing an initial driving force based on the relative stabilities of triplex base pairs, the prebiotic sequence evolution became innately nonrandom. Thus, the primordial assignment of the 64 codons to the 20 amino acids has been explained in detail according to base substitutions during the coevolution of tRNAs with aaRSs; meanwhile, the classification of aaRSs has also been explained.
... 12 The building blocks of proteins also appeared at different stages of evolution. Eck and Dayhoff found, for example, that the earliest four amino acids to appear were Ala, Asp, Ser, and Gly. 8 Based on multiple criteria, 13 the order of appearance of the amino acids in life is now accepted to be: Gly, Ala, Asp, Val, Pro, Ser, Glu, Leu/Thr, Arg, Ile/Gln/Asn, His, Lys, Cys, Phe, Tyr, Met, Trp. Interestingly, as many others have noticed, this suggests that the earliest peptides were disordered because the most primitive amino acids are not structurepromoting. ...
Article
Full-text available
Aromatic residues appeared relatively late in the evolution of protein sequences to stabilize the globular proteins' folding core and are less in the intrinsically disordered regions (IDRs). Recent advances in protein liquid–liquid phase separation (LLPS) studies have also shown that aromatic residues in IDRs often act as “stickers” to promote multivalent interactions in forming higher‐order oligomers. To study how general these structure‐promoting residues are in IDRs, we compared levels of sequence disorder in RNA binding proteins (RBPs), which are often found to undergo LLPS, and the human proteome. We found that aromatic residues appear more frequently than expected in the IDRs of RBPs and, through multiple sequence alignment analysis, those aromatic residues are often conserved among chordates. Using TDP‐43, FUS, and some other well‐studied LLPS proteins as examples, the conserved aromatic residues are important to their LLPS‐related functions. These analyses suggest that aromatic residues may have contributed twice to evolution: stabilizing structured proteins and assembling biomolecular condensates.
... Compared to a more common residue and/or a residue encoded by several codons, this would be a better indication to signify a start residue. A notable difficulty here is that the consensus on the order of introduction of the 20 usual residues in the genetic code ranks methionine as one of the very last amino acids to enter the code [13]. There would therefore have been a period with another start residue or without a defined start residue. ...
Article
Full-text available
Unlike its shorter analog, cysteine, and its methylated derivative, methionine, homocysteine is not today a proteinogenic amino acid. However, this thiol containing amino acid is capable of forming an activated species intramolecularly. Its thiolactone could have made it an interesting molecular building block at the origin of life on Earth. Here we study the cyclization of homocysteine in water and show theoretically and experimentally that in an acidic medium the proportion of thiolactone is significant. This thiolactone easily reacts with amino acids to form dipeptides. We envision that these reactions may help interpret why a methionine residue is introduced at the start of all protein synthesis.
... Возможно, что РНК-кольца, имеющие некоторую гомологию с петлями тРНК, -это аналоги первых, более ранних и имитирующих примордиальные, минимально-кодирующие и самовоспроизводящиеся РНК консенсусного типа (предположительно прото-тРНК), обладающие соответствующим антикодоном и адаптерными свойствами [5,6]. Существуют более 40 гипотез о порядке интеграции аминокислот генетическим кодом [22], каждая из которых проверялась / ранжировалась в отношении каждой же из 22 нуклеотидных позиций в РНК-кольцах, для обеих гипотез гомологии (с тРНК и с ориджном репликации) и для обоих же (A→G и C→T) типов дезаминирования. Одновременно существует представление, что тРНК, прежде чем стать специфическими трансляционными адаптерами, были использованы в репликации [23] (что не противоречит представлениям о реликтовом механизме совместной [24] или аминокислота-управляемой [25] репликации / трансляции). ...
Article
Full-text available
A special hypothetical mechanism of variable Individual Epitope Reverse Translation (at least 2 types) of eukaryotic cell is probably capable of reproducing primary linear (sens- / antisense-, CRISPR-, repeat-like, etc.) and secondary conformational (similar to quadruplexs, RNA-hairpins, RNA-ring-structures; etc.) oligonucleotide structures formed in the mitochondrial membrane-bound supramolecular and containing nanomolecular inclusions hypothetical particle of the retranslosome. This is the so-called nucleic acid equivalents of protein epitope, oligo-NEs, monomeric in ~15–30 and oligomeric in ~(15–30)n nucleotides, potentially capable of participating in the regulation of expression (activation, termination, switching) and modification of genes / genome, as well as in the creation protein / enzyme-containing nucleoprotein platform- / module- / complex-like formations in normal, pathologically altered (in particular, tumor) and virus-infected cells. Recently, in the GenBank databases, they are shown realistically and built / calculated bioinformatically in silico so-called minimum theoretical of 22 nucleotides and longer RNAring (stem-loop) structures, the composition of which depends, firstly, on constantly occurring chemical and enzymatic processes (including deamination mutations), and the properties of which, secondly, link, respectively, with the early (era of the so-called circular code) and later (era of modern universal coding, including the circular code as a component) evolutionary periods of the formation of the whole genetic code. It is generally accepted that the emergence and formation, respectively, of early evolutionary (proto-tRNA, proto-rRNA) and modern variants of molecules of the translational machine of mitochondria and cytoplasm is associated with stem-loop RNA-ring structures, similar to independently proposed oligo-NEs, such as tRNA, rRNA and gene products of ribosomal and other proteins.
... Возможно, что РНК-кольца, имеющие некоторую гомологию с петлями тРНК, -это аналоги первых, более ранних и имитирующих примордиальные, минимально-кодирующие и самовоспроизводящиеся РНК консенсусного типа (предположительно прото-тРНК), обладающие соответствующим антикодоном и адаптерными свойствами [5,6]. Существуют более 40 гипотез о порядке интеграции аминокислот генетическим кодом [22], каждая из которых проверялась / ранжировалась в отношении каждой же из 22 нуклеотидных позиций в РНК-кольцах, для обеих гипотез гомологии (с тРНК и с ориджном репликации) и для обоих же (A→G и C→T) типов дезаминирования. Одновременно существует представление, что тРНК, прежде чем стать специфическими трансляционными адаптерами, были использованы в репликации [23] (что не противоречит представлениям о реликтовом механизме совместной [24] или аминокислота-управляемой [25] репликации / трансляции). ...
Article
Full-text available
Специальный гипотетический механизм вариабельной поэпитопной обратной трансляции (по крайней мере 2 типов) отдельного эпитопа эукариотической клетки, вероятно, способен воспроизводить первичные линейные (типа сенс- / антисенс-, CRISPR-, повторподобные и др.) и вторичные конформационные (подобные квадруплексным, РНК-шпилечным, РНК-кольцевым структурам и др.) олигонуклеотидные структуры. Эти структуры формируются в митохондриальной мембраносвязанной супрамолекулярной и содержащей наномолекулярные включения гипотетической частице ретранслосоме. Это так называемые нуклеиновые эквиваленты (НЭ) белкового эпитопа, олиго-НЭ, мономерные в ~15–30 и олигомерные в ~(15–30)n нуклеотидов, потенциально способные участвовать в регуляции экспрессии (активации, терминации, переключении) и модификации генов / генома, а также в создании белок / ферментсодержащих нуклеопротеидных платформа- / модуль- / комплексподобных образований в нормальных и некоторых патологически измененных (в частности, опухолевых) и вирусинфицированных клетках. Недавно в базах GenBank показаны реально и выстроены / рассчитаны биоинформатически in silico минимальные теоретические в ~22 нуклеотида и более длинные РНК-кольцевые (стебель-петлевые) структуры. Их состав зависит от постоянно протекающих химических и ферментативных процессов (в том числе мутаций дезаминирования), а свойства связывают, соответственно, с ранним (эпохи циркулярного кода) и более поздним (эпохи современного универсального кодирования, включающего циркулярный код в качестве составной части) эволюционными периодами становления генетического кода. Принято считать, что с РНК-кольцевыми стебель-петлевыми структурами, схожими с ранее и независимо предложенными олиго-НЭ, связано появление и становление, соответственно, раннеэволюционных (прото-тРНК, прото-рРНК) и современных вариантов молекул-компонентов трансляционной машины митохондрий и цитоплазмы, таких как тРНК, рРНК и мРНК рибосом-ассоциированных генов белков.
... Thus, the current situation, when any of the three positions of the triplet can be occupied with any of the nucleotides, is to be considered a result of an evolutionary process. Trifonov (2004Trifonov ( , 2009) presents the extended version of the two-letter theory. His main emphasis was on the rules of codon transformation. ...
Article
Full-text available
We address issues of description of the origin and evolution of the genetic code from a semiotics standpoint. Developing the concept of codepoiesis introduced by Barbieri, a new idea of semio-poiesis is proposed. Semio-poiesis, a recursive auto-referential processing of semiotic system, becomes a form of organization of the bio-world when and while notions of meaning and aiming are introduced into it. The description of the genetic code as a semiotic system (grammar and vocabulary) allows us to apply the method of internal reconstruction to it: on the basis of heterogeneity and irregularity of the current state, to explicate possible previous states and various ways of forming mechanisms of coding and textualization. The revealed patterns are consistent with hypotheses about the origin and evolution of the genetic code.
... Next, the genetic code has expanded to 16 codons (GGC, GGG, GCC, GCG, GAC, GAG, GUC, GUG, CUC, GUG, CCC, CCG, CAC, CAG, CGC, and CGG) encoding 10 amino acids (Gly, Ala, Asp, and Val, Glu, Leu, Pro, His, Gln, and Arg) [119]. The order of the appearance of amino acids in the prebiotic GADV world was approximately the same as the evolutionary order of amino acid formation estab lished by Trifonov [120] based on 60 different criteria. ...
Article
The origin of genetic code and translation system is probably the central and most difficult problem in the inves tigations on the origin of life and one of the most complex problems in the evolutionary biology in general. There are mul tiple hypotheses on the emergence and development of existing genetic systems that propose the mechanisms for the origin and early evolution of genetic code, as well as for the emergence of replication and translation. Here, we discuss the most wellknown of these hypotheses, although none of them provides a description of the early evolution of genetic systems without gaps and assumptions. The RNA world hypothesis is a currently prevailing scientific idea on the early evolution of biological and prebiological structures, the main advantage of which is the assumption that RNAs as the first living systems were selfsufficient, i.e., capable of functioning as both catalysts and templates. However, this hypothesis has also significant limitations. In particular, no ribozymes with processive polymerase activity have been yet discovered or synthesized. Taking into account the mutual need of proteins and nucleic acids in each other in the current world, many authors propose the early evolution scenarios based on the coevolution of these two classes of organic molecules. They postulate that the emer gence of translation was necessary for the replication of nucleic acids, in contrast to the RNA world hypothesis, according to which the emergence of translation was preceded by the era of selfreplicating RNAs. Although such scenarios are less parsimonious from the evolutionary point of view, since they require simultaneous emergence and evolution of two classes of organic molecules, as well as the emergence of synchronized replication and translation, their major advantage is that they explain the development of processive and much more accurate proteindependent replication.
... Quantum chemical calculations of amino acids, and biochemical experiments with amino acids on animal membrane surfaces, suggested that tyrosine and tryptophan were added to the genetic code to prevent oxidative stress during the rise in concentration of molecular oxygen in the biosphere (Granold et al. 2018). The order or recruitment of amino acid into the protein synthesis system has also been proposed, based on amino acid properties (Francis 2013), on the amino acid frequency in ancestral sequences (Jordan et al. 2005), and on consideration of 60 other factors (Trifonov 2004). However, more sequence-based evidence about the route of evolution is required. ...
Article
Full-text available
Extant organisms commonly use 20 amino acids in protein synthesis. In the translation system, aminoacyl-tRNA synthetase (ARS) selectively binds an amino acid and transfers it to the cognate tRNA. It is postulated that the amino acid repertoire of ARS expanded during the development of the translation system. In this study we generated composite phylogenetic trees for seven ARSs (SerRS, ProRS, ThrRS, GlyRS-1, HisRS, AspRS, and LysRS) which are thought to have diverged by gene duplication followed by mutation, before the evolution of the last universal common ancestor. The composite phylogenetic tree shows that the AspRS/LysRS branch diverged from the other five ARSs at the deepest node, with the GlyRS/HisRS branch and the other three ARSs (ThrRS, ProRS and SerRS) diverging at the second deepest node. ThrRS diverged next, and finally ProRS and SerRS diverged from each other. Based on the phylogenetic tree, sequences of the ancestral ARSs prior to the evolution of the last universal common ancestor were predicted. The amino acid specificity of each ancestral ARS was then postulated by comparison with amino acid recognition sites of ARSs of extant organisms. Our predictions demonstrate that ancestral ARSs had substantial specificity and that the number of amino acid types amino-acylated by proteinaceous ARSs was limited before the appearance of a fuller range of proteinaceous ARS species. From an assumption that 10 amino acid species are required for folding and function, proteinaceous ARS possibly evolved in a translation system composed of preexisting ribozyme ARSs, before the evolution of the last universal common ancestor.
... Затем произошло расширение генетическо го кода до 16 кодонов (GGC, GGG, GCC, GCG, GAC, GAG, GUC, GUG, CUC, GUG, CCC, CCG, CAC, CAG, CGC и CGG), кодировавших 10 аминокислот (gly, ala, asp, val, glu, leu, pro, his, gln и arg) [119]. Порядок появления аминокис лот в пребиотическом GADV мире примерно совпадает с установленным Trifonov [120] эво люционным порядком образования аминокис лот, основанном на 60 различных критериях. ...
Article
Происхождение генетического кода и системы трансляции, возможно, является центральной и самой трудной проблемой в изучении происхождения жизни и одной из самых трудных во всей эволюционной биологии. Существует большое количество гипотез возникновения и развития современных генетических систем, затрагивающих происхождение и раннюю эволюцию генетического кода, а также возникновение репликации и трансляции. Наиболее широко известные гипотезы рассмотрены в данном обзоре. Однако ни одна из этих гипотез не описывает без пробелов и допущений все этапы ранней эволюции генетических систем. Гипотеза РНК-мира является главенствующей на сегодняшний день научной идеей о ранней эволюции биологических и пребиологических объектов. Главное её преимущество заключается в том, что она предлагает в качестве первых живых систем РНК как самодостаточные, с точки зрения воспроизведения, молекулы, которые способны функционировать как каталитический компонент системы и в то же время – как матричный. Однако есть и существенные недостатки. В частности, до сих пор не открыта и не получена экспериментально рибозимная процессивная полимераза. Учитывая взаимную потребность белков и нуклеиновых кислот в современном мире, многие авторы предлагают сценарии ранней эволюции на основе коэволюции этих двух классов органических молекул. Подобные гипотезы постулируют, что для репликации нуклеиновых кислот было необходимо возникновение трансляции, в отличие от мира РНК, где появлению трансляции предшествовала эра самореплицирующихся РНК. И хотя такие сценарии менее экономичны, с эволюционной точки зрения, так как требуют одномоментного появления и эволюции сразу двух классов органических молекул, а также синхронизации по времени появления репликации и трансляции, большим их преимуществом является то, что они предлагают развитие сразу гораздо более точной и процессивной белковой репликации.
... The recruitment order of the 20 amino acids from No.1 to No.20 can be obtained by the roadmap (Figures 3a and 9), which meets the basic requirement that Phase I amino acids appeared earlier than the Phase II amino acids [1,2]. The species with complete genome sequences are sorted by the order R 10/10 according to their amino acid frequencies, where the order R 10/10 is defined as the ratio of the average amino acid frequencies for the last 10 amino acids to that for the first 10 amino acids [8,36,[63][64][65]. Along the evolutionary direction indicated by the increasing R 10/10 , the amino acid frequencies vary in different monotonous manners for the 20 amino acids, respectively (Figure 9). ...
Article
Full-text available
Nirenberg’s genetic code chart shows a profound correspondence between codons and amino acids. The aim of this article is to try to explain the primordial formation of the codon degeneracy. It remains a puzzle how informative molecules arose from the supposed prebiotic random sequences. If introducing an initial driving force based on the relative stabilities of triplex base pairs, the prebiotic sequence evolution became innately nonrandom. Thus, the primordial assignment of the 64 codons to the 20 amino acids has been explained in detail according to base substitutions during the coevolution of tRNAs with aaRSs; meanwhile, the classification of aaRSs has also been explained.
... It is interesting to notice that Trp was one of the latest amino acids incorporated into the genetic code. While different contradicting theories were developed to rationalise this peculiar event, its late occurrence has drawn unanimous consensus (Davis 2002;Trifonov 2004;Wong 2005;José et al. 2009José et al. , 2011Palacios-Pérez and José 2019). Albeit a direct correlation between these hypotheses and our energetic results cannot be unequivocally drawn, it is striking that while for the majority of cases the α-amino acid form is the most stable among their isomers, Trp stands out being in the 55 th absolute position (although starting from 97,406 structures). ...
Article
Full-text available
The secular debate on the origin of life on our planet represents one of the open challenges for the scientific community. In this endeavour, chemistry has a pivotal role in disclosing novel scenarios that allow us to understand how the formation of simple organic molecules would be possible in the early primitive geological ages of Earth. Amino acids play a crucial role in biological processes. They are known to be formed in experiments simulating primitive conditions and were found in meteoric samples retrieved throughout the years. Understanding their formation is a key step for prebiotic chemistry. Following this reasoning, we performed a computational investigation over 100′000 structural isomers of natural amino acids. The results we have found suggest that natural amino acids are among the most thermodynamically stable structures and, therefore, one of the most probable ones to be synthesised among their possible isomers.
Article
Full-text available
We discuss the influence of linguistic metaphors on understanding genetic information processing and reveal the semantic roots and heuristic value of metaphors related to the genetic reading. They are derived from the equa tion of life with a book; thus, genetic information processing is understood as some operations on text. Some essential gene expression features are shared with human reading: a faculty can identify the biochemical sequenc es based on their functions in an abstract system and distinguish between type and its context-dependent tokens. The genetic information can be considered as a dual biochemical and semiotic entity, and as a complement to biochemical methods also a semiotic apparatus can be applied1.
Article
How life developed in its earliest stages is a central but notoriously difficult question in science. The earliest lifeforms likely used a reduced set of codon sequences that were progressively completed over time, driven by chemical, physical, and combinatorial constraints. However, despite its importance for prebiotic chemistry, UV radiation has not been considered a selection pressure for the evolution of early codon sequences. In this proof-of-principle study, we quantified the UV susceptibility of large pools of DNA protogenomes and tested the timing of evolutionary incorporation of codon sequences using a Monte Carlo method utilizing sequence-context-dependent damage rates previously determined by high throughput sequencing experiments. We traced the UV-radiation selection pressure on early protogenomes comprising a limited number of codon sequences to late protogenomes with access to all codons. The modeling showed that in just minutes under early sunlight, the choice of the first codons determined whether most of the protogenomes remained intact or became damaged entirely. The results correlated with earlier chemical models of the evolution of the genetic code. Our results show how UV could have played a crucial role in the evolution of the early genetic code for a DNA-based genome and provide the concept for future RNA-based studies.
Article
The current “consensus” order in which amino acids were added to the genetic code is based on potentially biased criteria, such as the absence of sulfur-containing amino acids from the Urey–Miller experiment which lacked sulfur. More broadly, abiotic abundance might not reflect biotic abundance in the organisms in which the genetic code evolved. Here, we instead identify which protein domains date to the last universal common ancestor (LUCA) and then infer the order of recruitment from deviations of their ancestrally reconstructed amino acid frequencies from the still-ancient post-LUCA controls. We find that smaller amino acids were added to the code earlier, with no additional predictive power in the previous consensus order. Metal-binding (cysteine and histidine) and sulfur-containing (cysteine and methionine) amino acids were added to the genetic code much earlier than previously thought. Methionine and histidine were added to the code earlier than expected from their molecular weights and glutamine later. Early methionine availability is compatible with inferred early use of S-adenosylmethionine and early histidine with its purine-like structure and the demand for metal binding. Even more ancient protein sequences—those that had already diversified into multiple distinct copies prior to LUCA—have significantly higher frequencies of aromatic amino acids (tryptophan, tyrosine, phenylalanine, and histidine) and lower frequencies of valine and glutamic acid than single-copy LUCA sequences. If at least some of these sequences predate the current code, then their distinct enrichment patterns provide hints about earlier, alternative genetic codes.
Article
Full-text available
Cys is one of the least abundant amino acids in proteins. However, it is often highly conserved and is usually found in important structural and functional regions of proteins. Its unique chemical properties allow it to undergo several post-translational modifications, many of which are mediated by reactive oxygen, nitrogen, sulfur, or carbonyl species. Thus, in addition to their role in catalysis, protein stability, and metal binding, Cys residues are crucial for the redox regulation of metabolism and signal transduction. In this review, we discuss Cys post-translational modifications (PTMs) and their role in plant metabolism and signal transduction. These modifications include the oxidation of the thiol group (S-sulfenylation, S-sulfinylation and S-sulfonylation), the formation of disulfide bridges, S-glutathionylation, persulfidation, S-cyanylation S-nitrosation, S-carbonylation, S-acylation, prenylation, CoAlation, and the formation of thiohemiacetal. For each of these PTMs, we discuss the origin of the modifier, the mechanisms involved in PTM, and their reversibility. Examples of the involvement of Cys PTMs in the modulation of protein structure, function, stability, and localization are presented to highlight their importance in the regulation of plant metabolic and signaling pathways.
Article
Ribosomal translation at the origin of life requires controlled aminoacylation to produce mono-aminoacyl esters of tRNAs. Herein, we show that transient annealing of short RNA oligo:amino acid mixed anhydrides to an acceptor strand enables the sequential transfer of aminoacyl residues to the diol of an overhang, first forming aminoacyl esters then peptidyl esters. Using N-protected aminoacyl esters prevents unwanted peptidyl ester formation in this manner. However, N-acyl-aminoacyl transfer is not stereoselective.
Chapter
Life's origin is an enigma. Mankind has been pondering as to how it all began for millennia, yet are we any closer to uncovering the answer to this enigma? It would seem not, but we are slowly and surely edging towards discovery of the processes and mechanics by which life emerged on Earth. There are more than a couple of dozen hypotheses which claim to have the answer, but in reality, there is no absolute front runner. We have categorised these hypotheses under the following four banners: metabolism, genetic, proteins and vesicles first. In this chapter we strive to demonstrate how they conflict with one another and to this effect we have brought into focus both the top‐down and bottom‐up approaches to the question of the origin of life in general, as well as answering the question as to which came first, chemolithoautotrophs and photolithoautotrophs? In addition, the part played by viruses (in particular the RNA ones) during the origin of life is addressed.
Chapter
Full-text available
There are many Archaea with flat, polygonal shapes, resembling shaped droplets in form. We hypothesized that the transition from abiogenesis to a living cell started with shaped droplets. We begin with proteins that have a transmembrane domain, such as the S layer, which is a common feature of Archaea and Bacteria. We show how the anchor transmembrane domain may have been hydrophobically “selected” from random mixed chirality peptides by the cell membrane itself when it was thinner, possibly produced from short meteorite amphiphiles. These peptides may have been the first proteins, formed mostly from three “core” amino acids that we deduce from t‐RNA phylogeny: alanine, glycine, and isoleucine. They may have formed by PCR‐like wet/dry cycles along with RNA, which may have protected them from hydrolysis as they grew, causing the membrane to thicken to its present thickness, a positive feedback, abiotic process. The possibly daily flattening and swelling of vesicles as the temperature changed at night, with 10 11 day/night cycles available, might have aided the PCR‐like activity. Thus, the peptide/protein world may have been simultaneous with the RNA world. The juxtaposition of peptides with RNA may have led to the genetic code. Interactions between membrane‐selected peptides and amphiphiles may have led to chirality of both.
Article
Full-text available
Simple Summary The relative rates of amino acid substitution over evolutionary time reflect the chemical properties of amino acids. Substitutions that result in an amino acid similar to an ancestral residue accumulate more rapidly than those resulting in a dissimilar amino acid. The substitution rates for each amino acid pair are the parameters in models of evolutionary change for proteins. Although the best-fitting model of protein evolution is known to differ among taxa, a comprehensive picture of model changes across the tree of life is not available. In principle, models of protein change might reflect evolutionary history (i.e., closely related taxa have similar models) or the environment (i.e., taxa living in similar environments have similar models). We estimated models of amino acid evolution for organisms across the tree of life, finding evidence that history and the environment have both contributed to model differences. Bacterial models differed from archaeal and eukaryotic models. Models for Halobacteriaceae (archaea that live in highly saline environments) and Thermoprotei (a group of thermophilic archaea) were found to be very distinctive. The rates of substitution for pairs of aromatic amino acids were especially variable. Overall, these results paint a picture of the “evolutionary model space” for proteins across the tree of life. Abstract The factors that determine the relative rates of amino acid substitution during protein evolution are complex and known to vary among taxa. We estimated relative exchangeabilities for pairs of amino acids from clades spread across the tree of life and assessed the historical signal in the distances among these clade-specific models. We separately trained these models on collections of arbitrarily selected protein alignments and on ribosomal protein alignments. In both cases, we found a clear separation between the models trained using multiple sequence alignments from bacterial clades and the models trained on archaeal and eukaryotic data. We assessed the predictive power of our novel clade-specific models of sequence evolution by asking whether fit to the models could be used to identify the source of multiple sequence alignments. Model fit was generally able to correctly classify protein alignments at the level of domain (bacterial versus archaeal), but the accuracy of classification at finer scales was much lower. The only exceptions to this were the relatively high classification accuracy for two archaeal lineages: Halobacteriaceae and Thermoprotei. Genomic GC content had a modest impact on relative exchangeabilities despite having a large impact on amino acid frequencies. Relative exchangeabilities involving aromatic residues exhibited the largest differences among models. There were a small number of exchangeabilities that exhibited large differences in comparisons among major clades and between generalized models and ribosomal protein models. Taken as a whole, these results reveal that a small number of relative exchangeabilities are responsible for much of the structure of the “model space” for protein sequence evolution. The clade-specific models we generated may be useful tools for protein phylogenetics, and the structure of evolutionary model space that they revealed has implications for phylogenomic inference across the tree of life.
Preprint
Full-text available
The central dogma of molecular biology dictates that, with only a few exceptions, information proceeds from DNA to protein through an RNA intermediate. Examining the enigmatic steps from prebiotic to biological chemistry, we take another road suggesting that primordial peptides acted as template for the self-assembly of the first nucleic acids polymers. Arguing in favour of a sort of archaic “reverse translation” from proteins to RNA, our basic premise is a Hadean Hearth where key biomolecules such as amino acids, polypeptides, purines, pyrimidines, nucleosides and nucleotides were available under different prebiotically plausible conditions, including meteorites delivery, shallow ponds and hydrothermal vents scenarios. Supporting a protein-first scenario alternative to the RNA world hypothesis, we propose the primeval occurrence of short peptides termed “selective amino acid- and nucleotide-matching oligopeptides” (henceforward SANMAOs) that noncovalently bind at the same time the polymerized amino acids and the single nucleotides dispersed in the prebiotic milieu. We describe the chemical features of this hypothetical oligopeptide, its biological plausibility and its virtues from an evolutionary perspective. We provide a theoretical example of SANMAO’s selective pairing between amino acids and nucleosides, simulating a poly-Glycine peptide that acts as a template to build a purinic chain corresponding to the glycine’s extant triplet codon GGG. Further, we discuss how SANMAO might have endorsed the formation of low-fidelity RNA’s polymerized strains, well before the appearance of the accurate genetic material’s transmission ensured by the current translation apparatus.
Preprint
The DNA sequences available in the prebiotic era were the genomic building blocks of the first life forms on Earth and have therefore been a matter of intense debate. 1,2 On the surface of the Early Earth, ultraviolet (UV) light is a key energy source ³ , which is known to damage nucleic acids ⁴ . However, a systematic study of the sequence selectivity upon UV exposure under Early Earth conditions is still missing. In this work, we quantify the UV stability of all possible canonical DNA sequences and derive information on codon appearance under UV irradiation as selection pressure. We irradiate a model system of random 8mers at 266 nm and determine its UV stability via next-generation sequencing. As a result, we obtain the formation rates of the dominant dimer lesions as a function of their neighboring sequences and find a strong sequence selectivity. On the basis of our experimental results, we simulate the photodamage of short proto-genomes of 150 bases length by a Monte Carlo approach. Our results strongly argue for UV compatibility of early life and allow the ranking of codon evolutionary models with respect to their UV resistance.
Chapter
How life originated from the inanimate mixture of organic and inorganic compounds on the priomordial earth remains one of the great unknowns in science. This origin of life, or abiogenesis, continues to be examined in the context of the conditions and materials required for natural life to have begun on Earth both theoretically and experimentally. This book provides a broad but in-depth analysis of the latest discoveries in prebiotic chemsitry from the microscopic to the macroscopic scale; utilising experimental insight to provide a bottom up approach to plausibly explaining how life arose. With contributions from global leaders, this book is an ideal reference for postgraduate students and a single source of comprehensive information on the latest technical and theoretical advancements for researchers in a variety of fields from astrochemistry and astrophysics to organic chemistry and evolution.
Article
Full-text available
Comparative path lengths in amino acid biosynthesis and other molecular indicators of the timing of codon assignment were examined to reconstruct the main stages of code evolution. The codon tree obtained was rooted in the 4 N-fixing amino acids (Asp, Glu, Asn, Gln) and 16 triplets of the NAN set. This small, locally phased (commaless) code evidently arose from ambiguous translation on a poly(A) collector strand, in a surface reaction network. Copolymerisation of these amino acids yields polyanionic peptide chains, which could anchor uncharged amide residues to a positively charged mineral surface. From RNA virus structure and replication in vitro, the first genes seemed to be RNA segments spliced into tRNA. Expansion of the code reduced the risk of mutation to an unreadable codon. This step was conditional on initiation at the 5′-codon of a translated sequence. Incorporation of increasingly hydrophobic amino acids accompanied expansion. As codons of the NUN set were assigned most slowly, they received the most nonpolar amino acids. The origin of ferredoxin and Gln synthetase was traced to mid-expansion phase. Surface metabolism ceased by the end of code expansion, as cells bounded by a proteo-phospholipid membrane, with a protoATPase, had emerged. Incorporation of positively charged and aromatic amino acids followed. They entered the post-expansion code by codon capture. Synthesis of efficient enzymes with acid–base catalysis was then possible. Both types of aminoacyl-tRNA synthetases were attributed to this stage. tRNA sequence diversity and error rates in RNA replication indicate the code evolved within 20 million yr in the preIsuan era. These findings on the genetic code provide empirical evidence, from a contemporaneous source, that a surface reaction network, centred on C-fixing autocatalytic cycles, rapidly led to cellular life on Earth.
Article
Full-text available
Twelve nonprotein amino acids appear to be present in the Murchison meteorite. The identity of eight of them has been conclusively established as N-methylglycine, beta-alanine, 2-methylalanine, alpha-amino-n-butyric acid, beta-amino-n-butyric acid, gamma-amino-n-butyric acid, isovaline, and pipecolic acid. Tentative evidence is presented for the presence of N-methylalanine, N-ethylglycine, beta-aminoisobutyric acid, and norvaline. These amino acids appear to be extraterrestrial in origin and may provide new evidence for the hypothesis of chemical evolution.
Article
Full-text available
It is suggested that protein sythesis may have begun without even a primitive ribosome if the primitive tRNA could take up two configuration and could bind to the messenger RNA with five base-pairs instead of the present three. This idea would impose base sequence restriction on the early messages and on the early genetic code such that the first four amino acids coded were glycine, serine, aspartic acid and aspargine. A possible mechanism is suggested for the polymerization of the early message.
Article
Full-text available
The genetic code, formerly thought to be frozen, is now known to be in a state of evolution. This was first shown in 1979 by Barrell et al. (G. Barrell, A. T. Bankier, and J. Drouin, Nature [London] 282:189-194, 1979), who found that the universal codons AUA (isoleucine) and UGA (stop) coded for methionine and tryptophan, respectively, in human mitochondria. Subsequent studies have shown that UGA codes for tryptophan in Mycoplasma spp. and in all nonplant mitochondria that have been examined. Universal stop codons UAA and UAG code for glutamine in ciliated protozoa (except Euplotes octacarinatus) and in a green alga, Acetabularia. E. octacarinatus uses UAA for stop and UGA for cysteine. Candida species, which are yeasts, use CUG (leucine) for serine. Other departures from the universal code, all in nonplant mitochondria, are CUN (leucine) for threonine (in yeasts), AAA (lysine) for asparagine (in platyhelminths and echinoderms), UAA (stop) for tyrosine (in planaria), and AGR (arginine) for serine (in several animal orders) and for stop (in vertebrates). We propose that the changes are typically preceded by loss of a codon from all coding sequences in an organism or organelle, often as a result of directional mutation pressure, accompanied by loss of the tRNA that translates the codon. The codon reappears later by conversion of another codon and emergence of a tRNA that translates the reappeared codon with a different assignment. Changes in release factors also contribute to these revised assignments. We also discuss the use of UGA (stop) as a selenocysteine codon and the early history of the code.
Article
Full-text available
Pheromone 3 mRNA of the ciliate Euplotes octocarinatus contains three in-frame UGA codons that are translated as cysteines. This was revealed from cDNA sequencing and from plasma desorption mass spectrometry of cleaved pheromone 3 in connection with pyridylethylation of the fragments. N-terminal sequence analysis of carboxymethylated protein confirmed this conclusion for the first of the three UGA codons. Besides UGA the common cysteine codons UGU and UGC are also used to encode cysteine. UAA functions as a termination codon. No UAG codon was found. In connection with results reported for other ciliates, this suggests that the role of the classic termination codons had not yet been established when the ciliates started to diverge from other eukaryotes.
Article
Full-text available
Protein sequence alignments have become an important tool for molecular biologists. Local alignments are frequently constructed with the aid of a "substitution score matrix" that specifies a score for aligning each pair of amino acid residues. Over the years, many different substitution matrices have been proposed, based on a wide variety of rationales. Statistical results, however, demonstrate that any such matrix is implicitly a "log-odds" matrix, with a specific target distribution for aligned pairs of amino acid residues. In the light of information theory, it is possible to express the scores of a substitution matrix in bits and to see that different matrices are better adapted to different purposes. The most widely used matrix for protein sequence comparison has been the PAM-250 matrix. It is argued that for database searches the PAM-120 matrix generally is more appropriate, while for comparing two specific proteins with suspected homology the PAM-200 matrix is indicated. Examples discussed include the lipocalins, human alpha 1 B-glycoprotein, the cystic fibrosis transmembrane conductance regulator and the globins.
Article
Full-text available
Glutathione peroxidase (GSHPx) is an important selenium-containing enzyme which protects cells from peroxide damage and also has a role in leukotriene formation. We report the identification of a genomic recombinant as encoding the entire mouse GSHPx gene. Surprisingly, the selenocysteine in the active site of the enzyme is encoded by TGA: this has been confirmed by primer extension/dideoxy sequencing experiments using reticulocyte mRNA. The same site of transcription initiation is used in three tissues in which the GSHPx mRNA is expressed at high levels (erythroblast, liver and kidney). Like some other regulated 'house-keeping' genes, the GSHPx gene has Sp1 binding site consensus sequences but no 'ATA' and 'CAAT' consensus sequences upstream of the transcription initiation site. Moreover, there is a cluster of two Sp1 binding site consensus sequences and two SV40 core enhancer sequences in the 3' region of the gene, close to the previously mapped position of a DNase I-hypersensitive site found only in tissues expressing the GSHPx mRNA at high levels.
Article
Full-text available
The degeneracy rules of genetic code including the distribution of terminators have been deduced through the minimization of mutational deterioration (MD). The MD of a given group of codons is divided into three parts: transitional, transversional and wobble's. The averaged mutational deteriorations (AMD) of various amino acids have been proved in order of their degrees of irreplaceability.
Article
Full-text available
UGA is a nonsense or termination (opal) codon throughout prokaryotes and eukaryotes. However, mitochondria use not only UGG but also UGA as a tryptophan codon. Here, we show that UGA also codes for tryptophan in Mycoplasma capricolum, a wall-less bacterium having a genome only 20-25% the size of the Escherichia coli genome. This conclusion is based on the following evidence. First, the nucleotide sequence of the S3 and L16 ribosomal protein genes from M. capricolum includes UGA codons in the reading frames; they appear at positions corresponding to tryptophan in E. coli S3 and L16. Second, a tRNATrp gene and its product tRNA found in M. capricolum have the anticodon sequence 5' U-C-A 3', which can form a complementary base-pairing interaction with UGA.
Article
Full-text available
Analysis of an almost complete mammalian mitochondrial DNA sequence has identified 23 possible tRNA genes and we speculate here that these are sufficient to translate all the codons of the mitochondrial genetic code. This number is much smaller than the minimum of 31 required by the wobble hypothesis. For each of the eight genetic code boxes with four codons for one amino acid we find a single specific tRNA gene with T in the first (wobble) position of the anticodon. We suggest that these tRNAs with U in the wobble position can recognize all four codons in these genetic code boxes either by a "two out of three" base interaction or by U.N wobble.
Article
Full-text available
The most primitive code is assumed to be a GC code: GG coding for glycine, CC coding for proline, GC coding for alanine, CG coding for "arginine." The genetic code is assumed to have originated with the coupling of glycine to its anticodon CC mediated by a copper-montmorillonite. The polymerization of polyproline followed when it was coupled to its anticodon GG. In this case the aminoacyl-tRNA synthetase was a copper-montmorillonite. The first membrane is considered to be a beta sheet formed from polyglycine. As the code grew more complicated, the alternative hydrophobic-hydrophilic polypeptide (alanine-"arginine") was coded for by the alternating CG copolymer. This alternating polypeptide (ala-"arg") began to function as both a primitive membrane and as an aminoacyl-tRNA synthetase. The evolution of protein structure is tightly coupled to the evolution of the membrane. The alpha helix was evolved as lipids became part of the structure of biological membranes. The membrane finally became the fluid mosaic structure that is now universal.
Article
Full-text available
Two ideas have essentially been used to explain the origin of the genetic code: Crick's frozen accident and Woese's amino acid-codon specific chemical interaction. Whatever the origin and codon-amino acid correlation, it is difficult to imagine the sudden appearance of the genetic code in its present form of 64 codons coding for 20 amino acids without appealing to some evolutionary process. On the contrary, it is more reasonable to assume that it evolved from a much simpler initial state in which a few triplets were coding for each of a small number of amino acids. Analysis of genetic code through information theory and the metabolism of pyrimidine biosynthesis provide evidence that suggests that the genetic code could have begun in an RNA world with the two letters A and U grouped in eight triplets coding for seven amino acids and one stop signal. This code could have progressively evolved by making gradual use of letters G and C to end with 64 triplets coding for 20 amino acids and three stop signals. According to proposed evidence, DNA could have appeared after the four-letter structure was already achieved. In the newborn DNA world, T substituted U to get higher physicochemical and genetic stability.
Article
Full-text available
DNA sequences of the complete cytochrome b gene are shown to contain robust phylogenetic signal for the strepsirrhine primates (i.e., lemurs and lorises). The phylogeny derived from these data conforms to other molecular studies of strepsirrhine relationships despite the fact that uncorrected nucleotide distances are high for nearly all intrastrepsirrhine comparisons, with most in the 15%-20% range. Cytochrome b sequences support the hypothesis that Malagasy lemuriforms and Afro-Asian lorisiforms each comprise clades that share a sister-group relationship. A study (Adkins and Honeycutt 1994) of the cytochrome c oxidase subunit II (COII) gene placed one Malagasy primate (Daubentonia) at the base of the strepsirrhine clade, thereby suggesting a diphyletic Lemuriformes. The reanalysis of COII third-position transversions, either alone or in combination with cytochrome b third-position transversions, however, yields a tree that is congruent with phylogenetic hypotheses derived from cytochrome b and other genetic data sets.
Article
The genetic code, formerly thought to be frozen, is now known to be in a state of evolution. This was first shown in 1979 by Barrell et al. (G. Barrell, A. T. Bankier, and J. Drouin, Nature [London] 282:189-194, 1979), who found that the universal codons AUA (isoleucine) and UGA (stop) coded for methionine and tryptophan, respectively, in human mitochondria. Subsequent studies have shown that UGA codes for tryptophan in Mycoplasma spp. and in all nonplant mitochondria that have been examined. Universal stop codons UAA and UAG code for glutamine in ciliated protozoa (except Euplotes octacarinatus) and in a green alga, Acetabularia. E. octacarinatus uses UAA for stop and UGA for cysteine. Candida species, which are yeasts, use CUG (leucine) for serine. Other departures from the universal code, all in nonplant mitochondria, are CUN (leucine) for threonine (in yeasts), AAA (lysine) for asparagine (in platyhelminths and echinoderms), UAA (stop) for tyrosine (in planaria), and AGR (arginine) for serine (in several animal orders) and for stop (in vertebrates). We propose that the changes are typically preceded by loss of a codon from all coding sequences in an organism or organelle, often as a result of directional mutation pressure, accompanied by loss of the tRNA that translates the codon. The codon reappears later by conversion of another codon and emergence of a tRNA that translates the reappeared codon with a different assignment. Changes in release factors also contribute to these revised assignments. We also discuss the use of UGA (stop) as a selenocysteine codon and the early history of the code.
Article
Before enzymes and templates: theory of surface metabolism.
Book
1. Origins of Life's Ingredients.- 1.1. The Setting.- 1.2. Prebiotic Syntheses (Stage I).- 1.3. Prebiotic Polymerization (Stage II).- 1.4. Summary.- 2. The Precellular, or Simple Interacting Systems, Level (Stage III).- 2.1. Synthetic Models of Protobionts.- 2.2. Autocatalysis.- 2.3. The Present Status of the Life Origins Problem-A Critical Assessment.- 3. The Genetic Mechanism: I. DNA, Nucleoids, and Chromatin.- 3.1. Introduction.- 3.2. The Focal Ingredients.- 3.3. The Key Macromolecule.- 3.4. Replication of DNA.- 3.5. Chromatin and the Chromosome.- 4. The Genetic Mechanism: II. the Cell's Employment of DNA.- 4.1. The Types of Ribonucleic Acid.- 4.2. Translation and Protein Synthesis.- 5. The Genetic Mechanism: III. Transcription, Processing, and an Analytical Synopsis.- 5.1. Transcription of the DNA Molecule.- 5.2. An Alternative Protein-Synthesizing System.- 5.3. An Annotated Synopsis-Summary and Analysis.- 6. Micromolecular Evolution-The Origin of the Genetic Code.- 6.1. Conceptual Approaches.- 6.2. Mathematical Concepts.- 6.3. Biochemical Approaches.- 6.4. A Biological Concept.- 7. The Transfer Ribonucleic Acids.- 7.1. The Characteristic Molecular Features of tRNAs.- 7.2. Codon-Anticodon Interactions.- 7.3. Summary of tRNA Structural Features.- 8. Reactive Sites and the Evolution of Transfer RNAs.- 8.1. Reactive Sites of tRNAs.- 8.2. Evolutionary Relations of tRNAs.- 8.3. Origin and Evolution of tRNA.- 9. The Genetic Mechanism of Viruses.- DNA Viruses.- 9.1. Double-Stranded DNA Viruses.- 9.2. Single-Stranded DNA Viruses.- RNA Viruses.- 9.3. Single-Stranded RNA Viruses.- 9.4. Double-Stranded RNA Viruses.- Proteinaceous Viruses.- Summary and Conclusions.- 10. The Origin of Early Life.- 10.1. A Preliminary Definition of Life.- 10.2. The Distinctive Characteristics of Viruses.- 10.3. Possible Steps in the Origins of Early Life.- References.
Article
The proposed model for a 'realistic hypercycle' is closely associated with the molecular organization of a primitive replication and translation apparatus. Hypercyclic organization offers selective stabilization and evolutive adaptation for all geno- and phenotypic constituents of the functionally linked ensemble. It originates in a molecular quasi-species and evolves by way of mutation and gene duplication to greater complexity. Its early structure appears to be reflected in: the assignment of codons to amino acids, in sequence homologies of tRNAs, in dual enzymic functions of replication and translation, and in the structural and functional organization of the genome of the prokaryotic cell.
Article
Pyrrolysine is a lysine derivative encoded by the UAG codon in methylamine methyltransferase genes of Methanosarcina barkeri. Near a methyltransferase gene cluster is thepylT gene, which encodes an unusual transfer RNA (tRNA) with a CUA anticodon. The adjacent pylS gene encodes a class II aminoacyl-tRNA synthetase that charges the pylT-derived tRNA with lysine but is not closely related to known lysyl-tRNA synthetases. Homologs of pylS and pylT are found in a Gram-positive bacterium. Charging a tRNACUA with lysine is a likely first step in translating UAG amber codons as pyrrolysine in certain methanogens. Our results indicate that pyrrolysine is the 22nd genetically encoded natural amino acid.
Article
Fundamental questions regarding the structure of the genetic code and origin of proteinous amino acids can be resolved through an understanding of the process by which the code evolved to accommodate increased variety of encoded amino acids.
Article
The fact that proteins contain onlya-amino acids and that protein structure is determined by 3 5 linked ribonucleotides is postulated to be the result of the copolymerization of these molecules in the prebiotic environment. Ribonucleotides therefore represent partial degradation products and proteins represent a side reaction developing from copolymerization. The basic structural unit of copolymerization is a nucleotide substituted with an amino acid at the 2 position. Characteristics of modern amino and ribonucleic acid structure are all consistent with and necessary for this hypothesis. The characteristics and individual base assignments of the code also provide strong support for origin from the postulated copolymers. All characteristics of the code can be accounted for by this single hypothesis.
Article
Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improvements in alignments and in searches using queries from each of the groups.
Article
The fact that proteins contain only alpha-amino acids and that protein structure is determined by 3' leads to 5' linked ribonucleotides is postulated to be the result of the copolymerization of these molecules in the prebiotic environment. Ribonucleotides therefore represent partial degradation products and proteins represent a side reaction developing from copolymerization. The basic structural unit of copolymerization is a nucleotide substituted with an amino acid at the 2' postion. Characteristics of modern amino and ribonucleic acid structure are all consistent with and necessary for this hypothesis. The characteristics and individual base assignemnts of the code also provide strong support for origin from the postulated copolymers. All characteristics of the code can be accounted for by this single hypothesis.
Article
Some of the basic problems presented by the rapid evolution of a universal genetic code can be resolved by a mechanism of co-evolution of the code and the amino acids it serves.
Article
The dependence of amino acid frequency on sequence length has been examined for the 20 natural amino acids using a set of 2275 protein sequences with little sequence identity. As expected, the frequency of cysteine increases dramatically for sequences shorter than 100 amino acids with a length-dependence that corresponds to an average of two Cys per sequence independent of length. Surprisingly dramatic changes were also observed for the frequencies of arginine, lysine, aspartic acid, and glutamic acid: Arg and Lys frequencies increase for short sequences whereas Asp and Glu frequencies decrease. These changes do not appear to be due to an over-abundance of DNA- and membrane-binding proteins in the database and may, therefore, be related to protein stability. Possible stabilizing mechanisms include increased hydrogen bonding by Arg and increased hydrophobic stabilization due to the amphiphilic character of Arg and Lys. These observations suggest that amino acid composition played an important role in the evolution of small proteins.
Article
We have calculated the average effect of changing a codon by a single base for all possible single-base changes in the genetic code and for changes in the first, second, and third codon positions separately. Such values were calculated for an amino acid's polar requirement, hydropathy, molecular volume, and isoelectric point. For each attribute the average effect of single-base changes was also calculated for a large number of randomly generated codes that retained the same level of redundancy as the natural code. Amino acids whose codons differed by a single base in the first and third codon positions were very similar with respect to polar requirement and hydropathy. The major differences between amino acids were specified by the second codon position. Codons with U in the second position are hydrophobic, whereas most codons with A in the second position are hydrophilic. This accounts for the observation of complementary hydropathy. Single-base changes in the natural code had a smaller average effect on polar requirement than all but 0.02% of random codes. This result is most easily explained by selection to minimize deleterious effects of translation errors during the early evolution of the code.
Article
Differences in assignments from those in the universal genetic code occur in codes of mitochondria. In this report, the published sequences of the mitochondrial genes for COI and ND1 in a platyhelminth (Fasciola hepatica) are examined and it is concluded that AAA may be a codon for asparagine instead of lysine, whereas AAG is the sole codon for lysine in this species.
Article
Analysis of the nucleotide sequence of 1,400 transfer RNAs has revealed the imprint of a prototypic genetic code in position 3-4-5 of the acceptor stem. It appears only in the transfer RNAs for the primordial amino acids ie those found by chemical condensation of a nitrogen-methane-water-ammonia mixture. The model for primitive protein synthesis as mentioned by Crick assumes a direct interaction between the amino acid and a prototypic adaptor oligonucleotide. This has hitherto appeared irreconcilable with the large spatial separation between the aminoacylation site and the anticodon in present day transfer RNAs. The observations reported here show how this paradox can be resolved by a process of duplication and cleavage of a prototypic adaptor.
Article
For the first time it is shown that each of the three codon bases has a general correlation with a different, predictable amino acid property, depending on position within the codon. In addition to the previously recognized link between the mid-base and the hydrophobic-hydrophilic spectrum, we show that, with the exception of G, the first base is generally invariant within a synthetic pathway. G--coded amino acids show a different order, being found only at the head of the synthetic pathways. The redundancy of the nature of the third base has a previously unrecognised relationship with molecular weight. The bases U and A (transversions) are associated with the most sharply defined or opposite states in both the first and second position, C somewhat less so or intermediate, anf G neutral. The apparently systematic nature of these relationships has profound implications for the origin of the genetic code. It appears to be the remains of the first language of the cell, predating the tRNA/ribosome system, persisting with remarkably little change at a deeper level of organisation than the codon language.
Article
L-Arginine competitively inhibits the reaction of GTP with the Tetrahymena ribosomal self-splicing intron. In order to define this RNA binding site for arginine, Ki's have now been measured for numerous arginine-like competitive inhibitors. Detailed consideration of the Ki's suggests a tripartite binding model. The dissociation constants of the inhibitors can be consistently interpreted if the guanidino group of arginine binds in the GTP site by utilizing the H-bonds otherwise made to the N1-H and 2 NH2 of the guanine pyrimidine ring. The positive charge of the arginine guanidino group also enhances binding. A second requirement is for the precise length of the aliphatic arm connecting the guanidino with the alpha-carbon. The positive charge of the alpha-amino group is the third feature essential to effective inhibition. The negative carboxyl charge of arginine inhibits binding, and the substituents on the alpha-carbon are probably oriented, with the alpha-amino group near the phosphate backbone of the RNA. This orientation contributes strongly to the L stereoselectivity of the amino acid site on the RNA. When spaced optimally, net contribution to the free energy of binding is of the same order for the guanidino group and for the arginine alpha-carbon substituents, but the guanidino apparently contributes more to binding free energy. Taken together, these observations extend the previous binding model [Yarus, M. (1988) Science (Washington, D.C.) 240, 1751-1758]. The observed dependence of binding on universal characteristics of amino acids suggests that RNA binding sites with other amino acid specificities could exist.
Article
Nucleotide sequences carry genetic information of many different kinds, not just instructions for protein synthesis (triplet code). Several codes of nucleotide sequences are discussed including: (1) the translation framing code, responsible for correct triplet counting by the ribosome during protein synthesis; (2) the chromatin code, which provides instructions on appropriate placement of nucleosomes along the DNA molecules and their spatial arrangement; (3) a putative loop code for single-stranded RNA-protein interactions. The codes are degenerate and corresponding messages are not only interspersed but actually overlap, so that some nucleotides belong to several messages simultaneously. Tandemly repeated sequences frequently considered as functionless "junk" are found to be grouped into certain classes of repeat unit lengths. This indicates some functional involvement of these sequences. A hypothesis is formulated according to which the tandem repeats are given the role of weak enhancer-silencers that modulate, in a copy number-dependent way, the expression of proximal genes. Fast amplification and elimination of the repeats provides an attractive mechanism of species adaptation to a rapidly changing environment.
Article
The structure of the genetic code suggests that amino acid biosynthesis and hydrophobicity were important factors in shaping the genetic code, as the primitive code coevolved with new varieties of amino acids generated by the expanding pathways of biosynthesis. The current code is exceptionally stable. Deviant codes nonetheless have been observed in a number of mitochondrial and cellular genomes. Even the membership of encoded amino acids is undergoing expansion to include phosphoserine and selenocysteine. Experimental mutation of the code also has proven feasible, in a replacement of tryptophan by 4-fluorotryptophan as a component constituent of proteins. Such mutations, introducing novel varieties of encoded amino acids, will open up a new dimension in protein engineering and design.
Article
Excerpt In 1966, Fitch proposed the ambiguity reduction hypothesis of the origin of the genetic code, based on a view that the origin of life was a process in which local (pre)biological order arose from molecular chaos on the earth, driven by the asymmetric energy budget of the earth's atmosphere, a process in which subsets of random biochemical events gradually became the programmed rule of the system. This in turn led to a view, regarding the origin of the genetic code, that suggests that originally there may have been little specificity regarding which amino acids were charged to the various RNA acceptors that paired to the message. Under such conditions, no messenger RNA is likely to produce exactly the same protein twice. The advantages of obtaining a well-defined protein sequence, however, would have gradually reduced the variability in the assignment of amino acids to codons until the current genetic code emerged....
Article
Excerpt In the past three decades, a wide variety of experiments have been designed to simulate conditions on the primitive earth and to demonstrate how organic compounds that made up the first living organisms were synthesized. This paper reviews this work and indicates the status of such syntheses. There is too much material to review in detail, and the reader is directed to a number of more complete discussions (Miller and Orgel 1974; Kenyon and Steinman 1969; Lemmon 1970). Composition of the Primitive Atmosphere There is no agreement on the constituents of the primitive atmosphere. It is to be noted that there is no geological evidence concerning the conditions on the earth from 4.5 × 10⁹ to 3.8 × 10⁹ years, since no rocks older than 3.8 × 10⁹ years are known. Even the 3.8 × 10⁹-year-old Isua Rocks in Greenland are not sufficiently well preserved to infer details of the...
Article
Analysis of the interaction between mRNA codons and tRNA anticodons suggests a model for the evolution of the genetic code. Modification of the nucleic acid following the anticodon is at present essential in both eukaryotes and prokaryotes to ensure fidelity of translation of codons starting with A, and the amino acids which could be coded for before the evolution of the modifying enzymes can be deduced.
Article
This paper presents a method of constructing a scheme of the genetic code with the advantage over the traditional that the table of codons is organized by the principle of gradual complication of the chemical structure of the amino acid moving from one triplet to an adjacent one. A hypothetical scheme of the evolution of the biological code is proposed.
Article
Differences between mitochondrial codes and the universal code indicate that an evolutionary simplification has taken place, rather than a return to a more primitive code. However, these differences make it evident that the universal code is not the only code possible, and therefore earlier codes may have differed markedly from the previous code. The present universal code is probably a "frozen accident." The change in CUN codons from leucine to threonine (Neurospora vs. yeast mitochondria) indicates that neutral or near-neutral changes occurred in the corresponding proteins when this code change took change took place, caused presumably by a mutation in a tRNA gene.
Article
Evolutionary history of tRNA is studied by comparative sequence analysis of two specified tRNA's at various phylogenetic levels and of tRNA families within four different species. Criteria are developed that allow 1) to distinguish between convergent and divergent evolution, 2) to determine the mechanism of divergence and 3) to estimate the degree of randomization of the variable parts of the sequences. The conclusion of these investigations is that tRNA's represent ancient molecules that existed in the form of a mutant distribution prior to their integration into genomes.
Article
The theory of self-reproductive molecular systems involves the consequence that translation must have started from a selected distribution of RNA molecules, that comprised GC-rich sequences of a length less than 100 nucleotides. This implies a joint function of messenger and adaptor, which both had to be recruited from the same mutant distribution. The reconstruction of tRNA precursors yields such a molecule showing some reverberation of a codon pattern GNC. These findings suggest that tRNA has been the earliest component of the translation machinery.
Article
The availability of specialized sequence databanks for Escherichia coli, Saccharomyces cerevisiae and Bacillus subtilis made it possible to build a set of 105 protein-coding genes that are homologous in these three species. An analysis of the triplets at both the nucleotide and amino acid level revealed that the codon bias of some amino acids are significantly higher at conserved rather than at non-conserved positions. Comparisons of homologous genes in E. coli and Salmonella typhimurium, and in S. cerevisiae and Drosophila melanogaster, led to the same conclusion. A special case was made for serine in E. coli, whose major codon is AGC for non-conserved and TCC for conserved residues. We interpret this observation as evidence that the primordial codons for serine were TCN, while codons AGY appeared later. This conclusion is substantiated by an analysis of the codon usage of catalytic serine residues in ancient, ubiquitous and essential proteins (ATP synthases and topoisomerases). It is shown that in these proteins the proportion of the catalytic serine residues coded by TCN is significantly higher than the one expected from the overall codon usage of serine residues.
Article
A two-substrate Michaelis-Menten mechanism previously proposed for the self-replication of RNA-like oligomers is developed. Differential growth depends on the existence of two pairs of complementary monomers and leads to 2n groups of 2n components each (n is the oligomer size). As n increases the 2n groups tend to overlap with one another, and the efficiency of the process to increase the information content of the strands decreases. In a second stage we suppose that randomly synthesized peptides with one predominant amino acid interacted with the ribotides, increasing the growth rate of some of them, and at the same time had their mean life increased by interactions with other ribotides of the same kinetic group. Natural selection could have preserved a favourable codon-anticodon-amino acid correlation, the precursor of the modern genetic code.
Article
A diversification of the genetic code based on the number of codons available for the proteinous amino acids is established. Three groups of amino acids during evolution of the code are distinguished. On the basis of their chemical complexity those amino acids emerging later in a translation process are derived. Codon number and chemical complexity indicate that His, Phe, Tyr, Cys and either Lys or Asn were introduced in the second stage, whereas the number of codons alone gives evidence that Trp and Met were introduced in the third stage. The amino acids of stage 1 use purine-rich codons, while all the amino acids introduced in the second stage, in contrast, use pyrimidines in the third position of their codons. A low abundance of pyrimidines during early translation is derived. This assumption is supported by experiments on non-enzymatic replication and interactions of hairpin loops with a complementary strand. A back extrapolation concludes a high purine content of the first nucleic acids, which gradually decreased during their evolution. Amino acids independently available from prebiotic synthesis were thus correlated to purine-rich codons. Implications on the prebiotic replication are discussed also in the light of recent codon usage data.
Article
Recently, shifted periodicities 1 modulo 3 and 2 modulo 3 have been identified in protein (coding) genes of both prokaryotes and eukaryotes with autocorrelation functions analysing eight of 64 trinucleotides (Arquèset al., 1995). This observation suggests that the trinucleotides are associated with frames in protein genes. In order to verify this hypothesis, a distribution of the 64 trinucleotides AAA,...,TTT is studied in both gene populations by using a simple method based on the trinucleotide frequencies per frame. In protein genes, the trinucleotides can be read in three frames: the reading frame 0 established by the ATG start trinucleotide and frame 1 (resp. 2) which is the frame 0 shifted by 1 (resp. 2) nucleotide in the 5′–3′ direction. Then, the occurrence frequencies of the 64 trinucleotides are computed in the three frames. By classifying each of the 64 trinucleotides in its preferential occurrence frame, i.e. the frame associated with its highest frequency, three subsets of trinucleotides can be identified in the three frames. This approach is applied in the two gene populations.
Article
Early protein synthesis is thought to have involved a reduced amino acid alphabet. What is the minimum number of amino acids that would have been needed to encode complex protein folds similar to those found in nature today? Here we show that a small beta-sheet protein, the SH3 domain, can be largely encoded by a five letter amino acid alphabet but not by a three letter alphabet. Furthermore, despite the dramatic changes in sequence, the folding rates of the reduced alphabet proteins are very close to that of the naturally occurring SH3 domain. This finding suggests that despite the vast size of the search space, the rapid folding of biological sequences to their native states is not the result of extensive evolutionary optimization. Instead, the results support the idea that the interactions which stabilize the native state induce a funnel shape to the free energy landscape sufficient to guide the folding polypeptide chain to the proper structure.
Article
mRNA sequences are known to carry a hidden periodical pattern (GCU)n, which may be considered a remnant of sequence organization of mRNA early in its evolution, dominated by codons for alanine and their point mutation derivatives. A similar pattern is characteristic of the master (consensus) tRNA sequence derived in 1981 by Eigen and Winkler-Oswatitsch. The master tRNA sequence is thought to represent one of the earliest mRNA. From analysis of literature and from our own calculations presented in this work, the (GCU)n pattern appears to be the most expandable in the norm and in disease. The speculation is put forward that (GCU)n and polyalanine have been key players at the beginning of the triplet code, and the first codons, apart from the GCU triplet, were point change derivatives of the generic triplet GCU, coding for amino acids present in the early prebiotic-biotic environment. The set of the earliest amino acids is derived on the basis of structural simplicity, presence in imitated prebiotic conditions and involvement with class II aminoacyl-tRNA synthetases. The set consists of six amino acids: Ala, Asp, Gly, Pro, Ser and Thr. All these amino acids are, indeed, encoded by the GCU triplet and its derivatives, as predicted. Thus, the pairs GCN (Ala), GAU (Asp), GGU (Gly), CCU (Pro), UCU (Ser) and ACU (Thr) can be viewed as an early triplet code.
Article
Improved thermodynamic parameters for prediction of RNA duplex formation are derived from optical melting studies of 90 oligoribonucleotide duplexes containing only Watson-Crick base pairs. To test end or base composition effects, new sets of duplexes are included that have identical nearest neighbors, but different base compositions and therefore different ends. Duplexes with terminal GC pairs are more stable than duplexes with the same nearest neighbors but terminal AU pairs. Penalizing terminal AU base pairs by 0.45 kcal/mol relative to terminal GC base pairs significantly improves predictions of DeltaG degrees37 from a nearest-neighbor model. A physical model is suggested in which the differential treatment of AU and GC ends accounts for the dependence of the total number of Watson-Crick hydrogen bonds on the base composition of a duplex. On average, the new parameters predict DeltaG degrees37, DeltaH degrees, DeltaS degrees, and TM within 3.2%, 6.0%, 6.8%, and 1.3 degreesC, respectively. These predictions are within the limit of the model, based on experimental results for duplexes predicted to have identical thermodynamic parameters.