Article

The Triplet Code From First Principles

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Temporal order ("chronology") of appearance of amino acids and their respective codons on evolutionary scene is reconstructed. A consensus chronology of amino acids is built on the basis of 60 different criteria each offering certain temporal order. After several steps of filtering the chronology vectors are averaged resulting in the consensus order: G, A, D, V, P, S, E, (L, T), R, (I, Q, N), H, K, C, F, Y, M, W. It reveals two important features: the amino acids synthesized in imitation experiments of S. Miller appeared first, while the amino acids associated with codon capture events came last. The reconstruction of codon chronology is based on the above consensus temporal order of amino acids, supplemented by the stability and complementarity rules first suggested by M. Eigen and P. Schuster, and on the earlier established processivity rule. At no point in the reconstruction the consensus amino-acid chronology was in conflict with these three rules. The derived genealogy of all 64 codons suggested several important predictions that are confirmed. The reconstruction of the origin and evolutionary history of the triplet code becomes, thus, a powerful research tool for molecular evolution studies, especially in its early stages.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... This assay and related techniques leverage highthroughput sequencing to measure the activity of thousands of candidate sequences in a mixed pool [50][51][52][53] . The six substrates (analogs of tryptophan, phenylalanine, leucine, isoleucine, valine, and methionine) represent a range of sizes and biochemical classes (aromatic, aliphatic, sulfur-containing), as well as amino acids thought to be early (Leu, Ile, Val) and late (Trp, Phe, Met) incorporations into the genetic code [54][55][56][57][58] . Because of this span, the chosen amino acids should be considered model systems to study trends in rate enhancement, specificity, and proximity of ribozymes in sequence space, rather than as a detailed model of the early prebiotic emergence of the genetic code. ...
... Within the group of small hydrophobic side chains, both β-branched and -unbranched residues were included. The set includes side chains that are considered early as well as side chains that are considered late additions to the genetic code [54][55][56][57][58] . In particular, aromatic residues, of which two were chosen to assess specificity within the class, are thought to have been added relatively late. ...
... Such properties of overlapping fitness landscapes could facilitate the expansion from a weakly active, promiscuous ribozyme to an elaborated system of ribozyme-substrate pairs. While the order in which amino acids were incorporated into the genetic code is a subject of debate, the amino acid substrates tested here include those that are generally believed to be early (L, I, V) and late (W, F, M) additions to the code [54][55][56][57][58] . The aromatic residues were generally preferred by all ribozyme families. ...
Article
Full-text available
Systems of catalytic RNAs presumably gave rise to important evolutionary innovations, such as the genetic code. Such systems may exhibit particular tolerance to errors (error minimization) as well as coding specificity. While often assumed to result from natural selection, error minimization may instead be an emergent by-product. In an RNA world, a system of self-aminoacylating ribozymes could enforce the mapping of amino acids to anticodons. We measured the activity of thousands of ribozyme mutants on alternative substrates (activated analogs for tryptophan, phenylalanine, leucine, isoleucine, valine, and methionine). Related ribozymes exhibited shared preferences for substrates, indicating that adoption of additional amino acids by existing ribozymes would itself lead to error minimization. Furthermore, ribozyme activity was positively correlated with specificity, indicating that selection for increased activity would also lead to increased specificity. These results demonstrate that by-products of ribozyme evolution could lead to adaptive value in specificity and error tolerance. Complex biochemical systems exhibit traits that appear to be highly adapted. Studies of catalytic RNA demonstrate that adaptive traits, such as increased specificity and error tolerance, could originate as evolutionary by-products.
... Although several analyses suggest the probable sequence of the late amino acid incorporation into the genetic code, many debates and questions about the order and factors that influenced these events remain open [56,130,131]. ...
... To start with, cysteine is regarded as one of the latest additions to the code according to the Trifonov meta-analysis [130]. At the same time, it is one of the most active and unique amino acids involved particularly in Fe-S clusters (such as in ferredoxin, considered one of the earliest protein domains) and conflicting hypotheses have been proposed as to whether these features were indispensable in early evolution. ...
... Finally, Met, Trp and Tyr are considered the latest additions to the amino acid alphabet [58,130]. A study by Granold et al. [58] suggests that they were incorporated into the genetic code during the great oxidation event as they showed antioxidant properties. ...
Article
Full-text available
Recent developments in Origins of Life research have focused on substantiating the narrative of an abiotic emergence of nucleic acids from organic molecules of low molecular weight, a paradigm that typically sidelines the roles of peptides. Nevertheless, the simple synthesis of amino acids, the facile nature of their activation and condensation, their ability to recognize metals and cofactors and their remarkable capacity to self-assemble make peptides (and their analogues) favourable candidates for one of the earliest functional polymers. In this mini-review, we explore the ramifications of this hypothesis. Diverse lines of research in molecular biology, bioinformatics, geochemistry, biophysics and astrobiology provide clues about the progression and early evolution of proteins, and lend credence to the idea that early peptides served many central prebiotic roles before they were encodable by a polynucleotide template, in a putative ‘peptide-polynucleotide stage’. For example, early peptides and mini-proteins could have served as catalysts, compartments and structural hubs. In sum, we shed light on the role of early peptides and small proteins before and during the nucleotide world, in which nascent life fully grasped the potential of primordial proteins, and which has left an imprint on the idiosyncratic properties of extant proteins.
... In the book Genesis and Evolutionary Development of Life, Oparin proposed that prokaryotes may have emerged three to four billion years ago with a highly ambiguous and/or primitive genetic sequence encoding proteins built from about seven most abundant amino acids in the primordial soup. Therefore, LUCA may just have had a simpler genetic codon system than that in present-day life forms [6], which gained higher complexity along the process of life evolution [7,8]. ...
... It is considered that the process of amino acid acquisition and reduction during life evolution follows the pattern assumed in the neutral model theory [9,10]. Specifically, newly recruited amino acids will sacrifice the use of older ones to gain a foothold [7,11,12]. However, the introduction of new amino acids must be slow because frequent changes in protein composition can be fatal to life [13,14]. ...
... In 1975, Wong proposed an evolutionary map of the genetic code and defined two minor amino acid centres (Phe-Tyr and Val-Leu) [17]. In 2000 and 2004, Trifonov suggested the temporal appearance order of amino acids and their respective codons based on various criteria such as thermostability, complementarity, and processivity [7,18]. The coevolution of amino acids and genetic codons is still ongoing today [19,20] and is supported by several recent findings. ...
Article
Full-text available
The mechanisms shaping the amino acids recruitment pattern into the proteins in the early life history presently remains a huge mystery. In this study, we conducted genome-wide analyses of amino acids usage and genetic codons structure in 7270 species across three domains of life. The carried-out analyses evidenced ubiquitous usage bias of amino acids that were likely independent from codon usage bias. Taking advantage of codon usage bias, we performed pseudotime analysis to re-determine the chronological order of the species emergence, which inspired a new species relationship by tracing the imprint of codon usage evolution. Furthermore, the multidimensional data integration showed that the amino acids A, D, E, G, L, P, R, S, T and V might be the first recruited into the last universal common ancestry (LUCA) proteins. The data analysis also indicated that the remaining amino acids most probably were gradually incorporated into proteogenesis process in the course of two long-timescale parallel evolutionary routes: I→F→Y→C→M→W and K→N→Q→H. This study provides new insight into the origin of life, particularly in terms of the basic protein composition of early life. Our work provides crucial information that will help in a further understanding of protein structure and function in relation to their evolutionary history.
... Since the decoding of genetic information during RNA translation also proceeds using codons that map the sequence of triplets to amino acids, it is conceivable that constraints on the order of triplets in RNA sequences in the primordial world would also constrain the assignment of amino acids to specific triplets (codons), i.e., the emergence of the genetic code. This is especially so because the availability and abundance of specific amino acids has changed dramatically during the earth's early history with simple amino acids such as glycine envisioned to have emerged early in the prebiotic era and more complex ones such as tyrosine later [35][36][37][38]. ...
... Thus, to investigate potential relationships between the topological connections of the triplets, the structure of the genetic code and chronology of amino acids, each node in the triplet network was also labelled by the encoded amino acid based on the standard genetic code (Fig. 2). Each node was also color coded by a consensus [35,37,38] of whether the assigned amino acid has been consistently produced in prebiotic chemistry experiments [39][40][41][42], and has been detected in meteorites ('early amino acid') [43,44] or emerged later ('late amino acids'). Furthermore, studies show that the concentration of the early amino acids in the recent evolution of proteins is mostly decreasing while that of late amino acids is mostly increasing [35,36]. ...
... The 10 early amino acids were based on prebiotic chemistry experiments and also have been identified in meteorites in the following order of relative abundance: Gly, Ala, Asp, Glu, Val, Ser, Ile, Leu, Pro, Thr [35]. This is also in agreement with consensus approaches for the recruitment of amino acids into the genetic code [37,38] and correlates with expectation of free-energy synthesis of different amino acids [45]. ...
Preprint
Full-text available
Life on earth relies on three types of information polymers-DNA, RNA and proteins. In all organisms and viruses, these molecules are synthesized by the copying of pre-existing templates. A triplet-based code known as the genetic code guides the synthesis of proteins by complex enzymatic machines that decode genetic information in RNA sequences. The origin of the genetic code is one of the most fundamental questions in biology. In this study, computational analysis of about 5,000 species level metagenomes using techniques for the analysis of human language suggests that the genomes of extant organisms contain relics of a distinct triplet code that potentially predates the genetic code. This code defines the relationship between adjacent triplets in DNA/RNA sequences, whereby these triplets predominantly differ by a single base. Furthermore, adjacent triplets encode amino acids that are thought to have emerged around the same period in the earth’s early history. The results suggest that the order of triplets in primordial RNA sequences was associated with the availability of specific amino acids, perhaps due to a coupling of a triplet-based primordial RNA synthesis mechanism to a primitive mechanism of peptide bond formation. Together, this coupling could have given rise to early nucleic acid sequences and a system for encoding amino acid sequences in RNA, i.e. the genetic code. Thus, the central role of triplets in biology potentially extends to the primordial world, contributing to both the origins of genomes and the origins of genetically coded protein synthesis. Significance One of the most intriguing discoveries in biology is that the order of amino acids in each protein is determined by the order of nucleotides (commonly represented by the letters A, U, G, C) in a biological molecule known as RNA. The genetic code serves as a dictionary that maps each of the 64 triplets ‘words’ in RNA to the 20 amino acids, thereby specifying how information encoded in RNA is decoded into sequences of amino acids (i.e., proteins). The deciphering of the genetic code was one of the greatest discoveries of the 20th century (1968 Nobel Prize in Medicine and Physiology) and is central to modern molecular biology. Yet, how it came to be that the order of triplets in RNA encodes the sequence of the protein synthesized remains one of the most important enigmas of biology. Paradoxically, in all life forms proteins cannot be synthesized without RNA and RNA itself cannot also be synthesized without proteins, presenting a chicken and egg dilemma. By analyzing thousands of microbial genomes using approaches drawn from the field of natural language processing, this study finds that the order of triplets across genomes contains relics of an ancient triplet code, distinct from but closely connected to the genetic code. Unlike the genetic code which specifies the relationship between information in RNA and the sequence of proteins, this ancient code describes the relationship between adjacent triplets in extant genome sequences, whereby such triplets are often different from each other by a single letter. Triplets that are closely related by this ancient code encode amino acids that are thought to have emerged around the same period in the earth’s early history. In other words, a fossil record of the chronological order of appearance of amino acids on early earth appears written in genome sequences. This potentially demonstrates that the process by which RNA sequences were synthesized in the primordial world relied on triplets and was coupled to amino acids available at the time. Hence, the connections between primordial RNA synthesis and a primitive mechanism for linking amino acids to form peptides could have enabled one type of molecule (RNA) to code for the other (protein), facilitating the emergence of the genetic code.
... Showing a list of all twenty α-amino acids used by all biology on the Earth for making proteins. (a) Trifonov's (2004), sequential listing of reconstruction of appearance of amino acids as per evolutionary time-frame [10]. Trifonov [9]. ...
... Showing a list of all twenty α-amino acids used by all biology on the Earth for making proteins. (a) Trifonov's (2004), sequential listing of reconstruction of appearance of amino acids as per evolutionary time-frame [10]. Trifonov [9]. ...
... However, proteins also contain disulfide bridges in general which are formed due to the presence of the sulfur containing amino acid, cysteine (with its thiol, S-H, side chain) which is conspicuously absent in the organic inventory of meteorites [14]. Cysteine is thought to be of biogenic origin (i.e., made by living entities) and in terms of its relative importance, ranks almost at the bottom according to Trifonov (column a, Table 1), meaning that it may have been acquired later on during the evolutionary time frame [10,18]. The implication of the absence of disulfides bridges implies that PrDs were somewhat simpler when compared to, for example, proteins which contain cysteine. ...
Article
Full-text available
In this paper the hypothesis that prions and prion-like molecules could have initiated the chemical evolutionary process which led to the eventual emergence of life is reappraised. The prions first hypothesis is a specific application of the protein-first hypothesis which asserts that protein- based chemical evolution preceded the evolution of genetic encoding processes. This genetics- first hypothesis asserts that an “RNA-world era” came before protein-based chemical evolution and rests on a singular premise that molecules such as RNA, acetyl-CoA, and NAD are relics of a long line of chemical evolutionary processes preceding the Last Universal Common Ancestor (LUCA). Nevertheless, we assert that prions and prion-like molecules may also be relics of chemical evolutionary processes preceding LUCA. To support this assertion is the observation that prions and prion-like molecules are involved in a plethora of activities in contemporary biology in both complex (eukaryotes) and primitive life forms. Furthermore, a literature survey reveals that small RNA virus genomes harbor information about prions (and amyloids). If, as has been presumed by proponents of the genetics-first hypotheses, small viruses were present during an RNA world era and were involved in some of the earliest evolutionary processes, this places prions and prion-like molecules potentially at the heart of the chemical evolutionary process whose eventual outcome was life. We deliberate on the case for prions and prion-like molecules as the frontier molecules at the dawn of evolution of living systems.
... Trifonov [52] на основе комплексного анализа 60 физических, химических и биохимических свойств аминокислот и соответствующих им кодонов/антикодонов установил временную последовательность («хронологию») аминокислот и кодонов в генетическом коде. Финальный консенсус выявил следующий временной порядок канонических аминокислот: 1-Gly, 2-Ala, 3-Asp, 4-Val, 5-Pro, 6-Ser, 7-Glu, -Thr, 9-Leu, 10-Arg, 11-Ile, 12-Gln, 13-Asn, 14-His, 15-Lys, 16-Cys, 17-Phe, 18-Tyr, 19-Met, 20-Trp. ...
... Затем Higgs и Pudritz [20] для углубленного анализа хронологии аминокислот в коде использовали дополнительные термодинамические критерии. Установленный ими временной порядок аминокислот почти полностью совпал с таковым по Trifonov [52], но Pro был переставлен с пятой на седьмую позицию вместо Glu. Также, согласно теории коэволюции [51], Glu напрямую продуцируется из α-кетоглутарата, а Pro является производной иминокислотой из Glu. ...
... It is usually enhanced by the endophytic microbial communities living inside healthy host tissues (seed, root, leaf, flower), playing an important role in mitigating abiotic and biotic stresses in plants. (NOTE: Because of its central role in linking amino acids with nucleotide triplets contained in tRNAs, tRNA ligase is thought to be among the first proteins that appeared in evolution [5]). The evolutionary analyses presented in this phylogenetic tree were conducted in MEGA X [6,7]. ...
... The evolutionary analyses presented in this phylogenetic tree were conducted in MEGA X [6,7]. (NOTE: Because of its central role in linking amino acids with nucleotide triplets contained in tRNAs, tRNA ligase is thought to be among the first proteins that appeared in evolution [5]). The evolutionary analyses presented in this phylogenetic tree were conducted in MEGA X [6,7]. ...
Article
Full-text available
According to the World Health Organization (WHO), depression is a leading cause of disability worldwide and a major contributor to the overall global burden of mental disorders. An increasing number of studies have revealed that among 20 different amino acids, high proline consumption is a dietary factor with the strongest impact on depression in humans and animals, including insects. Recent studies acknowledged that gut microbiota play a key role in proline-related pathophysiology of depression. In addition, the multi-omics approach has alleged that a high level of metabolite proline is directly linked to depression severity, while variations in levels of circulating proline are dependent on microbiome composition. The gut–brain axis proline analysis is a gut microbiome model of studying depression, highlighting the critical importance of diet, but nothing is known about the role of the plant microbiome–food axis in determining proline concentration in the diet and thus about preventing excessive proline intake through food consumption. In this paper, we discuss the protocooperative potential of a holistic study approach combining the microbiota–gut–brain axis with the microbiota–plant–food–diet axis, as both are involved in proline biogenesis and metabolism and thus on in its effect on mood and cognitive function. In preharvest agriculture, the main scientific focus must be directed towards plant symbiotic endophytes, as scavengers of abiotic stresses in plants and modulators of high proline concentration in crops/legumes/vegetables under climate change. It is also implied that postharvest agriculture—including industrial food processing—may be critical in designing a proline-balanced diet, especially if corroborated with microbiome-based preharvest agriculture, within a circular agrifood system. The microbiome is suggested as a target for selecting beneficial plant endophytes in aiming for a balanced dietary proline content, as it is involved in the physiology and energy metabolism of eukaryotic plant/human/animal/insect hosts, i.e., in core aspects of this amino acid network, while opening new venues for an efficient treatment of depression that can be adapted to vast groups of consumers and patients. In that regard, the use of artificial intelligence (AI) and molecular biomarkers combined with rapid and non-destructive imaging technologies were also discussed in the scope of enhancing integrative science outcomes, agricultural efficiencies, and diagnostic medical precisions.
... The latter are also proteinogenic amino acids, but are not believed to have been encoded early, based on the proposed genealogy of the 64 codons of the genetic code. [39,40] It is not unlikely that they were present in prebiotic mixtures, though. Phenylalanine has been found in the product mixture of the Miller experiment. ...
... Today's biosynthesis of tryptophan is long and complex, and the aromatic amino acids are known to have appeared late in the genetic code. [39,40] From a chemical point of view, though, it is not unreasonable to assume that the known abiotic syntheses of tryptophan could have produced concentration as high as those found in the oceanic lithosphere. [45,46] Indoles are resonance-stabilized aromatic compounds that can form readily under a range of chemical conditions, as known from the Fischer indole synthesis, the Reissert indole synthesis, the Bartoli indole synthesis, the Japp-Klingemann reaction, the Leimgruber-Batcho indole synthesis, the Madelung synthesis, the Nenitzescu indole synthesis, and the Baeyer-Emmerling indole synthesis, to name just a few. ...
Article
Full-text available
The formation of peptides from amino acids is one of the processes associated with life. Because of the dominant role of translation in extant biology, peptide-forming processes that are RNA-induced are of particular interest. We had previously reported the formation of phosphoramidate-linked peptido RNAs as the products of spontaneous condensation reactions between ribonucleotides and free amino acids in aqueous solution. We now asked whether four-helix bundle (4HB) DNA or RNA folding motifs with a single- or double-nucleotide gap next to a 5'-phosphate can act as reaction sites for phosphoramidate formation. For glycine, this was found to be the case, whereas phenylalanine and tryptophan showed accelerated formation of peptides without a covalent link to the nucleic acid. Free peptides with up to 11 tryptophan or phenylalanine residues were found in precipitates forming in the presence of gap-containing DNA or RNA 4HB's. Control experiments using motifs with just a nick or primer alone did not have the same effect. Because folded structures with a gap in a double helix are likely products of hybridization of strands formed in statistically controlled oligomerization reactions, our results are interesting in the context of prebiotic scenarios. Independent of a putative role in evolution, our findings suggest that for some aromatic amino acids an RNA-induced pathway for oligomerization exists that does not have a discernable link to translation.
... This assay and related techniques leverage high-throughput sequencing to measure the activity of thousands of candidate sequences in a mixed pool [50][51][52][53] . The six substrates (analogs of tryptophan, phenylalanine, leucine, isoleucine, valine, and methionine) represent a range of sizes and biophysical classes (aromatic, aliphatic, sulfur-containing), as well as supposed early (Leu, Ile, Val) and late (Trp, Phe, Met) incorporations into the genetic code [54][55][56][57][58] . Our findings indicate extensive opportunities for co-option to incorporate new substrates into the system. ...
... Such properties of overlapping fitness landscapes could facilitate the expansion from a weakly active, promiscuous ribozyme to an elaborated system of ribozyme-substrate pairs. While the order in which amino acids were incorporated into the genetic code is a subject of debate, the amino acid substrates tested here include those that are generally believed to be early (L, I, V) and late (W, F, M) additions to the code [54][55][56][57][58] . Interestingly, the aromatic residues were generally preferred by all ribozyme families. ...
Preprint
Full-text available
The emergence of the genetic code was a major transition in the evolution from a prebiotic RNA world to the earliest modern cells. A prominent feature of the standard genetic code is error minimization, or the tendency of mutations to be unusually conservative in preserving biophysical features of the amino acid. While error minimization is often assumed to result from natural selection, it has also been speculated that error minimization may be a by-product of emergence of the genetic code. During establishment of the genetic code in an RNA world, self-aminoacylating ribozymes would enforce the mapping of amino acids to anticodons. Here we show that expansion of the genetic code, through co-option of ribozymes for new substrates, could result in error minimization as an emergent property. Using self-aminoacylating ribozymes previously identified during an exhaustive search of sequence space, we measured the activity of thousands of candidate ribozymes on alternative substrates (activated analogs for tryptophan, phenylalanine, leucine, isoleucine, valine, and methionine). Related ribozymes exhibited preferences for biophysically similar substrates, indicating that co-option of existing ribozymes to adopt additional amino acids into the genetic code would itself lead to error minimization. Furthermore, ribozyme activity was positively correlated with specificity, indicating that selection for increased activity would also lead to increased specificity. These results demonstrate that by-products of the evolution and functional expansion of a ribozyme system could lead to adaptive properties of a genetic code. Such 'spandrels' could thus underlie significant prebiotic developments.
... For example, the frozen accident is related to the amino acid alphabet expanding. Trifonov et al studied the temporal order of 20 kinds of amino acids [Trifonov et al, 1997;Trifonov, 2004]. Wong proposed the coevolution theory which indicated the coevolution existed between amino acids and codes [Wong, 1975;. ...
... If there is enough knowledge on the amino-acid chronology (including the historical variation on the degeneracies of these amino acids) then we are able to deduce a more real picture on the genetic code evolution through GMD minimization under the varying constraints. Trifonov (2004) indicated two important features of amino acid evolution: the amino acids synthesized in Miller experiments appeared first, and those associated with codon capture events (when all 64 triplets are already engaged and codons for new amino acid have to be captured from the established codon repertoires) came last. Due to lack of the knowledge on the amino acid degeneracy we propose a simplified model as follows. ...
Article
Full-text available
A quantitative theory on the construction and the evolution of the genetic code is proposed. Through introducing the concept of mutational deterioration (MD) and developing a theoretical formalism on MD minimization we have proved: 1,the redundancy distribution of codons in the genetic code obeys MD minimization principle; 2, the hydrophilic-hydrophobic distribution of amino acids on the code table is global MD (GMD) minimal; 3, the standard genetic code can be deduced from the adaptive minimization of GMD; 4, the variants of the standard genetic code can be explained quantitatively by use of GMD formalism and the general trend of the evolution is GMD non-increasing which reflects the selection on the code. We have demonstrated that the redundancy distribution of codons and the hydrophobic-hydrophilic (H-P) distribution of amino acids are robust in the code relative to the mutational parameter, and indicated that the GMD can be looked as a non-fitness function on the adaptive landscape. Finally, an important aspect on the symmetry of the code construction, the Yin-Yang duality is investigated. The Yin-Yang duality among codons affords a sound basis for understanding the H-P structure in the genetic code. The approximate universality of the canonical genetic code and the discoveries of various deviant codes in a wide range of organisms strongly reveal that the genetic code is still evolving. Several mechanisms on code evolution were proposed, for example, the codon capture and the ambiguous decoding by tRNA [Knight et al, 2001; Santos et al, 2004]. However, a unified theory still lacks for a full explanation of the genetic code evolution both in its high universality and various deviations. Evidently, the point is closely related to the construction of the code. The construction of the genetic code obeys some general rules that afford a basis for understanding the universality and changeability of the code. On the other hand, the error minimization property of the genetic code was analyzed by several authors [Di Giuilo et al, 1994; Freeland & Hurst, 1998]. But it is still unclear why the canonical genetic code takes the standard form with error non-minimized and what are the evolutionary constraints for deducing the standard code. In the article we emphasize the unified understanding of the code construction and code evolution. We shall indicate that the unification between code construction and code evolution can be achieved through introducing the concept of mutational deterioration (MD) and developing a theoretical formalism for MD minimization. The materials are organized in the article as follows. In the first section we will review the mutational deterioration theory on the redundancy distribution in the genetic code. Then the adaptive minimization of global mutational deterioration and the accuracy of the genetic
... Root-Bernstein, 1982; Biro 2007; Lella and Mahalakshmi, 2017; Frenkel-Pinter et al, 2020). The most abundant amino acids are (and possibly were) the structurally versatile Gly and Ala(Trifonov 2004). Peptide flexibility is guaranteed by the small, rotating Gly, rigidity is guaranteed by Ala, Tyr, Trp, Phe and conformation changes are guaranteed by His. ...
Article
Full-text available
The central dogma of molecular biology dictates that, with only a few exceptions, information proceeds from DNA to protein through an RNA intermediate. Examining the enigmatic steps from prebiotic to biological chemistry, we take another road suggesting that primordial peptides acted as template for the self-assembly of the first nucleic acids polymers. Arguing in favour of a sort of archaic "reverse translation" from proteins to RNA, our basic premise is a Hadean Earth where key biomolecules such as amino acids, polypeptides, purines, pyrimidines, nucleosides and nucleotides were available under different prebiotically plausible conditions, including meteorites delivery, shallow ponds and hydrothermal vents scenarios. Supporting a protein-first scenario alternative to the RNA world hypothesis, we propose the primeval occurrence of short two-dimensional peptides termed "selective amino acid-and nucleotide-matching oligopeptides" (henceforward SANMAOs) that noncovalently bind at the same time the polymerized amino acids and the single nucleotides dispersed in the prebiotic milieu. In this theoretical paper, we describe the chemical features of this hypothetical oligopeptide, its biological plausibility and its virtues from an evolutionary perspective. We provide a theoretical example of SANMAO's selective pairing between amino acids and nucleosides, simulating a poly-Glycine peptide that acts as a template to build a purinic chain corresponding to the glycine's extant triplet codon GGG. Further, we discuss how SANMAO might have endorsed the formation of low-fidelity RNA's polymerized strains, well before the appearance of the accurate genetic material's transmission ensured by the current translation apparatus.
... The ten peptides were chosen to represent sets of amino acids that reect different stages in the early evolution of life (Fig. 1A): peptides P1 to P6 are dominated by amino acids that are abundant in Miller-type reactions 38 and were likely present in the earliest stages of life. 39 Several of these peptides explored the addition of arginine and lysine: arginine precursors are generated in cyanosuldic models for prebiotic reactions, 19,20 and lysine has been identied in meteorites. 40,41 In contemporary proteins, these two amino acids are the most common side chains in direct contact with RNA 42 and were therefore promising to explore for their benet to ribozyme function. ...
Article
Full-text available
Early stages of life likely employed catalytic RNAs (ribozymes) in many functions that are today filled by proteins. However, the earliest life forms must have emerged from heterogenous chemical mixtures, which included amino acids, short peptides, and many other compounds. Here we explored whether the presence of short peptides can help the emergence of catalytic RNAs. To do this, we conducted an in vitro selection for catalytic RNAs from randomized sequence in the presence of ten different peptides with a prebiotically plausible length of eight amino acids. This in vitro selection generated dozens of ribozymes, one of them with ∼900-fold higher activity in the presence of one specific peptide. Unexpectedly, the beneficial peptide had retained its N-terminal Fmoc protection group, and this group was required to benefit ribozyme activity. The same, or higher benefit resulted from peptide conjugates with prebiotically plausible polyaromatic hydrocarbons (PAHs) such as fluorene and naphthalene. This shows that PAH-peptide conjugates can act as potent cofactors to enhance ribozyme activity. The results are discussed in the context of the origin of life.
... It is currently thought that glycine, alanine, leucine, proline, and serine are among the eight amino acids first recruited into genetic code. 76,77 The UV photochemistry of all five has now been studied. All of the amino acids studied thus far directly produce HOCO radicals when exposed to 213 nm light. ...
Article
The ultraviolet photochemistry of the amino acids glycine, leucine, proline, and serine in their neutral forms was investigated using parahydrogen matrix-isolation spectroscopy. Irradiation by 213 nm light destroys the chirality of all three chiral amino acids as a result of the α-carbonyl C-C bond cleavage and hydrocarboxyl (HOCO) radical production. The temporal behavior of the Fourier-transform infrared spectra revealed that HOCO radicals rapidly reach a steady state, which occurs predominantly due to photodissociation of HOCO into CO + OH or CO2 + H. In glycine and leucine, the amine radicals generated by the α-carbonyl C-C bond cleavage rapidly undergo hydrogen elimination to yield methanimine and 3-methylbutane-1-imine, respectively. Breaking of the α-carbonyl C-C bond in proline appeared to yield 1-pyrroline, although due to its weak absorption it remains unconfirmed. In serine, additional products were formaldehyde and E/Z ethanimine. The present study shows that the direct production of HOCO previously observed in α-alanine generalizes to other amino acids of varying structure. It also revealed a tendency for amino acid photolysis to form imines rather than amine radicals. HOCO should be useful in the search for amino acids in interstellar space, particularly in combination with simple imine molecules.
... It is also conceivable that during the establishment of the genetic code, primitive tRNA genes underwent duplication and mutation in a manner that incrementally expanded the genetic code by matching new amino acids with new codon-anticodon interactions. This indicates that the near-universal genetic code of extant life emerged by descent with modification from smaller and simpler codes containing fewer codons, potentially where glycine, alanine, aspartic acid, and valine were among the first amino acids to join the genetic code (Trifonov 2004;Macé and Gillet 2016). As the genetic code increased in complexity, coded proteins would have gradually replaced many of the functions once carried out by RNA. ...
Article
Full-text available
Darwin’s assertion that “it is mere rubbish thinking, at present, of origin of life” (quoted from Peretó et al. 2009) is no longer valid. By synthesizing origin of life (OoL) research from its inception to recent findings, with a focus on (i) proof-of-principle prebiotically plausible syntheses and (ii) molecular relics of the ancient RNA World, we present a comprehensive up-to-date description of science’s understanding of the OoL and the RNA World hypothesis. Based on these observations, we solidify the consensus that RNA evolved before coded proteins and DNA genomes, such that the biosphere began with an RNA core where much of the translation apparatus and related RNA architecture arose before RNA transcription and DNA replication. This supports the conclusion that the OoL was a gradual process of chemical evolution involving a series of transitional forms between prebiotic chemistry and the last universal common ancestor (LUCA) during which RNA played a central role, and that many of the events and their relative order of occurrence along this pathway are known. The integrative nature of this synthesis also extends previous descriptions and concepts and should help inform future questions and experiments about the ancient RNA World and the OoL.
... To bring this number down, it should be noted that not all the possible amino acid pairs are available for bonding and higher affinities generate more persisting biomolecules (Root-Bernstein, 1982; Biro 2007; Lella and Mahalakshmi, 2017; Frenkel-Pinter et al, 2020). The most abundant amino acids are (and possibly were) Gln and the structurally versatile Gly and Ala (Trifonov 2004). Peptide flexibility is guaranteed by the small, rotating Gly, rigidity is guaranteed by Ala, Tyr, Trp, Phe and conformation changes are guaranteed by His. ...
Preprint
Full-text available
The central dogma of molecular biology dictates that, with only a few exceptions, information proceeds from DNA to protein through an RNA intermediate. Examining the enigmatic steps from prebiotic to biological chemistry, we take another road suggesting that primordial peptides acted as template for the self-assembly of the first nucleic acids polymers. Arguing in favour of a sort of archaic “reverse translation” from proteins to RNA, our basic premise is a Hadean Hearth where key biomolecules such as amino acids, polypeptides, purines, pyrimidines, nucleosides and nucleotides were available under different prebiotically plausible conditions, including meteorites delivery, shallow ponds and hydrothermal vents scenarios. Supporting a protein-first scenario alternative to the RNA world hypothesis, we propose the primeval occurrence of short peptides termed “selective amino acid- and nucleotide-matching oligopeptides” (henceforward SANMAOs) that noncovalently bind at the same time the polymerized amino acids and the single nucleotides dispersed in the prebiotic milieu. We describe the chemical features of this hypothetical oligopeptide, its biological plausibility and its virtues from an evolutionary perspective. We provide a theoretical example of SANMAO’s selective pairing between amino acids and nucleosides, simulating a poly-Glycine peptide that acts as a template to build a purinic chain corresponding to the glycine’s extant triplet codon GGG. Further, we discuss how SANMAO might have endorsed the formation of low-fidelity RNA’s polymerized strains, well before the appearance of the accurate genetic material’s transmission ensured by the current translation apparatus.
... Another possibility is that there was a major, ancient shift in REs on the branch separating Bacteria from Archaea+eukaryotes, followed by changes in response to specific factors such as the shifts in genomic base composition or environmental shifts (e.g., high salt concentrations in Halobacteriaceae). Notably, the aromatic amino acids, as well as some of the other amino acids involved in highly variable pairs, are thought to be late additions to the genetic code and to have increased in frequency since the time of the last universal common ancestor of life [63][64][65]. Regardless of the basis for the RE shifts, it seems clear that the shifts represent an important feature of protein evolution. ...
Preprint
The factors that determine the relative rates of amino acid substitution during protein evolution are complex and they are known to vary among taxa. We estimated relative exchangeabilities for pairs of amino acids from clades spread across the tree of life and assessed the historical signal in the distances among these clade-specific models. We trained these models separately on collections of arbitrarily selected protein alignments and on ribosomal protein alignments. In both cases we found a clear separation between the models trained using multiple sequence alignments from bacterial clades and the models trained on archaeal and eukaryotic data. We assessed the predictive power of our novel clade-specific models of sequence evolution by asking whether fit to the models could be used to identify the source of multiple sequence alignments. Model fit was generally able to classify protein alignments correctly at the level of domain (bacterial versus archaeal), but the accuracy of classification at finer scales was much lower. The only exceptions to this were the relatively high classification accuracy for two archaeal lineages: Halobacteriaceae and Thermoprotei. Genomic GC content had a modest impact on relative exchangeabilities despite having a large impact on amino acid frequencies. Relative exchangeabilities involving aromatic residues exhibited the largest differences among models. There were a small number of exchangeabilities that exhibited large differences in comparisons among major clades and between generalized models and ribosomal protein models. Taken as a whole, these results reveal that a small number of relative exchangeabilities are responsible for much of the structure of the “model space” for protein sequence evolution. If we look beyond the information that these clade-specific models reveal about protein evolution the models themselves are likely to be useful tools for phylogenomic inference across the tree of life.
... On the other side, the universal set of amino acids is a comprehensive assessment of biosynthetic cost, solubility, stability, etc. [48]. A consensus chronology of amino acids has been built based on many different criteria [49,50]. ...
Article
Full-text available
Extant biology uses RNA to record genetic information and proteins to execute biochemical functions. Nucleotides are translated into amino acids via transfer RNA in the central dogma. tRNA is essential in translation as it connects the codon and the cognate amino acid. To reveal how the translation emerged in the prebiotic context, we start with the structure and dissection of tRNA, followed by the theory and hypothesis of tRNA and amino acid recognition. Last, we review how amino acids assemble on the tRNA and further form peptides. Understanding the origin of life will also promote our knowledge of artificial living systems.
... The stereochemical matching of amino acids and nucleotide triplets/anti-triplets gives insight into how the genetic code originated [47,48]. There is a predicted order in which amino acids were assigned to codons during the development of the genetic code [49]; in this order, amino acids that were integrated early into the genetic code tend to bind their codons whilst those that were integrated late tend to bind their anticodons [48]. The idea that translation was initially based on codon-amino acid affinities before it was mediated by tRNA adaptors may explain the direct interactions between mRNAs and their cognate proteins in modern cells [50]. ...
Article
Full-text available
It is not entirely clear why, at some stage in its evolution, terrestrial life adopted double-stranded DNA as the hereditary material. To explain this, we propose that small, double-stranded, polynucleotide circlets have special catalytic properties. We then use this proposal as the basis for a ‘view from here’ that we term the Circlet hypothesis as part of a broader Ring World. To maximize the potential explanatory value of this hypothesis, we speculate boldly about the origins of several of the fundamental characteristics and briefly describe the main methods or treatments applied. The principal prediction of the paper is that the highly constrained, conformational changes will occur preferentially in dsDNA, dsRNA and hybrid RNA-DNA circlets that are below a critical size (e.g., 306 bp) and that these will favor the polymerization of precursors into RNA and DNA. We can conclude that the Circlet hypothesis and the Ring World therefore have the attraction of offering the same solution to the fundamental problems that probably confront both the earliest cells and the most recent ones.
... Clusters in the conformation [4Fe-4S] coordinated by monomeric cysteine (Figure 3.30) were chosen for this experiment as evidence suggests that their abiotic formation in alkaline, anaerobic environments is possible . Cysteine is considered to be a late addition amino acid (Trifonov, 2004), and although its synthesis has been calculated to be thermodynamically unfavourable under alkaline conditions (Amend and Shock, 1998), recent work has reported high yields of cysteine via its prebiotic formation in water (Foden et al., 2020), ...
Thesis
There is little agreement on how life might have started on Earth. Following life as a guide, phylogenetic and comparative biochemical studies point to an autotrophic origin, with complexity accruing over time. In modern metabolism, the universal energy currency is adenosine triphosphate (ATP), which not only drives metabolism through phosphorylation and condensation reactions but is also key for the synthesis of the informational molecules RNA, DNA, and proteins. Such deep conservation suggests an early origin of ATP, before the emergence of genes or genetically-encoded macromolecular machines such as the ATPase. This thesis explores the plausibility of an early emergence of ATP and the role it might have had at the origin of life. I first confirm earlier work showing moderate (15-20%) ATP yield from the non-enzymatic phosphorylation of ADP by acetyl phosphate (AcP), before systematically exploring the prebiotic context for this synthesis. AcP is a universally conserved intermediate between acetyl-CoA and ATP, bridging between thioester and phosphate metabolism. I show that it is possible to form moderate yields of ATP in a variety of aqueous environments. The combination of AcP and the catalyst Fe³⁺ is surprisingly favoured. No other prebiotically relevant metal ion, mineral, and phosphorylating agent tested here favoured ADP phosphorylation. Nor could AcP phosphorylate other nucleoside diphosphates to the triphosphates. I demonstrate a reaction mechanism that implicates the N7 and the N6 amino group on the adenine ring in the Fe³⁺-catalysed phosphorylation of ADP, implying a deep significance of the adenine base. Finally, I explore how ATP might have facilitated condensation reactions to generate nucleotide and peptide polymers in an aqueous environment, using life as a guide. These efforts met with limited success, confirming that condensation reactions are not facile in water. Nonetheless, my findings overall support the approach of taking life as a guide to study the origin of life.
... The recruitment order of the 20 amino acids from No.1t oNo.20 can be obtained by the roadmap (Fig 3a, 9a), which meets the basic requirement that Phase I amino acids appeared earlier than the Phase II amino acids (Wong 1975;Wong and Lazcano 2009). The species with complete genome sequences are sorted by the order R 10/10 according to their amino acid frequencies, where the order R 10/10 is defined as the ratio of the average amino acid frequencies for the last 10 amino acids to that for the first 10 amino acids (Li and Zhang 2009;Trifonov 2000;Trifonov 2004;Trifonov et al. 2001;Trifonov et al. 2006). Along the evolutionary direction indicated by the increasing R 10/10 ,t h ea m i n o acid frequencies vary in different monotonous manners for the 20 amino acids respectively (Fig 9). ...
Preprint
Full-text available
Nirenberg’s genetic code chart shows a profound correspondence between codons and amino acids. The aim of this article is to try to explain the primordial formation of the codon degeneracy. It remains a puzzle how informative molecules arose from the supposed prebiotic random sequences. If introducing an initial driving force based on the relative stabilities of triplex base pairs, the prebiotic sequence evolution became innately nonrandom. Thus, the primordial assignment of the 64 codons to the 20 amino acids has been explained in detail according to base substitutions during the coevolution of tRNAs with aaRSs; meanwhile, the classification of aaRSs has also been explained.
... Compared to a more common residue and/or a residue encoded by several codons, this would be a better indication to signify a start residue. A notable difficulty here is that the consensus on the order of introduction of the 20 usual residues in the genetic code ranks methionine as one of the very last amino acids to enter the code [13]. There would therefore have been a period with another start residue or without a defined start residue. ...
Article
Full-text available
Unlike its shorter analog, cysteine, and its methylated derivative, methionine, homocysteine is not today a proteinogenic amino acid. However, this thiol containing amino acid is capable of forming an activated species intramolecularly. Its thiolactone could have made it an interesting molecular building block at the origin of life on Earth. Here we study the cyclization of homocysteine in water and show theoretically and experimentally that in an acidic medium the proportion of thiolactone is significant. This thiolactone easily reacts with amino acids to form dipeptides. We envision that these reactions may help interpret why a methionine residue is introduced at the start of all protein synthesis.
... Возможно, что РНК-кольца, имеющие некоторую гомологию с петлями тРНК, -это аналоги первых, более ранних и имитирующих примордиальные, минимально-кодирующие и самовоспроизводящиеся РНК консенсусного типа (предположительно прото-тРНК), обладающие соответствующим антикодоном и адаптерными свойствами [5,6]. Существуют более 40 гипотез о порядке интеграции аминокислот генетическим кодом [22], каждая из которых проверялась / ранжировалась в отношении каждой же из 22 нуклеотидных позиций в РНК-кольцах, для обеих гипотез гомологии (с тРНК и с ориджном репликации) и для обоих же (A→G и C→T) типов дезаминирования. Одновременно существует представление, что тРНК, прежде чем стать специфическими трансляционными адаптерами, были использованы в репликации [23] (что не противоречит представлениям о реликтовом механизме совместной [24] или аминокислота-управляемой [25] репликации / трансляции). ...
Article
Full-text available
A special hypothetical mechanism of variable Individual Epitope Reverse Translation (at least 2 types) of eukaryotic cell is probably capable of reproducing primary linear (sens- / antisense-, CRISPR-, repeat-like, etc.) and secondary conformational (similar to quadruplexs, RNA-hairpins, RNA-ring-structures; etc.) oligonucleotide structures formed in the mitochondrial membrane-bound supramolecular and containing nanomolecular inclusions hypothetical particle of the retranslosome. This is the so-called nucleic acid equivalents of protein epitope, oligo-NEs, monomeric in ~15–30 and oligomeric in ~(15–30)n nucleotides, potentially capable of participating in the regulation of expression (activation, termination, switching) and modification of genes / genome, as well as in the creation protein / enzyme-containing nucleoprotein platform- / module- / complex-like formations in normal, pathologically altered (in particular, tumor) and virus-infected cells. Recently, in the GenBank databases, they are shown realistically and built / calculated bioinformatically in silico so-called minimum theoretical of 22 nucleotides and longer RNAring (stem-loop) structures, the composition of which depends, firstly, on constantly occurring chemical and enzymatic processes (including deamination mutations), and the properties of which, secondly, link, respectively, with the early (era of the so-called circular code) and later (era of modern universal coding, including the circular code as a component) evolutionary periods of the formation of the whole genetic code. It is generally accepted that the emergence and formation, respectively, of early evolutionary (proto-tRNA, proto-rRNA) and modern variants of molecules of the translational machine of mitochondria and cytoplasm is associated with stem-loop RNA-ring structures, similar to independently proposed oligo-NEs, such as tRNA, rRNA and gene products of ribosomal and other proteins.
... Возможно, что РНК-кольца, имеющие некоторую гомологию с петлями тРНК, -это аналоги первых, более ранних и имитирующих примордиальные, минимально-кодирующие и самовоспроизводящиеся РНК консенсусного типа (предположительно прото-тРНК), обладающие соответствующим антикодоном и адаптерными свойствами [5,6]. Существуют более 40 гипотез о порядке интеграции аминокислот генетическим кодом [22], каждая из которых проверялась / ранжировалась в отношении каждой же из 22 нуклеотидных позиций в РНК-кольцах, для обеих гипотез гомологии (с тРНК и с ориджном репликации) и для обоих же (A→G и C→T) типов дезаминирования. Одновременно существует представление, что тРНК, прежде чем стать специфическими трансляционными адаптерами, были использованы в репликации [23] (что не противоречит представлениям о реликтовом механизме совместной [24] или аминокислота-управляемой [25] репликации / трансляции). ...
Article
Full-text available
Специальный гипотетический механизм вариабельной поэпитопной обратной трансляции (по крайней мере 2 типов) отдельного эпитопа эукариотической клетки, вероятно, способен воспроизводить первичные линейные (типа сенс- / антисенс-, CRISPR-, повторподобные и др.) и вторичные конформационные (подобные квадруплексным, РНК-шпилечным, РНК-кольцевым структурам и др.) олигонуклеотидные структуры. Эти структуры формируются в митохондриальной мембраносвязанной супрамолекулярной и содержащей наномолекулярные включения гипотетической частице ретранслосоме. Это так называемые нуклеиновые эквиваленты (НЭ) белкового эпитопа, олиго-НЭ, мономерные в ~15–30 и олигомерные в ~(15–30)n нуклеотидов, потенциально способные участвовать в регуляции экспрессии (активации, терминации, переключении) и модификации генов / генома, а также в создании белок / ферментсодержащих нуклеопротеидных платформа- / модуль- / комплексподобных образований в нормальных и некоторых патологически измененных (в частности, опухолевых) и вирусинфицированных клетках. Недавно в базах GenBank показаны реально и выстроены / рассчитаны биоинформатически in silico минимальные теоретические в ~22 нуклеотида и более длинные РНК-кольцевые (стебель-петлевые) структуры. Их состав зависит от постоянно протекающих химических и ферментативных процессов (в том числе мутаций дезаминирования), а свойства связывают, соответственно, с ранним (эпохи циркулярного кода) и более поздним (эпохи современного универсального кодирования, включающего циркулярный код в качестве составной части) эволюционными периодами становления генетического кода. Принято считать, что с РНК-кольцевыми стебель-петлевыми структурами, схожими с ранее и независимо предложенными олиго-НЭ, связано появление и становление, соответственно, раннеэволюционных (прото-тРНК, прото-рРНК) и современных вариантов молекул-компонентов трансляционной машины митохондрий и цитоплазмы, таких как тРНК, рРНК и мРНК рибосом-ассоциированных генов белков.
... Thus, the current situation, when any of the three positions of the triplet can be occupied with any of the nucleotides, is to be considered a result of an evolutionary process. Trifonov (2004Trifonov ( , 2009) presents the extended version of the two-letter theory. His main emphasis was on the rules of codon transformation. ...
Article
Full-text available
We address issues of description of the origin and evolution of the genetic code from a semiotics standpoint. Developing the concept of codepoiesis introduced by Barbieri, a new idea of semio-poiesis is proposed. Semio-poiesis, a recursive auto-referential processing of semiotic system, becomes a form of organization of the bio-world when and while notions of meaning and aiming are introduced into it. The description of the genetic code as a semiotic system (grammar and vocabulary) allows us to apply the method of internal reconstruction to it: on the basis of heterogeneity and irregularity of the current state, to explicate possible previous states and various ways of forming mechanisms of coding and textualization. The revealed patterns are consistent with hypotheses about the origin and evolution of the genetic code.
... Next, the genetic code has expanded to 16 codons (GGC, GGG, GCC, GCG, GAC, GAG, GUC, GUG, CUC, GUG, CCC, CCG, CAC, CAG, CGC, and CGG) encoding 10 amino acids (Gly, Ala, Asp, and Val, Glu, Leu, Pro, His, Gln, and Arg) [119]. The order of the appearance of amino acids in the prebiotic GADV world was approximately the same as the evolutionary order of amino acid formation estab lished by Trifonov [120] based on 60 different criteria. ...
Article
The origin of genetic code and translation system is probably the central and most difficult problem in the inves tigations on the origin of life and one of the most complex problems in the evolutionary biology in general. There are mul tiple hypotheses on the emergence and development of existing genetic systems that propose the mechanisms for the origin and early evolution of genetic code, as well as for the emergence of replication and translation. Here, we discuss the most wellknown of these hypotheses, although none of them provides a description of the early evolution of genetic systems without gaps and assumptions. The RNA world hypothesis is a currently prevailing scientific idea on the early evolution of biological and prebiological structures, the main advantage of which is the assumption that RNAs as the first living systems were selfsufficient, i.e., capable of functioning as both catalysts and templates. However, this hypothesis has also significant limitations. In particular, no ribozymes with processive polymerase activity have been yet discovered or synthesized. Taking into account the mutual need of proteins and nucleic acids in each other in the current world, many authors propose the early evolution scenarios based on the coevolution of these two classes of organic molecules. They postulate that the emer gence of translation was necessary for the replication of nucleic acids, in contrast to the RNA world hypothesis, according to which the emergence of translation was preceded by the era of selfreplicating RNAs. Although such scenarios are less parsimonious from the evolutionary point of view, since they require simultaneous emergence and evolution of two classes of organic molecules, as well as the emergence of synchronized replication and translation, their major advantage is that they explain the development of processive and much more accurate proteindependent replication.
... Quantum chemical calculations of amino acids, and biochemical experiments with amino acids on animal membrane surfaces, suggested that tyrosine and tryptophan were added to the genetic code to prevent oxidative stress during the rise in concentration of molecular oxygen in the biosphere (Granold et al. 2018). The order or recruitment of amino acid into the protein synthesis system has also been proposed, based on amino acid properties (Francis 2013), on the amino acid frequency in ancestral sequences (Jordan et al. 2005), and on consideration of 60 other factors (Trifonov 2004). However, more sequence-based evidence about the route of evolution is required. ...
Article
Full-text available
Extant organisms commonly use 20 amino acids in protein synthesis. In the translation system, aminoacyl-tRNA synthetase (ARS) selectively binds an amino acid and transfers it to the cognate tRNA. It is postulated that the amino acid repertoire of ARS expanded during the development of the translation system. In this study we generated composite phylogenetic trees for seven ARSs (SerRS, ProRS, ThrRS, GlyRS-1, HisRS, AspRS, and LysRS) which are thought to have diverged by gene duplication followed by mutation, before the evolution of the last universal common ancestor. The composite phylogenetic tree shows that the AspRS/LysRS branch diverged from the other five ARSs at the deepest node, with the GlyRS/HisRS branch and the other three ARSs (ThrRS, ProRS and SerRS) diverging at the second deepest node. ThrRS diverged next, and finally ProRS and SerRS diverged from each other. Based on the phylogenetic tree, sequences of the ancestral ARSs prior to the evolution of the last universal common ancestor were predicted. The amino acid specificity of each ancestral ARS was then postulated by comparison with amino acid recognition sites of ARSs of extant organisms. Our predictions demonstrate that ancestral ARSs had substantial specificity and that the number of amino acid types amino-acylated by proteinaceous ARSs was limited before the appearance of a fuller range of proteinaceous ARS species. From an assumption that 10 amino acid species are required for folding and function, proteinaceous ARS possibly evolved in a translation system composed of preexisting ribozyme ARSs, before the evolution of the last universal common ancestor.
... Затем произошло расширение генетическо го кода до 16 кодонов (GGC, GGG, GCC, GCG, GAC, GAG, GUC, GUG, CUC, GUG, CCC, CCG, CAC, CAG, CGC и CGG), кодировавших 10 аминокислот (gly, ala, asp, val, glu, leu, pro, his, gln и arg) [119]. Порядок появления аминокис лот в пребиотическом GADV мире примерно совпадает с установленным Trifonov [120] эво люционным порядком образования аминокис лот, основанном на 60 различных критериях. ...
Article
Происхождение генетического кода и системы трансляции, возможно, является центральной и самой трудной проблемой в изучении происхождения жизни и одной из самых трудных во всей эволюционной биологии. Существует большое количество гипотез возникновения и развития современных генетических систем, затрагивающих происхождение и раннюю эволюцию генетического кода, а также возникновение репликации и трансляции. Наиболее широко известные гипотезы рассмотрены в данном обзоре. Однако ни одна из этих гипотез не описывает без пробелов и допущений все этапы ранней эволюции генетических систем. Гипотеза РНК-мира является главенствующей на сегодняшний день научной идеей о ранней эволюции биологических и пребиологических объектов. Главное её преимущество заключается в том, что она предлагает в качестве первых живых систем РНК как самодостаточные, с точки зрения воспроизведения, молекулы, которые способны функционировать как каталитический компонент системы и в то же время – как матричный. Однако есть и существенные недостатки. В частности, до сих пор не открыта и не получена экспериментально рибозимная процессивная полимераза. Учитывая взаимную потребность белков и нуклеиновых кислот в современном мире, многие авторы предлагают сценарии ранней эволюции на основе коэволюции этих двух классов органических молекул. Подобные гипотезы постулируют, что для репликации нуклеиновых кислот было необходимо возникновение трансляции, в отличие от мира РНК, где появлению трансляции предшествовала эра самореплицирующихся РНК. И хотя такие сценарии менее экономичны, с эволюционной точки зрения, так как требуют одномоментного появления и эволюции сразу двух классов органических молекул, а также синхронизации по времени появления репликации и трансляции, большим их преимуществом является то, что они предлагают развитие сразу гораздо более точной и процессивной белковой репликации.
... The recruitment order of the 20 amino acids from No.1 to No.20 can be obtained by the roadmap (Figures 3a and 9), which meets the basic requirement that Phase I amino acids appeared earlier than the Phase II amino acids [1,2]. The species with complete genome sequences are sorted by the order R 10/10 according to their amino acid frequencies, where the order R 10/10 is defined as the ratio of the average amino acid frequencies for the last 10 amino acids to that for the first 10 amino acids [8,36,[63][64][65]. Along the evolutionary direction indicated by the increasing R 10/10 , the amino acid frequencies vary in different monotonous manners for the 20 amino acids, respectively (Figure 9). ...
Article
Full-text available
Nirenberg’s genetic code chart shows a profound correspondence between codons and amino acids. The aim of this article is to try to explain the primordial formation of the codon degeneracy. It remains a puzzle how informative molecules arose from the supposed prebiotic random sequences. If introducing an initial driving force based on the relative stabilities of triplex base pairs, the prebiotic sequence evolution became innately nonrandom. Thus, the primordial assignment of the 64 codons to the 20 amino acids has been explained in detail according to base substitutions during the coevolution of tRNAs with aaRSs; meanwhile, the classification of aaRSs has also been explained.
... It is interesting to notice that Trp was one of the latest amino acids incorporated into the genetic code. While different contradicting theories were developed to rationalise this peculiar event, its late occurrence has drawn unanimous consensus (Davis 2002;Trifonov 2004;Wong 2005;José et al. 2009José et al. , 2011Palacios-Pérez and José 2019). Albeit a direct correlation between these hypotheses and our energetic results cannot be unequivocally drawn, it is striking that while for the majority of cases the α-amino acid form is the most stable among their isomers, Trp stands out being in the 55 th absolute position (although starting from 97,406 structures). ...
Article
Full-text available
The secular debate on the origin of life on our planet represents one of the open challenges for the scientific community. In this endeavour, chemistry has a pivotal role in disclosing novel scenarios that allow us to understand how the formation of simple organic molecules would be possible in the early primitive geological ages of Earth. Amino acids play a crucial role in biological processes. They are known to be formed in experiments simulating primitive conditions and were found in meteoric samples retrieved throughout the years. Understanding their formation is a key step for prebiotic chemistry. Following this reasoning, we performed a computational investigation over 100′000 structural isomers of natural amino acids. The results we have found suggest that natural amino acids are among the most thermodynamically stable structures and, therefore, one of the most probable ones to be synthesised among their possible isomers.
... Eck and Dayhoff found for example, that the earliest four amino acids to appear were Ala, Asp, Ser, and Gly (Eck and Dayhoff 1966), and based on multiple criteria, the order of appearance of the amino acids in life is now accepted to be (Trifonov 2004): ...
Preprint
Full-text available
Aromatic residues appeared relatively late in the evolution of protein sequences. They stabilize the hydrophobic core of globular proteins and are typically absent from intrinsically disordered regions (IDRs). However, recent advances in protein liquid-liquid phase separation (LLPS) studies have shown that aromatic residues in IDRs often act as important “stickers”, promoting multivalent interactions and the formation of higher-order oligomers. To reconcile this apparent contradiction, we compared levels of sequence disorder in RNA binding proteins and the human proteome and found that aromatic residues appear more frequently than expected in the IDRs of RNA binding proteins, which are often found to undergo LLPS. Phylogenetic analysis shows that aromatic residues are highly conserved among chordates, highlighting their importance in LLPS-driven functional assembly. These results suggest therefore that aromatic residues have contributed twice to evolution: in stabilizing structured proteins and in the assembly of biomolecular condensates.
... These four amino acids are a recurrent theme in early evolution. They are consistently identified as the most ancient amino acids by different approaches to the origin of the genetic code (Trifonov 2004;Copley, Smith and Morowitz 2005;Wong 2005); they were recently shown to form without enzymes from the corresponding ketoacids using only Fe 2+ as the catalyst and hydroxylamine as the amino donor (Muchowska, Varma and Moran 2019); they are the first acids that should arise in prebiotic metabolism if metabolism started out via the acetyl-CoA pathway (Preiner et al. 2020) generating the reductive amination products of acetate, pyruvate, oxaloacetate and α-ketoglutarate (Muchowska, Varma and Moran 2019) and they are central biosynthetic precursors of nucleic acid bases (Martin and Russell 2007). The accumulation in the cell wall of ancient amino acids that are very central to metabolism is consistent with an ancient nature of non-ribosomal peptide bond formation. ...
Article
Full-text available
Bacteria near-universally contain a cell wall sacculus of murein (peptidoglycan), the synthesis of which has been intensively studied for over 50 years. In striking contrast, archaeal species possess a variety of other cell wall types, none of them closely resembling murein. Interestingly though, one type of archaeal cell wall termed pseudomurein found in the methanogen orders Methanobacteriales and Methanopyrales is a structural analogue of murein in that it contains a glycan backbone that is cross-linked by a L-amino acid peptide. Here, we present taxonomic distribution, gene cluster and phylogenetic analyses that confirm orthologues of 13 bacterial murein biosynthesis enzymes in pseudomurein-containing methanogens, most of which are distantly related to their bacterial counterparts. We also present the first structure of an archaeal pseudomurein peptide ligase from Methanothermus fervidus DSM1088 (Mfer336) to a resolution of 2.5 Å and show that it possesses a similar overall tertiary three domain structure to bacterial MurC and MurD type murein peptide ligases. Taken together the data strongly indicate that murein and pseudomurein biosynthetic pathways share a common evolutionary history.
... Moreover, cysteine was found in several eukaryotic species analyzed in this study but primates and marine mammals. Inclusion of this amino-acid is considered to be a later addition to the genetic code (Trifonov, 2004), which appears to be under a firm evolutionary pressure. The lack of this residue in primate and marine mammal HO-1 may also be explained by the nature of the side chain of the cysteine, which can react in stress conditions that leads to increased oxidant levels (Shenton and Grant, 2003) such as the presence of pathogens; thus, a stress-inducible protein like HO-1 could benefit from the lack of an oxidant-sensitive amino-acid. ...
Article
Full-text available
Cetacea is a clade well-adapted to the aquatic lifestyle, with diverse adaptations and physiological responses, as well as a robust antioxidant defense system. Serious injuries caused by boats and fishing nets are common in bottlenose dolphins (Tursiops truncatus); however, these animals do not show signs of serious infections. Evidence suggests an adaptive response to tissue damage and associated infections in cetaceans. Heme oxygenase (HO) is a cytoprotective protein that participates in the anti-inflammatory response. HO catalyzes the first step in the oxidative degradation of the heme group. Various stimuli, including inflammatory mediators, regulate the inducible HO-1 isoform. This study aims to characterize HO-1 of the bottlenose dolphin in silico and compare its structure to the terrestrial mammal protein. Upstream HO-1 sequence of the bottlenose dolphin was obtained from NCBI and Ensemble databases, and the gene structure was determined using bioinformatics tools. Five exons and four introns were identified, and proximal regulatory elements were detected in the upstream region. The presence of 10 α-helices, three 3 10 helices, the heme group lodged between the proximal and distal helices, and a histidine-25 in the proximal helix serving as a ligand to the heme group were inferred for T. truncatus. Amino acid sequence alignment suggests HO-1 is a conserved protein. The HO-1 "fingerprint" and histidine-25 appear to be fully conserved among all species analyzed. Evidence of positive selection within an α-helix configuration without changes in protein configuration and evidence of purifying selection were found, indicating evolutionary conservation of the coding sequence structure.
... THE HYPOTHESIS OF TWO ALPHABETS. Eduard Trifonov (2004;) presents the extended version of the two-letter theory. His main emphasis was made on the rules of codon transformation. ...
Article
Full-text available
The code is meaningless unless translated. (Monod 1971, 143) We address issues of a description of the origin and evolution of the genetic code from the semiotics standpoint. Developing the concept of codepoiesis introduced by M. Barbieri, a new idea of semio-poiesis is proposed. Semio-poiesis, a recursive auto-referential processing of a semiotic system, becomes a form of organization of the bio-world when and while notions of meaning and aiming are introduced into it. The description of the genetic code as a semiotic system (grammar and vocabulary) allows us to apply the method of internal reconstruction to it: on the basis of heterogeneity and irregularity of the current state, to explicate possible previous states and various ways of forming coding and textualization mechanisms. The revealed patterns and irregularities are consistent with hypotheses about the origin and evolution of the genetic code.
... Methionine is thought to have been introduced late in the genetic code, just before the last introduced amino acid, tryptophan. 112 Does this mean that the introduction of a start codon / start amino acid pair appeared only lately in the chemical evolution process? Or, was this role assumed by another codon/amino acid pair? ...
Article
Two sulfur-containing amino acids are included in the list of the 20 classical protein amino acids. A methionine residue is introduced at the start of the synthesis of all current proteins. Cysteine, thanks to its thiol function, plays an essential role in a very large number of catalytic sites. Here we present what is known about the prebiotic synthesis of these two amino acids and homocysteine, and we discuss their introduction into primitive peptides and more elaborate proteins. 1 Introduction 2 Sulfur Sources 3 Prebiotic Synthesis of Cysteine 4 Prebiotic Synthesis of Methionine 5 Homocysteine and Its Thiolactone 6 Methionine and Cystine in Proteins 7 Prebiotic Scenarios Using Sulfur Amino Acids 8 Introduction of Cys and Met in the Genetic Code 9 Conclusion
... 48−52 These octamers contain several conformers of zwitterionic Ser molecules, while the crystalline form of solid Ser is built upon a single zwitterionic conformer. 53−56 Furthermore, Ser is assumed to have been among the first amino acids recruited by life 57,58 and is therefore of significant interest for biochemical evolution and for the origin of homochirality of life. 59 The direct involvement of Ser 8 or its protonated form Ser 8 H + as a promoter of chiral amplification has been discussed in the literature. ...
Article
Structural changes at the molecular level, occurring at the onset of condensation, can be probed by angle-resolved valence photoelectron spectroscopy, which is inherently sensitive to the electronic structure. For larger condensed systems like aerosol particles, the observation of intrinsic angular anisotropies in photoemission (β parameters) is challenging due to the strong reduction of their magnitude by electron transport effects. Here, we use a less common, more sensitive observable in the form of the chiral asymmetry parameter to perform a comparative study of the VUV photoelectron spectroscopy and photoelectron circular dichroism (PECD) between pure gas phase enantiomers of the amino acid serine and their corresponding homochiral nanoparticles. We observe a relatively large (1%) and strongly kinetic energy-dependent asymmetry, discussed in terms of the emergence of local order and conformational changes potentially counterbalancing the loss of angular information due to electron transport scattering. This demonstrates the potential of PECD as a sensitive probe of the condensation effects from the gas phase to bulk-like chiral aerosol particles surpassing the potential of conventional photoemission observables such as β parameters.
... The modern 20 amino acid alphabet originated from a much smaller early genetic code of only a few prebiotic amino acids. A wealth of information established a likely chronological order of amino acids entering the genetic code [14]. An extensive study by Newton et al. explored the solubility and secondary structure content of random proteins made from the likely earliest 5,9,16, and the modern 20 amino acids [15 ]. ...
Article
Natural proteins are the result of billions of years of evolution. The earliest predecessors of today’s proteins are believed to have emerged from random polypeptides. While we have no means to determine how this process exactly happened, there is great interest in understanding how it reasonably could have happened. We are reviewing how researchers have utilized in vitro selection and molecular evolution methods to investigate plausible scenarios for the emergence of early functional proteins. The studies range from analyzing general properties and structural features of unevolved random polypeptides to isolating de novo proteins with specific functions from synthetic randomized sequence libraries or generating novel proteins by combining evolution with rational design. While the results are exciting, more work is needed to fully unravel the mechanisms that seeded protein-dominated biology.
... The level of sC/sAC occurrence in the acceptor-TΨC arm is highly correlated with the antiquity of the corresponding aa. Eight out of the nine conserving aa (Table 2), that is, all except His, are listed among the 10 earliest appearing aa, according to a consensus chronology built on 60 criteria [33], which is compatible with the results of the Miller-Urey experiment [34]. In particular, the robustly conserving aa Ala, Asp, Gly, Pro, and Ser are among the six most ancient aa. ...
Article
Full-text available
The mechanism and evolution of the recognition scheme between key components of the translation system, i.e., tRNAs, synthetases and elongation factors, are fundamental issues in understanding the translation of genetic information into proteins. Statistical analysis of bacterial tRNA sequences reveals that for six amino acids, a string of 10 nucleotides preceding the tRNA 3’end, carries cognate coding triplets to nearly full extent. The triplets conserved in positions 63‐67 are implicated in the recognition by the elongation factor EF‐Tu, and those conserved in positions 68‐72, in the identification of cognate tRNAs and their derived minihelices by class IIa synthetases. These coding triplets are suggested to have primordial origin, being engaged in aminoacylation of prebiotic tRNAs and in the establishment of the canonical codon set.
... NAN amino acids are both prominent in Miller discharge (Miller 1987) experiments (Asp, Glu) and also absent (e.g., His; Higgs and Pudritz 2009). Consensus primordial amino acid lists, consulting 60 (!) chemical criteria include them, and also do not (Trifonov 2004). They are mixed in assignment to the two classes of aminoacyl-tRNA synthetases (Wetzel 1995). ...
Article
Full-text available
Wobble coding is inevitable during evolution of the Standard Genetic Code (SGC). It ultimately splits half of NN U/C/A/G coding boxes with different assignments. Further, it contributes to pervasive SGC order by reinforcing close spacing for identical SGC assignments. But wobble cannot appear too soon, or it will inhibit encoding and more decisively, obstruct evolution of full coding tables. However, these prior results assumed Crick wobble, NN U/C and NN A/G, read by a single adaptor RNA. Superwobble translates NN U/C/A/G codons, using one adaptor RNA with an unmodified 5′ anticodon U (appropriate to earliest coding) in modern mitochondria, plastids, and mycoplasma. Assuming the SGC was selected when evolving codes most resembled it, characteristics of the critical selection events can be calculated. For example, continuous superwobble infrequently evolves SGC-like coding tables. So, continuous superwobble is a very improbable origin hypothesis. In contrast, late-arising superwobble shares late Crick wobble’s frequent resemblance to SGC order. Thus late superwobble is possible, but yields SGC-like assignments less frequently than late Crick wobble. Ancient coding ambiguity, most simply, arose from Crick wobble alone. This is consistent with SGC assignments to NAN codons.
Chapter
Life's origin is an enigma. Mankind has been pondering as to how it all began for millennia, yet are we any closer to uncovering the answer to this enigma? It would seem not, but we are slowly and surely edging towards discovery of the processes and mechanics by which life emerged on Earth. There are more than a couple of dozen hypotheses which claim to have the answer, but in reality, there is no absolute front runner. We have categorised these hypotheses under the following four banners: metabolism, genetic, proteins and vesicles first. In this chapter we strive to demonstrate how they conflict with one another and to this effect we have brought into focus both the top‐down and bottom‐up approaches to the question of the origin of life in general, as well as answering the question as to which came first, chemolithoautotrophs and photolithoautotrophs? In addition, the part played by viruses (in particular the RNA ones) during the origin of life is addressed.
Chapter
There are many Archaea with flat, polygonal shapes, resembling shaped droplets in form. We hypothesized that the transition from abiogenesis to a living cell started with shaped droplets. We begin with proteins that have a transmembrane domain, such as the S layer, which is a common feature of Archaea and Bacteria. We show how the anchor transmembrane domain may have been hydrophobically “selected” from random mixed chirality peptides by the cell membrane itself when it was thinner, possibly produced from short meteorite amphiphiles. These peptides may have been the first proteins, formed mostly from three “core” amino acids that we deduce from t‐RNA phylogeny: alanine, glycine, and isoleucine. They may have formed by PCR‐like wet/dry cycles along with RNA, which may have protected them from hydrolysis as they grew, causing the membrane to thicken to its present thickness, a positive feedback, abiotic process. The possibly daily flattening and swelling of vesicles as the temperature changed at night, with 10 11 day/night cycles available, might have aided the PCR‐like activity. Thus, the peptide/protein world may have been simultaneous with the RNA world. The juxtaposition of peptides with RNA may have led to the genetic code. Interactions between membrane‐selected peptides and amphiphiles may have led to chirality of both.
Article
Full-text available
Simple Summary The relative rates of amino acid substitution over evolutionary time reflect the chemical properties of amino acids. Substitutions that result in an amino acid similar to an ancestral residue accumulate more rapidly than those resulting in a dissimilar amino acid. The substitution rates for each amino acid pair are the parameters in models of evolutionary change for proteins. Although the best-fitting model of protein evolution is known to differ among taxa, a comprehensive picture of model changes across the tree of life is not available. In principle, models of protein change might reflect evolutionary history (i.e., closely related taxa have similar models) or the environment (i.e., taxa living in similar environments have similar models). We estimated models of amino acid evolution for organisms across the tree of life, finding evidence that history and the environment have both contributed to model differences. Bacterial models differed from archaeal and eukaryotic models. Models for Halobacteriaceae (archaea that live in highly saline environments) and Thermoprotei (a group of thermophilic archaea) were found to be very distinctive. The rates of substitution for pairs of aromatic amino acids were especially variable. Overall, these results paint a picture of the “evolutionary model space” for proteins across the tree of life. Abstract The factors that determine the relative rates of amino acid substitution during protein evolution are complex and known to vary among taxa. We estimated relative exchangeabilities for pairs of amino acids from clades spread across the tree of life and assessed the historical signal in the distances among these clade-specific models. We separately trained these models on collections of arbitrarily selected protein alignments and on ribosomal protein alignments. In both cases, we found a clear separation between the models trained using multiple sequence alignments from bacterial clades and the models trained on archaeal and eukaryotic data. We assessed the predictive power of our novel clade-specific models of sequence evolution by asking whether fit to the models could be used to identify the source of multiple sequence alignments. Model fit was generally able to correctly classify protein alignments at the level of domain (bacterial versus archaeal), but the accuracy of classification at finer scales was much lower. The only exceptions to this were the relatively high classification accuracy for two archaeal lineages: Halobacteriaceae and Thermoprotei. Genomic GC content had a modest impact on relative exchangeabilities despite having a large impact on amino acid frequencies. Relative exchangeabilities involving aromatic residues exhibited the largest differences among models. There were a small number of exchangeabilities that exhibited large differences in comparisons among major clades and between generalized models and ribosomal protein models. Taken as a whole, these results reveal that a small number of relative exchangeabilities are responsible for much of the structure of the “model space” for protein sequence evolution. The clade-specific models we generated may be useful tools for protein phylogenetics, and the structure of evolutionary model space that they revealed has implications for phylogenomic inference across the tree of life.
Preprint
Full-text available
The central dogma of molecular biology dictates that, with only a few exceptions, information proceeds from DNA to protein through an RNA intermediate. Examining the enigmatic steps from prebiotic to biological chemistry, we take another road suggesting that primordial peptides acted as template for the self-assembly of the first nucleic acids polymers. Arguing in favour of a sort of archaic “reverse translation” from proteins to RNA, our basic premise is a Hadean Hearth where key biomolecules such as amino acids, polypeptides, purines, pyrimidines, nucleosides and nucleotides were available under different prebiotically plausible conditions, including meteorites delivery, shallow ponds and hydrothermal vents scenarios. Supporting a protein-first scenario alternative to the RNA world hypothesis, we propose the primeval occurrence of short peptides termed “selective amino acid- and nucleotide-matching oligopeptides” (henceforward SANMAOs) that noncovalently bind at the same time the polymerized amino acids and the single nucleotides dispersed in the prebiotic milieu. We describe the chemical features of this hypothetical oligopeptide, its biological plausibility and its virtues from an evolutionary perspective. We provide a theoretical example of SANMAO’s selective pairing between amino acids and nucleosides, simulating a poly-Glycine peptide that acts as a template to build a purinic chain corresponding to the glycine’s extant triplet codon GGG. Further, we discuss how SANMAO might have endorsed the formation of low-fidelity RNA’s polymerized strains, well before the appearance of the accurate genetic material’s transmission ensured by the current translation apparatus.
Preprint
The DNA sequences available in the prebiotic era were the genomic building blocks of the first life forms on Earth and have therefore been a matter of intense debate. 1,2 On the surface of the Early Earth, ultraviolet (UV) light is a key energy source ³ , which is known to damage nucleic acids ⁴ . However, a systematic study of the sequence selectivity upon UV exposure under Early Earth conditions is still missing. In this work, we quantify the UV stability of all possible canonical DNA sequences and derive information on codon appearance under UV irradiation as selection pressure. We irradiate a model system of random 8mers at 266 nm and determine its UV stability via next-generation sequencing. As a result, we obtain the formation rates of the dominant dimer lesions as a function of their neighboring sequences and find a strong sequence selectivity. On the basis of our experimental results, we simulate the photodamage of short proto-genomes of 150 bases length by a Monte Carlo approach. Our results strongly argue for UV compatibility of early life and allow the ranking of codon evolutionary models with respect to their UV resistance.
Chapter
How life originated from the inanimate mixture of organic and inorganic compounds on the priomordial earth remains one of the great unknowns in science. This origin of life, or abiogenesis, continues to be examined in the context of the conditions and materials required for natural life to have begun on Earth both theoretically and experimentally. This book provides a broad but in-depth analysis of the latest discoveries in prebiotic chemsitry from the microscopic to the macroscopic scale; utilising experimental insight to provide a bottom up approach to plausibly explaining how life arose. With contributions from global leaders, this book is an ideal reference for postgraduate students and a single source of comprehensive information on the latest technical and theoretical advancements for researchers in a variety of fields from astrochemistry and astrophysics to organic chemistry and evolution.
Article
Aromatic residues appeared relatively late in the evolution of protein sequences to stabilize the globular proteins' folding core and are less in the intrinsically disordered regions (IDRs). Recent advances in protein liquid–liquid phase separation (LLPS) studies have also shown that aromatic residues in IDRs often act as “stickers” to promote multivalent interactions in forming higher‐order oligomers. To study how general these structure‐promoting residues are in IDRs, we compared levels of sequence disorder in RNA binding proteins (RBPs), which are often found to undergo LLPS, and the human proteome. We found that aromatic residues appear more frequently than expected in the IDRs of RBPs and, through multiple sequence alignment analysis, those aromatic residues are often conserved among chordates. Using TDP‐43, FUS, and some other well‐studied LLPS proteins as examples, the conserved aromatic residues are important to their LLPS‐related functions. These analyses suggest that aromatic residues may have contributed twice to evolution: stabilizing structured proteins and assembling biomolecular condensates.
Article
Natural selection of specific protobiomonomers during abiogenic development of the prototype genetic code is hindered by the diversity of structural, spatial, and rotational isomers that have identical elemental composition and molecular mass (M), but can vary significantly in their physicochemical characteristics, such as the melting temperature Tm, the Tm:M ratio, and the solubility in water, due to different positions of atoms in the molecule. These parameters differ between cis- and trans-isomers of dicarboxylic acids, spatial monosaccharide isomers, and structural isomers of α-, β-, and γ-amino acids. The stable planar heterocyclic molecules of the major nucleobases comprise four (C, H, N, O) or three (C, H, N) elements and contain a single -C=C bond and two nitrogen atoms in each heterocycle involved in C-N and C=N bonds. They exist as isomeric resonance hybrids of single and double bonds and as a mixture of tautomer forms due to the presence of -C=O and/or -NH2 side groups. They are thermostable, insoluble in water, and exhibit solid-state stability, which is of central importance for DNA molecules as carriers of genetic information. In M-Tm diagrams, proteinogenic amino acids and the corresponding codons are distributed fairly regularly relative to the distinct clusters of purine and pyrimidine bases, reflecting the correspondence between codons and amino acids that was established in different periods of genetic code development. The body of data on the evolution of the genetic code system indicates that the elemental composition and molecular structure of protobiomonomers, and their M, Tm, photostability, and aqueous solubility determined their selection in the emergence of the standard genetic code.
Article
Mechanical and thermal activation enable the solvent-free oligomerization of inactivated amino acids in the presence of TiO2 to form peptides. Our observations validate the usefulness of mechanochemistry to simulate and study prebiotic chemistry in an impact scenario where dry land and mineral surfaces are critical for the concentration and formation of organic building blocks. Abstract The presence of amino acids on the prebiotic Earth, either stemming from endogenous chemical routes or delivered by meteorites, is consensually accepted. Prebiotically plausible pathways to peptides from inactivated amino acids are still unclear as most oligomerization approaches rely on thermodynamically disfavored reactions in solution. Now, a combination of prebiotically plausible minerals and mechanochemical activation enables the oligomerization of glycine at ambient temperature in the absence of water. Raising the reaction temperature increases the degree of oligomerization concomitantly with the formation of a commonly unwanted cyclic glycine dimer (DKP). However, DKP is a productive intermediate in the mechanochemical oligomerization of glycine. The findings of this research show that mechanochemical peptide bond formation is a dynamic process that provides alternative routes towards oligopeptides and establishes new synthetic approaches for prebiotic chemistry.
Article
The presence of amino acids on the prebiotic Earth, either stemming from endogenous chemical routes or delivered by meteorites, is consensually accepted. In contrast, prebiotically plausible pathways to achieve peptides from inactivated amino acids are still unclear since most oligomerization approaches rely on thermodynamically disfavored reactions in solution. Here, we show that the combination of prebiotically plausible minerals and mechanochemical activation enables the oligomerization of glycine at ambient temperature in the absence of water. Raising the reaction temperature increases the degree of oligomerization concomitantly with the formation of a commonly unwanted cyclic glycine dimer (DKP). However, we show here that DKP is a productive intermediate in the mechanochemical oligomerization of glycine. The findings of this research show that mechanochemical peptide bond formation is a dynamic process that provides alternative routes towards oligopeptides and establishes new synthetic approaches for prebiotic chemistry.
Article
Inouye et al. (2020) use the observation that Ser is coded in the genetic code by two blocks of codons that differ on more than one base to understand some aspects of the origin of the genetic code organization. I argue instead that this observation per se cannot be used to understand any aspect of the origin of the genetic code, unless it is accompanied by other assumptions concerning in the specific case: (i) the ancestrality of some amino acids, (ii) the hypothesis that the first mRNA to be translated was poly-G, which can be translated into poly-Gly, and (iii) an evolutionary mechanism for the genetic code origin based on the duplication of tRNAs. However, both the tRNA duplication mechanism and the existence of poly-G as the first mRNA to be translated are not corroborated as mechanisms through which the genetic code would have been structured. For example, the origin of the actual mRNA should have been preceded by the evolution of a proto-mRNA which evidently already coded for more than one amino acid. Therefore, when it evolved from proto-mRNA, the mRNA should already have coded for more than one amino acid. In other words, poly-G as mRNA would most likely never have existed because the first mRNAs already had to code for more than one amino acid. On the contrary, all these assumptions would have been operational if the observations of Inouye et al. (2020) had been discussed within the coevolution theory of the origin of the genetic code, which they do not.
Article
Full-text available
Comparative path lengths in amino acid biosynthesis and other molecular indicators of the timing of codon assignment were examined to reconstruct the main stages of code evolution. The codon tree obtained was rooted in the 4 N-fixing amino acids (Asp, Glu, Asn, Gln) and 16 triplets of the NAN set. This small, locally phased (commaless) code evidently arose from ambiguous translation on a poly(A) collector strand, in a surface reaction network. Copolymerisation of these amino acids yields polyanionic peptide chains, which could anchor uncharged amide residues to a positively charged mineral surface. From RNA virus structure and replication in vitro, the first genes seemed to be RNA segments spliced into tRNA. Expansion of the code reduced the risk of mutation to an unreadable codon. This step was conditional on initiation at the 5′-codon of a translated sequence. Incorporation of increasingly hydrophobic amino acids accompanied expansion. As codons of the NUN set were assigned most slowly, they received the most nonpolar amino acids. The origin of ferredoxin and Gln synthetase was traced to mid-expansion phase. Surface metabolism ceased by the end of code expansion, as cells bounded by a proteo-phospholipid membrane, with a protoATPase, had emerged. Incorporation of positively charged and aromatic amino acids followed. They entered the post-expansion code by codon capture. Synthesis of efficient enzymes with acid–base catalysis was then possible. Both types of aminoacyl-tRNA synthetases were attributed to this stage. tRNA sequence diversity and error rates in RNA replication indicate the code evolved within 20 million yr in the preIsuan era. These findings on the genetic code provide empirical evidence, from a contemporaneous source, that a surface reaction network, centred on C-fixing autocatalytic cycles, rapidly led to cellular life on Earth.
Article
Full-text available
Twelve nonprotein amino acids appear to be present in the Murchison meteorite. The identity of eight of them has been conclusively established as N-methylglycine, beta-alanine, 2-methylalanine, alpha-amino-n-butyric acid, beta-amino-n-butyric acid, gamma-amino-n-butyric acid, isovaline, and pipecolic acid. Tentative evidence is presented for the presence of N-methylalanine, N-ethylglycine, beta-aminoisobutyric acid, and norvaline. These amino acids appear to be extraterrestrial in origin and may provide new evidence for the hypothesis of chemical evolution.
Article
Full-text available
It is suggested that protein sythesis may have begun without even a primitive ribosome if the primitive tRNA could take up two configuration and could bind to the messenger RNA with five base-pairs instead of the present three. This idea would impose base sequence restriction on the early messages and on the early genetic code such that the first four amino acids coded were glycine, serine, aspartic acid and aspargine. A possible mechanism is suggested for the polymerization of the early message.
Article
Full-text available
The genetic code, formerly thought to be frozen, is now known to be in a state of evolution. This was first shown in 1979 by Barrell et al. (G. Barrell, A. T. Bankier, and J. Drouin, Nature [London] 282:189-194, 1979), who found that the universal codons AUA (isoleucine) and UGA (stop) coded for methionine and tryptophan, respectively, in human mitochondria. Subsequent studies have shown that UGA codes for tryptophan in Mycoplasma spp. and in all nonplant mitochondria that have been examined. Universal stop codons UAA and UAG code for glutamine in ciliated protozoa (except Euplotes octacarinatus) and in a green alga, Acetabularia. E. octacarinatus uses UAA for stop and UGA for cysteine. Candida species, which are yeasts, use CUG (leucine) for serine. Other departures from the universal code, all in nonplant mitochondria, are CUN (leucine) for threonine (in yeasts), AAA (lysine) for asparagine (in platyhelminths and echinoderms), UAA (stop) for tyrosine (in planaria), and AGR (arginine) for serine (in several animal orders) and for stop (in vertebrates). We propose that the changes are typically preceded by loss of a codon from all coding sequences in an organism or organelle, often as a result of directional mutation pressure, accompanied by loss of the tRNA that translates the codon. The codon reappears later by conversion of another codon and emergence of a tRNA that translates the reappeared codon with a different assignment. Changes in release factors also contribute to these revised assignments. We also discuss the use of UGA (stop) as a selenocysteine codon and the early history of the code.
Article
Full-text available
Pheromone 3 mRNA of the ciliate Euplotes octocarinatus contains three in-frame UGA codons that are translated as cysteines. This was revealed from cDNA sequencing and from plasma desorption mass spectrometry of cleaved pheromone 3 in connection with pyridylethylation of the fragments. N-terminal sequence analysis of carboxymethylated protein confirmed this conclusion for the first of the three UGA codons. Besides UGA the common cysteine codons UGU and UGC are also used to encode cysteine. UAA functions as a termination codon. No UAG codon was found. In connection with results reported for other ciliates, this suggests that the role of the classic termination codons had not yet been established when the ciliates started to diverge from other eukaryotes.
Article
Full-text available
Protein sequence alignments have become an important tool for molecular biologists. Local alignments are frequently constructed with the aid of a "substitution score matrix" that specifies a score for aligning each pair of amino acid residues. Over the years, many different substitution matrices have been proposed, based on a wide variety of rationales. Statistical results, however, demonstrate that any such matrix is implicitly a "log-odds" matrix, with a specific target distribution for aligned pairs of amino acid residues. In the light of information theory, it is possible to express the scores of a substitution matrix in bits and to see that different matrices are better adapted to different purposes. The most widely used matrix for protein sequence comparison has been the PAM-250 matrix. It is argued that for database searches the PAM-120 matrix generally is more appropriate, while for comparing two specific proteins with suspected homology the PAM-200 matrix is indicated. Examples discussed include the lipocalins, human alpha 1 B-glycoprotein, the cystic fibrosis transmembrane conductance regulator and the globins.
Article
Full-text available
Glutathione peroxidase (GSHPx) is an important selenium-containing enzyme which protects cells from peroxide damage and also has a role in leukotriene formation. We report the identification of a genomic recombinant as encoding the entire mouse GSHPx gene. Surprisingly, the selenocysteine in the active site of the enzyme is encoded by TGA: this has been confirmed by primer extension/dideoxy sequencing experiments using reticulocyte mRNA. The same site of transcription initiation is used in three tissues in which the GSHPx mRNA is expressed at high levels (erythroblast, liver and kidney). Like some other regulated 'house-keeping' genes, the GSHPx gene has Sp1 binding site consensus sequences but no 'ATA' and 'CAAT' consensus sequences upstream of the transcription initiation site. Moreover, there is a cluster of two Sp1 binding site consensus sequences and two SV40 core enhancer sequences in the 3' region of the gene, close to the previously mapped position of a DNase I-hypersensitive site found only in tissues expressing the GSHPx mRNA at high levels.
Article
Full-text available
The degeneracy rules of genetic code including the distribution of terminators have been deduced through the minimization of mutational deterioration (MD). The MD of a given group of codons is divided into three parts: transitional, transversional and wobble's. The averaged mutational deteriorations (AMD) of various amino acids have been proved in order of their degrees of irreplaceability.
Article
Full-text available
UGA is a nonsense or termination (opal) codon throughout prokaryotes and eukaryotes. However, mitochondria use not only UGG but also UGA as a tryptophan codon. Here, we show that UGA also codes for tryptophan in Mycoplasma capricolum, a wall-less bacterium having a genome only 20-25% the size of the Escherichia coli genome. This conclusion is based on the following evidence. First, the nucleotide sequence of the S3 and L16 ribosomal protein genes from M. capricolum includes UGA codons in the reading frames; they appear at positions corresponding to tryptophan in E. coli S3 and L16. Second, a tRNATrp gene and its product tRNA found in M. capricolum have the anticodon sequence 5' U-C-A 3', which can form a complementary base-pairing interaction with UGA.
Article
Full-text available
Analysis of an almost complete mammalian mitochondrial DNA sequence has identified 23 possible tRNA genes and we speculate here that these are sufficient to translate all the codons of the mitochondrial genetic code. This number is much smaller than the minimum of 31 required by the wobble hypothesis. For each of the eight genetic code boxes with four codons for one amino acid we find a single specific tRNA gene with T in the first (wobble) position of the anticodon. We suggest that these tRNAs with U in the wobble position can recognize all four codons in these genetic code boxes either by a "two out of three" base interaction or by U.N wobble.
Article
Full-text available
The most primitive code is assumed to be a GC code: GG coding for glycine, CC coding for proline, GC coding for alanine, CG coding for "arginine." The genetic code is assumed to have originated with the coupling of glycine to its anticodon CC mediated by a copper-montmorillonite. The polymerization of polyproline followed when it was coupled to its anticodon GG. In this case the aminoacyl-tRNA synthetase was a copper-montmorillonite. The first membrane is considered to be a beta sheet formed from polyglycine. As the code grew more complicated, the alternative hydrophobic-hydrophilic polypeptide (alanine-"arginine") was coded for by the alternating CG copolymer. This alternating polypeptide (ala-"arg") began to function as both a primitive membrane and as an aminoacyl-tRNA synthetase. The evolution of protein structure is tightly coupled to the evolution of the membrane. The alpha helix was evolved as lipids became part of the structure of biological membranes. The membrane finally became the fluid mosaic structure that is now universal.
Article
Full-text available
Two ideas have essentially been used to explain the origin of the genetic code: Crick's frozen accident and Woese's amino acid-codon specific chemical interaction. Whatever the origin and codon-amino acid correlation, it is difficult to imagine the sudden appearance of the genetic code in its present form of 64 codons coding for 20 amino acids without appealing to some evolutionary process. On the contrary, it is more reasonable to assume that it evolved from a much simpler initial state in which a few triplets were coding for each of a small number of amino acids. Analysis of genetic code through information theory and the metabolism of pyrimidine biosynthesis provide evidence that suggests that the genetic code could have begun in an RNA world with the two letters A and U grouped in eight triplets coding for seven amino acids and one stop signal. This code could have progressively evolved by making gradual use of letters G and C to end with 64 triplets coding for 20 amino acids and three stop signals. According to proposed evidence, DNA could have appeared after the four-letter structure was already achieved. In the newborn DNA world, T substituted U to get higher physicochemical and genetic stability.
Article
Full-text available
DNA sequences of the complete cytochrome b gene are shown to contain robust phylogenetic signal for the strepsirrhine primates (i.e., lemurs and lorises). The phylogeny derived from these data conforms to other molecular studies of strepsirrhine relationships despite the fact that uncorrected nucleotide distances are high for nearly all intrastrepsirrhine comparisons, with most in the 15%-20% range. Cytochrome b sequences support the hypothesis that Malagasy lemuriforms and Afro-Asian lorisiforms each comprise clades that share a sister-group relationship. A study (Adkins and Honeycutt 1994) of the cytochrome c oxidase subunit II (COII) gene placed one Malagasy primate (Daubentonia) at the base of the strepsirrhine clade, thereby suggesting a diphyletic Lemuriformes. The reanalysis of COII third-position transversions, either alone or in combination with cytochrome b third-position transversions, however, yields a tree that is congruent with phylogenetic hypotheses derived from cytochrome b and other genetic data sets.
Article
The genetic code, formerly thought to be frozen, is now known to be in a state of evolution. This was first shown in 1979 by Barrell et al. (G. Barrell, A. T. Bankier, and J. Drouin, Nature [London] 282:189-194, 1979), who found that the universal codons AUA (isoleucine) and UGA (stop) coded for methionine and tryptophan, respectively, in human mitochondria. Subsequent studies have shown that UGA codes for tryptophan in Mycoplasma spp. and in all nonplant mitochondria that have been examined. Universal stop codons UAA and UAG code for glutamine in ciliated protozoa (except Euplotes octacarinatus) and in a green alga, Acetabularia. E. octacarinatus uses UAA for stop and UGA for cysteine. Candida species, which are yeasts, use CUG (leucine) for serine. Other departures from the universal code, all in nonplant mitochondria, are CUN (leucine) for threonine (in yeasts), AAA (lysine) for asparagine (in platyhelminths and echinoderms), UAA (stop) for tyrosine (in planaria), and AGR (arginine) for serine (in several animal orders) and for stop (in vertebrates). We propose that the changes are typically preceded by loss of a codon from all coding sequences in an organism or organelle, often as a result of directional mutation pressure, accompanied by loss of the tRNA that translates the codon. The codon reappears later by conversion of another codon and emergence of a tRNA that translates the reappeared codon with a different assignment. Changes in release factors also contribute to these revised assignments. We also discuss the use of UGA (stop) as a selenocysteine codon and the early history of the code.
Article
Before enzymes and templates: theory of surface metabolism.
Book
1. Origins of Life's Ingredients.- 1.1. The Setting.- 1.2. Prebiotic Syntheses (Stage I).- 1.3. Prebiotic Polymerization (Stage II).- 1.4. Summary.- 2. The Precellular, or Simple Interacting Systems, Level (Stage III).- 2.1. Synthetic Models of Protobionts.- 2.2. Autocatalysis.- 2.3. The Present Status of the Life Origins Problem-A Critical Assessment.- 3. The Genetic Mechanism: I. DNA, Nucleoids, and Chromatin.- 3.1. Introduction.- 3.2. The Focal Ingredients.- 3.3. The Key Macromolecule.- 3.4. Replication of DNA.- 3.5. Chromatin and the Chromosome.- 4. The Genetic Mechanism: II. the Cell's Employment of DNA.- 4.1. The Types of Ribonucleic Acid.- 4.2. Translation and Protein Synthesis.- 5. The Genetic Mechanism: III. Transcription, Processing, and an Analytical Synopsis.- 5.1. Transcription of the DNA Molecule.- 5.2. An Alternative Protein-Synthesizing System.- 5.3. An Annotated Synopsis-Summary and Analysis.- 6. Micromolecular Evolution-The Origin of the Genetic Code.- 6.1. Conceptual Approaches.- 6.2. Mathematical Concepts.- 6.3. Biochemical Approaches.- 6.4. A Biological Concept.- 7. The Transfer Ribonucleic Acids.- 7.1. The Characteristic Molecular Features of tRNAs.- 7.2. Codon-Anticodon Interactions.- 7.3. Summary of tRNA Structural Features.- 8. Reactive Sites and the Evolution of Transfer RNAs.- 8.1. Reactive Sites of tRNAs.- 8.2. Evolutionary Relations of tRNAs.- 8.3. Origin and Evolution of tRNA.- 9. The Genetic Mechanism of Viruses.- DNA Viruses.- 9.1. Double-Stranded DNA Viruses.- 9.2. Single-Stranded DNA Viruses.- RNA Viruses.- 9.3. Single-Stranded RNA Viruses.- 9.4. Double-Stranded RNA Viruses.- Proteinaceous Viruses.- Summary and Conclusions.- 10. The Origin of Early Life.- 10.1. A Preliminary Definition of Life.- 10.2. The Distinctive Characteristics of Viruses.- 10.3. Possible Steps in the Origins of Early Life.- References.
Article
The proposed model for a 'realistic hypercycle' is closely associated with the molecular organization of a primitive replication and translation apparatus. Hypercyclic organization offers selective stabilization and evolutive adaptation for all geno- and phenotypic constituents of the functionally linked ensemble. It originates in a molecular quasi-species and evolves by way of mutation and gene duplication to greater complexity. Its early structure appears to be reflected in: the assignment of codons to amino acids, in sequence homologies of tRNAs, in dual enzymic functions of replication and translation, and in the structural and functional organization of the genome of the prokaryotic cell.
Article
Pyrrolysine is a lysine derivative encoded by the UAG codon in methylamine methyltransferase genes of Methanosarcina barkeri. Near a methyltransferase gene cluster is thepylT gene, which encodes an unusual transfer RNA (tRNA) with a CUA anticodon. The adjacent pylS gene encodes a class II aminoacyl-tRNA synthetase that charges the pylT-derived tRNA with lysine but is not closely related to known lysyl-tRNA synthetases. Homologs of pylS and pylT are found in a Gram-positive bacterium. Charging a tRNACUA with lysine is a likely first step in translating UAG amber codons as pyrrolysine in certain methanogens. Our results indicate that pyrrolysine is the 22nd genetically encoded natural amino acid.
Article
Fundamental questions regarding the structure of the genetic code and origin of proteinous amino acids can be resolved through an understanding of the process by which the code evolved to accommodate increased variety of encoded amino acids.
Article
The fact that proteins contain onlya-amino acids and that protein structure is determined by 3 5 linked ribonucleotides is postulated to be the result of the copolymerization of these molecules in the prebiotic environment. Ribonucleotides therefore represent partial degradation products and proteins represent a side reaction developing from copolymerization. The basic structural unit of copolymerization is a nucleotide substituted with an amino acid at the 2 position. Characteristics of modern amino and ribonucleic acid structure are all consistent with and necessary for this hypothesis. The characteristics and individual base assignments of the code also provide strong support for origin from the postulated copolymers. All characteristics of the code can be accounted for by this single hypothesis.
Article
Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improvements in alignments and in searches using queries from each of the groups.
Article
The fact that proteins contain only alpha-amino acids and that protein structure is determined by 3' leads to 5' linked ribonucleotides is postulated to be the result of the copolymerization of these molecules in the prebiotic environment. Ribonucleotides therefore represent partial degradation products and proteins represent a side reaction developing from copolymerization. The basic structural unit of copolymerization is a nucleotide substituted with an amino acid at the 2' postion. Characteristics of modern amino and ribonucleic acid structure are all consistent with and necessary for this hypothesis. The characteristics and individual base assignemnts of the code also provide strong support for origin from the postulated copolymers. All characteristics of the code can be accounted for by this single hypothesis.
Article
Some of the basic problems presented by the rapid evolution of a universal genetic code can be resolved by a mechanism of co-evolution of the code and the amino acids it serves.
Article
The dependence of amino acid frequency on sequence length has been examined for the 20 natural amino acids using a set of 2275 protein sequences with little sequence identity. As expected, the frequency of cysteine increases dramatically for sequences shorter than 100 amino acids with a length-dependence that corresponds to an average of two Cys per sequence independent of length. Surprisingly dramatic changes were also observed for the frequencies of arginine, lysine, aspartic acid, and glutamic acid: Arg and Lys frequencies increase for short sequences whereas Asp and Glu frequencies decrease. These changes do not appear to be due to an over-abundance of DNA- and membrane-binding proteins in the database and may, therefore, be related to protein stability. Possible stabilizing mechanisms include increased hydrogen bonding by Arg and increased hydrophobic stabilization due to the amphiphilic character of Arg and Lys. These observations suggest that amino acid composition played an important role in the evolution of small proteins.
Article
We have calculated the average effect of changing a codon by a single base for all possible single-base changes in the genetic code and for changes in the first, second, and third codon positions separately. Such values were calculated for an amino acid's polar requirement, hydropathy, molecular volume, and isoelectric point. For each attribute the average effect of single-base changes was also calculated for a large number of randomly generated codes that retained the same level of redundancy as the natural code. Amino acids whose codons differed by a single base in the first and third codon positions were very similar with respect to polar requirement and hydropathy. The major differences between amino acids were specified by the second codon position. Codons with U in the second position are hydrophobic, whereas most codons with A in the second position are hydrophilic. This accounts for the observation of complementary hydropathy. Single-base changes in the natural code had a smaller average effect on polar requirement than all but 0.02% of random codes. This result is most easily explained by selection to minimize deleterious effects of translation errors during the early evolution of the code.
Article
Differences in assignments from those in the universal genetic code occur in codes of mitochondria. In this report, the published sequences of the mitochondrial genes for COI and ND1 in a platyhelminth (Fasciola hepatica) are examined and it is concluded that AAA may be a codon for asparagine instead of lysine, whereas AAG is the sole codon for lysine in this species.
Article
Analysis of the nucleotide sequence of 1,400 transfer RNAs has revealed the imprint of a prototypic genetic code in position 3-4-5 of the acceptor stem. It appears only in the transfer RNAs for the primordial amino acids ie those found by chemical condensation of a nitrogen-methane-water-ammonia mixture. The model for primitive protein synthesis as mentioned by Crick assumes a direct interaction between the amino acid and a prototypic adaptor oligonucleotide. This has hitherto appeared irreconcilable with the large spatial separation between the aminoacylation site and the anticodon in present day transfer RNAs. The observations reported here show how this paradox can be resolved by a process of duplication and cleavage of a prototypic adaptor.
Article
For the first time it is shown that each of the three codon bases has a general correlation with a different, predictable amino acid property, depending on position within the codon. In addition to the previously recognized link between the mid-base and the hydrophobic-hydrophilic spectrum, we show that, with the exception of G, the first base is generally invariant within a synthetic pathway. G--coded amino acids show a different order, being found only at the head of the synthetic pathways. The redundancy of the nature of the third base has a previously unrecognised relationship with molecular weight. The bases U and A (transversions) are associated with the most sharply defined or opposite states in both the first and second position, C somewhat less so or intermediate, anf G neutral. The apparently systematic nature of these relationships has profound implications for the origin of the genetic code. It appears to be the remains of the first language of the cell, predating the tRNA/ribosome system, persisting with remarkably little change at a deeper level of organisation than the codon language.
Article
L-Arginine competitively inhibits the reaction of GTP with the Tetrahymena ribosomal self-splicing intron. In order to define this RNA binding site for arginine, Ki's have now been measured for numerous arginine-like competitive inhibitors. Detailed consideration of the Ki's suggests a tripartite binding model. The dissociation constants of the inhibitors can be consistently interpreted if the guanidino group of arginine binds in the GTP site by utilizing the H-bonds otherwise made to the N1-H and 2 NH2 of the guanine pyrimidine ring. The positive charge of the arginine guanidino group also enhances binding. A second requirement is for the precise length of the aliphatic arm connecting the guanidino with the alpha-carbon. The positive charge of the alpha-amino group is the third feature essential to effective inhibition. The negative carboxyl charge of arginine inhibits binding, and the substituents on the alpha-carbon are probably oriented, with the alpha-amino group near the phosphate backbone of the RNA. This orientation contributes strongly to the L stereoselectivity of the amino acid site on the RNA. When spaced optimally, net contribution to the free energy of binding is of the same order for the guanidino group and for the arginine alpha-carbon substituents, but the guanidino apparently contributes more to binding free energy. Taken together, these observations extend the previous binding model [Yarus, M. (1988) Science (Washington, D.C.) 240, 1751-1758]. The observed dependence of binding on universal characteristics of amino acids suggests that RNA binding sites with other amino acid specificities could exist.
Article
Nucleotide sequences carry genetic information of many different kinds, not just instructions for protein synthesis (triplet code). Several codes of nucleotide sequences are discussed including: (1) the translation framing code, responsible for correct triplet counting by the ribosome during protein synthesis; (2) the chromatin code, which provides instructions on appropriate placement of nucleosomes along the DNA molecules and their spatial arrangement; (3) a putative loop code for single-stranded RNA-protein interactions. The codes are degenerate and corresponding messages are not only interspersed but actually overlap, so that some nucleotides belong to several messages simultaneously. Tandemly repeated sequences frequently considered as functionless "junk" are found to be grouped into certain classes of repeat unit lengths. This indicates some functional involvement of these sequences. A hypothesis is formulated according to which the tandem repeats are given the role of weak enhancer-silencers that modulate, in a copy number-dependent way, the expression of proximal genes. Fast amplification and elimination of the repeats provides an attractive mechanism of species adaptation to a rapidly changing environment.
Article
The structure of the genetic code suggests that amino acid biosynthesis and hydrophobicity were important factors in shaping the genetic code, as the primitive code coevolved with new varieties of amino acids generated by the expanding pathways of biosynthesis. The current code is exceptionally stable. Deviant codes nonetheless have been observed in a number of mitochondrial and cellular genomes. Even the membership of encoded amino acids is undergoing expansion to include phosphoserine and selenocysteine. Experimental mutation of the code also has proven feasible, in a replacement of tryptophan by 4-fluorotryptophan as a component constituent of proteins. Such mutations, introducing novel varieties of encoded amino acids, will open up a new dimension in protein engineering and design.
Article
Excerpt In 1966, Fitch proposed the ambiguity reduction hypothesis of the origin of the genetic code, based on a view that the origin of life was a process in which local (pre)biological order arose from molecular chaos on the earth, driven by the asymmetric energy budget of the earth's atmosphere, a process in which subsets of random biochemical events gradually became the programmed rule of the system. This in turn led to a view, regarding the origin of the genetic code, that suggests that originally there may have been little specificity regarding which amino acids were charged to the various RNA acceptors that paired to the message. Under such conditions, no messenger RNA is likely to produce exactly the same protein twice. The advantages of obtaining a well-defined protein sequence, however, would have gradually reduced the variability in the assignment of amino acids to codons until the current genetic code emerged....
Article
Excerpt In the past three decades, a wide variety of experiments have been designed to simulate conditions on the primitive earth and to demonstrate how organic compounds that made up the first living organisms were synthesized. This paper reviews this work and indicates the status of such syntheses. There is too much material to review in detail, and the reader is directed to a number of more complete discussions (Miller and Orgel 1974; Kenyon and Steinman 1969; Lemmon 1970). Composition of the Primitive Atmosphere There is no agreement on the constituents of the primitive atmosphere. It is to be noted that there is no geological evidence concerning the conditions on the earth from 4.5 × 10⁹ to 3.8 × 10⁹ years, since no rocks older than 3.8 × 10⁹ years are known. Even the 3.8 × 10⁹-year-old Isua Rocks in Greenland are not sufficiently well preserved to infer details of the...
Article
Analysis of the interaction between mRNA codons and tRNA anticodons suggests a model for the evolution of the genetic code. Modification of the nucleic acid following the anticodon is at present essential in both eukaryotes and prokaryotes to ensure fidelity of translation of codons starting with A, and the amino acids which could be coded for before the evolution of the modifying enzymes can be deduced.
Article
This paper presents a method of constructing a scheme of the genetic code with the advantage over the traditional that the table of codons is organized by the principle of gradual complication of the chemical structure of the amino acid moving from one triplet to an adjacent one. A hypothetical scheme of the evolution of the biological code is proposed.
Article
Differences between mitochondrial codes and the universal code indicate that an evolutionary simplification has taken place, rather than a return to a more primitive code. However, these differences make it evident that the universal code is not the only code possible, and therefore earlier codes may have differed markedly from the previous code. The present universal code is probably a "frozen accident." The change in CUN codons from leucine to threonine (Neurospora vs. yeast mitochondria) indicates that neutral or near-neutral changes occurred in the corresponding proteins when this code change took change took place, caused presumably by a mutation in a tRNA gene.
Article
Evolutionary history of tRNA is studied by comparative sequence analysis of two specified tRNA's at various phylogenetic levels and of tRNA families within four different species. Criteria are developed that allow 1) to distinguish between convergent and divergent evolution, 2) to determine the mechanism of divergence and 3) to estimate the degree of randomization of the variable parts of the sequences. The conclusion of these investigations is that tRNA's represent ancient molecules that existed in the form of a mutant distribution prior to their integration into genomes.
Article
The theory of self-reproductive molecular systems involves the consequence that translation must have started from a selected distribution of RNA molecules, that comprised GC-rich sequences of a length less than 100 nucleotides. This implies a joint function of messenger and adaptor, which both had to be recruited from the same mutant distribution. The reconstruction of tRNA precursors yields such a molecule showing some reverberation of a codon pattern GNC. These findings suggest that tRNA has been the earliest component of the translation machinery.
Article
The availability of specialized sequence databanks for Escherichia coli, Saccharomyces cerevisiae and Bacillus subtilis made it possible to build a set of 105 protein-coding genes that are homologous in these three species. An analysis of the triplets at both the nucleotide and amino acid level revealed that the codon bias of some amino acids are significantly higher at conserved rather than at non-conserved positions. Comparisons of homologous genes in E. coli and Salmonella typhimurium, and in S. cerevisiae and Drosophila melanogaster, led to the same conclusion. A special case was made for serine in E. coli, whose major codon is AGC for non-conserved and TCC for conserved residues. We interpret this observation as evidence that the primordial codons for serine were TCN, while codons AGY appeared later. This conclusion is substantiated by an analysis of the codon usage of catalytic serine residues in ancient, ubiquitous and essential proteins (ATP synthases and topoisomerases). It is shown that in these proteins the proportion of the catalytic serine residues coded by TCN is significantly higher than the one expected from the overall codon usage of serine residues.
Article
A two-substrate Michaelis-Menten mechanism previously proposed for the self-replication of RNA-like oligomers is developed. Differential growth depends on the existence of two pairs of complementary monomers and leads to 2n groups of 2n components each (n is the oligomer size). As n increases the 2n groups tend to overlap with one another, and the efficiency of the process to increase the information content of the strands decreases. In a second stage we suppose that randomly synthesized peptides with one predominant amino acid interacted with the ribotides, increasing the growth rate of some of them, and at the same time had their mean life increased by interactions with other ribotides of the same kinetic group. Natural selection could have preserved a favourable codon-anticodon-amino acid correlation, the precursor of the modern genetic code.
Article
A diversification of the genetic code based on the number of codons available for the proteinous amino acids is established. Three groups of amino acids during evolution of the code are distinguished. On the basis of their chemical complexity those amino acids emerging later in a translation process are derived. Codon number and chemical complexity indicate that His, Phe, Tyr, Cys and either Lys or Asn were introduced in the second stage, whereas the number of codons alone gives evidence that Trp and Met were introduced in the third stage. The amino acids of stage 1 use purine-rich codons, while all the amino acids introduced in the second stage, in contrast, use pyrimidines in the third position of their codons. A low abundance of pyrimidines during early translation is derived. This assumption is supported by experiments on non-enzymatic replication and interactions of hairpin loops with a complementary strand. A back extrapolation concludes a high purine content of the first nucleic acids, which gradually decreased during their evolution. Amino acids independently available from prebiotic synthesis were thus correlated to purine-rich codons. Implications on the prebiotic replication are discussed also in the light of recent codon usage data.
Article
Recently, shifted periodicities 1 modulo 3 and 2 modulo 3 have been identified in protein (coding) genes of both prokaryotes and eukaryotes with autocorrelation functions analysing eight of 64 trinucleotides (Arquèset al., 1995). This observation suggests that the trinucleotides are associated with frames in protein genes. In order to verify this hypothesis, a distribution of the 64 trinucleotides AAA,...,TTT is studied in both gene populations by using a simple method based on the trinucleotide frequencies per frame. In protein genes, the trinucleotides can be read in three frames: the reading frame 0 established by the ATG start trinucleotide and frame 1 (resp. 2) which is the frame 0 shifted by 1 (resp. 2) nucleotide in the 5′–3′ direction. Then, the occurrence frequencies of the 64 trinucleotides are computed in the three frames. By classifying each of the 64 trinucleotides in its preferential occurrence frame, i.e. the frame associated with its highest frequency, three subsets of trinucleotides can