All the conserved detailed results of evolution stored in DNA must be read, transcribed, and translated via an RNA‐mediated process. This is required for the development and growth of each individual cell. Thus, all known living organisms fundamentally depend on these RNA‐mediated processes. In most cases, they are interconnected with other RNAs and their associated protein complexes and function in a strictly coordinated hierarchy of temporal and spatial steps (i.e., an RNA network). Clearly, all cellular life as we know it could not function without these key agents of DNA replication, namely rRNA, tRNA, and mRNA. Thus, any definition of life that lacks RNA functions and their networks misses an essential requirement for RNA agents that inherently regulate and coordinate (communicate to) cells, tissues, organs, and organisms. The precellular evolution of RNAs occurred at the core of the emergence of cellular life and the question remained of how both precellular and cellular levels are interconnected historically and functionally. RNA networks and RNA communication can interconnect these levels. With the reemergence of virology in evolution, it became clear that communicating viruses and subviral infectious genetic parasites are bridging these two levels by invading, integrating, coadapting, exapting, and recombining constituent parts in host genomes for cellular requirements in gene regulation and coordination aims. Therefore, a 21st century understanding of life is of an inherently social process based on communicating RNA networks, in which viruses and cells continuously interact.
RNA populations and their historical
Any definition of life that lacks RNA functions and
RNA networks misses these agents that inherently
regulate and coordinate cells, tissues, organs, and
organisms. Such definitions ignore the role of com-
munication and networks. The precellular evolu-
tion of RNAs at the core of the evolution of cellular
life is well documented, and the question remains
how both levels are interconnected historically and
RNAs bind to other RNAs, DNA, and proteins.11
In the early RNA world, proteins that bind to
RNA served for RNA stability and therefore RNA
function. RNAs such as the ribosomal, transfer,
and messenger RNAs have long been known for
their crucial roles in cellular replication processes
through their helper functions in generating
proteins out of DNA, the genetic storage medium.
The evolutionary timeline of emergence ranges
from the (co-opted and exapted) two halves of
tRNAs, viroids, and RNA viruses, to the subunits
of ribosomal RNAs and messenger RNAs.12–14
Essential features of such RNA structures are self-
folding, catalysis, subunit assembly, association of
large stem-loop groups, cooperative evolution, and
surface proteinization.15 Thereissomeevidence
for direct codon (RNA loop)-to-amino acid inter-
actions. This would indicate a pre-tRNA origin of a
preliminary genetic code functioning as an emerg-
ing network of RNA–amino acid interactions.16–18
Ribosomes are ribonucleoprotein complexes
responsible for protein synthesis in all forms of life.
They polymerize polypeptide chains templated by
the nucleotide sequences in messenger RNA and
mediated by transfer RNAs. Messenger RNA and
transfer RNAs move rapidly through the ribosome
while maintaining the translational reading frame.
This process is accompanied by large- and small-
scale conformational rearrangements in the ribo-
some, mainly in its rRNA.19 Interestingly, as essen-
tial agents of life, ribosomes are agents that may have
self-assembled in an early RNA world, possibly out
of prebiotic matter.20,21
Noncoding RNAs can interact with and regu-
late each other through various molecular interac-
tions based on complementary base-pairing of their
nucleotides, generating a complex network includ-
ing different species of RNAs (e.g., snRNAs, snoR-
NAs, mRNAs, miRNAs, lncRNAs, and circRNAs)
and mobile genetic elements.22–25 Such a network
of RNA–RNA (also competitive) interactions per-
vades and modulates the physiological functioning
of canonical protein-coding pathways involved in
proliferation, differentiation, and even metastasis
in cancer. Interestingly, the noncoding RNAs in host
genomes are more conserved than the protein cod-
ing sequences;26 does this indicate that the regula-
tory functions are of greater importance than the
RNA interaction motifs: parasitizing,
networking, immunity, and identity
A main characteristic of RNA identities is that the
membership of RNA stem-loop groups can never
be fully specified because it can always be (1) par-
asitized by yet unknown parasites or (2) internally
reorganized and split into two from the original
population. The most important consequence of
this uncertainty is that it provides the inherent
capacity for novelty, that is, the precondition for
evolutionary innovation, such as new RNA groups
and increased complexity.27–29
If we look at some interacting motifs of RNA
agents to form consortial biotic structures and share
a functional identity, we must look at the for-
mation of groups of RNA stem-loop structures.30
It has been found that single stem-loops interact
in a pure physicochemical mode without selective
forces, independently of whether they are derived
randomly or are artificially constructed.31,32 In con-
trast to this, if such single RNA stem-loops become
part of group formation, they transcend purely
physicochemical interaction patterns, and biologi-
cal selection emerges. As a result, we can find biolog-
ical identities capable of self/nonself-identification
and preclusion, immune-like functions, and
dynamically changing (adapting) membership/
identity roles.33 The primary emergence of biologi-
cal selection occurred here.
Single alterations in a base-pairing RNA stem
that leads to new loops or bulges may dynami-
cally alter not only this single loop or bulge but
may also change the whole group identity of which
this section of single-stranded RNA is part.34 This
means that any RNA group transcribed out of the
DNA storage medium may play a new, modu-
lar, unexpected role in cooperating or competing
RNAs, genetic regulations, counter-regulations, and
diseases or infection events. Such noncoding RNAs
most often derive from mobile genetic elements and
are the major contributors to regulatory ncRNAs,
which include endogenous retroviruses, their defec-
tive derivatives, and other persistent genetic para-
sites (not forgetting their degraded parts).35–40
Another highly interesting interaction motif of
RNAs is the RNA fragments that self-ligate into self-
replicating ribozymes, which spontaneously form
cooperative networks. It was found that the three-
membered RNA networks showed highly coopera-
tive growth dynamics. When such cooperative RNA
networks compete directly against selfish autocat-
alytic cycles, the former grows faster. As a result,
it was shown that RNA cooperation outcompetes
RNA group behavior generated the origin of a
natural code
One of the crucial key features of single-stranded
RNAs is their tendency to fold back on themselves
and form a double strand based on the complemen-
tary base pairing of the involved nucleotides.44–46
This provides the basis for the historical evolution
of the so-called stem-loops, a double-stranded RNA
stem with a single-stranded RNA loop (or bulge). In
contrast to the double-stranded RNA region, which
is not the primary source of complementary binding
to other RNAs, the single-stranded loops and bulges
are prone to such binding. They are the interaction
centers with other stem-loop structures.31 Inter-
action will be more likely if repetitive nucleotide
sequences cluster than if it occurs in nonrepetitive
sequence order. Therefore, repetitive sequence order
was a beneficial biological selection for the growth
of sequence space in the early RNA world.47,48 In
contrast, a nonrepetitive nucleic acid code evolved,
which is mainly used to code for protein struc-
tures only. This may be consistent with the role of
proteins in an early RNA world: to stabilize RNA
structures and maintain their function. Later on,
proteins were exapted to build cells, tissues, organs,
and organisms. The tRNAs especially intercon-
nect(ed) the RNA world with the protein world.49–51
Interestingly, primary and secondary st ructure anal-
yses indicate a common ancestry for tRNAs and
parasitic RNAs such as, for example, ribosomal
The chemical binding of nucleotides able
to better bind repetitive and distributed loop
sequences can lead to an RNA that interacts with
larger RNA groups that have this repetition. This
can then allow for selection of those RNAs that
could participate in and build more extended
RNA interactions and identities. An RNA group
identity is characterized as a newly emerged social
property that can protect the identity and reject
the nonidentity of individual RNA agents, that is,
self/nonself-differentiation. This looks like a kind
of RNA-sensing or -monitoring property and can
explain RNA quasispecies preclusion34,54
But this social–chemical binding can add a
completely new and crucial communication
feature to the population of chemical molecules:
a semiotic interaction has started that is absent
from all other known molecular structures. A
semiotically determined code means that besides
the physicochemical boundaries, a sign-based
interaction (semiosis) now takes place with at least
the predominant interaction motifs (such as RNA
group identity).55 Natural semiotic codes normally
start out of social interactions, which requires
agents and out of which emerge the complementary
functions of combinatorial rules (syntax), contex-
tual rules (pragmatics), and content-relevant rules
(semantics). This start of a natural semiotic code
gave rise to the genetic identities most relevant for
the evolution of viruses and, later on, cells.
The better-selected RNA groups then inher-
ently parasitize weaker groups or solitary stem-
loops, constantly producing new sequence space
by spontaneously generating binding-prone bulges
and loops. The repetitive structures form much
better interactions with (and parasitize) the com-
plementary repetitive structures than nonrepetitive
ones. The infectious lifestyle of later-derived viruses
emerged from such parasites. However, this inter-
actional modus is not what we might call life, but
any life depends on these social abilities and fea-
tures. Thus, an initial repetitive sequence gram-
mar emerges that characterizes the code with which
RNAs “talk” to each other, which means they com-
municate to generate group behavior.
The genetic identities of RNA stem-loop groups
(RNA networks), such as those from group I introns,
group II introns, viroids, RNA viruses, retrotrans-
posons, LTRs, non-LTRs, and subviral networks,
such as SINEs, LINEs, and Alu elements, have all
invaded and mostly persist in host genomes.56,57
They provide complex RNA-mediated networks. In
addition, mixed consortia of RNA and DNA virus-
derived parts (especially those encoding stem-loop
RNAs) also integrated into host genomes. The highly
dynamic RNA–protein networks, such as ribosome,
editosome, and spliceosome, generate a large vari-
ety of results and core functions out of DNA
content.30,58,59 These are all examples of complex
RNA group functions; and we may conclude here
that without socially interacting RNAs, there is no
effective (code-based) communication and no evo-
lution of viruses or cellular life.
Key features of RNA group evolution
Let us, for example, look at a single stem-loop RNA
within an RNA consortium that undergoes replica-
tion (perhaps for several rounds). Each replication
event (necessarily being low fidelity) produces its
own particular version of diversified progeny. Let
us say, we get a new bulge in the stem. This bulge
then becomes available to provide a whole array of
possible outcomes (including counteracting ones).
It might:
rinteract with the original template to either
complement or inhibit it;
rprovide an interaction point for other RNA
progeny (including itself);
rprovide a target site for cleavage or ligation;
ract in combination with other progeny to
provide a more complex catalytic (ribozyme)
function; or
ralter or provide a binding site to other partici-
pants, such as peptides (RNPs).
In other words, it now has a whole array of possible
(and multiple) usages (positive and negative). It is
important to note that the actual use will depend on
the context (circumstances and history) of the pop-
ulation it is in. Then, add to this all the other diverse
progeny from these few rounds of replication, all in
their own peculiar RNA region, and all providing
their own peculiar potential for use. Such a scenario
very rapidly becomes too complex to follow the fate
(fitness and usage) of any particular RNA.
But, if we now think in social terms, then the RNA
population (quasispecies consortia) can be consid-
ered as a “culture” that retains a common natural
code (with repetitive grammar) that provides a level
of group coherence (quasispecies selection).60 Each
individual diverse RNA then becomes like a poten-
tially new “word” for that “language” represented by
an agent within its population. The culture of a cer-
tain RNA population is then free to use it (possibly
even with multiple “meanings”) however it can or to
reject it. And if this culture changes by building new
bulges or loops, formerly rejected RNA stem-loops
may later on fit (be reused) into the assemblage.61
These RNA uses will also vary considerably with the
history of prior RNAs, as well as any possible inter-
actions with other quasispecies consortia. And these
uses can vary (and be lost) with time as the culture
adopts new meanings.
Whether or not viruses predate cellular life does
not alter the fact that some of the simplest RNA
viruses are built up entirely of RNA stem-loops.
Viruses and their relatives
Viruses have long been assumed to descend from
cells as escaped parasites since they are not able to
self-replicate, and therefore are not living entities.62
But as documented by several authors, this is not
correct. Numerous genes can be found in viruses
without any relation to cellular genes. This indi-
cates clearly that viruses must be older than cellular
life.6,63,64 At least with the discovery of mimiviruses
(and other giant viruses), it might be plausible that
giant viruses have cellular origins.65–70
We must constantly remember that viruses and
subviral infectious genetic parasites are the most
abundant biological agents on the planet.71,72 They
all cellular organisms, and serve as key agents in
the generation of adaptive and innate immune
systems, which are essential for the survival of
cellular life forms since they are key for the
capacity for self/nonself-differentiation.73–75 The
invasion strategy of genomic parasites that results
in persistence within host genomes provides novel
evolutionary genetic identities not present prior
to the invasion.65,76 This is not error-dependent
evolution of novelty.
Importantly, fragmented parasitic genetic ele-
ments can also provide an abundance of distributed
works and be directly relevant in gene regulation in
all organisms.77,78 Best documented of these are the
persistent lifestyles of retroviruses as endogenous
retroviruses and their defective derivatives, such as
LTRs, SINEs, LINEs, and Alu elements.79
They share all the variants of genetic sequence
syntax from RNA to DNA, from single- to
double-stranded, and from repetitive to non-
repetitive sequence order, and only living entities
have these features. But the virus-related RNA-
dependent RNA polymerases, specific reverse
transcriptases, and specific RNAses share the key
ability to transfer RNA into DNA and to insert RNA
interaction motifs into the DNA storage library of
the host.80–82 Other transmissible ribozymes such
as group II introns are related to spliceosomes and
retrotransposons, not to mention editosomes and
ribosomal subunits.83 Interestingly, the nucleolus,
which is the generator of ribosomal RNA subunits,
represents an RNA skeleton structure that is built
out of noncoding RNAs interconnected by Alu
elements, which are also remnants and fragmented
elements of former viral infections.84–86 Many, if not
all, viruses also produce noncoding RNAs,87 which
can provide a source of host and viral regulation.
One can now understand why the nucleolus has
prominent roles in development and aging, given
its noncoding RNA composition.88
Viruses and subviral infectious genetic parasites
remain conserved biological identities, dating to
before the origins of the extant domains of cel-
lular life.52,53,65 They are characterized by a great
ability to modulate genomic content and by perma-
nent evolutionary adaptability. They may recom-
bine not only with each other but also with those
that have single- or double-stranded RNA or DNA
They retain the ability to bridge RNA and DNA
living domains. In bridging the RNA- and protein-
based cellular worlds, they are also the main drivers
of the adaptability of cells by introducing RNA inter-
action motifs into cells.87,92,93 Viruses are also mas-
ters of co-opting cellular noncoding RNAs.94
Viruses and subviral infectious genetic parasites
thus represent biotic identities that are competent
to edit code.95 As editors, they can cooperate, build
communities, generate nucleotide sequences de
novo and insert/delete them into/from host genetic
content (without damaging host genetic content),
remain as mobile genetic elements (or similar
“defectives”), build counterbalancing “addiction
modules” (T/A and R/M),96,97 and determine host
genetic identities throughout all kingdoms of life,
including the virosphere. No other entities havesuch
an expansive capacity to edit and create code.
Viruses and subviral infectious genetic parasites
are the main drivers of speciation in evolution-
ary processes since the beginning of life. Most
importantly, they determine host genetic identities
throughout all the kingdoms of life through various
techniques, such as generating and installing addic-
tion modules persistently. All known species (even
highly related ones) have distinct patterns of colo-
nization by viruses and subviral infectious genetic
parasites. And all species have their peculiar pattern
of susceptibility to persistent viruses and subviral
infectious genetic parasites, and also their corre-
sponding acute versions. As an evolutionary rule,
colonization created a distinct group identity that
initiates the process of speciation.27
The main behavioral motif that interconnects
communication of RNA groups, viruses,
and cells
The crucial question remains: How did evolution
connect the RNA world of interaction motifs with
cell-based life via infectious genetic parasites such
as viruses and their relatives? As noted above, we
can find an abundance of counterbalancing mod-
ules in cellular genomes, such as T/A, R/M, and
various similar counterbalancing agents, that are
inherently regulatory and are abundantly employed
for immune functions against genetic parasites
(although immune systems themselves are mostly
constituted by such parasitic agents).98–100
Cell-based organisms, be they prokaryotic or
eukaryotic, represent rare islands in a sea (or popu-
lation) of viruses, virus-like agents, and RNAs.101
This situation is a classic feature of life. This
means that cells are a rare resource for compet-
ing genetic parasites, persistent genome settlers, or
similar integration-driven behavioral motifs. The
competition in most cases is a rather complex one
between several virus populations that try to invade
host genomes.102 Whereasoneviruspopulationmay
try to invade the cell, another population also tries
to invade or protect the host from the opposing
invading cloud. It is an arms race between the var-
ious virus populations and the immune system of
the host, which constantly tries to react and oppose
nonself-agents that themselves seek to adapt to the
host immune response.
Within these close, dense, and in many cases,
fast and unexpected interactions that follow
agent colonization, the acquisition of identifica-
tion (self/nonself) agents also promotes the acquisi-
tion and emergence of new addiction modules and
Ann. N.Y. Acad. Sci. xxxx (2019) 1–16 C2019 New York Academy of Sciences.
makes this an objective of the biological selection
processes. At the end of this process, the host has
changed its genetic identity through integration of
several modules, including antagonizing agents, that
enriches its own genetic identity but which is absent
from cells of related species that were not sub-
jected to the same invasion process.103,104 For exam-
ple, the enrichment by restriction/modification or
toxin/antitoxin modules which protects the new
host does not protect related organisms that do not
have such a TA module, which means that related
species may now be killed by a toxin originating
from an infectious agent.105,106 In the long run of
evolution, this divides a species into two different
genetic identities, the noninfected (and nonpro-
tected) one and the infected (and protected) one.107
This may lead to the start of two different lineages.
RNA-dependent RNA polymerases (RdRp) are
very ancient enzymes and crucial players for all
108 Together with reverse
transcriptase and various RNases they promote and
maintain RNA networks, and are essential agents
in key cellular processes, including generating DNA
sequences, a basic requirement for cellular life.109
The reverse transcriptases are also closely related
to those of the group II introns—but lacking the
intronic RNA sequence structure—as well as being
associated with a type of CRISPR/Cas system.110 The
CRISPR/Cas system is an effective prokaryotic adap-
tive immune system descended from mobile genetic
elements and most likely represents an exapted T/A
module, as presented above.111
Protein-based cells
Protein-based cells are metabolizing protein bod-
ies that have membranes, genetic information that
is inherited, and replication processes for repro-
duction. Their origin during evolution occurred
after the emergence of RNA networks and virus-
like structures.112,113 In contrast, the origin of cell-
based life without RNA networks and viruses first
seems to be impossible because all regulatory ele-
ments in cells depend on varying degrees of RNA
functions.114 Cells represent identities in the liv-
ing world of either prokaryotic populations—with
their own mostly unicellular history and ecologi-
cal niche construction—or eukaryotes, with their
emergence, most often, of social cellular identities
from formerly free-living prokaryotes. Both of these
cellular identities are protein-based, genetically con-
served, and reproducible.115 This means the former
determinants of RNA or virus-based life are now
dominated by the identity of the protein-based cell.
Yet, these protein-based cells retain their constant
entanglement with the virosphere, which means that
permanent infectious events and counter-defense
actions involving persistent genetic parasites con-
stantly calibrating immune functions against com-
peting genetic parasites, both acting with noncoding
RNAs to adapt or reject. Thus, protein-based cellu-
lar identities remain strongly influenced by viruses
and subviral infectious genetic parasites.
Cells have long been assumed to be the basic
entity of all life. According to Woese’s categoriza-
tion of life into three domains, the most primitive
cells were archaea followed by bacteria and later on
the eukarya emerged. Previously, it was the main-
stream opinion that it took innumerable rounds of
mutations and natural selection for eukaryotes to
evolve. Such evolution results from small changes
in structure and function that accumulate over very
long periods of time. However, the serial endosym-
biotic theory of Lynn Margulis presents a different
evolutionary scenario.115 According to this narra-
tive, it is not mutations in unicellular prokaryotes
over very long time periods that led to the emer-
gence of eukaryotes. Instead, it is via the formation
of cooperative symbiotic networks from formerly
free-living prokaryotes that resulted in unicellular
eukaryotes. This view requires new networks to be
formed via symbiosis.
The emergence of protein-based cellular life—
which constitutes what has been termed “life” exclu-
sively for many centuries—also provides a new
phenotype, with completely new protein interaction
in the early RNA world. Because protein bodies also
depend on RNA-mediated regulation, this vastly
extends the interactions between constitutes of cells
and also diversified how cells experience life–world
interactions. This had large consequences on niche
constructions, the adaptation processes, and com-
peting and cooperating populations. But all such
interactions remain linked to an inherently stabile
genetic identity of the cellular organism. This cel-
lular interacting life world is a new level that deter-
mines cellular life more or equally than the deter-
mination by genetic information, which now has
to be coordinated in an appropriate way to survive.
Reproduction pathways include several steps and
substeps of transcription, translation, repair, and
immunity that are conserved in cellular evolu-
tion. But DNA alone does not specify cell fates.
The crucial evolutionary benefit derived from RNA
stem-loop networking and genetic parasites is the
posttranscriptional (epigenetic) modifications that
modulate genetic content into a dynamic and highly
adaptive behavioral modification of the stored
genetic information.116,117
Context-dependent natural genome editing
in cells
Although all cell types of any known organism
contain the same organism-specific genetic infor-
mation, they are expressed according to their
spatiotemporal position and their contextual
(pragmatic) needs, such as developmental stages,
stress, damage repair, or changing environments.118
This means that depending on the context, such as
a cell being located within an organ of an organism,
the resulting expression leads to a new tissue-
and site-specific pattern at the right time and in
the right place to generate a specific cell type and
chromatin state.119,120
Chromatin marking enables a kind of identity
programming.121 This means that a specific cell
within an organism is able to obtain or even change
its identity through epigenetics, if developmental,
environmental, nutrition, or stress-related condi-
tions make it necessary. Because RNAs are mobile,
they can serve as signals throughout tissues, organs,
and even the whole organism. In this respect, it
is the imprinting of new experiences that leads to
variable meanings of genetic information, depend-
ing on the action of noncoding RNAs. With epige-
netic marking, life has an appropriate technique for
the emergence of memory and learning processes
for faster adaptation.122–125 Additionally, this epige-
netic memory tool plays important roles in trans-
generational inheritance, which also represents an
important evolutionary function.126–128
Both small RNAs and long noncoding RNAs
are able to direct chromatin changes through his-
tone modifications and DNA methylation. These
noncoding RNAs are able to direct chromatin-
modifying agents to specific targets. In small
RNA-driven silencing pathways, the regulatory
RNAs identify and mark potentially dangerous
nonself-elements for transcriptional silencing or
elimination.129,130 In other networks, homology
between the regulatory RNA and the target locus
marks the region as self and protects it from silenc-
ing or elimination. Interestingly, epigenetic marking
conceivably originally emerged to defend genomes
against genetic invaders.131,132
We can look at the shared behavioral motif of
how RNA can modify meaning out of a given
DNA sequence syntax and also how RNA affects
RNA editing.133,134 RNA editing is a co- or post-
transcriptional process that alters the RNA sequence
derived complementarily from the DNA from which
it was transcribed. Before RNA editing, the edito-
some (i.e., small nuclear RNAs complexed with a
variety of proteins) must be assembled in a strictly
coordinated process. RNA editing changes gene
sequences at the RNA level.135 The edited mRNA
specifies an amino acid sequence that is different
from the protein that would be expected based on
the encoding of the genomic DNA into the primary
transcript.136 RNA-editing alterations of such
transcribed RNA sequences occur by modification,
substitution, and insertion/deletion processes.137
Editing sites have to be identified individually to
differentiate the A, T, G, C to be edited from the A,
T, G, C that should not be edited.138 The discrim-
inating information can be found in the nucleotide
sequence surrounding a given site. This means
that context is crucial for identification. Thus, each
editing site carries its own identification context.139
RNA editing predates splicing (also in evolution)
and is temporally and functionally interconnected.
The editosome and spliceosome are important
interacting agents but whose assembly is dependent
on editing.140–142 In ribosome and editosome
assembly, and also in spliceosome assembly,
construction of the needed ribonucleoprotein
introns and splice exons together. The spliceosomal
ribonucleoproteins are mainly small nuclear RNAs
complexed with at least 300 different proteins to
form five spliceosomal subunits.
Interestingly, the variety of steps in which the
subunits of the final spliceosome are produced are
counterbalanced by (formerly) competing genetic
parasites, all of them persistently integrated within
the host genome. After this final splicing proce-
dure of the mature spliceosome, the remaining
RNA products are actively discharged from the
spliceosome and the remaining ribonucleoprotein
particles are recycled for further catalytic processes
as multiuse modules. Depending on these regula-
tory events, the end product may vary concerning
the context dependency of the resulting regula-
tion process, which is highly sensitive to various
needs and circumstances. Consequently, spliceoso-
mal regulation differentiates the inclusion (splicing
enhancers) or exclusion (splicing silencers) of exons
in the final mRNA.140 Splicing regulation occurs by
competing cis-acting elements that precisely balance
regulatory proteins.143
Intron-exon genetic sequence construction
in cells
One of the most interesting aspects of RNA editing
is the complexity of genetic sequence construction,
with genes that code for proteins and noncoding
sections containing regulatory sequences, that is,
the division of genetic information into introns and
exons.144 Previously, introns that do not code for
proteins had been viewed as meaningless remnants
of former evolutionary stages remaining in the host
genome.145 Later on, it became clear that cellular
genomes represent a rather limited resource and
most likely do not represent senseless sequences.
With the resurgence of virology in host evolution,
especially the focus on genome-invading genetic
parasites with a persistent status (and their rela-
tives), introns more and more seemed to represent
former genetic parasites that are co-opted or exapted
for cellular genetic functions.146
To generate a coherent messenger RNA sequence
for use in a cellular replication process, the lining
up of the translationally relevant genetic sequences
coding for a protein requires that the introns be
removed and the remaining exons be ligated into
a protein-coding sequence. By our perspective, this
means that the identity of a protein-coding gene can
be found only if the introns, which represent rem-
nants of previous infection events by genetic para-
sites with repetitive sequence syntax, are removed.
Otherwise, the gene coding for a protein cannot be
produced coherently. This reminds us of the early
self/nonself-differentiation competence of repeti-
tive RNA, the forerunner of every evolutionarily
derived immune function and the core behavioral
motif needed to generate a genetic identity.147 The
repetitive sequences of introns must be removed to
get the nonrepetitive sequences to line up properly
in order to code for a protein. A very complex and
strict regulation must govern this division of func-
tions in every replication process throughout the
living world.148
The repetitive sequences of the introns also
represent the preferred target sites of genetic par-
asites to invade the host genome, whereas the non-
repetitive sequences that code for proteins normally
are not damaged or deformed by genome invading
agents.54,149–151 Intriguingly, intronic regions that
do not code for proteins in many cases serve as a
rich source of RNAs that are used for defense against
transposable elements, indicating additional roles in
Communication is the key
The core paradigmatic assumptions of the 20th
century biology, including (1) the central dogma
of molecular biology (DNA to RNA to proteins),
(2) noncoding repetitive DNA is junk, and (3) the
“one gene, one protein” hypothesis, have been falsi-
fied and no longer play important roles in the 21st
century.155,156 A similar situation also applies to the
core concepts seeking to explain the genetic code,
RNAs, viruses, and cellular life. These used math-
ematically determined concepts such as cybernetic
systems theory, information theory, biophysics, and
derivative concepts in an attempt to explain liv-
ing processes mechanistically. Such approaches have
been uniformly lacking in essential progress to
understand the complexity of interactional (com-
municating) patterns.29,157 For example, the self-
reproducing machine has been announced for more
than half a century but not a single self-replicating
machine has been constructed, observed, or demon-
strated as yet.
Although cell–cell communication and numer-
ous signaling processes within cells have been well
known for over half a century, the explanations
for this communication were subsumed under their
corresponding physicochemical properties as well as
under information- and formal systems-theoretical
explanatory attempts. The word “system” can con-
fuse many readers. As currently accepted in science,
we must keep in mind that if we define cells or even
life as a system, we do not speak about empirical
observations but of a cybernetic, systems-theoretical
perspective that exchanges “cells” for “systems” in a
systems-theoretical (mathematical) construct. It is
important to remember this and not to confuse such
a theoretical explanatory model with existing life
or even confuse the theoretical term with observed
This is not a minor disagreement in nuanced
word usage. Key vocabulary that was used for
the description of the essential activities of cellu-
lar life (at least in the last six decades), including
terms such as “genetic code,” “genetic information,”
“cell–cell communication,” “nucleotide sequences,”
“protein-coding sequences,” and “self/nonself-
recognition,” all are using vocabulary from linguis-
tics and communication science, and not vocabulary
from chemistry, physics, or mathematics. But biol-
ogy, as well as physics, chemistry, and mathematics,
has yet to integrate the results of the discourse in
the philosophy of science about the topics of com-
munication and meaning and therefore is lacking
a clear expertise on understanding language and
Communication primarily is a kind of social
Communication designates social interaction.
Socially interacting living agents need tools so that
interaction may lead to the coordination and orga-
nization of common behaviors to reach goals. In
contrast to physicochemical interactions on an abi-
otic planet, communicative interactions on biotic
planets are mediated by signs. In cell-based organ-
isms, such signs must be uttered by bodily expressed
movements, phonetics, audiovisuality,tac tility (e.g.,
vibrational), or semiochemical sensing.158
Communication as rule-governed sign-mediated
interactions is different from interactions in a purely
physicochemical world without any biotic agents.
Communicating living agents share a limited reper-
toire of signs that are used according to a limited
number of rules that must be followed to gen-
erate correct sign sequences to designate context-
dependent content. Most interesting is the fact that
such rules—although rather conservative—may be
changed in extreme cases or if adaptation is neces-
sary. Rule-following by living agents is rather flexi-
ble, in contrast to the natural laws that living agents
strictly abide by. This means that communication
is the essential tool to generate new signs, sign
sequences, new rules for sign use, and generation
of new content according to unexpected contex-
tual circumstances. Communicating living agents
are able, in principle, to generate new communica-
tive patterns for better or innovative adaptation to
a new and unforeseeable situation.159
Communication in all domains of life
If communication is the key in the 21st century to
understanding life, it must be possible to identify
communicative actions throughout all domains of
life.160 Until the middle of the last century, language
and communication were thought to be the special
tools of only humans. Meanwhile, we know many
examples of nonhuman languages and communi-
cation processes.161–166 Therefore, the description
of communication processes must be valid in prin-
ciple in all organisms, from the simplest akaryote
up to humans. The main characteristics of commu-
nication, namely its (1) social character, (2) depen-
dence on signs accordingly, and (3) the three kinds of
rules (combinatorial, context-specific, and content-
coherent), are not compatible with the decades-
long narrative suggested by information theory and
systems theory, that is, the sender–receiver narra-
tive (coding-decoding), which was wrong in sev-
eral respects.167 All empirical data clearly show that
communication is not only an information-transfer
process, but also an interaction mediated by signs.
No natural code codes itself
As we know today, communicative properties and
communicative interactive agents cannot be suffi-
ciently described by physicochemical analyses and
mathematical theories of language and communi-
cation (such as systems theory and information the-
ories) because:
rthe hidden, context-dependent deep grammar
that finally determines the meaning of the
sequence of superficial (visible) sign sequences
cannot be identified, because it may vary
according to its concrete contextual use, and
rthe dependence of natural communication
processes on social interacting agents is pri-
marily a sociological expertise and not one of
physics, chemistry, or mathematics.
It is empirically evident that natural codes and
natural languages do not code themselves or speak
themselves. In all scientific observations, it is evi-
dent that there must be agents that use and edit
natural codes, such as the living agents that use
and generate natural sign-based languages.159 Code
or language characters do not build sequences by
statistical ensemble mechanics, and the genetic
code is not the result of a sequence of selected
errors caused by mechanistic and thermodynamic
conditions.60 ThesameistrueforRNAgroupcon-
structions and interactions as well as viruses and
their interaction motifs with cell-based life forms.
The tools to construct appropriate concepts that
can fully integrate present empirical data cannot be
found in the 20th century.168
If we look for a coherent explanation and under-
standing of life, we must add the communicative
aspects of sign-mediated interactions in RNA net-
works, viruses, and cells, although the interactional
patterns of these three levels are quite different his-
torically and functionally:169
rIn RNA communication, RNA stem-loops are
functioning as both catalytic drivers of reac-
tions and as signs in sequences by themselves,
representing nucleotide sequences mainly in a
repetitive sequence order that is stabilized by
its binding to protein structures (RNPs).
rCellular life is organized and coordinated
exclusively by sign-mediated interactions in
the uptake, interpretation, and release of
chemical substances. This means that cellular
life constantly interacts during its typical life
cycle by continuous communication processes
using semiochemicals, that is, molecules that
are produced, released, and taken up through
signaling processes.
rViruses and subviral infectious genetic para-
sites can do both of the above: (1) they gener-
ate, take up, and release semiochemicals for
communication and they mimic host com-
munication between themselves and the host
organism, and (2) they function as catalytic
units and sign sequences (viroids). They are
the ideal intermediate between RNA networks
and cells that have their own goals.
Disease: dysfunctional communication
Disease has long been assumed to be the result of
mostly dangerous events for living organisms caused
by abiotic or biotic influences that disturb or trigger
a misleading mechanistic effect within the organism.
In most cases, this is a rather complex event, because
cascades of several distinct pathways are involved
constantly and in collaboration with immune func-
tions that try to restore health.
From the biocommunicative viewpoint, disease is
the result of dysfunctional communication at one or
more communication levels (i.e., RNA group com-
munication, persistent virus communication, and
cellular communication). As we have seen, these lev-
els are rather complex and distinct from each other
but are intertwined in all living organisms. RNA–
RNA dysfunctions may be relevant to RNA–virus-
and RNA–protein-based cell dysfunctional commu-
As we have seen, RNAs are key in nearly all func-
tions of cell-based life. They have been integrated,
combined, recombined, exapted, and co-opted by
genetic parasites, such as viruses and their relatives
(e.g., mobile genetic elements), which reached a per-
sistent identity within host organisms. Persistence
may be reached from fully functional viruses that
have exapted old functional motifs for new traits—
such as in the syncytin genes or the neuronal arc
genes—to viral defectives, such as the great vari-
ety of transposable elements and other noncoding
RNAs, all of which can be identified as repetitive
sequence structures in either RNA or DNA.170–174
Without a doubt, transposable elements repre-
sent key agents to change genome identities for
various adaptive purposes in a rather flexible and
module-based, and even social, manner. But this
inherent capability to change genome integrity and
sequence order with far-reaching consequences for
the adaptability of the host organisms must be
restricted to ensure an acceptable level of genome
integrity (e.g., genome immunity). Mobile genetic
elements otherwise may drive genome instability
into deregulation of well-conserved cellular traits
and may cause disease in various ways, which may
lead to a paradoxical arms race between evolution-
ary genome plasticity and stability.175
The crucial behavioral motif for such host
integration events is the addiction module. As
described above, we know them for counterbal-
ancing (opposing) functions such as restriction/
modification, toxin/antitoxin, insertion/deletion,
amplification/silencing, endonuclease/ligase (anti-
sense/sense), and various others, such as death/
antideath programs.29 The counterbalancing prop-
erties evolved by socially selective forces and rep-
resent astonishingly complex interaction motifs.
For example, a bacterial genome with 51 restric-
tion motifs is counterbalanced by 51 modification
A great problem in understanding life in all its
complexity is its communication—the intertwined
regulation of the detailed steps and substeps in
transcription, translation, immunity, repair, repli-
cation, etc. Based on the three levels of communi-
cation described above, disease can be understood
as dysfunctional communication in one or more
levels that leads to a deregulation of the involved
counterbalancing agents. This may lead to dysfunc-
tion on one level, which directs the nondysfunc-
tional communication of another level into a wrong
direction (i.e., death or uncontrolled growth), as
occurs in cancer. For example, the same pathways
used by trophoblasts in embryo uterus implanta-
tion can be used for metastasis in cancer cells in
which those molecular pathways have lost their
stop signal.176–178 We must therefore keep in mind
the multiple roles of deregulated RNAs in cancer
to understand the consequences of dysregulated
RNA communication.179–181 From this perspective,
errors in replication events may initiate such dys-
functional communication because it damages or
deforms functional communication pathways.
Previously, the evolutionary beginning of life was
defined by the emergence of self-replicating and
metabolizing individual cells. Communication was
simply a byproduct of physical interactions, and
repetitive or parasitic DNA was simply junk. Now,
we argue that life emerged from RNA networks
capable of communication and self-identification
via repetitive motifs. These networks underlie all
extant life. They are constituted by three levels of
interactions (see below). All cells are genetically
regulated by RNA networks that were initially trans-
ferred into cellular host genomes as repetitive motifs
via genome invading agents, such as viruses and sub-
viral infectious genetic parasites (e.g., transposons).
The persistence of genomic parasites results when
the parasites introduce evolutionarily novel genetic
identities absent before invasion (often via addiction
modules). Formerly competing genetic parasites
come together (via symbiosis) with host immune
function to generate new regulatory tools that them-
selves are counterregulated. The three interaction
levels are among (1) RNA groups, (2) viruses (both
internal/external), and (3) cell-based organisms.
These networks not only constitute identity and
regulate phenotype, but are also inherently open
for subsequent parasite invasion that generates new,
unpredictable, and thus noncomputable interaction
profiles, such as de novo genetic sequences, new
cooperation pathways, exaptation, and new traits
from former parasite module-like parts that evolved
for different purposes. Parasite-derived functions
become degraded and available for reuse as regu-
latory modules and new behavioral motifs in all
cellular life forms. If these conserved regulatory
modules and their counteregulatory components
become out of balance, dysfunctional communica-
tion takes place and disease may be the consequence.
Full-text available
Retroviral integration into germline DNA can result in the formation of a vertically inherited proviral sequence called an endogenous retrovirus (ERV). Over the course of their evolution, vertebrate genomes have accumulated many thousands of ERV loci. These sequences provide useful retrospective information about ancient retroviruses, and have also played an important role in shaping the evolution of vertebrate genomes. There is an immediate need for a unified system of nomenclature for ERV loci, not only to assist genome annotation, but also to facilitate research on ERVs and their impact on genome biology and evolution. In this review, we examine how ERV nomenclatures have developed, and consider the possibilities for the implementation of a systematic approach for naming ERV loci. We propose that such a nomenclature should not only provide unique identifiers for individual loci, but also denote orthologous relationships between ERVs in different species. In addition, we propose that—where possible—mnemonic links to previous, well-established names for ERV loci and groups should be retained. We show how this approach can be applied and integrated into existing taxonomic and nomenclature schemes for retroviruses, ERVs and transposable elements.
Full-text available
Particles containing degenerate forms of the viral genome which interfere with virus replication and are non-replicative per se are known as defective interfering particles (DIPs). DIPs are likely to be produced upon infection by any virus in vitro and in nature. Until recently, roles of these non-viable particles as members of a multi-component viral system have been overlooked. In this review, we cover the most recent studies that shed light on critical roles of DIPs during the course of infection, including: the modulation of virus replication, innate immune responses, disease outcome and virus persistence, as well as the evolution of the viral population. Together, these reports allow us to conceive a more complete picture of the virion population, and highlight the fact that DIPs are not a negligible subset of this population but instead can greatly influence the fate of infection.
Most analyses assume that genomes are to be read as linear text, much as a sequence of nucleotides can be translated into a sequence of amino acids by looking in a table. However, information can evolve in genomes with distinct forms of representation, such as in the structure of DNA or RNA and/or the relationship between nucleotide sequences. Such information has importance to biology yet is largely unexpected and unexplored. As described in this volume, much of this information, through mechanisms ranging from alternative splicing of RNA to the generation of bacterial coat protein diversity, affects the probability of distinct types of alterations in the nucleic acid sequence. Some genomic DNA sequences affect genome stability, handling and organization, with implications for the robustness of lineages over evolutionary time. The examples reviewed in this volume, taken from a broad range of biological organisms, both extend our view of the nature of information encoded within genomes, and can deepen our appreciation of the power of natural selection, through which this information, in its various forms, has emerged.
This year's Lasker-Koshland Special Achievement Award is given to Joan Argetsinger Steitz for her RNA research discoveries and her exemplary international leadership.
The general notion of an "RNA world" is that, in the early development of life on the Earth, genetic continuity was assured by the replication of RNA, and RNA molecules were the chief agents of catalytic function. Assuming that all of the components of RNA were available in some prebiotic locale, these components could have assembled into activated nucleotides that condensed to form RNA polymers, setting the stage for the chemical replication of polynucleotides through RNA-templated RNA polymerization. If a sufficient diversity of RNAs could be copied with reasonable rate and fidelity, then Darwinian evolution would begin with RNAs that facilitated their own reproduction enjoying a selective advantage. The concept of a "protocell" refers to a compartment where replication of the primitive genetic material took place and where primitive catalysts gave rise to products that accumulated locally for the benefit of the replicating cellular entity. Replication of both the protocell and its encapsulated genetic material would have enabled natural selection to operate based on the differential fitness of competing cellular entities, ultimately giving rise to modern cellular life.
Viruses are ubiquitous parasites of cellular life forms and the most abundant biological entities on earth. The relationships between viruses and their hosts involve the continuous arms race but are by no account limited to it. Growing evidence shows that, in the course of evolution, viruses and their components are repeatedly recruited (exapted) for host functions. The functions of exapted viruses typically involve either defense from other viruses or cellular competitors or transfer of nucleic acids between cells, or storage functions. Virus exaptation can reach different depths, from recruitment of a fully functional virus to exploitation of defective, partially degraded viruses, to utilization of individual virus proteins.