ArticlePDF Available

Abstract and Figures

Background The genomes of all vertebrates harbor remnants of ancient retroviral infections, having affected the germ line cells during the last 100 million years. These sequences, named Endogenous Retroviruses (ERVs), have been transmitted to the offspring in a Mendelian way, being relatively stable components of the host genome even long after their exogenous counterparts went extinct. Among human ERVs (HERVs), the HERV-W group is of particular interest for our physiology and pathology. A HERV-W provirus in locus 7q21.2 has been coopted during evolution to exert an essential role in placenta, and the group expression has been tentatively linked to Multiple Sclerosis and other diseases. Following up on a detailed analysis of 213 HERV-W insertions in the human genome, we now investigated the ERV-W group genomic spread within primate lineages. ResultsWe analyzed HERV-W orthologous loci in the genome sequences of 12 non-human primate species belonging to Simiiformes (parvorders Catarrhini and Platyrrhini), Tarsiiformes and to the most primitive Prosimians. Analysis of HERV-W orthologous loci in non-human Catarrhini primates revealed species-specific insertions in the genomes of Chimpanzee (3), Gorilla (4), Orangutan (6), Gibbon (2) and especially Rhesus Macaque (66). Such sequences were acquired in a retroviral fashion and, in the majority of cases, by L1-mediated formation of processed pseudogenes. There were also a number of LTR-LTR homologous recombination events that occurred subsequent to separation of Catarrhini sub-lineages. Moreover, we retrieved 130 sequences in Marmoset and Squirrel Monkeys (family Cebidae, Platyrrhini parvorder), identified as ERV1–1_CJa based on RepBase annotations, which appear closely related to the ERV-W group. Such sequences were also identified in Atelidae and Pitheciidae, representative of the other Platyrrhini families. In contrast, no ERV-W-related sequences were found in genome sequence assemblies of Tarsiiformes and Prosimians. Conclusions Overall, our analysis now provides a detailed picture of the ERV-W sequences colonization of the primate lineages genomes, revealing the exact dynamics of ERV-W locus formations as well as novel insights into the evolution and origin of the group.
Content may be subject to copyright.
R E S E A R C H A R T I C L E Open Access
HERV-W group evolutionary history in
non-human primates: characterization of
ERV-W orthologs in Catarrhini and related
ERV groups in Platyrrhini
Nicole Grandi
1
, Marta Cadeddu
1
, Jonas Blomberg
2
, Jens Mayer
3
and Enzo Tramontano
1,4*
Abstract
Background: The genomes of all vertebrates harbor remnants of ancient retroviral infections, having affected the
germ line cells during the last 100 million years. These sequences, named Endogenous Retroviruses (ERVs), have
been transmitted to the offspring in a Mendelian way, being relatively stable components of the host genome
even long after their exogenous counterparts went extinct. Among human ERVs (HERVs), the HERV-W group is of
particular interest for our physiology and pathology. A HERV-W provirus in locus 7q21.2 has been coopted during
evolution to exert an essential role in placenta, and the group expression has been tentatively linked to Multiple
Sclerosis and other diseases. Following up on a detailed analysis of 213 HERV-W insertions in the human genome,
we now investigated the ERV-W group genomic spread within primate lineages.
Results: We analyzed HERV-W orthologous loci in the genome sequences of 12 non-human primate species
belonging to Simiiformes (parvorders Catarrhini and Platyrrhini), Tarsiiformes and to the most primitive Prosimians.
Analysis of HERV-W orthologous loci in non-human Catarrhini primates revealed species-specific insertions in the
genomes of Chimpanzee (3), Gorilla (4), Orangutan (6), Gibbon (2) and especially Rhesus Macaque (66). Such
sequences were acquired in a retroviral fashion and, in the majority of cases, by L1-mediated formation of
processed pseudogenes. There were also a number of LTR-LTR homologous recombination events that occurred
subsequent to separation of Catarrhini sub-lineages. Moreover, we retrieved 130 sequences in Marmoset and
Squirrel Monkeys (family Cebidae,Platyrrhini parvorder), identified as ERV11_CJa based on RepBase annotations,
which appear closely related to the ERV-W group. Such sequences were also identified in Atelidae and Pitheciidae,
representative of the other Platyrrhini families. In contrast, no ERV-W-related sequences were found in genome
sequence assemblies of Tarsiiformes and Prosimians.
Conclusions: Overall, our analysis now provides a detailed picture of the ERV-W sequences colonization of the
primate lineages genomes, revealing the exact dynamics of ERV-W locus formations as well as novel insights into
the evolution and origin of the group.
Keywords: Comparative genomics, Endogenous retroviruses, HERV-W, Syncytin, ERV11, Viral evolution, Monkey
and ape retroviruses
* Correspondence: tramon@unica.it
1
Department of Life and Environmental Sciences, University of Cagliari,
Cagliari, Italy
4
Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy
Full list of author information is available at the end of the article
© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Grandi et al. BMC Evolutionary Biology (2018) 18:6
DOI 10.1186/s12862-018-1125-1
Background
The genomes of all vertebrates include a portion of se-
quences of viral origin, namely Endogenous Retroviruses
(ERVs). ERVs belong to Class I Transposable Elements
(TEs), representing remnants of ancient infections that
occurred mostly during the last 100 million years [1]. An
essential step of the retroviral infectious cycle is reverse
transcription, in which the single-stranded RNA genome
is converted into a double-stranded DNA (provirus) and
stably integrated in the host cell genome. In the case of
ERVs, such integration occurred in the germ line cells,
allowing the subsequent Mendelian inheritance of
proviral sequences through the offspring.
If not severely mutated, ERVs share with exogenous
retroviruses a typical proviral structure, where two Long
Terminal Repeats (LTRs) flank gag, pro, pol and env genes.
Briefly, gag encodes matrix, capsid and nucleocapsid
proteins; pro and pol encode the viral enzymes Protease,
Reverse Transcriptase, Ribonuclease H and Integrase; and
env encodes the envelope surface and transmembrane
domains. The 5and 3LTRs are formed during reverse
transcription from two unique regions (U3 and U5) sepa-
rated by a repeated portion (R), and are identical at the
time of formation.
ERVs,likeallTEs,hadamajorroleinvertebrateevolu-
tion [2] and greatly influenced host genomes by providing
new functions and evolutionary stimuli, causing relevant
physiological effects on the host [35]. ERV colonization
could cause genetic alterations, insertional mutagenesis,
non-homologous recombination, rearrangements and dis-
ruption of genes [1,3,69]. ERV LTRs could provide
additional regulatory elements, potentially acting as bidirec-
tional promoters, enhancers, alternative splice and polyade-
nylation sites [3,917]. Indeed, some ERV LTRs have been
coopted as promoters/enhancers of nearby genes involved
in embryonic development and pluripotency maintenance
that was likely beneficial to the hostsevolution[18]. ERV
proteins can likewise being coopted and greatly influence
the hosts biology and evolution, as in the case of functional
envelope proteins (Env) produced by an ERV-W and an
ERV-FRD provirus, Syncytin-1 and Syncytin-2, respectively,
that are involved in the placental syncytiotrophoblast for-
mation and in the maternal immune tolerance to the fetus
[1922]. Notably, while Syncytin-1 is conserved in the
genomes of Hominoids only and Syncytin-2 is shared by all
primates except Tarsiiformes and Prosimians, functionally
similar Env-derived proteins from different ERV groups
have been domesticated independently on multiple occa-
sions for the placental functions of several mammalian line-
ages, thus representing a process of convergent evolution
[23,24]. Also ERV sequences devoid of functional Open
Reading Frames (ORFs) can nevertheless modulate import-
a
nthostfunctions.Forinstance, spread of ERVs during
mammalian evolution dispersed a great number of
interferon-inducible enhancers, thus shaping an effective
regulatory network of innate immunity [25]. ERVs were also
reportedtoinfluencethedefencesystemsviaRNAtran-
scripts that can modulate host functions in a variety of
mechanisms, among which RNA interference and innate
immunity sensing of double-stranded RNA [10,26].
Beside the contributions to (human) physiology and evo-
lution, some pathological roles have also been suggested
for HERVs [35] and their expression has been tentatively
linked to a number of diseases [2731], although no un-
equivocal cause-effect relationships have been established
so far [3,31,32].
While ERVs and their exogenous counterparts are cur-
rently co-existing in some vertebrates [3335], exogen-
ous retroviruses that formed HERV insertions have gone
extinct millions of years ago (MYa), and usually cannot
be studied as replicating viruses. However, considerable
information on ancestral retroviruses can be obtained
from HERV sequences, constituting approximately 8% of
the human genomic DNA [36], by comparative analysis
of shared (orthologous) elements within non-human pri-
mate species. We recently analyzed the human genome
sequence assembly GRCh37/hg19 with RetroTector soft-
ware [37], characterizing ~ 3200 near complete HERV
insertions [38]. The most ancient HERV groups formed
before the separation of parvorders Catarrhini (which
includes the families Cercopithecidae, also known as Old
World Monkeys, OWM, and Hominoidea) and Platyr-
rhini (also known as New World Monkeys, NWM), that
occurred ~ 40 MYa [39,40] (Fig. 1), being thus shared
between primate species of both parvorders, as in the
case of HERV-L and HERV-H) [41]. Many other HERV
groups, such as HERV-E and HERV-K(HML-2), are evo-
lutionarily younger and have been acquired after the
evolutionary separation of Catarrhini and Platyrrhini.
Among HERVs, the HERV-W group has recently drawn
considerable interest. In fact, as mentioned above, a
HERV-W provirus in locus 7q21.2 (ERVWE1) retained an
intact ORF producing a functional Env-like protein,
Syncytin-1, coopted for placenta morphogenesis and
homeostasis [19,20,42], while the groups overall expres-
sion has been investigated in various human pathological
contexts [43].
In a previous study, we described in detail the distribu-
tion and genetic composition of 213 HERV-W loci in the
human genome assembly GRCh37/hg19, providing a
detailed overview of this HERV group [44]. Briefly, the
HERV-W group comprises 65 proviruses, acquired
through retroviral replication and having complete 5and
3LTRs; 135 processed pseudogenes, generated by L1
(Long Interspersed Nuclear Elements 1) retrotransposition
and having accordingly truncated LTRs [45,46]; and 13
unclassifiable elements lacking both LTRs. Phylogenetic
and structural analysis classified HERV-W members into
Grandi et al. BMC Evolutionary Biology (2018) 18:6 Page 2 of 14
subgroups 1 and 2 that were acquired along the Catar-
rhini evolutionary lineage approximately between 40 and
20 MYa, after the lineages separation from parvorder
Platyrrhini [44].
In order to further characterize the HERV-W group
throughout primate evolution, we investigated HERV-W
homologous sequences in primate species with publicly
available genome assemblies (Fig. 1). In particular, we i)
analyzed the HERV-W loci non-human orthologs, as well
as the additional species-specific ERV-W sequences lacking
orthologs in humans, in the genome sequences of 5 Catar-
rhini species, specifically Rhesus Macaque and 4 great apes
(Gibbon, Orangutan, Gorilla and Chimpanzee); ii) identi-
fied and characterized ERV elements closely related to
ERV-W, named ERV11inRepBase,inPlatyrrhini species
Marmoset and Squirrel Monkey (family Cebidae); iii)
found support for the presence of such ERV-W related
elements also in Spider Monkey and Red-bellied Titi spe-
cies, belonging to the other Platyrrhini families (Atelidae
and Pitheciidae, respectively); and iv) corroborated the lack
of (H)ERV-W closely related elements in Tarsiiformes and
in the more primitive Prosimians (including Lemuriformes
and Lorisiformes).
Taken together, our findings provide a detailed descrip-
tion of the ERV-W sequences presence and distribution
within primate genomes, and further depict the group
evolutionary history in various primate lineages. Import-
antly, comparative analyses allowed us to characterize
ERV-W species-specific insertions in Catarrhini primates,
further detailing the groups dynamics while colonizing
primate genomes. Moreover, hitherto unreported ERV ele-
ments closely related to ERV-W in Platyrrhini species
provided important insights into putative ancestral se-
quence contributions.
Results
Comparative analysis of HERV-W orthologous loci in
Catarrhini primates genome sequences
Subsequent to our recent characterization of 213 HERV-
W loci in the human genome assembly hg19 [44], we
now analyzed in detail the presence/absence of ortholo-
gous loci in the genome sequences of non-human pri-
mate species. For the sake of simplicity, we will refer to
the respective non-human primate sequences as ERV-W,
in order to distinguish them from the human (HERV-
W) sequences. Making use of homologous genome re-
gions and annotations provided by UCSC Genome
Browser [4749], the presence of HERV-W-orthologous
ERV-W loci was examined in the genome sequences of
Rhesus Macaque, Gibbon, Orangutan, Gorilla and Chim-
panzee, by comparison of the respective ERV-W loci. To
properly verify the presence of each ERV-W locus, we
dedicated particular attention on nucleotide sequence
similarity of the genomic regions flanking its insertion
site. Of note, since no comparable sequence information
was available for 2 HERV-W loci on chromosome Y, ex-
cept for Chimpanzee, in our investigation we considered
the remaining 211 HERV-W loci.
Our analysis generated an exhaustive comparative map of
orthologous ERV-W insertions (Additional file 1:TableS1).
Fig. 1 Schematic of the phylogeny of the primate species analyzed in this study. Presence of (H)ERV-W or (H)ERV-W-related sequences in
respective species is indicated with a filled or an empty circle, respectively. Primatesparvorders and infraorders are indicated in italics and bold,
respectively. Estimated ages of divergences of evolutionary lineages in millions of years ago are given near tree nodes and were taken from
Steiper and Young 2006 [39] (first number) and Perelman et al. 2011 [40] (second number). Species marked with an * lack assembled reference
genome sequences
Grandi et al. BMC Evolutionary Biology (2018) 18:6 Page 3 of 14
Analysis of Hominoidea species Chimpanzee, Gorilla and
Orangutan genome sequences revealed an overall number
of orthologous ERV-W loci comparable to the one ob-
served in human genome assembly GRCh37/hg19 [44],
while Gibbon and Rhesus genome sequences harbored a
lower number of orthologous ERV-W loci (Table 1). The
absence of an entire ERV-W insertion in some primates
could be due to an integration having occurred after the
separation of the respective evolutionary lineages, thus
providing direct information on the time period of germ
line colonization. It could however also depend on dele-
tions, rearrangements, errors in genome sequence assem-
blies or in their comparative analysis, particularly for
primate species with less complete assemblies.
Based on our analysis, 123 out of 211 (H)ERV-W loci
are actually shared by all analyzed Catarrhini primates,
from human to Rhesus. However, when considering also
the (H)ERV-W loci found in Rhesus and human but
apparently absent in some intermediate primates (see
above), the number of shared ERV-W loci increases to
131/211 (Fig. 2). Those findings corroborate the view
that the first and major wave of ERV-W loci formation
occurred between 43 and 30 MYa, after the separation of
Catarrhini and Platyrrhini, but before the divergence of
Rhesus lineage from Hominoidea, in line with previously
reported integration periods [44,46,50]. In addition to
this first wave of formation, a total of 80 HERV-W loci
was lacking an ortholog in Rhesus, but had orthologs
only in subsequent Hominoidea species, suggesting the
integration of about 66 novel HERV-W loci less than 30
MYa. Differently, relatively few insertions (14) likely
occurred later on, between 20 and 17 MYa (Fig. 2).
Overall, (H)ERV-W insertions comparison in primate
genome sequences indicated that the ERV-W group
formed new loci throughout an extended period of time
during evolution, due to both novel proviral integrations
(n= 63) and L1-mediated processed pseudogene forma-
tions (n= 133). In particular, > 90% of ERV-W orthologs
were acquired by Rhesus (n= 131) and Gibbon (n=65),
approximately between 43 and 20 MYa, showing in both
species a 2:1 ratio of processed pseudogenes relative to
proviruses. These data indicate that ERV-W processed
pseudogene formation occurred during considerable ex-
tent of time, also implying that ERV-W transcripts serving
as templates for L1 retrotransposition must have been
present in the germ line during that period. A pronounced
decline in ERV-W locus formation was then observed in
Orangutan, with 8 and 2 novel ERV-W processed pseudo-
genes and proviruses, respectively; as well as in Gorilla,
harboring 3 novel ERV-W processed pseudogenes and no
new proviral integration. This suggests that L1-mediated
formation of ERV-W loci occurred for an extended period
of time when compared to true provirus formations, and
also at significant extent in more recent primate lineages.
Of further note, no new formations of ERV-W loci were
observed in Chimpanzee, while a HERV-W locus in
chromosome 12q13.3 appeared to be human-specific
because of an empty site in the orthologous genome
regions of all non-human Catarrhini primates, thus pos-
sibly suggesting that an HERV-W insertion has occurred
less than 7 MYa [39,40]. However, the human-specificity
of this sequence is uncertain due to the overall highly
mutated structure of the locus and the lack of LTRs,
making sequence divergence-based age estimation very
unreliable [44].
Analysis of ERV-W sequences identified by sequence
similarity searches in non-human Catarrhini identifies
species-specific insertions
The above comparative analysis revealed an extended
period of ERV-W loci formation throughout primates
Table 1 Number of orthologous HERV-W loci in the analyzed
Catarrhini primate genome sequences
Chimpanzee Gorilla Orangutan Gibbon Rhesus
ERV-W loci
orthologous
to human 211
a
HERV-W elements
205 207 205 190 131
a
no reliable sequence information was available for two HERV-W loci in human
chromosome Y (see text)
Fig. 2 Initial formation of 211 HERV-W loci based on respective
orthologs in Catarrhini primate reference genomes. The number of
orthologs to HERV-W loci initially formed in a particular primate
species is given for each species for proviruses, L1-retrotransposed
processed pseudogene and undefined elements (see text for more
details). For instance, 20 HERV-W loci were initially formed in the
common ancestor of human and Gibbon, and 8 HERV-W processed
pseudogenes were formed in the common ancestor of human and
Orangutan. Note that the majority of HERV-W loci was initially
formed in the common ancestor of human and Rhesus and is thus
common to all Catarrhini genomes. Approximate time periods of last
common ancestors of Catarrhini primate lineages are given in millions
of years ago (MYa) below species names
Grandi et al. BMC Evolutionary Biology (2018) 18:6 Page 4 of 14
evolution, with evidently 80 novel insertions since the
separation of Gibbon and human lineages. Thus, such
an extended time period of ERV-W activity could likely
have also resulted in species-specific insertions outside
of the human evolutionary lineage, therefore lacking an
orthologous locus in humans. To identify potential
species-specific ERV-W insertions, we performed UCSC
Genome Browser BLAT searches of Catarrhini primates
genome sequences by using the assembled LTR17-
HERV17-LTR17 RepBase HERV-W reference as a query.
It is worth noting that this BLAT search approach iden-
tified a lower overall number of ERV-W loci in each
non-human Catarrhini primate, suggesting that a
proportion of ERV-W elements were not effectively
detected (Table 2).
We further investigated those different outcomes by
comparing the orthologous ERV-W loci retrieved by both
approaches with the additional ones retrieved by BLAT
searches only. Results showed that only 5367% of the
ERV-W orthologs (Table 1) were effectively identified by
BLAT searches (Table 2, first row). The remaining BLAT-
identified ERV-W loci could be explained by three corre-
sponding states in the human GRCh37/hg19 assembly: i)
presence of a HERV-W solitary LTR (Table 2, row 2); ii)
presence of HERV-W-like elements with somewhat lesser
identity (~ 63% on average) to HERV17 (Table 2, row 3);
iii) complete absence of HERV-W or HERV-W-like
sequence (Table 2, row 4). Each of those three conditions
was analyzed separately and results are described in the
followings.
i. ERV-W BLAT-identified sequences being solitary
LTRs in the human genome. In 19 instances, a
solitary LTR annotated as LTR17 was found at the
orthologous position in the human reference
genome (Table 2, row 2, and Additional file 1: Table
S2), suggesting a previous event of LTR-LTR
homologous recombination that eliminated the
internal portion and one LTR [51] from ERV-W
proviral integrations that had occurred either in
Rhesus (14) or Gibbon (5), in line with the groups
main period of germ line colonization. None of the
solitary or corresponding proviral LTRs showed
signatures of processed pseudogenes, that likely
would have prohibited homologous recombination
due to relatively short homologous sequences within
remaining 5and 3LTR portions.
ii. ERV-W BLAT-identified sequences corresponding to
HERV-W-like elements with lesser identity to
HERV17. The here reported lower scoring HERV-
W-like elements (Table 2, row 3; Additional file 1:
Table S3) had not been identified as HERV-W loci
by BLAT searches in our recent characterization of
the group in the human genome [44]. A closer
inspection of RepeatMasker annotations revealed
that some of those loci were composed of stretches
of other Gammaretrovirus-like HERVs (γHERVs)
(such as LTR12F flanking HERV9, HERV30 and
HERVIP10FH internal portions) in human genome
sequence, while they were annotated as HERV17 in
non-human primates. Also, some of these loci were
previously identified as non-canonical HERV9
elements, which are in fact closely related to the
HERV-W group [38].
Interestingly, ~ two-thirds of the HERV-W-like loci
are present at orthologous positions ranging from
Rhesus to human, having thus been likely formed
during the main period of the (H)ERV-W groups
activity. The remaining (H)ERV-W-like elements
presumably entered primate genomes only in the
evolutionarily separated lineages leading to Gibbon
(3), Orangutan (2), and Gorilla (2), while no novel
elements were observed for Chimpanzee, as already
observed for HERV-W orthologous loci.
iii. ERV-W BLAT-identified sequences lacking an ortholog
in humans. A number of ERV-W loci identified by
BLAT searches in non-human Catarrhini species
lacked orthologous loci in the human genome (Table
Table 2 Numbers and orthologs of ERV-W sequences identified by HERV17 BLAT searches in Catarrhini primate genome sequences
Chimpanzee Gorilla Orangutan Gibbon Rhesus
1) ERV-W loci with HERV-W orthologs in human genome 138 (67%) 132 (64%) 122 (60%) 111(58%) 69 (53%)
2) ERV-W loci corresponding to human solitary LTRs (n= 19) 1 (17) 1 (17) 7 (10) 10* (8) 14* (0)
3) ERV-W loci present in human as non-canonical HERV-W (like) 29 27 24 21 20
4) ERV-W loci lacking an ortholog in human 3 (3) 5 (4) 8 (6) 4 (2) 68 (66)
TOTAL 171 165 160 145 168
1) Number of ERV-W elements with an orthologous locus among the 211 HERV-W loci: respective percentages are given in parenthesis. Two HERV-W loci on human
chromosome Y were excluded from the analysis (see text)
2) Numbers of ERV-W elements corresponding to a solitary LTR at the orthologous human position. Numbers in parenthesis indicate the proviral insertions acquired in
evolutionarily older primate species that were likewise a solitary LTR in the non-human primates analyzed. *indicates species with initial formations of proviruses that
recombined to solitary LTRs in subsequent primate species: Gibbon (5) and Rhesus (14)
3) Numbers of ERV-W elements with an ortholog in the human reference genome sequence, yet being less similar to HERV-W. Those sequences were not identified as
HERV-W elements in a previous analysis [68]
4) ERV-W loci absent in the orthologous human genome positions. Numbers in parenthesis indicate the proportion of species-specific insertions
Grandi et al. BMC Evolutionary Biology (2018) 18:6 Page 5 of 14
2, row 4 and Additional file 1: Table S4). In theory,
such ERV-W loci may have formed species- or
lineage-specifically, and thus they could also provide
information on the ERV-W groupstimeperiod(s)of
activity (Fig. 1). Interestingly, the great majority (81/88)
of these ERV-W sequences are actually species-specific
insertions (Additional file 1: Table S4), also suggesting
an extended period of ERV-W germ line colonization
in primates. In particular, 77% of ERV-W insertions in
Rhesus appeared to be absent in humans, with still 66/
68 species-specific elements when compared to non--
human primate species more closely related. This fur-
ther indicates that the main period of ERV-W activity
ranges from 43 MYa to < 20 MYa, with a greater num-
ber of Rhesus-specific ERV-W acquisitions after the
separation of its evolutionary lineage. The other non-
human Catarrhini primates likewise showed some evi-
dence for ERV-W insertions lacking a human ortholog:
4lociinGibbon(2species-specific);8lociinOrangu-
tan (6 species-specific); 5 loci in Gorilla (4 species-
specific) and 3 in Chimp, (all species-specific) (Table 2,
row 4 and Additional file 1:TableS4).
Also noteworthy, Rhesus and Gorilla showed 15 and
1 new proviruses, respectively, suggesting that the
ERV-W species-specific colonization has in part
been due to either intracellular provirus formations
or re-infections, likely hinting at sporadic acquisition
of novel elements during the recent 105 MY.
Similarly, species-specific formations of ERV-W
processed pseudogenes in Rhesus (24), Orangutan
(3), Gorilla (1) and Chimpanzee (1) further suggest
that L1 retrotransposition of ERV-W transcripts has
also been ongoing for considerable time periods
outside of the human lineage, approximately
between 43 and 5 MYa.
Sequences closely related to HERV-W in Platyrrhini (new
world monkeys)
The UCSC Genome Browser BLAT search in Platyrrhini
species Marmoset (Callithrix jacchus) and Squirrel
Monkey (Saimiri boliviensis) did not identify true ERV-W
insertions, confirming that the group spread has been lim-
ited to Catarrhini. However, our searches identified a
group of apparently highly related sequences, indicated as
ERV11_CJa-I and ERV11_CJa-LTR for the internal por-
tion and the 5and 3LTRs, respectively, based on
RepBase annotations. For sake of brevity, those sequences
will be referred to as ERV11.
Sequence similarities of HERV-W and ERV11 were
further examined at the nucleotide level by the compari-
son of representative proviral sequences (Fig. 3). The
pairwise comparison between the ERV11 and HERV-W
RepBase references, assembled as LTR-internal-LTR, re-
vealed an overall 73% sequence identity between internal
portions (~nt 2700 to 7750 in the HERV-W sequence),
albeit a portion of the HERV-W env gene (~nt 7750 to
8570) appeared to be absent in the ERV 11 reference
(Fig. 3a). We further investigated ERV11 sequences by
retrieving reasonably complete ERV11 proviruses,
based on chromosome coordinates obtained from BLAT
searches plus 5 kb of upstream and downstream flanking
sequence each. The collected ERV11 sequences were
analyzed for the presence of 5and 3LTRs, and the
actual complete ERV11 proviruses from Marmoset (59)
and Squirrel Monkey (71) assemblies were used to gen-
erate two species-specific multiple alignments and, sub-
sequently, two majority rule-based consensus sequences,
named ERV11_CalJac_PVconsensus and ERV11_Sai-
Bol_PVconsensus, respectively (Additional file 2). Those
consensus sequences were subjected to dot-plot com-
parison and pairwise alignment to assess differences
between the ERV11 groups in the two NWM species
(Fig. 3b). Since the two consensus sequences showed
98% overall identity, the ERV11_CalJac proviral con-
sensus was chosen as representative for both species for
subsequent analysis. Comparison of ERV11_CalJac pro-
viral consensus with the HERV-W RepBase reference
(Fig. 3c) and the HERV-W consensus previously built
from the human proviral dataset [44] (Fig. 3d) revealed
that the above mentioned env portion was not repre-
sented in the ERV11RepBase reference due to a larger
deletion within the concerned env gene region in the
majority of ERV11 sequences, similar to a recurrent
structural variant in approximately 80% of HERV-W ele-
ments [44]. Inclusion of this often-missing env portion
in the ERV11_CalJac proviral consensus sequence thus
confirmed the high sequence identity with HERV-W
along the full-length env gene. Interestingly, the compar-
isons showed that ERV11 sequences also harbor a
so-called pre-gagregion between the 5LTR and the
gag gene, as previously reported for HERV-W elements
(~nt 800 to 2700 in LTR17-HERV17-LTR17) [44]. Of
further note, contrary to the proviral internal portion,
ERV11 LTRs did not show pronounced similarity
(overall 34%) to either the LTR17 RepBase sequence or
the proviral HERV-W LTR consensus. Accordingly,
BLAT searches did not identify sequences resembling
LTR17 in Marmoset or Squirrel Monkey genomes.
Presence of ERV-W related elements in other NWM
families
To the best of our knowledge, unlike Marmoset and Squir-
rel Monkey, no genome sequence assemblies are available
for the other two Platyrrhini families, Atelidae and Pithecii-
dae. We therefore performed BLAST searches of unassem-
bled sequences of Spider Monkey (Ateles geoffroyi,Atelidae
family) and Red-bellied Titi (Callicebus moloch,Pitheciidae
family) available in the NCBI Trace Archive database, using
Grandi et al. BMC Evolutionary Biology (2018) 18:6 Page 6 of 14
both LTR17-HERV17-LTR17 and ERV11_CalJac proviral
consensus sequence as queries. Results confirmed the pres-
ence of ERV11 elements highly related to ERV-W internal
portion also in these two NWM families (data not shown).
Absence of elements closely related to ERV-W in Tarsiiformes
and Prosimians
To complete our search for ERV-W-related sequences, we
performed BLAT searches in UCSC Genome Browser
assemblies of species representative for Tarsiiformes,i.e.Tar-
sier (Tarsius syrichta), and Prosimians,i.e.Bushbaby(Otole-
mur garnettii) and Mouse Lemur (Microcebus murinus).
Only short matches with insignificant scores were retrieved,
indicating the absence of ERV-W-related elements in those
species (data not shown) and further confirming that their
spread took place after the evolutionary separation of
Prosimians and Simiiformes, occurred ~ 60 MYa [39,40].
Analysis of retroviral puteins corroborate close
relationship of ERV11 with the ERV-W group
To further characterize sequence relationships between
ERV11 and ERV-W groups, we analyzed their
phylogeny with respect to other endogenous and
exogenous Gammaretroviruses [38,52] at the amino
acid level, by using Maximum Likelihood (ML) analysis
of Gag, Pol and Env putative proteins (puteins) (Fig. 4).
To this aim, ERV11 ORFs were identified in Marmoset
and Squirrel Monkey ERV11 proviral consensus se-
quences by the software RetroTector [37], reconstruct-
ing the amino acid sequences of encoded retroviral
puteins. Subsequent ML analysis revealed that both
ERV11 Pol and Env puteins were most closely related
to the HERV-W puteins, further demonstrating a strong
evolutionary relationship between those groups. A less
pronounced relationship was found for the Gag putein
(Fig. 4), even if ERV11 Gag sequence was one of the
best hit identified by RetroTector for HERV-W Gag
recognition [38]. It is interesting to note that, even if
HERV-W appears to be a closer relative to ERV11,
ERV11 puteins clustered also with other
Gammaretrovirus-like families known to be related to
HERV-W, such as HERV9 and HERV30, possibly further
hinting towards a common evolutionary origin of all
those (H)ERV groups.
a
cd
b
Fig. 3 Pairwise nucleotide sequence comparisons depicting sequence similarities between HERV-W and ERV11 groups. Reference sequences and consensus
sequences were compared with each other as follows. aCallithrix jaccus ERV11 RepBase sequence and HERV-W RepBase sequence; bCallithrix jaccus and
Saimiri boliviensis ERV11 proviral consensus sequences as generated in this paper; cCallithrix jaccus ERV11 proviral consensus as generated in this paper and
HERV-W RepBase reference sequence; dCallithrix jaccus ERV11 proviral consensus sequence as generated in this paper and a HERV-W proviral consensus as
reported recently [44]. Sequence similarities in dot-plot comparisons are highlighted for sequence regions with at least 50% similarity along a 100 nucleotides
sequence window. Proviral gene and LTR regions are depicted
Grandi et al. BMC Evolutionary Biology (2018) 18:6 Page 7 of 14
Phylogeny and ERV11 sequence relationships with
human solitary LTRs and HERV-W-like elements and with
Catarrhini ERV-W elements without human orthologs
To further characterize the elements identified by BLAT
searches in the Catarrhini non-human primate genomes
and lacking orthologs in humans, the above mentioned
three subsets of sequences were compared with the
consensus sequences generated for HERV-W [44] and
ERV11 and the reference sequences of other γHERVs
as provided by RepBase.
i) ERV-W BLAT-identified sequences being solitary
LTRs in human. ML phylogenetic analysis of human
solitary LTRs derived from ERV-W proviral insertions
in Rhesus (14) and Gibbon (5) confirmed that they
belong to the HERV-W group, clustering with the
LTR17 consensus (100% bootstrap support) and being
clearly separated from all other γHERV sequences
(Additional file 3).
ii) ERV-W BLAT-identified sequences corresponding to
HERV-W-like elements with lesser identity to
HERV17. ML phylogenetic analysis of HERV-W-like
elements with lower nucleotide identity to HERV17
revealed three clusters of sequences with reasonable
bootstrap support: cluster I, 96%; cluster II, 100%;
cluster III, 70% (Additional file 4). These three clusters
were separated from the other γHERVs with a 96%
bootstrap support and included 24 out of 29 HERV-
W-like sequences as well as HERV-W, HERV9,
HERV30 and ERV11 references. Cluster I elements
were most related to HERV-W, while cluster II
sequences showed closer relationships to HERV9 and
HERV30 (Additional file 4). In accord, RepeatMasker
analysis (Additional file 1: Table S3) confirmed that
cluster I members were annotated exclusively as
HERV17. Cluster II members included elements
structurally related to HERV17 and, in one case,
HERV30 in the internal portions, yet harboring
LTR12F (the HERV9 LTR in RepBase) as LTR type.
Cluster III members were indeed only remotely
related to the other HERV-W-like elements
(bootstrap support = 52), being clearly separated from
γHERVs (Additional file 4). RepeatMasker analysis,
however, identified these sequences either as LTR17
Fig. 4 Phylogenetic analysis of ERV11 Gag, Pol and Env puteins. ERV11 puteins, labeled with an empty triangle, were obtained by identification
and conceptual translation of Marmoset ERV11 proviral consensus sequence Open Reading Frames (see methods). The other Gammaretroviral
putein sequences were retrieved from Vargiu et al. 2016 [38]. HERV-W puteins are marked with a filled triangle. The evolutionary relationships
were inferred by using the ML method based on the Poisson model. Phylogenies were tested by using the bootstrap method with 100 replicates
each: the obtained bootstrap values are reported near each node (bootstrap values lower than 30% are not shown). Length of branches indicates
the number of substitutions per site
Grandi et al. BMC Evolutionary Biology (2018) 18:6 Page 8 of 14
and HERV17 or as other related γHERVs (HERV9,
HERV30, HERVH, HERVIP10FH) (Additional file 1:
Table S3). Overall, these results demonstrated closer
relationships, yet of different degrees, of HERV-W-like
elements with HERV-W, HERV9, HERV30 and
ERV11.
iii)ERV-W BLAT-identified sequences lacking an ortholog
in human. To verify the phylogeny of Catarrhini
ERV-W sequences lacking an ortholog in humans
with respect to the other γHERV sequences, Chimp,
Gorilla, Orangutan and Gibbon full-length sequences
were analyzed separately (Fig. 5) from Rhesus
ERV-W sequences, whose phylogeny was inferred
considering the pol gene only because of the relatively
high number of elements (Additional file 5).
All ERV-W sequences identified in Chimpanzee, Gorilla,
Orangutan and Gibbon grouped with the HERV-W
consensus (82% bootstrap) and were furthermore closely
related to ERV11 (78% bootstrap) followed by HERV9
and HERV30 (Fig. 5). A single sequence retrieved from
Gibbon (chr20:58,589,53958,590,163) displayed a rather
weakly supported (64%) relationship with MER57.
The separately analyzed Rhesus ERV-W pol sequences
likewise formed a well-supported (90%) cluster with
HERV-W (Additional file 5). That phylogenetic clade
was likewise related to HERV9 and HERV30 with high
bootstrap supports (99%). Six Rhesus ERV-W sequences
were instead located outside of that cluster. Those
sequencesactual nature was further examined by com-
paring their full-length nucleotide sequences to a subset
of γHERV reference sequences by EMBOSS polydot ana-
lysis (Additional file 6). Particularly, a sequence related
to MER57 in ML tree (chr4:4,004,5564,011,519; 64%
bootstrap) shared longer stretches of identity exclusively
with the HERV-W consensus sequence. Four other
sequences that clustered together with 100% bootstrap
support and were furthermore weakly related to HERV-
H (31% bootstrap) displayed longer stretches of similar-
ity with both HERV-W and HERV-H consensus
sequences, possibly representing non-canonical mosaic
forms. Another sequence forming a separate branch in
ML tree (chr1:51,551,81151,557,699) did not show
appreciable similarity to any of the γHERV sequences
(Additional file 6).
Taken together, phylogenetic analysis confirmed the
ERV-W nature of almost all the retrieved ERV-W-like
elements without human orthologs in non-human Cat-
arrhini species as well as the independent spread of
true(H)ERV-W elements in Rhesus later in primate
evolution.
Discussion
Following up on our recent characterization of the
HERV-W group in the human genome [44], the present
work aimed to analyze the ERV-W elements integrated
in genome sequences of non-human primates, to pro-
vide a complete and definitive depiction of the group
Fig. 5 Phylogenetic analysis of Chimpanzee, Gorilla, Orangutan and Gibbon ERV-W nucleotide sequences lacking an ortholog in the human genome.
Gammaretrovirus-like HERV reference sequences were retrieved from RepBase. For the HERV-W group, both RepBase reference and the consensus
sequences generated previously from the proviral dataset [44] were included and marked with a filled square. The ERV11 reference sequence from
RepBase and the consensus generated from the proviral sequences dataset in this study are marked with an empty square. Evolutionary relationships
were inferred by using the ML method and the Kimura-2-parameter model. The resulting phylogeny was tested using the bootstrap method with 100
replicates: the obtained bootstrap values are reported near each node (bootstrap values lower than 30% are not shown). Length of branches indicates
the number of substitutions per site
Grandi et al. BMC Evolutionary Biology (2018) 18:6 Page 9 of 14
spread during primates evolution. A number of studies,
in fact, suggested that the initial ERV-W colonization of
primates germ line had occurred in Catarrhini after
their evolutionary separation from Platyrrhini, i.e. < 40
MYa, based on results from HERV-W pol PCR [53] and
Southern Blot [50] analysis of different non-human
primates samples, or from the nucleotide divergence
between HERV-W subfamilies [46]. Such results were
supported by the absence of ERV-W sequences in Pla-
tyrrhini and Prosimians [46,50,53]. One of these works
reported, in addition, the presence of solitary ERV-W
LTRs also in three Platyrrhini species based on PCR re-
sults, suggesting that ERV-W LTR acquisition occurred
approximately 55 MYa [53]. Overall, the previously avail-
able information suggests that the first (H)ERV-W
proviral acquisitions occurred around 25 MYa, and the
group as a whole formed during a rather short period of
activity (~ 5 MY) [46,50,54]. Such relatively low prolif-
eration rate had been explained by the abundance of
HERV-W L1-processed pseudogenes, being proliferation-
incompetent due to the lack of 5LTR U3 a nd 3 LTR U5
regions [46].
Our detailed analysis of primate genome sequences
provided the definitive support that the ERV-W group is
present exclusively in Catarrhini primates. However, our
searches for ERV-W orthologous loci in the genomes of
Hominoids and OWMs revealed that the group prolifer-
ated for an extended time period, with novel locus for-
mations having occurred approximately between 43 and
20 MYa, in line with recent age estimates of single
HERV-W sequences [44]. Interestingly, a 2:1 ratio of L1-
mediated processed pseudogene formations relative to
trueprovirus formations was observed in Rhesus and
Gibbon, suggesting that a quite massive formation of
ERV-W processed pseudogenes likewise occurred during
an extended time period. Similarly, ERV-W processed
pseudogenes were the main source of additional ERV-W
locus acquisitions also in Orangutan and Gorilla.
The spread of the ERV-W group within the parvorder
Catarrhini was further investigated through BLAT
searches at the UCSC Genome Browser, using the
RepBase HERV17 reference sequence as a query. That
strategy identified 4 ERV-W loci in Gibbon and 15 in
Rhesus that were likely formed between 43 and 20 MYa
and were present in the human genome only as solitary
LTRs. BLAT searches furthermore identified 29 ERV-W-
like elements with somewhat lower similarities to
HERV-W, mostly present in the Rhesus genome but also
found in Gibbon (3), Orangutan (2) and Gorilla (2).
In support of a longer time period of ERV-W locus
formations, some ERV-W loci in non-human primates
appeared to be species-specific and thus lack orthologs
in the other species. In particular, we identified 88 ERV-
W loci with corresponding empty sites in the human
genome, 81 of which could be interpreted as species-
specific insertions in respective primates: 66 in Rhesus, 2
in Gibbon, 6 in Orangutan, 4 in Gorilla, and 3 in Chim-
panzee. The latter further indicate lineage-specific for-
mations of ERV-W loci less than 10 MYa. Importantly,
species-specific acquisition of ERV-W loci occurred by
both full-length proviruses and L1-mediated processed
pseudogenes formation. It should be stressed here that
our analysis of (orthologous) ERV-W loci present (or ab-
sent) in the various available primate genome sequences
relies on comparative genomics data as provided by the
UCSC Genome Browser [49,55] and required a mini-
mum of 500 nt of upstream and downstream flanking
sequences to ensure analysis of truly homologous
genome regions. While some of the observed differences
in orthologous ERV-W loci may be due to errors in
genome sequence assemblies or (b)lastz alignments, it
appears that only a minority of loci are associated with,
or in close proximity to, for instance, gaps in assembled
genome sequences.
Taken together, our comparative analysis of primate
genome sequences thus provides a detailed evolutionary
history of (H)ERV-W sequences and their spread during
Catarrhini evolution, corroborating an extended period
of ERV-W locus formations, having peaked between ~
42 and 30 MYa, and providing sporadic, species or
lineage-specific ERV-W locus formations until < 10
MYa, confirming the absence of ERV-W sequences in
NWMs regarding neither gene regions nor LTRs.
Of note, our sequence searches identified an ERV
group closely related to ERV-W, named ERV11_CJa in
RepBase. Because of the lack of an established ERV
nomenclature, we designated those sequences as ERV1
1. A total of 130 ERV11 loci were identified in the
genomes of Marmoset (59) and Squirrel Monkey (71),
and searches of unassembled genome sequence data fur-
thermore indicated the presence of ERV11 sequences
in species belonging to all the three Platyrrhini families.
However, there was no evidence of ERV11 sequences
in Tarsiiformes and Prosimians, indicating that their
formation in the respective primate lineage occurred <
60 MYa based on estimated times of separations of
respective lineages [39,40]. Also noteworthy, despite the
remarkable identity along the proviral internal portion,
none of the ERV11 loci showed signatures of processed
pseudogenes, as it is the case for many (H)ERV-W loci
[45,46], suggesting a central role of LTRs in L1-
recognition and retrotransposition of (H)ERV-W tran-
scripts. The established close sequence relationships at
both nucleotide and amino acid level suggest that
ERV11 and (H)ERV-W could derive from a common
ancestor, possibly also involving related groups such as
HERV9 and HERV30. As mentioned above, such closer
sequence relationships do not apply to the ERV11
Grandi et al. BMC Evolutionary Biology (2018) 18:6 Page 10 of 14
LTRs, that appear very different in sequence from
(H)ERV-W LTRs. This is however in line with previous
observations in other ERV groups for which different
paths of evolution were taken by the proviral body and
the LTR sequences, resulting in different LTR subgroups
associated with otherwise monophyletic proviral bodies
(for instance, see [56,57]) and possibly leading to retro-
viral chimeras formation [38].
Given the relatively recent availability of many
eukaryotic genome sequences and new bioinformatics
tools, the field of paleovirology is currently emerging. In
this view, ERVs may have a central role in understanding
the evolution of both host and virus. Regarding host evo-
lution, as described in the introduction, ERVs significantly
contributed to the host genome shaping by introducing
genetic variation and novel functions. In addition, as it has
been shown in the case of retroviruses with an ongoing
process of endogenization, such as the Koala retrovirus
(KoRV) [34], there is a complex dynamics of retroviral/
host evolution suggesting that ERV acquisition may be an
effective defence strategy against exogenous viral patho-
genic infections [58]. Hence, the present study set the
basis for further analysis of the role of specific ERV-W se-
quences in primates, providing for the first time exhaust-
ive information regarding both the individual loci shared
by different species and the ones acquired exclusively by
one of them. Regarding viral evolution, our results showed
unprecedented similarities between ERV-W and ERV11
sequences, providing unreported insights on their evolu-
tion and describing in greater detail the dynamics of the
ERV-W groups spread regarding ancient orthologous
insertions that are shared by primates including human,
as well as species-specific ERV-W locus formed in non-
human primates. Those findings, combined with a reason-
ably accurate estimation of the times of integration
through a combined approach, now provides a complete
overview of the ERV-W groups colonization of primate
genomes and may allow to better understand the complex
history of acquisition, cross-species transmission and
clade-specific amplification that have been shaped by host,
viral, and ecological factors [59].
Our study leaves also room for some speculations that
deserve further investigation. For example, the fact that
the majority of ERV-W sequences are shared by all the
analysed primates might suggest a relevant role of the
ERV-W group in the ancestral infected population, that
could possibly has been favoured in bottleneck events by
the protection against deleterious exogenous infections,
as seen for KoRV, or some other advantages. Similarly,
the species-specific insertions could instead have pro-
vided, at least temporarily, specific advantages for those
species and lineages.
It is also worth mentioning that ERV-W locus acquisi-
tions in primates by L1-mediated processed pseudogene
formation during an extended period of time provided
novel insights into the mechanisms of the ERV-W
groups copy number increases and proliferation activity,
further highlighting the special link between ERV-W and
L1 [60,61]. The latter is still poorly understood, espe-
cially regarding the specific molecular determinants that
limited the L1-retroposition to (H)ERV-W transcripts
only, without involving any other (H)ERV groups [43].
Conclusions
The present study offers an exhaustive overview of the
germ line colonization of ERV-W during the evolution
of primates, revealing a rather unexpectedly long period
of activity and several species-specific activation and
providing novel insights on the evolution of the group
and its close unreported relation with NWMs ERV11
elements. It also characterized the contribution of other
human TEs to the spread of ERV-W in primates, point-
ing out that L1-mediated formation of ERV-W processed
pseudogenes was not a secondary phenomenon with
negative impact on the groups proliferation rate, but
instead a parallel and major mechanism of ERV-W locus
formations in all primates genomes.
Methods
Sequence collection
1) HERV-W orthologous ERV-W sequences in non-human
Catarrhini primate genome sequences.
Identification and collection of ERV-W sequences
orthologous to previously characterized HERV-W loci
was done by using information provided by the UCSC
Genome Browser [49,55] for the following non-human
Catarrhini primate genome sequence assemblies:
Chimpanzee (Pan troglodytes, assembly Feb. 2011 -
CSAC 2.1.4/panTro4)
Gorilla (Gorilla gorilla gorilla, assembly May 2011 -
gorGor3.1/gorGor3)
Orangutan (Pongo pygmaeus abelii, assembly July
2007 - WUGSC 2.0.2/ponAbe2)
Gibbon (Nomascus Leucogenys, assembly Oct. 2012 -
GGSC Nleu3.0/nomLeu3)
Rhesus (Macaca mulatta, assembly Oct. 2010 - BGI
CR_1.0/rheMac3)
Comparative analysis of presence or absence of HERV-
W orthologous loci involved examination of a minimum
of 500 nt of 5and 3flanking genomic sequence in
respective primate genome sequences.
2) ERV-W sequences in non-human Catarrhini primate
genome sequences.
Additional ERV-W sequences in non-human Catar-
rhini primate genomes sequence assemblies were identi-
fied by BLAT searches [62] at the UCSC Genome
Grandi et al. BMC Evolutionary Biology (2018) 18:6 Page 11 of 14
Browser [49,55] using an assembled sequence consisting
of LTR17-HERV17-LTR17 as provided by RepBase [63]
as a query. The so identified ERV-W loci were mapped
to the human genome to investigate the presence of
orthologous elements, by using UCSC Genome Browser
comparative genomics, as described above. Absence of a
HERV-W sequence in an orthologous genome region was
concluded when no HERV-W sequences were found by
BLAT searches using HERV17 and the ERV-W nucleotide
sequence from the respective orthologous primate genome
region (including flanking genomic regions) as queries.
3) ERV-W-related ERV11 sequences in Platyrrhini
primate genome sequences.
ERV-W-related ERV11 elements were identified by a
UCSC Genome Browser BLAT search, using the
RepBase HERV17 sequence as a query, in the following
Platyrrhini primates (family Cebidae):
Marmoset (Callithrix jaccus, assembly March 2009 -
WUGSC 3.2/calJac3)
Squirrel Monkey (Saimiri boliviensis, assembly Oct.
2011 - Broad/saiBol1)
ERV11 sequences were retrieved including 500 nucleo-
tides 5and 3flankings, and proviruses with relatively
intact LTRs based on pairwise dot-plot comparison were
selected for subsequent analysis.
Since no assembled genomes sequences were available
for representative members of the other two Platyrrhini
families, i.e. Atelidae and Pitheciidae, the presence of
ERV-W-related elements was assessed by BLAST
searches of unassembled genomic sequence data avail-
able from the NCBI Trace Archive database (https://tra-
ce.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?) for:
Spider Monkey (Ateles geoffroyi,Atelidae family),
Red-bellied Titi (Callicebus moloch,Pitheciidae
family)
using LTR17-HERV17-LTR17 and a majority-rule ERV1
1 consensus (Additional file 2) as queries.
4) ERV-W-like sequences in Tarsiiformes and Prosim-
ian genome sequences.
ERV-W-like elements were searched by UCSC Gen-
ome Browser BLAT using LTR17-HERV17-LTR17 and a
majority-rule ERV11 consensus (Additional file 2)as
queries in the following species:
Tarsier (Tarsius syrichta,Tarsiiformes, assembly Sep.
2013 Tarsius_syrichta-2.0.1/tarSyr2)
Bushbaby (Otolemur garnettii, Lemuriformes,
assembly Mar. 2011 Broad/otoGar3)
Mouse Lemur (Microcebus murinus, Lorisiformes,
assembly Jul. 2007 Broad/micMur1)
Pairwise and multiple alignments of sequences
Multiple alignments of nucleotide and amino acid sequences
were generated by Geneious software, version 8.1.4 [64]
using MAFFT algorithms FFT-NS-i × 1000 or G-INS-I [65]
with default parameters. All multiple alignments were
visually inspected and, when necessary, manually optimized
before subsequent analysis. Sequences pairwise comparisons
were done using the Geneious dot-plot tool Graphical
depictions of alignments were generated with Geneious and
further adapted manually.
Phylogenetic analysis
1) Phylogenetic trees.
All phylogenetic trees were built from manually opti-
mized multiple alignments (see above) by MEGA soft-
ware, version 6 [66] using Maximum Likelihood (ML) or
Neighbor Joining (NJ) methods. For nucleotide align-
ments: ML trees were built using the Kimura 2-parameter
model, and phylogenies were tested by the bootstrap
method with 100 replicates. For amino acid alignments:
ML trees were built using the Poisson correction model,
and phylogenies were tested by the bootstrap method with
100 replicates; while NJ trees were built using the Poisson
correction model after applying pairwise deletion of miss-
ing sites, and phylogenies were tested by the bootstrap
method with 1000 replicates.
See figure legends and the manuscript text for further
details on specific phylogenetic analysis.
2) Calculation of pairwise nucleotide distances.
Pairwise divergence between aligned nucleotide sequences
was estimated by MEGA Software, version 6 [66]using
p-distance model and pairwise deletion after removal of
CpG dinucleotides,
ERV11 ORFs and prediction of putative proteins
(puteins)
ERV11 Gag, Pol and Env amino acid sequences were
obtained from the bioinformatics reconstructions of
retroviral ORFs and puteins in a majority-rule ERV11
consensus (Additional file 2), by using i) ReTe online
version (http://retrotector.neuro.uu.se/pub/queue.php?-
show=submit)[67], ii) Geneious software [64] ORF
finder and three-frame translations functions.
Additional files
Additional file 1: Table S1. HERV-W loci in the human reference genome
sequence and ERV-W orthologous sequences in non-human Catarrhini primates
reference genome sequences. Table S2: ERV-W loci in non-human Catarrhini
primate reference genome sequences with a solitary HERV-W LTR at the
orthologous human genome position. Table S3: ERV-W loci in non-human
Catarrhini primates corresponding to HERV-W-like elements with lesser
similarities to HERV17. Table S4: ERV-W loci in non-human Catarrhini primate
genome sequences lacking an ortholog in the human reference genome
sequence. (XLSX 85 kb)
Grandi et al. BMC Evolutionary Biology (2018) 18:6 Page 12 of 14
Additional file 2: ERV11 consensus sequences in FASTA format.
(DOCX 187 kb)
Additional file 3: Phylogenetic analysis of human solitary LTRs orthologous
to ERV-W loci formed in Rhesus or Gibbon. Gammaretrovirus-like HERV LTR
sequences were retrieved from RepBase: the HERV-W group LTR17
reference sequence is marked with a filled square. The ERV11 LTR
consensus were generated from the Marmoset (CalJac) and Squirrel
Monkey (SaiBol) proviral sequence datasets, and are marked with empty
squares. Evolutionary relationships were inferred by using the ML method
and the Kimura-2-parameter model. The resulting phylogeny was tested
using the bootstrap method with 100 replicates: the obtained bootstrap
values are reported near each node (bootstrap values lower than 30% are
not shown). Length of branches indicates the number of substitutions
per site. (PDF 15 kb)
Additional file 4: Phylogenetic analysis of HERV-W-like nucleotide sequences
orthologous to ERV-W loci identified in non-human primates by HERV17 BLAT
searches. Gammaretrovirus-like HERV reference sequences were retrieved
from RepBase. The HERV-W group RepBase LTR17 HERV17 LTR17 reference
sequence and the proviral HERV-W subgroup 1 and 2 consensus sequences
generated previously [44] are marked with a filled square. The ERV11
reference sequence from RepBase and the consensus generated from the
proviral sequence dataset in this study are marked with an empty square.
Evolutionary relationships were inferred by using the ML method and the
Kimura-2-parameter model. The resulting phylogeny was tested using the
bootstrap method with 100 replicates: bootstrap values are reported near
each node (bootstrap values lower than 30% are not shown). Length of
branches indicates the number of substitutions per site. (PDF 20 kb)
Additional file 5: Phylogenetic analysis of pol gene nucleotide sequence
from Rhesus ERV-W loci lacking an ortholog in the human reference
genome. Gammaretrovirus-like HERV pol gene reference sequences were
retrieved from RepBase. The HERV-W group pol sequences from RepBase
reference sequence and the proviral HERV-W consensus sequence
generated previously [44] are marked with a filled square. The ERV11pol
sequences from RepBase reference sequence and the consensus generated
from the ERV11 sequences dataset in this study are marked with an empty
square. Evolutionary relationships were inferred by using the ML method
and the Kimura-2-parameter model. The resulting phylogeny was tested
using the bootstrap method with 100 replicates: bootstrap values are
reported near each node (bootstrap values lower than 30% are not shown).
Length of branches indicates the number of substitutions per site. (PDF 23 kb)
Additional file 6: Polydot pairwise analyses of the 6 Rhesus ERV-W nucleotide
sequences lacking an ortholog in the human reference genome sequence and
showing unclear sequence relationships with other HERV sequences. Analyzed
consensus sequences marked *were generated in this study. Other
sequences were retrieved from RepBase. (PDF 174 kb)
Abbreviations
Env: Envelope; ERVS: Endogenous retroviruses; HERVs: Human endogenous
retroviruses; L1: Long interspersed nuclear elements 1; LTRs: Long terminal
repeats; ML: Maximum likelihood; Mya: Millions of years ago; NWM: New
world monkeys; ORFs: Open reading frames; OWM: Old world monkeys;
TEs: Transposable elements; γHERVs: Gammaretrovirus-like HERVs
Acknowledgements
Not applicable
Funding
Nothing to declare.
Availability of data and materials
All the phylogenetic data presented in this study are deposited in TreeBASE
public repository and are available at the following URL: http://purl.org/
phylo/treebase/phylows/study/TB2:S22051
Authorscontributions
NG performed the analysis and wrote the manuscript. MC participated in the
analysis. JB and JM participated in the analysis and in writing. ET conceived
and coordinated the study and participated in writing. All authors helped in
the editing and read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
PublishersNote
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
Author details
1
Department of Life and Environmental Sciences, University of Cagliari,
Cagliari, Italy.
2
Department of Medical Sciences, Uppsala University, Uppsala,
Sweden.
3
Institute of Human Genetics, University of Saarland, Homburg,
Germany.
4
Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato,
Italy.
Received: 4 April 2017 Accepted: 14 January 2018
References
1. Magiorkinis G, Blanco-Melo D, Belshaw R. The decline of human
endogenous retroviruses: extinction and survival. Retrovirology. 2015;12:8.
2. Biémont CA. Brief history of the status of transposable elements: from junk
DNA to major players in evolution. Genetics. 2010;186:108593.
3. Jern P, Coffin JM. Effects of retroviruses on host genome function. Annu Rev
Genet. 2008;42:70932.
4. Feschotte C, Gilbert C. Endogenous viruses: insights into viral evolution and
impact on host biology. Nat Rev Genet. 2012;13:28396.
5. Suntsova M, Garazha A, Ivanova A, Kaminsky D, Zhavoronkov A, Buzdin A.
Molecular functions of human endogenous retroviruses in health and
disease. Cell Mol Life Sci Springer Basel. 2015;72:365375.
6. Varmus HE. Form and function of retroviral proviruses. Science. 1982;216:81220.
7. Hedges DJ, Deininger PL. Inviting instability: transposable elements, double-
strand breaks, and the maintenance of genome integrity. Mutat Res. 2007;
616:4659.
8. Cordaux R, Batzer MA. The impact of retrotransposons on human genome
evolution. Nat Rev Genet. 2009;10:691703.
9. Kim H-S. Genomic impact, chromosomal distribution and transcriptional
regulation of HERV elements. Mol Cells. 2012;33:53944.
10. Meyer TJ, Rosenkrantz JL, Carbone L, Chavez SL. Endogenous retroviruses:
with us and against us. Front Chem. 2017;5:18.
11. Schön U, Seifarth W, Baust C, Hohenadl C, Erfle V, Leib-Mösch C. Cell
type-specific expression and promoter activity of human endogenous
retroviral long terminal repeats. Virology. 2001;279:28091.
12. Landry J-R, Rouhi A, Medstrand P, Mager DL. The Opitz syndrome gene
Mid1 is transcribed from a human endogenous retroviral promoter. Mol Biol
Evol. 2002;19:193442.
13. Sin H-SS, Huh J-WW, Kim D-SS, Kang DW, Min DS, Kim T-HH, et al.
Transcriptional control of the HERV-H LTR element of the GSDML gene in
human tissues and cancer cells. Arch Virol. 2006;151:198594.
14. Conley AB, Piriyapongsa J, Jordan IK. Retroviral promoters in the human
genome. Bioinformatics. 2008;24:15637.
15. Schön U, Diem O, Leitner L, Günzburg WH, Mager DL, Salmons B, et al.
Human endogenous retroviral long terminal repeat sequences as cell
type-specific promoters in retroviral vectors. J Virol. 2009;83:1264350.
16. Cohen CJ, Lock WM, Mager DL. Endogenous retroviral LTRs as promoters for
human genes: a critical assessment. Gene Elsevier BV. 2009;448:10514.
17. Yu HL, Zhao ZK, Zhu F. The role of human endogenous retroviral long
terminal repeat sequences in human cancer (review). Int J Mol Med. 2013;
32:75562.
18. Friedli M, Trono D. The developmental control of transposable
elements and the evolution of higher species. Annu Rev Cell Dev
Biol. 2015;31:42951.
19. Blond JL, Lavillette D, Cheynet V, Bouton O, Oriol G, Chapel-Fernandes S, et
al. An envelope glycoprotein of the human endogenous retrovirus HERV-W
is expressed in the human placenta and fuses cells expressing the type D
mammalian retrovirus receptor. J Virol. 2000;74:33219.
Grandi et al. BMC Evolutionary Biology (2018) 18:6 Page 13 of 14
20. Mi S, Lee X, Li X, Veldman GM, Finnerty H, Racie L, et al. Syncytin is a
captive retroviral envelope protein involved. Nature. 2000;403:7859.
21. Blaise S, de Parseval N, Bénit L, Heidmann T. Genomewide screening
for fusogenic human endogenous retrovirus envelopes identifies
syncytin 2, a gene conserved on primate evolution. Proc Natl Acad Sci
U S A. 2003;100:130138.
22. Mangeney M, Renard M, Schlecht-Louf G, Bouallaga I, Heidmann O,
Letzelter C, et al. Placental syncytins: genetic disjunction between the
fusogenic and immunosuppressive activity of retroviral envelope proteins.
Proc Natl Acad Sci U S A. 2007;104:205349.
23. Lavialle C, Cornelis G, Dupressoir A, Esnault C, Heidmann O, Vernochet C, et
al. Paleovirology of syncytins, retroviral env genes exapted for a role in
placentation. Philos Trans R Soc Lond Ser B Biol Sci. 2013;368:20120507.
24. Imakawa K, Nakagawa S, Miyazawa T. Baton pass hypothesis: successive
incorporation of unconserved endogenous retroviral genes for placentation
during mammalian evolution. Genes Cells. 2015:20:77188.
25. Chuong EB, Elde NC, Feschotte C. Regulatory evolution of innate
immunity through co-option of endogenous retroviruses. Science (80-).
2016; 351:10831087.
26. Chiappinelli KB, Strissel PL, Desrichard A, Chan T a, Baylin SB,
Correspondence S. Inhibiting DNA Methylation causes an interferon
response in cancer via dsRNA including endogenous retroviruses. Cell.
Elsevier Inc.; 2015; 162:97486.
27. Brodziak A, Zi E, Nowakowska-zajdel E, Kokot T, Klakla K. The role of human
endogenous retroviruses in autoimmune diseases. Med Sci Monit. 2011;18:
808.
28. Kassiotis G. Endogenous retroviruses and the development of cancer. J
Immunol. 2014;192:13439.
29. Zhao M, Wang Z, Yung S, Lu Q. Epigenetic dynamics in immunity and
autoimmunity. Int J Biochem Cell Biol Elsevier Ltd. 2015;67:6574.
30. Trela M, Nelson PN, Rylance PB. The role of molecular mimicry and other
factors in the association of human endogenous retroviruses and
autoimmunity. APMIS. 2016;124:88104.
31. Christensen T. Human endogenous retroviruses in neurologic disease.
APMIS. 2016;124:11626.
32. Voisset C, Weiss RA, Griffiths DJ. Human RNA rumorviruses: the search for
novel human retroviruses in chronic disease. Microbiol Mol Biol Rev. 2008;
72:157196, table of contents.
33. Baillie GJ, van de Lagemaat LN, Baust C, Mager DL. Multiple groups of
endogenous betaretroviruses in mice, rats, and other mammals. J Virol 2004;
78:57845798.
34. Tarlinton RE, Meers J, Young PR. Retroviral invasion of the koala genome.
Nature 2006; 442:7981.
35. Arnaud F, Caporale M, Varela M, Biek R, Chessa B, Alberti A, et al. A paradigm
for virus-host coevolution: sequential counter-adaptations between
endogenous and exogenous retroviruses. PLoS Pathog. 2007;3:e170.
36. International Human Genome Sequencing Consortium. International human
genome sequencing consortium. Finishing the euchromatic sequence of
the human genome. Nature. 2004;431:93145.
37. Sperber G, Airola T, Jern P, Blomberg J. Automated recognition of retroviral
sequences in genomic dataRetroTector. Nucleic Acids Res. 2007;35:496476.
38. Vargiu L, Rodriguez-Tomé P, Sperber GO, Cadeddu M, Grandi N, Blikstad V,
et al. Classification and characterization of human endogenous retroviruses;
mosaic forms are common. Retrovirology. 2016;13:7.
39. Steiper ME, Young NM. Primate molecular divergence dates. Mol
Phylogenet Evol. 2006;41:38494.
40. Perelman P, Johnson WE, Roos C, Seuánez HN, Horvath JE, Moreira MA, et
al. A molecular phylogeny of living primates. PLoS Genet. 2011;7:117.
41. Bannert N, Kurth R. The evolutionary dynamics of human endogenous
retroviral families. Annu Rev Genomics Hum Genet. 2006;7:14973.
42. Blond JL, Besème F, Duret L, Bouton O, Bedin F, Perron H, et al. Molecular
characterization and placental expression of HERV-W, a new human
endogenous retrovirus family. J Virol. 1999;73:117585.
43. Grandi N, Tramontano E. Type W human endogenous retrovirus (HERV-W)
integrations and their mobilization by L1 machinery: contribution to the
human transcriptome and impact on the host physiopathology. Viruses.
2017;9.
44. Grandi N, Cadeddu M, Blomberg J, Tramontano E. Contribution of type
W human endogenous retrovirus to the human genome:
characterization of HERV-W proviral insertions and processed
pseudogenes. Retrovirology. 2016;13:125.
45. Pavlícek A, Paces J, Elleder D. Processed Pseudogenes of human
endogenous retroviruses generated by LINEs: their integration, stability, and
distribution. Genome Res. 2002;12:3919.
46. Costas J. Characterization of the intragenomic spread of the human
endogenous retrovirus family HERV-W. Mol Biol Evol. 2002;19:52633.
47. Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolutions cauldron:
duplication, deletion, and rearrangement in the mouse and human
genomes. Proc Natl Acad Sci U S A. 2003;100:114849.
48. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, et al.
Human-mouse alignments with BLASTZ. Genome Res. 2003;13:1037.
49. Karolchik D, Bar ber GP, Casper J, Clawson H, Cline MS, Diekhans M, et
al. The UCSC genome browser database: 2014 update. Nucleic Acids
Res. 2014;42:D76470.
50. VoissetC,BlancherA,PerronH,MandrandB,MalletF,Paranhos-Baccalà
G. Phylogeny of a novel family of human endogenous retrovirus
sequences, HERV-W, in humans and other primates. AIDS Res Hum
Retrovir. 1999;15:152933.
51. Mager DL, Goodchild NL. Homologous recombination between the LTRs of
a human retrovirus-like element causes a 5-kb deletion in two siblings. Am
J Hum Genet. 1989;45:84854.
52. Blikstad V, Benachenhou F, Sperber GO, Blomberg J. Evolution of human
endogenous retroviral sequences: a conceptual account. Cell Mol Life Sci.
2008;65:334865.
53. Kim HS, Takenaka O, Crow TJ. Isolation and phylogeny of endogenous
retrovirus sequences belonging to the HERV-W family in primates. J Gen
Virol. 1999;80:26139.
54. Voisset C, Bouton O, Bedin F, Duret L, Mandrand B, Mallet F, et al.
Chromosomal distribution and coding capacity of the human endogenous
retrovirus HERV-W family. AIDS Res Hum Retrovir. 2000;16:73140.
55. Kent W, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The
human genome browser at UCSC. Genome Res. 2002;12:9961006.
56. Lavie L, Medstrand P, Schempp W, Mayer J, Meese E. Human endogenous
retrovirus family reconstruction of an ancient Betaretrovirus in the human
genome human endogenous retrovirus family HERV-K ( HML-5 ): status,
evolution, and reconstruction of an ancient Betaretrovirus in the human
genome. J Virol. 2004;78:878898.
57. Flockerzi A, Burkhardt S, Schempp W, Meese E, Mayer J. Human
endogenous retrovirus HERV-K14 families: status, variants, evolution, and
mobilization of other cellular sequences. J Virol. 2005;79:29419.
58. Gifford RJ. Evolution at the host-retrovirus interface. BioEssays. 2006;28:11536.
59. Escalera-Zamudio M, Greenwood AD. On the classification and evolution of
endogenous retrovirus: human endogenous retroviruses may not be
humanafter all. APMIS. 2016;124:4451.
60. Beck CR, Garcia-Perez JL, Badge RM, Moran JV. LINE-1 elements in structural
variation and disease. Annu Rev Genomics Hum Genet. 2011;12:187215.
61. Hancks DC, Kazazian HH. Roles for retrotransposon insertions in human
disease. Mob DNA. 2016;7:9.
62. Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002;12:65664.
63. Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J.
Repbase update, a database of eukaryotic repetitive elements. Cytogenet
Genome Res. 2005;110:4627.
64. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al.
Geneious basic: an integrated and extendable desktop software platform for
the organization and analysis of sequence data. Bioinformatics. 2012;28:16479.
65. Katoh K, Standley DM. MAFFT multiple sequence alignment software
version 7: improvements in performance and usability. Mol Biol Evol.
2013;30:77280.
66. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular
evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30:27259.
67. Sperber G, Lövgren A, Eriksson N-E, Benachenhou F, Blomberg J.
RetroTector online, a rational tool for analysis of retroviral elements in small
and medium size vertebrate genomic sequences. BMC Bioinformatics. 2009;
10 Suppl 6:S4.
68. Grandi N, Cadeddu M, Blomberg J, Tramontano E. Contribution of type
W human endogenous retrovirus to the human genome:
characterization of HERV-W proviral insertions and processed
pseudogenes. Retrovirology. 2016;accepted.
Grandi et al. BMC Evolutionary Biology (2018) 18:6 Page 14 of 14
... Although they were subsequently inherited in a Mendelian fashion by all descendants of the original host, there was little evolutionary pressure to maintain them in their intact form; more likely the opposite. Chimpanzees and gorillas have remarkably similar sets of retroviral loci (19)(20)(21), except for different mutations and the dozen or so new integrations that occurred in each species since our last shared ancestor lived approximately 6 million years ago. ...
... This raised the possibility that HIV-reactive antibodies in patients are, in fact, antibodies against HERV proteins that have a sufficient degree of sequence homology with HIV proteins. Indeed, two papers (170,171) reported that 16% of RA patients have antibodies against an epitope in the HERV-K envelope protein (amino acids [19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37]. It should be noted that the percentage of positive patients in the earlier papers was higher, presumably because the tested antigens were full-length proteins in their native state, while later papers mainly used selected peptides and therefore may have missed many autoantibodies. ...
Article
Full-text available
More than 200 human disorders include various manifestations of autoimmunity. The molecular events that lead to these diseases are still incompletely understood and their causes remain largely unknown. Numerous potential triggers of autoimmunity have been proposed over the years, but very few of them have been conclusively confirmed or firmly refuted. Viruses have topped the lists of suspects for decades, and it seems that many viruses, including those of the Herpesviridae family, indeed can influence disease initiation and/or promote exacerbations by a number of mechanisms that include prolonged anti-viral immunity, immune subverting factors, and mechanisms, and perhaps “molecular mimicry”. However, no specific virus has yet been established as being truly causative. Here, we discuss a different, but perhaps mechanistically related possibility, namely that retrotransposons or retroviruses that infected us in the past and left a lasting copy of themselves in our genome still can provoke an escalating immune response that leads to autoimmune disease. Many of these loci still encode for retroviral proteins that have retained some, or all, of their original functions. Importantly, these endogenous proviruses cannot be eliminated by the immune system the way it can eliminate exogenous viruses. Hence, if not properly controlled, they may drive a frustrated and escalating chronic, or episodic, immune response to the point of a frank autoimmune disorder. Here, we discuss the evidence and the proposed mechanisms, and assess the therapeutic options that emerge from the current understanding of this field.
... It is worth noting that we found HERVH sequences not only in the species in which they have been reported previously (Homo sapiens, Gorilla gorilla gorilla, Pongo abelii, Papio anubis, Chlorocebus aethiops, Callithrix jacchus, Pan troglodytes, Nomascus siki and Aotus nancymaae) but also in some new genera of Catarrhini, such as Mandrillus, Rhinopithecus, and Colobus ( Figure 1B), further confirming the widespread and ancient nature of HERVHF. We also found other types of F-HERVs, such as HERVW9, HERVIPADP, HERVK, and HSERVIII members ( Figure 1B), in primates, which was consistent with previous studies [35][36][37][38] but with hosts expanded in this study. Together, these results demonstrated that F-HERVs are ancient, and humans inherited such elements via vertical transmission from nonhuman primates. ...
... mph a l an gu s sy n dac tyl us ta r seq 127 8 . 1135 30 55 .113 5 3594 .+. HE RVHF P a n tro glod y t e s CM 0 09 242 .2. 160 6 841 39 . 1 60 686 0 50.+ .H ERV HF P a n tro glod y t e s NB AG0 300 2 955.1 . 1 82 70.20 1 81.+ . ...
Article
Human endogenous retroviruses (HERVs) are viral "fossils" in the human genome that originated from the ancient integration of exogenous retroviruses. Although HERVs have sporadically been reported in nonhuman primate genomes, their deep origination in pan-primates remains to be explored. Hence, based on the in silico genomic mining of full-length HERVs in 49 primates, we performed the largest systematic survey to date of the distribution, phylogeny, and functional predictions of HERVs. Most importantly, we obtained conclusive evidence of nonhuman origin for most contemporary HERVs. We found that various supergroups, including HERVW9, HUERSP, HSERVIII, HERVIPADP, HERVK, and HERVHF, were widely distributed in Strepsirrhini, Platyrrhini (New World monkeys) and Catarrhini (Old World monkeys and apes). We found that numerous HERVHFs are spread by vertical transmission within Catarrhini and one HERVHF was traced in 17 species, indicating its ancient nature. We also discovered that 164 HERVs were likely involved in genomic rearrangement and 107 HERVs were potentially coopted in the form of noncoding RNAs (ncRNAs) in humans. In summary, we provided comprehensive data on the deep origination of modern HERVs in pan-primates.
... This is indeed what is observed. Grandi et al. [39] searched the genomes of twelve nonhuman primates for orthologs of 211 human ERV-W elements. They identified 205 ERV-Ws in orthologous loci in chimpanzees, 207 in gorillas, 205 in orangutans, 190 in gibbons and 131 in rhesus macaques. ...
Article
Full-text available
One of the most sophisticated philosophies of science is the methodology of scientific research programmes (MSRP), developed by Imre Lakatos. According to MSRP, scientists are working within so-called research programmes, consisting of a hard core of fixed convictions and a flexible protective belt of auxiliary hypotheses. Anomalies are accommodated by changes to the protective belt that do not affect the hard core. Under MSRP, research programmes are appraised as ‘progressive’ if they successfully predict novel facts but are judged as ‘degenerative’ if they merely offer ad hoc solutions to anomalies. This paper applies these criteria to the evolutionary research programme as it has performed during half a century of ERV research. It describes the early history of the field and the emergence of the endogenization-amplification theory on the origins of retroviral-like sequences. It then discusses various predictions and postdictions that were generated by the programme, regarding orthologous ERVs in different species, the presence of target site duplications and the divergence of long terminal repeats, and appraises how the programme has dealt with data that did not conform to initial expectations. It is concluded that the evolutionary research programme has been progressive with regard to the issues here examined.
... In marmoset (Callithrix jacchus) of Cebidae, 51 out of the 56 SPRE-like elements extended in the 3′ direction overlapped with RepeatMasker tracks, and most of them were ERV1-1_CJa-I and ERV1-3_CJ-I (Fig. 3f ). These two families were specific to New World monkey, and ERV1-1_CJa-I was similar to ERV-W [30,31]. Together, while SPRE-like elements are not present in the exogenous retroviruses, infections and/or transpositions of the different SPRE-harboring retroviruses independently pushed up the SPRE-like elements' numbers in mammalian genomes. ...
Article
Full-text available
Background Retroviruses utilize multiple unique RNA elements to control RNA processing and translation. However, it is unclear what functional RNA elements are present in endogenous retroviruses (ERVs). Gene co-option from ERVs sometimes entails the conservation of viral cis -elements required for gene expression, which might reveal the RNA regulation in ERVs. Results Here, we characterized an RNA element found in ERVs consisting of three specific sequence motifs, called SPRE. The SPRE-like elements were found in different ERV families but not in any exogenous viral sequences examined. We observed more than a thousand of copies of the SPRE-like elements in several mammalian genomes; in human and marmoset genomes, they overlapped with lineage-specific ERVs. SPRE was originally found in human syncytin-1 and syncytin-2 . Indeed, several mammalian syncytin genes: mac-syncytin-3 of macaque, syncytin-Ten1 of tenrec, and syncytin-Car1 of Carnivora, contained the SPRE-like elements. A reporter assay revealed that the enhancement of gene expression by SPRE depended on the reporter genes. Mutation of SPRE impaired the wild-type syncytin-2 expression while the same mutation did not affect codon-optimized syncytin-2 , suggesting that SPRE activity depends on the coding sequence. Conclusions These results indicate multiple independent invasions of various mammalian genomes by retroviruses harboring SPRE-like elements. Functional SPRE-like elements are found in several syncytin genes derived from these retroviruses. This element may facilitate the expression of viral genes, which were suppressed due to inefficient codon frequency or repressive elements within the coding sequences. These findings provide new insights into the long-term evolution of RNA elements and molecular mechanisms of gene expression in retroviruses.
... At the time of integration, the HERV genome was composed of four retroviral genes (gag, pro, pol, and env) flanked by two LTRs, while the MaLRs genome was similar but lacked the env gene [3,4]. LTR-retrotransposons accumulated several mutations over time, and solitary LTRs were generated by recombination occurrences [3][4][5]. Despite that, both solo LTRs and proviral MaLRs/HERVs can contribute to human biology and development [6][7][8]. ...
Article
Full-text available
Human Endogenous retroviruses (HERVs) and Mammalian Apparent LTRs Retrotransposons (MaLRs) are remnants of ancient retroviral infections that represent a large fraction of our genome. The HERV and MaLR transcriptional activity is regulated in developmental stages, adult tissues, and pathological conditions. In this work, we used a bioinformatics approach based on RNA-sequencing (RNA-seq) to study the expression and modulation of HERVs and MaLR in a scenario of activation of the immune response. We analyzed transcriptome data from subjects before and after the administration of an inactivated vaccine against the Hantaan orthohantavirus, the causative agent of Korean hemorrhagic fever, to investigate the HERV and MaLR expression and differential expression in response to the administration of the vaccine. Specifically, we described the HERV transcriptome in PBMCs and identified HERV and MaLR loci differentially expressed after the 2nd, 3rd, and 4th inactivated vaccine administrations. We found that the expression of 545 HERV and MaLR elements increased in response to the vaccine and that the activation of several individual HERV and MaLR loci is specific for each vaccine administration and correlated to different genes and immune-related pathways.
... Importantly, such RetroTector-based classification can be used as a starting point for the deep characterization of individual HERV group of interest, which can possibly include additional members previously missed due to their structural incompleteness or accumulation of mutations affecting key retroviral motifs. In fact, while some HERV groups have been described in great detail (see for instance HERV-W [13,14] and HERV-H [15]), most of them are still poorly known in terms of total members, genomic distribution, and nucleotide structure. Overall, the best characterized class of HERV is surely Class II, which includes betaretrovirus-like elements currently divided into 10 groups named HML (Human MMTV-like) from 1 to 10 due to their identity to exogenous Mouse Mammary Tumor Virus (MMTV). ...
Article
Full-text available
Endogenous Retroviruses (ERVs) are ancient relics of infections that affected the primate germ line and constitute about 8% of our genome. Growing evidence indicates that ERVs had a major role in vertebrate evolution, being occasionally domesticated by the host physiology. In addition, human ERV (HERV) expression is highly investigated for a possible pathological role, even if no clear associations have been reported yet. In fact, on the one side, the study of HERV expression in high-throughput data is a powerful and promising tool to assess their actual dysregulation in diseased conditions; but, on the other side, the poor knowledge about the various HERV group genomic diversity and individual members somehow prevented the association between specific HERV loci and a given molecular mechanism of pathogenesis. The present study is focused on the HERV-K(HML7) group that—differently from the other HERV-K members—still remains poorly characterized. Starting from an initial identification performed with the software RetroTector, we collected 23 HML7 proviral insertions and about 160 HML7 solitary LTRs that were analyzed in terms of genomic distribution, revealing a significant enrichment in chromosome X and the frequent localization within human gene introns as well as in pericentromeric and centromeric regions. Phylogenetic analyses showed that HML7 members form a monophyletic group, which based on age estimation and comparative localization in non-human primates had its major diffusion between 20 and 30 million years ago. Structural characterization revealed that besides 3 complete HML7 proviruses, the other group members shared a highly defective structure that, however, still presents recognizable functional domains, making it worth further investigation in the human population to assess the presence of residual coding potential.
... Mobile DNA Combinatorial analysis of functional datasets and cluster analysis for ERV-ORFs in humans and mice revealed that many expressed ERV-ORFs were tissue-and lineage-specific. This is consistent with previous studies showing that specific ERV genes, such as ERVW-1, are expressed in limited tissues [18,[54][55][56] and are shared exclusively among apes [57]. In addition, we demonstrated that fractions of each viral-like protein domain in ERV-ORFs varied among mammalian lineages ( Fig. 1a and b). ...
Article
Full-text available
Background: Endogenous retroviruses (ERVs) are remnants of ancient retroviral infections of mammalian germline cells. A large proportion of ERVs lose their open reading frames (ORFs), while others retain them and become exapted by the host species. However, it remains unclear what proportion of ERVs possess ORFs (ERV-ORFs), become transcribed, and serve as candidates for co-opted genes. Results: We investigated characteristics of 176,401 ERV-ORFs containing retroviral-like protein domains (gag, pro, pol, and env) in 19 mammalian genomes. The fractions of ERVs possessing ORFs were overall small (~ 0.15%) although they varied depending on domain types as well as species. The observed divergence of ERV-ORF from their consensus sequences showed bimodal distributions, suggesting that a large proportion of ERV-ORFs either recently, or anciently, inserted themselves into mammalian genomes. Alternatively, very few ERVs lacking ORFs were found to exhibit similar divergence patterns. To identify candidates for ERV-derived genes, we estimated the ratio of non-synonymous to synonymous substitution rates (dN/dS) for ERV-ORFs in human and non-human mammalian pairs, and found that approximately 42% of the ERV-ORFs showed dN/dS < 1. Further, using functional genomics data including transcriptome sequencing, we determined that approximately 9.7% of these selected ERV-ORFs exhibited transcriptional potential. Conclusions: These results suggest that purifying selection operates on a certain portion of ERV-ORFs, some of which may correspond to uncharacterized functional genes hidden within mammalian genomes. Together, our analyses suggest that more ERV-ORFs may be co-opted in a host-species specific manner than we currently know, which are likely to have contributed to mammalian evolution and diversification.
Article
Background and Aim: Endogenous retroviruses (ERVs) found in all vertebrates, including non-human primates (NHPs), are known to be genetically inherited. Thus, recent studies have explored ERVs for human immunodeficiency virus vaccine development using human ERV (HERV) due to the hypervariability of exogenous retroviruses which cause conventional vaccines to be ineffective. HERV was also found to be able to induce an immune response in cancer patients. This study aimed to identify and molecularly characterize ERVs from Indonesian NHPs: Macaca fascicularis and Macaca nemestrina. Then, we described the phylogenetic relationship of these isolates with those of the simian ERVs (SERVs) characterized in other species and countries. Materials and Methods: First, 5 mL of whole blood samples was taken from 131 long-tailed macaques and 58 pig-tailed macaques in captive breeding facilities at Bogor, Indonesia, for DNA extraction. Next, the DNA samples were screened using the SYBR Green real-time polymerase chain reaction (PCR) technique with specific primers for env (simian retroviruses [SRV]1-5 7585U19 and SRV1-5 7695L21). Positive SERV results were those with cycle threshold (CT) values < 24 (CT < 24) and melting temperature (TM) ranges of 80°C–82°C. Then, whole-genome nucleotide sequences from two pig-tailed macaques samples detected as positive SERV were generated using a nucleic acid sequencing technique which utilized the walking primer method. Subsequently, the sequences were analyzed using bioinformatics programs, such as 4Peaks, Clustal Omega, and BLAST (NCBI). Subsequently, a phylogenetic tree was constructed using the neighbor-joining method in MEGA X. Results: SYBR Green real-time PCR amplification results indicated that SERV (Mn B1 and Mn B140910)-positive samples had CT values of 22.37–22.54 and TM of 82°C. Moreover, whole-genome sequences resulted in 7991 nucleotide sequences, comprising long terminal repeat, gag, pro, pol, and env genes identical between the sequenced samples. Furthermore, the phylogenetic tree results indicated that both samples from M. nemestrina had 99%–100% nucleotide identities to the Mn 92227 sample identified at the National Primate Center University of Washington (NaPRC UW) which was imported from Indonesia in 1998, confirmed as a novel SERV strain. The phylogenetic tree results also indicated that although SERV whole-genome nucleotide and env amino acid sequences were clustered with SRV-2 (identity values of 82% and 79%, respectively), they had a 99%–100% nucleotide identity to Mn 92227. Meanwhile, the gag, pro, and pol amino acids were clustered with SRV-1, SRV-3, SRV-4, SRV-5, SRV-8, and SERV/1997, with 82% and 88% identity values. Conclusion: Based on the SYBR Green real-time PCR profiles generated, similarities with Mn 92227 were observed. Subsequent phylogenetic analysis confirmed that both samples (Mn B1 and Mn B140919) from pig-tailed macaques in the country of origin were novel SERV strains at NaPRC UW. Therefore, it could be used in biomedical research on ERVs.
Article
Whereas DNA viruses are known to be abundant, diverse, and commonly key ecosystem players, RNA viruses are insufficiently studied outside disease settings. In this study, we analyzed ≈28 terabases of Global Ocean RNA sequences to expand Earth's RNA virus catalogs and their taxonomy, investigate their evolutionary origins, and assess their marine biogeography from pole to pole. Using new approaches to optimize discovery and classification, we identified RNA viruses that necessitate substantive revisions of taxonomy (doubling phyla and adding >50% new classes) and evolutionary understanding. "Species"-rank abundance determination revealed that viruses of the new phyla "Taraviricota," a missing link in early RNA virus evolution, and "Arctiviricota" are widespread and dominant in the oceans. These efforts provide foundational knowledge critical to integrating RNA viruses into ecological and epidemiological models.
Article
Full-text available
Endogenous retroviruses (ERVs) are relics of past infection that constitute up to 8% of the human genome. Understanding the genetic evolution of the ERV family and the interplay of ERVs and encoded RNAs and proteins with host function has become a new frontier in biology.
Article
Full-text available
Human Endogenous Retroviruses (HERVs) are ancient infection relics constituting ~8% of our DNA. While HERVs' genomic characterization is still ongoing, impressive amounts of data have been obtained regarding their general expression across tissues. Among HERVs, one of the most studied is the W group, which is the sole HERV group specifically mobilized by the long interspersed element-­1 (LINE-­1) machinery, providing a source of novel insertions by retrotransposition of HERV-­W processed pseudogenes, and comprising a member encoding a functional envelope protein coopted for human placentation. The HERV-W group has been intensively investigated for its putative role in several diseases, such as cancer, inflammation, and autoimmunity. Despite major interest in the link between HERV-W expression and human pathogenesis, no conclusive correlation has been demonstrated so far. In general, (i) the absence of a proper identification of the specific HERV-­W sequences expressed in a given condition, and (ii) the lack of studies attempting to connect the various observations in the same experimental conditions are the major problems preventing the definitive assessment of the HERV-­W impact on human physiopathology. In this review, we summarize the current knowledge on the HERV-­W group presence within the human genome and its expression in physiological tissues as well as in the main pathological contexts.
Article
Full-text available
Mammalian genomes are scattered with thousands of copies of endogenous retroviruses (ERVs), mobile genetic elements that are relics of ancient retroviral infections. After inserting copies into the germ line of a host, most ERVs accumulate mutations that prevent the normal assembly of infectious viral particles, becoming trapped in host genomes and unable to leave to infect other cells. While most copies of ERVs are inactive, some are transcribed and encode the proteins needed to generate new insertions at novel loci. In some cases, old copies are removed via recombination and other mechanisms. This creates a shifting landscape of ERV copies within host genomes. New insertions can disrupt normal expression of nearby genes via directly inserting into key regulatory elements or by containing regulatory motifs within their sequences. Further, the transcriptional silencing of ERVs via epigenetic modification may result in changes to the epigenetic regulation of adjacent genes. In these ways, ERVs can be potent sources of regulatory disruption as well as genetic innovation. Here, we provide a brief review of the association between ERVs and gene expression, especially as observed in pre-implantation development and placentation. Moreover, we will describe how disruption of the regulated mechanisms of ERVs may impact somatic tissues, mostly in the context of human disease, including cancer, neurodegenerative disorders, and schizophrenia. Lastly, we discuss the recent discovery that some ERVs may have been pressed into the service of their host genomes to aid in the innate immune response to exogenous viral infections.
Article
Full-text available
Background Human endogenous retroviruses (HERVs) are ancient sequences integrated in the germ line cells and vertically transmitted through the offspring constituting about 8 % of our genome. In time, HERVs accumulated mutations that compromised their coding capacity. A prominent exception is HERV-W locus 7q21.2, producing a functional Env protein (Syncytin-1) coopted for placental syncytiotrophoblast formation. While expression of HERV-W sequences has been investigated for their correlation to disease, an exhaustive description of the group composition and characteristics is still not available and current HERV-W group information derive from studies published a few years ago that, of course, used the rough assemblies of the human genome available at that time. This hampers the comparison and correlation with current human genome assemblies. Results In the present work we identified and described in detail the distribution and genetic composition of 213 HERV-W elements. The bioinformatics analysis led to the characterization of several previously unreported features and provided a phylogenetic classification of two main subgroups with different age and structural characteristics. New facts on HERV-W genomic context of insertion and co-localization with sequences putatively involved in disease development are also reported. Conclusions The present work is a detailed overview of the HERV-W contribution to the human genome and provides a robust genetic background useful to clarify HERV-W role in pathologies with poorly understood etiology, representing, to our knowledge, the most complete and exhaustive HERV-W dataset up to date.
Article
Full-text available
Over evolutionary time, the dynamic nature of a genome is driven, in part, by the activity of transposable elements (TE) such as retrotransposons. On a shorter time scale it has been established that new TE insertions can result in single-gene disease in an individual. In humans, the non-LTR retrotransposon Long INterspersed Element-1 (LINE-1 or L1) is the only active autonomous TE. In addition to mobilizing its own RNA to new genomic locations via a "copy-and-paste" mechanism, LINE-1 is able to retrotranspose other RNAs including Alu, SVA, and occasionally cellular RNAs. To date in humans, 124 LINE-1-mediated insertions which result in genetic diseases have been reported. Disease causing LINE-1 insertions have provided a wealth of insight and the foundation for valuable tools to study these genomic parasites. In this review, we provide an overview of LINE-1 biology followed by highlights from new reports of LINE-1-mediated genetic disease in humans.
Article
Full-text available
Human endogenous retroviruses (HERVs) represent the inheritance of ancient germ-line cell infections by exogenous retroviruses and the subsequent transmission of the integrated proviruses to the descendants. ERVs have the same internal structure as exogenous retroviruses. While no replication-competent HERVs have been recognized, some retain up to three of four intact ORFs. HERVs have been classified before, with varying scope and depth, notably in the RepBase/RepeatMasker system. However, existing classifications are bewildering. There is a need for a systematic, unifying and simple classification. We strived for a classification which is traceable to previous classifications and which encompasses HERV variation within a limited number of clades. The human genome assembly GRCh 37/hg19 was analyzed with RetroTector, which primarily detects relatively complete Class I and II proviruses. A total of 3173 HERV sequences were identified. The structure of and relations between these proviruses was resolved through a multi-step classification procedure that involved a novel type of similarity image analysis (“Simage”) which allowed discrimination of heterogeneous (noncanonical) from homogeneous (canonical) HERVs. Of the 3173 HERVs, 1214 were canonical and segregated into 39 canonical clades (groups), belonging to class I (Gamma- and Epsilon-like), II (Beta-like) and III (Spuma-like). The groups were chosen based on (1) sequence (nucleotide and Pol amino acid), similarity, (2) degree of fit to previously published clades, often from RepBase, and (3) taxonomic markers. The groups fell into 11 supergroups. The 1959 noncanonical HERVs contained 31 additional, less well-defined groups. Simage analysis revealed several types of mosaicism, notably recombination and secondary integration. By comparing flanking sequences, LTRs and completeness of gene structure, we deduced that some noncanonical HERVs proliferated after the recombination event. Groups were further divided into envelope subgroups (altogether 94) based on sequence similarity and characteristic “immunosuppressive domain” motifs. Intra and inter(super)group, as well as intraclass, recombination involving envelope genes (“env snatching”) was a common event. LTR divergence indicated that HERV-K(HML2) and HERVFC had the most recent integrations, HERVL and HUERSP3 the oldest. A comprehensive HERV classification and characterization approach was undertaken. It should be applicable for classification of all ERVs. Recombination was common among HERV ancestors.
Article
Endogenous retroviruses (ERVs) are abundant in mammalian genomes and contain sequences modulating transcription. The impact of ERV propagation on the evolution of gene regulation remains poorly understood. We found that ERVs have shaped the evolution of a transcriptional network underlying the interferon (IFN) response, a major branch of innate immunity, and that lineage-specific ERVs have dispersed numerous IFN-inducible enhancers independently in diverse mammalian genomes. CRISPR-Cas9 deletion of a subset of these ERV elements in the human genome impaired expression of adjacent IFN-induced genes and revealed their involvement in the regulation of essential immune functions, including activation of the AIM2 inflammasome. Although these regulatory sequences likely arose in ancient viruses, they now constitute a dynamic reservoir of IFN-inducible enhancers fueling genetic innovation in mammalian immune defenses.
Article
Human Endogenous Retroviruses (HERVs) have been implicated in autoimmune and other diseases. Molecular mimicry has been postulated as a potential mechanism of autoimmunity. Exogenous viruses have also been reported to be associated with the same diseases, as have genetic and environmental factors. If molecular mimicry were to be shown to be an initiating mechanism of some autoimmune diseases, then therapeutic options of blocking antibodies and peptides might be of benefit in halting diseases at the outset. Bioinformatic and molecular modelling techniques have been employed to investigate molecular mimicry and the evidence for the association of HERVs and autoimmunity is reviewed. The most convincing evidence for molecular mimicry is in rheumatoid arthritis, where HERV K-10 shares amino acid sequences with IgG1Fc, a target for rheumatoid factor. Systemic lupus erythematosus is an example of a condition associated with several autoantibodies, and several endogenous and exogenous viruses have been reported to be associated with the disease. The lack of a clear link between one virus and this condition, and the spectrum of clinical manifestations, suggests that genetic, environmental and the inflammatory response to a virus or viruses might also be major factors in the pathogenesis of lupus and other autoimmune conditions. Where there are strong associations between a virus and an autoimmune condition, such as in hepatitis C and cryoglobulinaemia, the use of bioinformatics and molecular modelling can also be utilized to help to understand the role of molecular mimicry in how HERVs might trigger disease.
Article
Endogenous retroviruses are pathogenic – in other species than the human. Disease associations for Human Endogenous RetroViruses (HERVs) are emerging, but so far an unequivocal pathogenetic cause-effect relationship has not been established. A role for HERVs has been proposed in neurological and neuropsychiatric diseases as diverse as multiple sclerosis (MS) and schizophrenia (SCZ). Particularly for MS, many aspects of the activation and involvement of specific HERV families (HERV-H/F and HERV-W/MSRV) have been reported, both for cells in the circulation and in the central nervous system. Notably envelope genes and their gene products (Envs) appear strongly associated with the disease. For SCZ, for ALS, and for HIV-associated dementia (HAD), indications are accumulating for involvement of the HERV-K family, and also HERV-H/F and/or HERV-W. Activation is reasonably a prerequisite for causality as most HERV sequences remain quiescent in non-pathological conditions, so the importance of regulatory pathways and epigenetics involved in regulating HERV activation, derepression, and also involvement of retroviral restriction factors, is emerging. HERV-directed antiretrovirals have potential as novel therapeutic paradigms in neurologic disease, particularly in MS. The possible protective or ameliorative effects of antiretroviral therapy in MS are substantiated by reports that treatment of HIV infection may be associated with a significantly decreased risk of MS. Further studies of HERVs, their role in neurologic diseases, and their potential as therapeutic targets are essential.