JOURNAL OF VIROLOGY, Oct. 2011, p. 9863–9876
Copyright © 2011, American Society for Microbiology. All Rights Reserved.
Vol. 85, No. 19
Widespread Endogenization of Densoviruses and Parvoviruses in
Animal and Human Genomes?†
Huiquan Liu,1,2Yanping Fu,2Jiatao Xie,2Jiasen Cheng,2Said A. Ghabrial,3Guoqing Li,1,2
Youliang Peng,4Xianhong Yi,2and Daohong Jiang1,2*
State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, Hubei Province,
People’s Republic of China1; Provincial Key Lab of Plant Pathology of Hubei Province, College of Plant Science and Technology,
Huazhong Agricultural University, Wuhan, 430070, Hubei Province, People’s Republic of China2; Department of
Plant Pathology, University of Kentucky, 201F Plant Science Building, 1405 Veterans Drive, University of
Kentucky, Lexington, Kentucky 40546-03123; and State Key Laboratories for Agrobiotechnology,
China Agricultural University, Yuanming-Yuan West Road No. 2,
Haidian District, 100193 Beijing, People’s Republic of China4
Received 22 April 2011/Accepted 14 July 2011
Parvoviruses infect humans and a broad range of animals, from mammals to crustaceans, and generally are
associated with a variety of acute and chronic diseases. However, many others cause persistent infections and
are not known to be associated with any disease. Viral persistence is likely related to the ability to integrate into
the chromosomal DNA and to establish a latent infection. However, there is little evidence for genome
integration of parvoviral DNA except for Adeno-associated virus (AAV). Here we performed a systematic search
for homologs of parvoviral proteins in publicly available eukaryotic genome databases followed by experimen-
tal verification and phylogenetic analysis. We conclude that parvoviruses have frequently invaded the germ
lines of diverse animal species, including mammals, fishes, birds, tunicates, arthropods, and flatworms. The
identification of orthologous endogenous parvovirus sequences in the genomes of humans and other mammals
suggests that parvoviruses have coexisted with mammals for at least 98 million years. Furthermore, some of
the endogenized parvoviral genes were expressed in eukaryotic organisms, suggesting that these viral genes are
also functional in the host genomes. Our findings may provide novel insights into parvovirus biology, host
interactions, and evolution.
Members of the family Parvoviridae infect a wide variety of
hosts, ranging from insects to primates. These viruses contain
linear single-stranded DNA (ssDNA) genomes and typically
possess two major gene cassettes; one encodes the nonstruc-
tural protein (NS or Rep) essential for viral gene expression
and DNA replication, and the other encodes the structural
proteins of the capsid (CP or VP) (5, 38). Members of this
family have been classified into two subfamilies, the Parvoviri-
nae (vertebrate viruses) and the Densovirinae (arthropod vi-
Generally, parvoviruses cause a wide range of acute or
chronic diseases; many, however, are not known to be associ-
ated with any disease (6). Parvoviruses frequently cause per-
sistent infections, but the persistence mechanisms remain un-
known. Viral persistence is likely related to the ability to
integrate into the chromosomal DNA and to establish a latent
infection, such as for retroviruses (17, 22) and some DNA
tumor viruses (11, 36, 50, 51). Adeno-associated virus (AAV), a
nonautonomous parvovirus, can establish latency through site-
specific genome integration into human chromosome 19 in cell
culture (29, 41), and the autonomous parvovirus minute virus
of mice (MVM) has been shown to integrate in a site-specific
manner into episomes (12). However, it is not known whether
integration into the host germ line DNA and consequent trans-
mission to offspring (endogenization) take place.
Recently, Kerr and Boschetti (27) identified some short re-
gions (17 to 26 nucleotides [nt]) of sequence identity between
several human and rodent parvoviruses and their respective
host genomes; this could be biologically relevant to the persis-
tence of these viruses in host tissues. However, there is no clear
evidence of integration of these viruses. The presence within
the shrimp genome of sequences clearly related to infectious
hypodermal and hematopoietic necrosis virus (IHHNV) (46), a
common parvovirus of shrimp, implies that integration of au-
tonomous parvoviruses may have occurred widely but has not
been well documented. The increasing availability of eukary-
otic genome data and viral sequences open up the scope for
further investigating integration events as well as the mecha-
nisms of pathogenesis and persistence of parvoviruses.
Hence, we performed a systematic search for homologs of
parvoviral proteins in the publicly available eukaryotic genome
databases, and our subsequent phylogenetic analysis confirmed
that parvoviruses have been frequently endogenized into the
nuclear genomes of various animals. While our paper was
being prepared for submission, two independent groups of
investigators reported that sequences derived from two genera,
the parvoviruses and dependoviruses in the subfamily Parvo-
virinae, are integrated in the genomes of vertebrate species (3,
26). Here we report our more comprehensive and convincing
* Corresponding author. Mailing address: Plant Pathology, College
of Plant Science and Technology, Huazhong Agricultural University,
Wuhan 430070, Hubei Province, People’s Republic of China. Phone:
86-27-87280487. Fax: 86-27-87397735. E-mail: daohongjiang@mail
† Supplemental material for this article may be found at http://jvi
?Published ahead of print on 27 July 2011.
results based on ample critical data analysis and laboratory
research. Our studies not only have corroborated the endog-
enization of viruses in the subfamily Parvovirinae in vertebrate
species but have also confirmed that numerous densoviruses
(subfamily Densovirinae) have been endogenized into the ge-
nomes of invertebrate species. Furthermore, we identified a
syntenic endogenous parvovirus in human and other mammal
species that dates this endogenization event back to at least 98
million years. In addition, we have confirmed the expression of
some endogenous viral genes in host genomes. The implica-
tions of these findings, which are pertinent to virus-cell inter-
action and evolution, are also discussed.
MATERIALS AND METHODS
Data mining. All database searches were performed online and were com-
pleted in April 2010. To screen for parvovirus-related DNA sequences (PRDs)
in eukaryotic genomes, we performed tBLASTn searches using as queries the NS
and CP protein sequences of representative parvoviruses against the refse-
q_genomic, chromosome, whole-genome shotgun (WGS), and eukaryote
genomic BLAST databases at NCBI (http://blast.ncbi.nlm.nih.gov/Blast.cgi).
Through an iterative process of screening, all nonredundant hits from these
searches with E values of ?1e?5 were extracted. The relationships between
sequences found to be similar in these searches were determined by reverse
BLAST comparisons using each extracted hit as a BLASTx query against the
nonredundant (NR) protein database. The region beside each PRD was scanned
for adjacent transposable elements (TEs) or repetitive sequences by a WSCensor
(http://www.ebi.ac.uk/Tools/censor/) search against a reference collection of se-
quence repeats (25).
Examination of possible chimeras or errors in assembling of PRDs. Sequence
similarities between parvoviruses and eukaryotic genomes could be attributed to
trivial contamination of eukaryotic genomic DNA with viral sequences during
cloning or sequence assembly. To rule out this possibility, we searched against
archival data of the eukaryotic genome sequencing using their PRDs and flank-
ing cellular sequences as megaBLAST queries at the NCBI Trace Archive (http:
//www.ncbi.nlm.nih.gov/blast/mmtrace.shtml) with the cutoff value of ?95% nu-
cleotide identity and carefully examined the junctions between PRD and cellular
sequences. The statistics of junction coverage that show the number of trace
records containing the junctions between PRDs and cellular sequences are listed
in Data Set S1 in the supplemental material.
Phylogenetic analyses. The putative amino acid sequences of all available
PRDs, which were obtained according to BLASTx hits, were used for the phy-
logenetic analysis with NS or CP proteins of representative exogenous vertebrate
or arthropod parvoviruses. Multiple alignments of protein sequences were con-
structed using COBALT (37) (http://www.ncbi.nlm.nih.gov/tools/cobalt/cobalt
.cgi?link_loc?BlastHomeAd.). The in-frame stop codons within PRDs were
indicated as X. Protein maximum-likelihood (ML) trees were inferred with
PhyML-mixtures (20, 30), assuming the EX2 mixture model (30) and SPR tree
topology search strategy (21). Gaps in alignment were systematically treated as
unknown characters. The reliability of internal branches was evaluated based on
approximate likelihood ratio test (aLRT) statistics (1).
Detection of expression of PRDs from animal nuclear genomes. To investigate
whether PRDs can be expressed in animals, we first screened parvovirus-related
cDNA sequences in the NCBI expressed sequence tag (EST) database using the
method described in “Data mining” above. Subsequently, we compared the
identified parvovirus-related cDNA sequences with those of animal genomes and
parvoviruses by megaBLAST to determine whether they were expressed se-
quences from animal nuclear genomes or contaminating sequences from exog-
enous incidental viruses.
PCR amplification and DNA sequencing. Bovine, cat, dog, horse, guinea pig,
mouse C57BLC/6J, porcine, rabbit, Sprague-Dawley rat, and sheep genomic
DNA samples were obtained from Zyagen Laboratories, and Drosophila
sechellia, Drosophila persimilis, and Drosophila willistoni genomic DNA samples
were acquired from the Drosophila Species Stock Center. To amplify the candi-
date DNA fragments from these DNA samples by PCR, primer pairs were
designed based on the virus-related sequences and their flanking cellular se-
quences (see Table S1 in the supplemental material for the primer pairs used).
PCR products were fractionated by gel electrophoresis on 1% agarose gels and
stained with ethidium bromide. DNA was sequenced by Sanger methods at the
Beijing Genomics Institute (BGI).
Nucleotide sequence accession numbers. New sequences generated in this
study were deposited in GenBank under accession numbers HM469386 to
HM469391 and HM989956 to HM989958.
Identification of parvovirus-related DNA sequences in ani-
mal nuclear genomes. We systematically screened the assem-
bled genomes of 209 eukaryotes in genomic BLAST databases,
as well as other, uncompleted eukaryotic genomes in the WGS
database, using the methods described in Materials and Meth-
ods. This process identified 275 significant matches to parvo-
viral NS or CP proteins in diverse animal nuclear genomes
(Table 1; see Data Set S1 in the supplemental material), which
we named parvovirus-related DNA sequences (PRDs). The
PRD copy numbers varied among animal species. For exam-
ple, the tammar wallaby contains 39 PRDs, although its ge-
nome sequencing is not yet complete. In contrast, only one
PRD homologous to parvovirus CP was found in the human
genome, although humans are infected by many isolates be-
longing to distinct species of parvoviruses. Some PRDs were
found in puffer fish, blood fluke, and two sea squirt species.
Curiously, in terms of known viral families, parvoviruses com-
prise one group of DNA viruses that have not been found in
fishes (14, 49). Parvoviruses have also not yet been reported in
tunicates and flatworms (49).
Characteristics of PRDs. Compared to intact parvovirus ge-
nomes, endogenous parvoviral sequences, in most cases, com-
prised only individual viral genes or gene fragments. However,
some others consisted of complete copies of viral genomes, but
several contained rearranged viral genome structures or were
interrupted by TEs or repeated nuclear element insertions
(Fig. 1). It should be noted that some PRDs are truncated due
to sequencing or assembly gaps. This may also be caused by
divergence of the PRDs. Furthermore, many PRD copies have
resulted from segmental duplication within animal genomes, as
they shared unambiguous flanking cellular sequences (see Fig.
S1 in the supplemental material). The 5? or 3? noncoding
sequences of endogenous parvoviruses were generally not de-
tected, possibly due to sequence degradation. In the genome of
the African savanna elephant, however, a PRD located within
a reverse transcriptase gene shared significant sequence simi-
larities not only with the complete Rep protein of Bovine
adeno-associated virus but also with its 5? noncoding sequence
(Fig. 2). In summary, DNA copies of complete parvovirus
genomes or individual genes could have been endogenized into
host genomes and subsequently subjected to amplification, de-
letions, insertions, and rearrangements in the host genome.
Examination of the potential coding capacities of the endog-
enized viral sequences indicated that most were defective, con-
taining numerous in-frame stop codons, frameshift mutations,
and insertions or deletions (Fig. 1; see Data Set S1 in the
supplemental material), which suggests that these are unlikely
to have functional potential as a virus. This characteristic was
especially apparent for vertebrate PRDs. The invertebrate
PRDs, however, were more conserved, and some of them re-
tained uninterrupted viral open reading frames (see Data Set
S1 in the supplemental material).
Although AAV integrates into the human genome in a site-
specific manner (29, 41), the PRDs were scattered in an ap-
9864LIU ET AL.J. VIROL.
TABLE 1. Distribution of parvovirus-related DNA sequences among animal genomes
No. of genes
Homo sapiens (human)
Pan troglodytes (chimpanzee)
Gorilla gorilla gorilla (Western lowland gorilla)
Pongo abelii (Sumatran orangutan)
Macaca mulatta (rhesus monkey)
Callithrix jacchus (white-tufted-ear marmoset)
Microcebus murinus (gray mouse lemur)
Otolemur garnettii (small-eared galago)
Tarsius syrichta (Philippine tarsier)
Mus musculus (house mouse)
Rattus norvegicus (Norway rat)
Dipodomys ordii (Ord’s kangaroo rat)
Cavia porcellus (domestic guinea pig)
Canis lupus familiaris (dog)
Ailuropoda melanoleuca (giant panda)
Felis catus (domestic cat)
Equus caballus (horse)
Bos taurus (cattle)
Sus scrofa (pig)
Lama pacos (alpaca)
Ovis aries (sheep)
Monodelphis domestica (gray short-tailed opossum)
Macropus eugenii (tammar wallaby)
Ornithorhynchus anatinus (platypus)
Myotis lucifugus (little brown bat)
Pteropus vampyrus (large flying fox)
Dasypus novemcinctus (nine-banded armadillo)
Procavia capensis (cape rock hyrax)
Echinops telfairi (small Madagascar hedgehog)
Loxodonta africana (African savanna elephant)
Whale or dolphin
Tursiops truncatus (bottlenosed dolphin)
Rabbits and hares
Oryctolagus cuniculus (rabbit)
Ochotona princeps (American pika)
Tetraodon nigroviridis (spotted green pufferfish)
Taeniopygia guttata (zebra finch)
Ciona intestinalis (sea squirt)
Ciona savignyi (Pacific transparent sea squirt)
Lepeophtheirus salmonis (salmon louse) strain Pacific
Ixodes scapularis (black-legged tick) strain Wikel colony
Acyrthosiphon pisum (pea aphid) strain LSR1
Drosophila (fruit fly) grimshawi strain TSC 15287-2541.00
Drosophila (fruit fly) persimilis strain MSH-3
Drosophila (fruit fly) willistoni strain TSC 14030-0811.24
Drosophila (fruit fly) sechellia strain Rob3c
Rhodnius prolixus (Triatomid bug)
Schistosoma mansoni (blood fluke)
169 106 275
aBoldface, completed genomic assembly; underlining, whole-genome shotgun assembly; standard font, unfinished genomic sequence.
bThe top Blast hit is with the U94 gene of human herpesvirus 6, a parvovirus Rep homolog.
parently random fashion within animal genomes, but most of
them were adjacent to or directly connected to TEs or re-
peated elements (Fig. 1 and 2; see Data Set S1 in the supple-
mental material), which might be involved in the integration
Exclusion of the possibility that PRDs are contaminant se-
quences. Several lines of evidence indicate that PRDs are real
cellular sequences rather than contaminants originating from
exogenous incidental viruses. First, a closer inspection of the
raw sequence reads used for WGS assembly indicated deep
sequencing coverage across the junctions between PRDs and
cellular sequences (see Data Set S1 in the supplemental ma-
terial). Moreover, identical PRD loci were present in two dif-
ferent assemblies (the NCBI reference assembly and the Cel-
era assembly) of mouse and rat genomes and in three different
assemblies (the GRCh37 primary reference assembly and al-
ternate Celera and HuRef assemblies) of the human genome.
These results strongly suggest that PRDs were not artifacts of
cloning or sequence assembly. Second, most PRDs underwent
various degrees of degradation, and several also contained
rearranged viral structures or were interrupted by TEs or re-
peated nuclear element insertions, suggesting that PRDs in-
vaded animal genomes millions of years ago. Finally, many
PRDs arose via segmental duplication events within the animal
genomes; hence, these represent an established germ line in-
To validate these observations, we amplified and sequenced
some of the proposed junctions between PRDs and cellular
sequences from bovine, cat, dog, horse, guinea pig, mouse,
porcine, rabbit, rat, sheep, and three fruit fly species (Drosoph-
ila sechellia, D. persimilis, and D. willistoni) (Fig. 3; see Table S1
in the supplemental material). The results revealed that the
PCR products were of the expected sizes and the experimental
sequences were identical to sequences of the animal genomes
containing both the expected host sequences and PRDs.
Phylogenetic analysis of PRDs and exogenous viruses. To
evaluate the genetic relationship of PRDs and exogenous par-
voviruses, we performed a comprehensive phylogenetic analy-
sis using deduced amino acid sequences of all available PRDs
with the NS or CP protein sequences of representative parvo-
viruses. As shown in Fig. 4, the PRDs were most clearly placed
within the subclades of phylogenetic trees of subfamily Parvo-
virinae or Densovirinae, thus strongly suggesting that the PRDs
were derived from members of the family Parvoviridae. Nu-
merous PRDs from different animal species and even different
lineages clustered together and formed a sister clade to mod-
ern exogenous viruses (see examples in the upper part of the
trees shown in Fig. 4A and B), suggesting that these repre-
sented extinct or undescribed lineages sharing a common an-
cestor with known modern viruses. Moreover, the phylogenetic
pattern of these PRDs is not consistent with the evolutionary
relatedness of their hosts, indicating that these PRDs have
most likely invaded these species via independent integration
events over time rather than via a single event in their ancestor.
The PRDs from two tunicate species showed clustering of
PRD units from the same species (Fig. 4B). Thus, the integra-
tion of virus probably occurred before the separation of these
two species. However, it is also possible that these PRDs were
derived from amplification events after integration into the two
genomes. Indeed, the flanking sequences of some PRDs in the
sea squirt genome show high sequence similarities (see Fig. S1
in the supplemental material).
FIG. 1. Schematic representation of some PRDs and their most related viruses. Arrowhead boxes indicate viral-like genes (red, nonstructural
proteins; blue, structural proteins). Green rectangular boxes indicate transposable elements. Colored sectors connect corresponding homologous
regions, and the percent amino acid identity scores are indicated. Wavy and vertical lines within boxes indicate sequences containing frameshifts
and stop codons compared with viral genes, respectively. Black arrowheads indicate primers which were used to amplify the junctions between PRD
and host sequences. See Table S1 in the supplemental material for PCR primer sequences and their chromosomal locations.
9866 LIU ET AL.J. VIROL.
Although many PRDs from the same species (such as those
in pea aphid and black-legged tick) clustered together and
shared high sequence identity with each other (Fig. 4C and D),
their flanking host sequences were not homologous. This sug-
gests that multiple endogenization of the same or very similar
viruses occurred. In contrast, some PRDs from individual spe-
cies (such as those in Triatomid bug, tammar wallaby, Norway
rat, and domestic guinea pig) were placed within different
clades (Fig. 4A, B, and C), suggesting that integration of dis-
tinct parvoviral species occurred in the same host genome.
Notably, the sole PRD in the genome of Philippine tarsier is
most closely related to the U94 gene of human herpesvirus 6
(48), a homolog of the parvovirus Rep gene (44, 48), indicating
the possibility that it was directly derived from a herpesvirus
integration event (Fig. 4A) (see Fig. S2 in the supplemental
The integration of parvoviruses could have occurred over a
wide range of the evolutionary time scale. For example, several
PRDs were most closely related to one species of modern
exogenous viruses in the phylogenetic trees (green-shaded taxa
in Fig. 4A, B, and C), clearly suggesting that integrations of
these viruses involved relatively recent events. Some PRDs, on
the other hand, were located at the base of the extant virus
clades and have accumulated numerous degeneration muta-
FIG. 2. Schematic representation and alignment of a PRD in the African savanna elephant genome and Bovine adeno-associated virus (BAAV).
Sequence alignment of 5? untranslated regions (1) and predicted amino acid sequences (2) of BAAV Rep and PRD are shown. Conserved
nucleotides (amino acid residues) are shaded in orange. Green interrupted rectangular boxes indicate transposable elements, the length of which
was not drawn to scale. Colored sectors connect corresponding homologous regions; percent nucleotide or amino acid identities are indicated.
FIG. 3. PCR using animal total DNAs. PCR products were fractionated by gel electrophoresis on 1% agarose gels and stained with ethidium
bromide. Marker, DNA marker DL 2000. Arrowheads indicate bands of the expected sizes in lanes with more than one band. The sequences of
bands of the expected sizes from guinea pig, horse, D. sechellia NS, D. persimilis, mouse, pig, rabbit, rat NSCP-2, and cat were deposited in
GenBank under accession numbers HM469386 to HM469391 and HM989956 to HM989958.
VOL. 85, 2011WIDESPREAD DENSOVIRUS AND PARVOVIRUS ENDOGENIZATION9867
tions, suggesting that these were derived from integration of
ancestral parvoviruses millions of years ago. Furthermore,
through genomic syntenic analysis, we found that the human
PRD-related sequences were present at similar locations in the
genomes of primates, carnivores, ungulates, and dolphins but
not in placental African savanna elephant (Fig. 5A and B; see
Table S2 in the supplemental material). These human-related
PRDs were located in an intron of human Ellis van Creveld
syndrome 2 (limbin) gene orthologs. Phylogenetic analysis of
such PRDs from different mammalian lineages confirmed that
FIG. 4.–Continued on following page
VOL. 85, 2011WIDESPREAD DENSOVIRUS AND PARVOVIRUS ENDOGENIZATION9869
9870LIU ET AL.J. VIROL.
they are derived from a single event rather than from many
independent integration events at the same location, since the
phylogenetic topology was consistent with mammal evolution
(9, 39) (Fig. 5C). Considering mammal phylogeny, this finding
demonstrated that these PRDs must have appeared in an an-
cestor of living placental mammals that diverged from primi-
tive placental mammals in the late Cretaceous period (9, 39).
This implies that parvoviruses have coexisted with mammals
for an evolutionary history stretching at least 98 million years
(9). Given that the nucleotide substitution rate of ssDNA vi-
ruses is close to that of RNA viruses, it is remarkable that the
similarity between these PRDs and related exogenous viruses
could still be recognized (Fig. 5D). We could not identify genes
orthologous to human PRD in the rodent lineage (Fig. 5B),
suggesting that lineage-specific deletions have occurred during
Expression of PRDs in animal nuclear genomes. We found
numerous parvovirus-related cDNA sequences in various or-
ganisms by mining NCBI EST database. Through subsequent
sequence comparisons, some of these ESTs were regarded as
contaminated sequences from exogenous incidental viruses be-
cause they shared high nucleotide identity with sequences of
FIG. 4. Phylogenetic trees of exogenous parvoviruses and animal PRDs. (A and B) NS and CP trees of vertebrate parvoviruses and their related
PRDs, respectively. The trees were rooted with the densovirus-like Penaeus monodon hepatopancreatic parvovirus. The node of the orthologous
PRD clade is marked by a red diamond, and the relevant hosts are indicated by a blue arc in the middle of tree (B). (C and D) NS and CP trees
of arthropod parvoviruses and their related PRDs, respectively. The trees were rooted with the parvovirus Aleutian mink disease virus. Only P values
of the approximate likelihood ratios (SH test) of ?0.5 (50%) are indicated. All scale bars correspond to 0.5 amino acid substitution per site. The
PRD branches are printed in red. The taxon names of PRDs possibly derived from recent integration events are shaded in green (see details in
text). Animals belonging to the same group are indicated to the right. The sequence accession number is given for each sequence.
VOL. 85, 2011WIDESPREAD DENSOVIRUS AND PARVOVIRUS ENDOGENIZATION 9871
known parvoviruses but lacked sequence similarity to animal
genomes (see Data Set S2 in the supplemental material). Some
other ESTs exhibited low amino acid identity compared to
sequences of PRDs or exogenous parvoviruses (see Data Set
S2 in the supplemental material), and several also contained
frameshifts or internal stop codons. Such characteristics are
very similar to those of the PRDs detected from nuclear ge-
nomes, implying that these ESTs are most likely expressed
PRDs from nuclear genomes of animals. However, whether
they were real expressed PRDs remain to be established, be-
cause the genome sequences of relevant animals are not cur-
We were convinced that the parvovirus-related cDNA se-
quences from some invertebrates (sea squirt, fruit fly, pea
aphid, black-legged tick, and salmon louse) were expressed
PRDs from nuclear genomes (see Data Set S3 in the supple-
FIG. 5. Identification of a syntenic PRD locus in mammal genomes. (A) Schematic representation of the human limbin gene structure. Vertical
blue bars indicate putative exons; arcs indicate putative introns. The region of the PRD and flanking sequence is marked with a red rectangular
box. (B) The PRD and flanking sequence in human genome were aligned with the orthologous regions of other mammals using BLASTn. Colored
bars indicate the similarity level between human sequences and other mammal sequences as measured by BLAST scores. Asterisks indicate that
sequences are truncated due to sequencing gaps. Note that African savanna elephant and mouse did not contain PRDs. See Table S2 in the
supplemental material for accession numbers and positions of mammal sequences used for analysis. SINE, short interspersed repetitive element.
(C) Phylogenetic tree of orthologous PRD regions in mammal genomes. The phylogenetic tree was constructed by the neighbor-joining method
using the maximum composite likelihood substitution model with the pairwise deletion option in MEGA4 (45). The bootstrap probability is
indicated for each interior branch. The scale bar indicates the number of nucleotide substitutions per site. The tree is midpoint rooted, and its
topology is consistent with the phylogeny of mammals (9, 39). (D) Alignment of putative amino acid sequences of human PRD and its best-matched
virus. The default color scheme for ClustalW alignment in the Jalview program was used. Percent amino acid identity is indicated. The red asterisks
and triangle indicate predicted stop codons and frameshift sites in human PRD, respectively. NHP_AAV_CP, capsid protein of nonhuman primate
adeno-associated virus (AAO88189.1).
9872LIU ET AL. J. VIROL.
mental material) because most of these have high sequence
identity to PRDs as well as to flanking cellular sequences over
full-length sequences (Fig. 6). Although a corresponding full-
length sequence of a parvovirus-related cDNA (accession no.
EW905967) was not found in the black-legged tick genome, it
occurred in its Trace-WGS records (Fig. 6G). This suggests
that some trace records of black-legged tick containing ex-
pressed PRDs remain to be assembled into genomic contigs. In
addition, two parvovirus-related cDNAs (DY223558 and
DY224604) of pea aphid did not have corresponding full-
length sequences in either its genome or its Trace-WGS re-
cords, and the parvovirus-related cDNAs of salmon louse con-
tained rearranged structures relative to genomic sequences
(Fig. 6H and I). This finding could possibly be because
genomic sequencing is not yet complete or did not cover some
regions containing PRDs.
Remarkably, the only endogenous parvoviral sequence de-
tected in the fruit fly (Drosophila sechellia) has a distinct viral
genome organization (Fig. 7). The arrangement and sequence
of its NS-like open reading frame (ORF) are similar to those of
Dendrolimus punctatus densovirus (DpDNV), whereas the ar-
rangement and sequence of its CP-like ORF are similar to
those of Periplaneta fuliginosa densovirus (PfDNV). More than
300 EST sequences were detected for the CP-like gene.
Through EST assembly and sequence comparison, we found
that the CP-like gene was expressed as multiple transcript
variants through alternative splicing involving two additional
overlapping ORFs near the 3? end of this gene. The genome
structure and alternatively spliced CP transcripts of this en-
dogenous parvoviral counterpart are different from those of
any known parvoviruses. Currently, we do not know whether it
functions as viral or cellular genes. It will be interesting to
determine if it could be activated as an episomal virus under
certain conditions in the host.
Most of the expressed PRDs were truncated or contained
frameshifts or internal stop codons (Fig. 6), suggesting that
these no longer generate functional proteins. However, the
possibility that these PRDs function at the RNA level cannot
FIG. 6. Schematic representation of some PRDs and their expressed sequences. Colored boxes with arrowheads and swallowtails indicate ORFs
with and without start codons, respectively. Red, nonstructural proteins; blue, structural proteins. Green rectangular boxes indicate transposable
elements. Wavy and vertical lines within boxes indicate sequences containing frameshifts and stop codons compared with viral genes, respectively.
Similar regions of expressed sequences are identified, and the percent nucleotide identity with PRDs is indicated. Note that the full-length
sequence corresponding to a parvovirus-related cDNA (EW905967) containing repeated sequences was not identified in the genomic database but
occurred in the Trace-WGS database (G), suggesting that some trace records containing expressed PRDs remain to be assembled into genomic
contigs. In addition, two parvovirus-related cDNAs (DY223558 and DY224604) containing repeated sequences did not have corresponding
full-length sequences in either the genomic database or the Trace-WGS database (H), and the parvovirus-related cDNAs contained rearranged
structures relative to genomic sequences (I), which could possibly be due to incomplete genomic sequencing or because sequencing did not cover
some regions containing PRDs.
VOL. 85, 2011 WIDESPREAD DENSOVIRUS AND PARVOVIRUS ENDOGENIZATION9873
be ruled out. We did not detect expressed sequences for any
vertebrate PRDs. This is consistent with the observation that
most vertebrate PRDs showed multiple defects and therefore
may not be functional.
The discovery of PRDs unequivocally demonstrates that
parvoviruses are capable of invading diverse animal genomes.
Given the sequence divergence between the extant viruses and
the integrated sequences as well as the limitation of sequenced
animal genomes, the endogenous parvoviruses are likely to be
more widely dispersed than described here. Moreover, integra-
tion of these endogenous viruses must have occurred in cellu-
lar tissues that were subsequently able to contribute to the
germ line and must be fixed in a given species. Therefore, the
integration events that actually occurred in the somatic cells
might be much more frequent and widespread. Integration of
parvoviruses probably involved illegitimate recombination, re-
arrangements, insertions, deletions, and intragenomic prolifer-
ation. Numerous PRDs are adjacent to TEs or repeated ele-
ments. The genomic regions containing these elements are
unstable and prone to form double-strand breaks, which could
have facilitated viral integration (8, 34, 35).
Although incorporation of parvoviral sequences is likely to
be random and incidental as was integration of AAV recom-
binant vectors (33), the presence of these viral sequences in
animal genomes could have functional implications for the
virus-cell interactions. On the one hand, integration of parvo-
viral sequences may confer selective advantages to the host.
Recently, a similar antiviral immunity mechanism has been
proposed to underlie the integration of different viruses in
plants, fungi, and animals (4, 7, 16, 28, 31). DNA double-strand
break repair functions have been reported to serve as a defense
response against parvovirus infection (47). Parvoviruses can
preferentially target genetically unstable, transformed (tumor)
cells, which are deficient in DNA repair mechanisms (43).
Infection with minute virus of mice (MVM) is known to trigger
an innate antiviral response in normal but not transformed
mouse cells (19). Furthermore, we found that the host species
which carry endogenous parvoviruses were generally not
amenable for invasion by their genetically most closely re-
lated exogenous counterparts, which were commonly found
in other species, and the virus-infected species appear not to
be subject to integration by the invading viral sequences.
These observations are consistent with a previous report
that shrimp (Penaeus monodon) populations containing in-
tegrated IHHNV sequences were not infected by IHHNV
(46). Our own findings combined with previous reports suggest
that the host can capture parvovirus-specific sequences during
the repair of double-stranded DNA (dsDNA) breaks and sub-
sequently use them as a defense against the virus. Moreover,
this “integration-based immunity” (28) could be heritable if
the viruses integrated into the germ line and were vertically
transmitted to offspring.
On the other hand, integration of parvoviral sequences may
be associated with parvovirus pathogenesis and persistence.
Parvoviruses are species specific and appear to have evolved
with their host species to such an extent that infection usually
remains subclinical, whereas infections causing lethal disease
may be an uncommon or aberrant situation (32, 42). In con-
trast, species jumping of parvoviruses could result in acute
infections (23, 40, 49). Hence, parvoviruses may circumvent the
FIG. 7. Structural and expression analysis of an endogenous densovirus in the genome of Drosophila sechellia. Colored arrowhead boxes
indicate virus-like ORFs. Red, nonstructural proteins; blue, structural proteins; other colors, hypothetical proteins. Gray sectors connect corre-
sponding homologous regions detected by BLASTp; percent amino acid identities are indicated. Black arrows indicate primers which were used
to amplify and validate the connections. The sequence of the transposable element–densovirus-like gene boundary is shown above the diagram at
the left. Blue bars represent the matched regions of expressed sequences of the endogenous densovirus; arcs indicate introns.
9874 LIU ET AL. J. VIROL.
host immune reaction and maintain an unapparent persistent
infection in their host. However, when these viruses are trans-
ferred to new host species and the precise relationship with
host has yet to be established, this situation might trigger a host
antiviral innate immune response resulting in integration. As a
consequence of suppressed viral propagation by PRD-medi-
ated immunity, parvoviruses persistently infect their hosts at a
low level. Although the host has subclinical illness or is asymp-
tomatic under these conditions, the virus could still be trans-
missible to host offspring or to naive individuals.
Although some animal lineages (such as fishes, tunicates,
and flatworms) are not known to be infected by parvoviruses,
we have found many endogenous parvoviral sequences in their
genomes. This suggests that these species can also be infected
by parvoviruses, or at least could have been in the past. We
identified an endogenous parvoviral sequence in the human
genome. Genomic synteny (orthology) and phylogenetic anal-
ysis of this integrated sequence in mammal species date this
endogenization event back to at least 98 million years, such
that it coexisted with the rise of the mammals. As far as we
know, this is the oldest “viral fossil” known. Some of the
parvovirus-related genes were conserved and transcribed, sug-
gesting that these viral genes are also functional in the host
Our studies also have a potential impact on gene therapy. A
possible consequence of using viral vectors for human gene
therapy is the inadvertent introduction of foreign DNA into
recipient germ cells, causing the introduction of heritable
changes into the offspring of patients (18). This could cause
profound and far-reaching ethical problems. Parvoviruses, es-
pecially AAV, are currently being used for human somatic
gene therapy (10, 13). Previous studies on the use of AAV as
a gene therapy vector suggest that it did not transduce the
germ cells (2, 24). Our findings, however, clearly suggest that
germ line integration of parvoviruses is possible, raising the
concern of germ line transmission of gene therapy vectors.
Hence, the potential risk of germ line integration using AAV
vectors during human gene therapy should be experimentally
assessed before clinical applications.
In summary, our study provided convincing evidence that
parvoviruses have been endogenized into the host genomes
and that this endogenization is widespread and has occurred in
diverse animal genomes. This discovery extends the host range
of parvoviruses and provides fossil records of past viral inva-
sions, and it thereby will help shed light on the evolutionary
history of parvoviruses and hosts, as well as advance our knowl-
edge of host-virus interactions. Furthermore, the capture and
functional assimilation of exogenous viral genes may represent
an important force in animal evolution.
This research was supported in part by the National Basic Research
Program (2006CB101901), the Commonweal Specialized Research
Fund of China Agriculture (3-21), the Program for New Century Ex-
cellent Talents in University (NCET-06-0665), and the Huazhong Ag-
ricultural University Scientific & Technological Self-Innovation Foun-
1. Anisimova, M., and O. Gascuel. 2006. Approximate likelihood-ratio test for
branches: a fast, accurate, and powerful alternative. Syst. Biol. 55:539–552.
2. Arruda, V. R., et al. 2001. Lack of germline transmission of vector sequences
following systemic administration of recombinant AAV-2 vector in males.
Mol. Ther. 4:586–592.
3. Belyi, V. A., A. J. Levine, and A. M. Skalka. 2010. Sequences from ancestral
single-stranded DNA viruses in vertebrate genomes: the Parvoviridae and
Circoviridae are more than 40 to 50 million years old. J. Virol. 84:12458–
4. Belyi, V. A., A. J. Levine, and A. M. Skalka. 2010. Unexpected inheritance:
multiple integrations of ancient bornavirus and ebolavirus/marburgvirus se-
quences in vertebrate genomes. PLoS Pathog. 6:e1001030.
5. Bergoin, M., and P. Tijssen. 2008. Parvoviruses of arthropods, p. 76–85. In
B. W. J. Mahy and M. H. V. Van Regenmortel (ed.), Encyclopedia of
virology 3rd ed., vol. 4. Elsevier, Oxford, United Kingdom.
6. Berns, K., and C. R. Parrish. 2007. Parvoviridae, p. 2437–2477. In D. M.
Knipe and P. M. Howley (ed.), Fields virology, 5th ed. Lippincott-Williams
& Wilkins Publishers, Philadelphia, PA.
7. Bertsch, C., et al. 2009. Retention of the virus-derived sequences in the
nuclear genome of grapevine as a potential pathway to virus resistance. Biol.
8. Bill, C. A., and J. Summers. 2004. Genomic DNA double-strand breaks are
targets for hepadnaviral DNA integration. Proc. Natl. Acad. Sci. U. S. A.
9. Bininda-Emonds, O. R., et al. 2007. The delayed rise of present-day mam-
mals. Nature 446:507–512.
10. Blechacz, B., and S. J. Russell. 2004. Parvovirus vectors: use and optimisa-
tion in cancer gene therapy. Expert Rev. Mol. Med. 6:1–24.
11. Clark, D. A., et al. 2006. Transmission of integrated human herpesvirus 6
through stem cell transplantation: implications for laboratory diagnosis. J.
Infect. Dis. 193:912–916.
12. Corsini, J., J. Tal, and E. Winocour. 1997. Directed integration of minute
virus of mice DNA into episomes. J. Virol. 71:9008.
13. Daya, S., and K. I. Berns. 2008. Gene therapy using adeno-associated virus
vectors. Clin. Microbiol. Rev. 21:583–593.
14. Essbauer, S., and W. Ahne. 2001. Viruses of lower vertebrates. J. Vet. Med.
B Infect. Dis. Vet. Public Health 48:403–475.
15. Fauquet, C. M., M. A. Mayo, J. Maniloff, U. Desselberger, and L. A. Ball.
2005. Virus taxonomy: eighth report of the International Committee on
Taxonomy of Viruses. Elsevier Academic Press, San Diego, CA.
16. Flegel, T. W. 2009. Hypothesis for heritable, anti-viral immunity in crusta-
ceans and insects. Biol. Direct. 4:32.
17. Goff, S. P. 1992. Genetics of retroviral integration. Annu. Rev. Genet. 26:
18. Gordon, J. W. 1998. Germline alteration by gene therapy: assessing and
reducing the risks. Mol. Med. Today 4:468–470.
19. Grekova, S., et al. 2010. Activation of an antiviral response in normal but not
transformed mouse cells: a new determinant of minute virus of mice onco-
tropism. J. Virol. 84:516.
20. Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to
estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696–704.
21. Hordijk, W., and O. Gascuel. 2005. Improving the efficiency of SPR moves
in phylogenetic tree search methods based on maximum likelihood. Bioin-
22. Hu, W. S., and H. M. Temin. 1990. Retroviral recombination and reverse
transcription. Science 250:1227–1233.
23. Hueffer, K., and C. R. Parrish. 2003. Parvovirus host range, cell tropism and
evolution. Curr. Opin. Microbiol. 6:392–398.
24. Jakob, M., et al. 2005. No evidence for germ-line transmission following
prenatal and early postnatal AAV-mediated gene delivery. J. Gene Med.
25. Kapitonov, V. V., and J. Jurka. 2008. A universal classification of eukaryotic
transposable elements implemented in Repbase. Nat. Rev. Genet. 9:411–
412. (Author reply, 9:414.)
26. Kapoor, A., P. Simmonds, and W. I. Lipkin. 2010. Discovery and character-
ization of mammalian endogenous parvoviruses. J. Virol. 84:12628–12635.
27. Kerr, J. R., and N. Boschetti. 2006. Short regions of sequence identity
between the genomes of human and rodent parvoviruses and their respective
hosts occur within host genes for the cytoskeleton, cell adhesion and Wnt
signalling. J. Gen. Virol. 87:3567–3575.
28. Koonin, E. V. 2010. Taming of the shrewd: novel eukaryotic genes from RNA
viruses. BMC Biol. 8:2.
29. Kotin, R. M., et al. 1990. Site-specific integration by adeno-associated virus.
Proc. Natl. Acad. Sci. U. S. A. 87:2211–2215.
30. Le, S. Q., N. Lartillot, and O. Gascuel. 2008. Phylogenetic mixture models
for proteins. Philos. Trans. R. Soc. Lond. B Biol. Sci. 363:3965–3976.
31. Liu, H. Q., et al. 2010. Widespread horizontal gene transfer from double-
stranded RNA viruses to eukaryotic nuclear genomes. J. Virol. 84:11876–
32. Lukashov, V. V., and J. Goudsmit. 2001. Evolutionary relationships among
parvoviruses: virus-host coevolution among autonomous primate parvovi-
ruses and links between adeno-associated and avian parvoviruses. J. Virol.
33. McCarty, D. M., S. M. Young, Jr., and R. J. Samulski. 2004. Integration of
VOL. 85, 2011WIDESPREAD DENSOVIRUS AND PARVOVIRUS ENDOGENIZATION 9875
adeno-associated virus (AAV) and recombinant AAV vectors. Annu. Rev. Download full-text
34. Miller, D. G., L. M. Petek, and D. W. Russell. 2004. Adeno-associated virus
vectors integrate at chromosome breakage sites. Nat. Genet. 36:767–773.
35. Miller, D. G., et al. 2005. Large-scale analysis of adeno-associated virus
vector integration sites in normal human cells. J. Virol. 79:11434–11442.
36. Orend, G., A. Linkwitz, and W. Doerfler. 1994. Selective sites of adenovirus
(foreign) DNA integration into the hamster genome: changes in integration
patterns. J. Virol. 68:187.
37. Papadopoulos, J. S., and R. Agarwala. 2007. COBALT: constraint-based
alignment tool for multiple protein sequences. Bioinformatics 23:1073–1079.
38. Parrish, C. R. 2008. Parvoviruses of vertebrates, p. 85–90. In B. W. J. Mahy
and M. H. V. Van Regenmortel (ed.), Encyclopedia of virology 3rd ed., vol.
4. Elsevier, Oxford, United Kingdom.
39. Prasad, A. B., M. W. Allard, E. D. Green, and N. C. S. Program. 2008.
Confirming the phylogeny of mammals by use of large comparative sequence
data sets. Mol. Biol. Evol. 25:1795–1808.
40. Roekring, S., et al. 2002. Comparison of penaeid shrimp and insect parvo-
viruses suggests that viral transfers may occur between two distantly related
arthropod groups. Virus Res. 87:79–87.
41. Samulski, R. J., et al. 1991. Targeted integration of adeno-associated virus
(AAV) into human chromosome 19. EMBO J. 10:3941–3950.
42. Shadan, F., and L. P. Villarreal. 1993. Coevolution of persistently infecting
small DNA viruses and their hosts linked to host-interactive regulatory
domains. Proc. Natl. Acad. Sci. U. S. A. 90:4117.
43. Shadan, F. F., and L. P. Villarreal. 2000. Parvovirus-mediated antineoplastic
activity exploits genome instability. Med. Hypotheses 55:1–4.
44. Srivastava, A., E. W. Lusby, and K. I. Berns. 1983. Nucleotide sequence and
organization of the adeno-associated virus 2 genome. J. Virol. 45:555–564.
45. Tamura, K., J. Dudley, M. Nei, and S. Kumar. 2007. MEGA4: Molecular
Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol.
46. Tang, K. F., and D. V. Lightner. 2006. Infectious hypodermal and hemato-
poietic necrosis virus (IHHNV)-related sequences in the genome of the
black tiger prawn Penaeus monodon from Africa and Australia. Virus Res.
47. Tauer, T. J., M. H. Schneiderman, J. K. Vishwanatha, and S. L. Rhode. 1996.
DNA double-strand break repair functions defend against parvovirus infec-
tion. J. Virol. 70:6446–6449.
48. Thomson, B. J., S. Efstathiou, and R. W. Honess. 1991. Acquisition of the
human adeno-associated virus type-2 rep gene by human herpesvirus type-6.
49. Villarreal, L. P. 2005. Viruses and the evolution of life. American Society for
Microbiology, Washington, DC.
50. Wentzensen, N., S. Vinokurova, and M. von Knebel Doeberitz. 2004. Sys-
tematic review of genomic integration sites of human papillomavirus ge-
nomes in epithelial dysplasia and invasive cancer of the female lower genital
tract. Cancer Res. 64:3878–3884.
51. Yang, W., and J. Summers. 1999. Integration of hepadnavirus DNA in
infected liver: evidence for a linear precursor. J. Virol. 73:9710–9717.
9876 LIU ET AL.J. VIROL.