Species-specific endogenous retroviruses shape
the transcriptional network of the human
tumor suppressor protein p53
Ting Wang*, Jue Zeng†, Craig B. Lowe*, Robert G. Sellers*‡, Sofie R. Salama*‡, Min Yang†, Shawn M. Burgess§,
Rainer K. Brachmann†¶?, and David Haussler*‡?
*Center for Biomolecular Science and Engineering, and‡Howard Hughes Medical Institute, University of California, Santa Cruz, CA 95064;†Division of
Hematology/Oncology, Departments of Medicine and Biological Chemistry, University of California, Irvine, CA 92697; and§Genome Technology Branch,
National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892
Edited by Eric H. Davidson. California Institute of Technology, Pasadena, CA, and approved September 26, 2007 (received for review April 27, 2007)
The evolutionary forces that establish and hone target gene
networks of transcription factors are largely unknown. Transpo-
sition of retroelements may play a role, but its global importance,
beyond a few well described examples for isolated genes, is not
clear. We report that LTR class I endogenous retrovirus (ERV)
retroelements impact considerably the transcriptional network of
human tumor suppressor protein p53. A total of 1,509 of ?319,000
human ERV LTR regions have a near-perfect p53 DNA binding site.
The LTR10 and MER61 families are particularly enriched for copies
with a p53 site. These ERV families are primate-specific and trans-
posed actively near the time when the New World and Old World
monkey lineages split. Other mammalian species lack these p53
response elements. Analysis of published genomewide ChIP data
sites are accounted for by ERV copies with a p53 site. ChIP and
expression studies for individual genes indicate that human ERV
p53 sites are likely part of the p53 transcriptional program and
direct regulation of p53 target genes. These results demonstrate
how retroelements can significantly shape the regulatory network
of a transcription factor in a species-specific manner.
disease, but it will require a much improved knowledge of evolu-
tionary forces that shape transcriptional networks. One key will be
understanding the ?5% of the human genome that is under
purifying selection and hence likely to contain functional segments
(1, 2). Two-thirds of these segments do not code for protein and
from transposable elements, and thus lie in the 45% of our genome
once deemed ‘‘junk DNA.’’ Although there are examples where
transposable elements played important roles in the evolution of
gene regulation (3–5), and certain families have deposited putative
regulatory elements that are now subjected to purifying selection
(6–10), it is unclear how extensively these mobile elements have
shaped gene regulatory networks.
Human endogenous retroviruses (ERVs), remnants of exoge-
found in gene-rich regions, consistent with an effect on gene
of the human genome (1). Approximately 10% contain sequences
that once coded for retroviral proteins flanked by two LTRs; the
rest are solitary LTRs (solo LTRs). Despite the general selection
enhancers derived from ERVs is steadily increasing (13). The
evolutionary process of ‘‘exaptation’’ of noncoding functional ele-
ments from viruses and transposons to benefit the host is not well
understood beyond a few examples (14).
The tumor suppressor protein p53 is a sequence-specific tran-
scription factor that responds to cellular stresses by coordinating
expression of genes involved in cell-cycle arrest, senescence, and
apoptosis (15). p53 regulates genes of diverse biological pathways
eciphering gene regulatory networks in the postgenomic era
will provide pivotal insights into genome function and human
and is considered a pleiotropic master regulator. Intense compu-
tational and experimental efforts have determined p53 DNA
binding specificity, mapped many genomic binding sites, and iden-
tified numerous target genes (16–19). However, no studies have
examined a relationship between p53 and transposable elements to
We report that human ERVs actively shape the p53 transcrip-
tional network in a species-specific manner. p53 sites are highly
enriched in LTRs of a few ERV subfamilies. These p53 site-
containing LTRs are in vivo binding sites for p53 and account for
?30% of p53 sites found in a genomewide ChIP analysis (16).
Expression of many genes close to these LTRs is regulated by p53,
based on published data and our experimental validation. These
ERVs likely entered the primate ancestral genome and transposed
within it ?25 Mya to 63 Mya. Their proviruses were probably
responsible for introducing a p53 site. In general, ERV insertions
near genes (including those with p53 sites) were selected against
(11, 12), but a significant fraction of p53 site-containing ERVs may
have been exapted as regulatory sequences to expand the p53
the transcriptional landscape of its surrounding genomic area and
was instrumental in creating a new gene that became part of the
human-specific p53 regulatory network.
p53 Sites Are Enriched in LTRs of Several Human ERV Subfamilies. A
genomewide yeast-based screen identified certain ERV LTR
elements with a p53-responsive site (J.Z. and R.K.B., unpub-
human genome for p53 sites in ERV LTR elements. Using
RepeatMasker (44), 319,106 ERV LTR fragments were identi-
fied, accounting for 5% of the human genome and belonging to
?500 families and subfamilies of LTR-containing retroelements
defined in RepBase (21). Only 1,509 fragments had a near-
perfect p53 site based on our stringent criteria [see Materials and
Methods and supporting information (SI) Text]. Copies with a
p53 site were strikingly overrepresented in the LTR10 and
Author contributions: T.W. and J.Z. contributed equally to this work; T.W., R.K.B., and D.H.
designed research; T.W., J.Z., and C.B.L. performed research; T.W., J.Z., C.B.L., R.G.S., S.R.S.,
M.Y., S.M.B., and D.H. contributed new reagents/analytic tools; T.W., J.Z., R.K.B., and D.H.
analyzed data; and T.W., R.K.B., and D.H. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Freely available online through the PNAS open access option.
Abbreviations: ERV, endogenous retrovirus; 5-FU, 5-fluorouracil.
¶Present address: Genentech BioOncology, South San Francisco, CA 94080.
?To whom correspondence may be addressed. E-mail: email@example.com or haussler@
This article contains supporting information online at www.pnas.org/cgi/content/full/
© 2007 by The National Academy of Sciences of the USA
November 20, 2007 ?
vol. 104 ?
no. 47 ?
MER61 families of class I ERV elements, accounting for 9–53%
sites expected by chance (SI Table 4).
p53 Binds in Vivo to ERV LTRs with a p53 Site. Recently, a study
identified 327 high confidence binding regions for DNA damage-
activated p53 (PET3? loci) using ChIP followed by paired-end
di-tags sequencing (ChIP–PET) (16). We found a 446-fold enrich-
ment of ERV LTRs with p53 sites in these ChIP–PET-confirmed
criteria) compared with ERV LTRs without p53 site (P ? 6 ?
and MER61 subfamilies (Table 2). LTRs in these six subfamilies
accounted for 72 of the 89 overlaps between PET3? regions and
LTRs with p53 sites. By our site definition, 250 of the 327 PET3?
loci contained a p53 site (16). Thus, the 89 ERV LTR fragments
accounted for one-third of PET3? loci with predicted p53 sites
(Table 3). When we further compared the distribution of repetitive
elements within PET3? loci to their genomewide distribution, we
found clear enrichment of ERV LTR copies in PET3? loci (18.3%
(SI Table 5).
To verify these findings, we chose four genomic ERV p53 sites
close to or in the introns of genes for validation. All selected loci,
near DHX37, Neogenin, PTPRM, and TMEM12, showed increased
cells (Fig. 1A and SI Table 6).
ERV LTRs with a p53 Site Exhibit p53-Regulatory Potential. We
collected published data for 392 genes with p53-dependent regu-
lation. This list was compared with the set of 440 closest genes that
are no further than 1 Mb away from each of the 497 LTR10 and
MER61 LTRs with a p53 site. Thirty-one of the 392 known p53
targets were among the 440 genes associated with an LTR-derived
p53 site (P ? 8 ? 10?11) (SI Table 6).
binding site, but it was not previously noted that in some cases the
p53 site is from an ERV LTR. For example, a p53 ChIP study of
is up-regulated by p53. This binding site is provided by an LTR10D
copy (22). Confirmatory ChIP for individual genomic sites of the
ChIP–PET study included a site close to IFNAR. We determined
that the p53 site is provided by an LTR10B1 copy (16).
We investigated the p53-dependent regulatory potential of the
four p53 binding ERV LTRs we confirmed (Fig. 1A) and an
additional LTR near the TP53AP1 gene (which we discuss later) by
assessing expression levels of nearby genes and the enhancer
capacity of these LTRs in a reporter gene assay in response to
various stress treatments. We performed quantitative RT-PCR of
(5-FU), doxorubicin, or UV treatments. All genes showed p53-
dependent activation, albeit to varying degrees, depending on the
type of DNA-damaging agent (Fig. 1B). Under similar cellular
conditions, all LTR fragments tested showed clear p53-dependent
assay (Fig. 1C). Although other factors are likely involved in the
transactivation of these target genes, these results strongly argue
that these ERVs have the potential to contribute to p53-dependent
expression of nearby genes.
Evolutionary History of ERVs with p53 Sites. LTR10 families are
related to provirus HERVIP10, and MER61 families are related to
provirus HUERS-P3B (23). We used three independent methods
to estimate the age of these ERV elements.
First, we looked for the presence of sequences related to LTR10
and MER61 families in extant species by searching all nucleotide
sequence in National Center for Biotechnology Information data-
bases, including trace archives. Copies of LTR10 and MER61
World monkeys (including squirrel monkey and marmoset) and
human) but not in Strepsirrhini prosimians, such as lemurs and
galagos, and not in tree shrews (Fig. 2A). The tarsier is thought to
Table 1. Distribution of p53 sites in ERV/LTR elements
ERV category Fragment no. Copies with p53 sitesPercentageSites/fragmentSites/kb
319,106 1,509 0.47
Predicted p53 sites within repetitive sequences of the human genome. The columns contain the following: names of selected LTR
families defined in RepBase (21); total number of fragments identified in the human genome; total number of fragments that contain
a predicted p53 site; percentage of fragments that contain a p53 site; average number of sites per fragment; average number of sites
per thousand bases. Only selected LTR10 and MER61 subfamilies are listed. For a complete table see SI Table 4.
Table 2. Overlap between ERV/LTR and PET3 ? loci
ERVs without p53 sites that
overlap PET3 ? loci
ERVs with p53 sites that
overlap PET3 ? loci
Enrichment for ERVs
with p53 sites (Fold)
42 of 317,597 (0.0132%)
0 of 169 (0%)
1 of 465 (0.22%)
0 of 155 (0%)
2 of 183 (1.1%)
1 of 137 (0.73%)
1 of 204 (0.49%)
89 of 1,509 (5.9%)
12 of 77 (15.6%)
1 of 47 (2.1%)
11 of 35 (31.4%)
14 of 69 (20.3%)
15 of 157 (9.6%)
19 of 112 (17.0%)
ERV LTRs are separated into those that contain p53 sites and those that do not, and overlap with PET3 ? loci is determined for each
set. Enrichment is determined as a ratio of the two overlap fractions. P value is calculated based on hypergeometric distribution. N/A,
www.pnas.org?cgi?doi?10.1073?pnas.0703637104Wang et al.
prosimians, forming a clade with anthropoids called Haplorrhini
three times coverage of the tarsier genome, we found further
evidence for this phylogenetic relationship in 790 matches to the
anthropoid MER61E element. However, we cannot confidently
identify a tarsier MER61E element that is in the orthologous
location in an anthropoid, so we cannot rule out the possibility that
these matches are caused by an independent endogenization of a
similar, but not identical, retrovirus. None of the five other p53
site-containing LTR subfamilies are found in tarsier. Thus, at least
five of these six ERV families did not show activity in Haplorrhini
before the split between anthropoid and tarsier, but do show wide
activity in anthropoids. This finding places the time of original
activity for these ERVs roughly between 40 Mya and 63 Mya. The
fact that human and rhesus share the majority, but not all, of these
LTRs at orthologous sites (some differences possibly caused by
lineage-specific losses), and human and chimpanzee share essen-
tially all. These estimations are similar to results based on analyzing
coding sequence divergence in the respective ERVs (23, 25).
Second, ?7% of LTRs in each family are linked to an almost
intact ERV internal structure with both flanking LTRs. Knowing
that the 5? and 3? LTR were identical at the time of insertion into
Table 3. ERVs with a p53 site enriched in PET loci
Locus categoryNo. of loci
Loci with p53
sites in ERV
PET1 (low confidence)
PET2 (medium confidence)
PET3? (high confidence)
PET1, PET2, and PET3 ? were defined by ref. 16 as potential p53 binding
in the second column. The third column contains the number of loci that have
a predicted, near-perfect p53 site. The last column indicates the number and
fraction of these predicted p53 sites residing in ERVs.
semiquantitative PCR for four LTR elements near genes, after treatment with
5-FU (375 ?M for 6 h). (B) Reverse transcriptase quantitative real-time PCR for
four genes close to the four LTR elements of A and TP53AP1, after treatment
(60 J/m2). (C) p53 reporter gene assays for five firefly luciferase constructs
in B and C represent 95% confidence interval (?2 SDs). Relative expression
levels were scaled relative to the mean of p53 (?/?) with no treatment
(designated as 1). p21 served as a positive control.
Experimental validation of selected candidates. (A) ChIP of p53 with
Ages of near-complete ERVs, estimated by comparing their 5? and 3? LTR se-
rate of 2.3 ? 10?9and 5 ? 10?9substitutions per site per year, respectively. (C)
by using the Jukes–Cantor formula (26). Fragments were grouped based on the
presence of a p53 site and if they are solo LTRs, then the average and SD were
calculated for each group. Branch length is taken from that of ref. 27.
Estimated age of p53 site-containing ERV families. Insertion time of
Wang et al.
November 20, 2007 ?
vol. 104 ?
no. 47 ?
the ancestral genome, and assuming that ERVs accumulate muta-
tions at a rate of 2.3 to 5 ? 10?9substitutions per site per year (25),
we estimated that LTR10 and MER61 LTRs were transposed ?40
World monkeys and Catarrhini.
Lastly, assuming that the consensus sequence is a good replica of
what was originally inserted for each copy and using the Juke–
Cantor formula (26), we calculated a substitution rate for each
individual LTR (Fig. 2C). Using branch lengths calculated as in ref.
27, we estimated that most were transposed in the same time
with New World monkeys. Combining the above analyses, we
lineage in a time window of ?25 Mya to 63 Mya.
The p53 Sites Were Likely Present in Progenitor LTRs. LTR elements
long ago. One of two main scenarios likely explains p53 sites in a
subset of elements in each subfamily. (i) The p53 site was present
in the LTRs of the founder retroviruses or proviruses, and some
copies lost the p53 site over time. (ii) The founder provirus had no
p53 site, and a later copy acquired a p53 site through mutations,
started to propagate, and created a subset of ERV insertions with
a p53 site.
Our analysis is most consistent with the first model. First, it is
unlikely that individual LTRs evolved a p53 site independently as
subfamilies clearly contains a p53 site. Second, for each of the six
subfamilies, sequence comparison shows that LTRs with p53 site
arguing against a single ancestor for LTRs with a p53 site that is
distinct from all other LTRs. Third, the estimated age of individual
LTRs with a p53 site has a wide range and is slightly biased toward
older age compared with LTRs without a p53 site, except for
MER61E (Fig. 2C), indicating that p53 site-containing ERVs
occurred relatively early in the evolution of these ERV families.
To further support the hypothesis that the p53 site was present
in the progenitor LTR, we aligned all genomic copies of each LTR
family and found that many copies without a p53 site likely lost it
copies and clustered them based on the presence of a p53 site and
percentage identity with the consensus (Fig. 3A). The horizontal
stretch identifies the reconstructed ancestral p53 site. The recon-
structed site (TGACATGCCCAGACATGCCT) is an almost per-
fect p53 site. Only the first position ‘‘T’’ is different from the ‘‘R
(purine)’’ in the canonical consensus (28), the influence of which is
not likely significant (18). Sequences above the blue line in Fig. 3A
have a p53 site, whereas sequences below do not. More than 10%
of LTR10B1 fragments are almost free of large deletions or
insertions and contain a p53 site (Fig. 3A, cluster II). Another 20%
with a p53 site display deletions at the 5? end that stop right before
the p53 site (Fig. 3A, cluster I). LTRs without a p53 site either have
a small deletion around the p53 site (Fig. 3A, cluster III), or a large
deletion from the 5? end to just 3? of the p53 site (Fig. 3A, cluster
IV). This trend is summarized in Fig. 3B by plotting the coverage
(Fig. 3C). The percentage identity within the p53 site is no better
than the background because the binding site allows similar levels
of degeneracy. The preponderance of large deletions in LTR10B1
progenitor sequence had an intact p53 site.
Impact of ERVs with p53 Sites on the Dynamics of Genome Regulation.
ERV insertions tend to be distant from genes and those fixed in
introns are preferentially oriented antisense to the enclosing gene
(11, 12). We observed the same for LTR10 and MER61 families.
Compared with LTRs without a p53 site, LTRs with a p53 site are
even less frequently found close to genes and show stronger strand
bias, suggesting overall selection against insertion that could influ-
ence gene expression inappropriately (SI Text and SI Fig. 4).
have higher sequence similarity. The nucleotides are color-coded. A, green; C, yellow; G, red; T, blue. The image was created with Jalview (20). (B) Frequency of
coverage of the consensus sequence by the genomic copies. (C) Average percentage identity of each base in the consensus sequence that is aligned to multiple
www.pnas.org?cgi?doi?10.1073?pnas.0703637104Wang et al.
However, our data indicate that in a subset of LTRs, which have a
p53 site positioned correctly to affect gene regulation, the site is
likely to be bound by p53 and affect regulation of the nearby gene.
For example, for 21 of 31 known p53 target genes associated with
an ERV-derived p53 site, the ERV p53 site is the closest one to the
gene. Similarly, for those genes identified by us as being close to an
LTR with a p53 site, the LTR p53 site is also the closest one to the
may exist additional p53 sites that are functional between the ERV
p53 site and the candidate gene, or in the vicinity, especially when
the ERV is relatively faraway from the gene. The ERV p53 sites
regulation, the elucidation of which awaits further investigation.
Given the importance of p53 regulation in all mammals, we
expect that the already established and fine-tuned parts of the p53
regulatory network were not substantially disturbed when ERVs
with a p53 site populated the genome ?40 Mya. More likely, the
introduction of ERVs resulted in new lineage-specific subnetworks
of p53 regulation (29). Indeed, when we analyzed Gene Ontology
annotation of potential new p53 target genes, we found that genes
related to well known and conserved p53 functions, such as
response to DNA damage, are completely absent from our list of
genes close to ERVs with a p53 site (P ? 6 ? 10?13). In contrast,
cell adhesion-related genes are enriched (30/413, P ? 7 ? 10?8).
This finding gives indirect evidence that p53 adopted a new role of
regulating cell adhesion-related processes in the primate lineage
to this process. Interestingly, cell adhesion genes are known to be
enriched for exapted elements and to evolve rapidly in the human
lineage (6, 9, 30).
Our model predicts that insertion of an ERV with a p53 site has
the potential to significantly change the dynamics of regional
transcriptional activity. In one case, such ERV activities may even
have led to the creation of a new gene, TP53AP1 (see SI Fig. 5A for
details of genomic locus). TP53AP1 is known to be up-regulated by
p53 in human cells (31, 32) (Fig. 1 B and C), but its exact function
and the p53 site is within the MER61E LTR element of a
near-intact HUER-P3B ERV copy that inserted ?40 Mya. It is
shared among human, chimp, and rhesus, but not outgroup species
such as galago and treeshrew.
The predicted protein sequence of TP53AP1 is not homologous
to any other protein outside of the primate lineage, and the gene
Thus, it is possible that the gene is not translated, but rather
functions through its RNA product. We compared the genomic
regions in chimp, rhesus, galago, treeshrew, mouse, and rat that are
orthologous to the predicted human ORF (SI Fig. 5C). The
nucleotide sequences of these species share modest similarity, but
their coding potential is quite different. The sequence of rodents
are 100% identical, the rhesus sequence contains 14 nucleotide
substitutions in the presumed ORF region with 11 resulting in
amino acid changes and 1 in a premature stop codon (SI Fig. 5C).
Assuming that TP53AP1 is indeed a protein-coding gene, then this
lineage to form the novel protein that exists in apes today. The
Regardless of whether TP53AP1 is translated or not, there is a
large difference in transcript distribution in this area. Human
a gene structure suggestive of nonsense mediated decay, in humans
GenBank mRNA evidence for TP53AP1 is as abundant as for
CROT. For the orthologous mouse region, ample transcripts exist
for CROT, but none for the corresponding region of TP53AP1 (SI
with a p53 site correlates significantly with reshaping of the
transcriptional landscape in its vicinity, creating a transcript that is
now part of the p53 regulatory network.
When Barbara McClintock (34) first discovered transposable ele-
ments in maize ?50 years ago, she called them ‘‘controlling
elements’’ because they altered gene expression. Roy Britten and
Eric Davidson (35) then proposed that coordinated regulatory
systems in animal genomes are encoded by networks of repetitive
sequence relationships and that this presents an attractive evolu-
tionary scenario for gene regulatory networks. This idea is now
supported by the discovery of numerous promoters and enhancers
shown that a substantial proportion of constrained nonexonic
to transposons as a major creative force in the evolution of
mammalian gene regulation (9, 10).
Our study brings p53, a pleiotropic transcription factor and one
of the most important master regulators, into this paradigm. We
discovered a unique distribution pattern of p53 sites within repet-
itive sequences of the human genome, and several ERV families
emerged as being substantially enriched for p53 sites in their LTRs.
Whole-genome ChIP data (16) revealed that p53 occupies such
LTR p53 sites in vivo, and our targeted ChIP analysis for four LTR
p53 sites confirmed this assessment.
Our data indicate that LTRs with a p53 site contain strong
p53-dependent regulatory potential. Our five chosen LTRs all
exhibited p53-dependent enhancer activity in reporter gene
activity. The p53 effect depended on the type of DNA-damaging
agent, consistent with a crucial contribution of damage-specific
cofactors to p53 activity. The impact of ERV p53 sites on gene
expression is likely to be modulated by cofactors, long-range
interactions, and local chromatin structure, as well as additional
genomic p53 sites. The details of such impact await further, more
Our evolutionary analysis determined that ERVs with a p53 site
populated the ancestral primate genomes ?40 Mya. Insertions and
deletions of these elements created a turnover of p53 sites adjacent
to many genes. In this manner, the spread of ERVs may have
accelerated the evolution of the host genome (36). Specifically, by
depositing p53 sites throughout the genome, ERVs may have
recruited and even created new target genes to be part of the
primate-specific p53 regulatory network. Other sites not directly
driving expression of nearby genes may also play an important role
in the p53 network by sequestering activated p53 en masse, thereby
titering the amount of active p53 needed for the appropriate
response to a cellular stress.
have benefited the retrovirus. Retroviral endogenization is a par-
asitic process that allows retroviruses to survive and pass on their
own genetic material. A p53 site in their LTRs may have given
stress, a condition known to activate ERVs in some species (37).
Thus, interaction between what was once foreign genetic informa-
in multiple ways, and the impact may have been extensive enough
that the relationship between primates and ERVs could ultimately
have to be viewed as symbiotic in a generalized sense.
Our findings provide support for Britten and Davidson’s theory
of regulatory network evolution through mobile elements. Several
interesting corollaries stem from this hypothesis. First, the general
independently of any specific lineage or evolutionary time frame.
ERVs or other mobile elements may have mediated the expansion
to speculate that such activity, much too distant in the past for us
to recognize, may have contributed to the crowning of p53 as a
Wang et al.
November 20, 2007 ?
vol. 104 ?
no. 47 ?