Identification of an overprinting gene in Merkel cell
polyomavirus provides evolutionary insight into
the birth of viral genes
Joseph J. Cartera,b,1,2, Matthew D. Daughertyc,1, Xiaojie Qia, Anjali Bheda-Malgea,3, Gregory C. Wipfa,
Kristin Robinsona, Ann Romana, Harmit S. Malikc,d, and Denise A. Gallowaya,b,2
Divisions ofaHuman Biology,bPublic Health Sciences, andcBasic Sciences anddHoward Hughes Medical Institute, Fred Hutchinson Cancer Research Center,
Seattle, WA 98109
Edited by Peter M. Howley, Harvard Medical School, Boston, MA, and approved June 17, 2013 (received for review February 24, 2013)
Many viruses use overprinting (alternate reading frame utiliza-
tion) as a means to increase protein diversity in genomes severely
constrained by size. However, the evolutionary steps that facili-
tate the de novo generation of a novel protein within an ancestral
ORF have remained poorly characterized. Here, we describe the
identification of an overprinting gene, expressed from an Alter-
nate frame of the Large T Open reading frame (ALTO) in the early
region of Merkel cell polyomavirus (MCPyV), the causative agent
of most Merkel cell carcinomas. ALTO is expressed during, but not
required for, replication of the MCPyV genome. Phylogenetic anal-
ysis reveals that ALTO is evolutionarily related to the middle T
antigen of murine polyomavirus despite almost no sequence sim-
ilarity. ALTO/MT arose de novo by overprinting of the second exon
of T antigen in the common ancestor of a large clade of mamma-
lian polyomaviruses. Taking advantage of the low evolutionary
divergence and diverse sampling of polyomaviruses, we propose
evolutionary transitions that likely gave birth to this protein. We
suggest that two highly constrained regions of the large T antigen
ORF provided a start codon and C-terminal hydrophobic motif nec-
essary for cellular localization of ALTO. These two key features,
together with stochastic erasure of intervening stop codons,
resulted in a unique protein-coding capacity that has been pre-
served ever since its birth. Our study not only reveals a previously
undefined protein encoded by several polyomaviruses including
MCPyV, but also provides insight into de novo protein evolution.
gene evolution|synonymous substitution|disordered motifs
duplication or gene rearrangement have been characterized, less
is known about the birth of new genes de novo. One particularly
intriguing mechanism of de novo gene birth is via “overprinting,”
in which a novel overprinting gene is encoded as an alternate ORF
within an ancestral “overprinted” gene (1). Overprinting results in
two unrelated functional proteins encoded as overlapping ORFs
within the same DNA sequence. However, the origins of such
a complex evolutionary solution have remained elusive.
Viruses appear to be especially adept at this form of evolu-
tionary innovation. This frequent use of overprinting is likely the
result of the severe constraints imposed on viral genome size,
making gene innovation more likely to occur as overprinting
rather than within a noncoding region (2). Due to the numerous
examples of overprinting in single-stranded RNA viruses, a great
deal of research has focused in particular on this class of viruses
(3–6). However, small DNA viruses, such as adenoviruses, pap-
illomaviruses, and polyomaviruses, have a similar requirement to
maximize the coding capacity of their genomes. In this study, we
have taken advantage of our identification of an overprinting gene
born in the ancestor of a large clade of polyomaviruses to in-
vestigate the steps that allowed generation of this unique protein.
Polyomaviruses are nonenveloped viruses containing an ∼5-kb
circular double-stranded DNA genome that infect a wide range of
he birth of new genes has fascinated biologists for decades.
Although the steps required to generate a new gene by gene
mammals and birds (7, 8). Polyomaviruses leverage alternative
splicing of the early region (ER) of the genome to generate pro-
tein diversity, including the large and small T antigens (LT and ST,
respectively) and the middle T antigen (MT) of murine poly-
omavirus (MPyV), which is generated by a novel splicing event and
overprinting of the second exon of LT. Some polyomaviruses can
drive tumorigenicity, and gene products from the ER, especially
SV40 LT and MPyV MT, have been extraordinarily useful models
to study the viral and host processes required for cellular trans-
formation (9–11). More recently, a new oncogenic polyomavirus,
Merkel cell polyomavirus (MCPyV), was discovered in Merkel cell
carcinoma (MCC), a rare but aggressive form of skin cancer,
providing the first established case of a human cancer stemming
from a polyomavirus infection (12, 13).
In comparison with other human polyomaviruses, the MCPyV
LT protein contains an expanded, highly divergent region of
∼200 amino acids encompassing two conserved short linear
motifs (Fig. 1A and SI Appendix, Fig. S1). Given the precedent of
protein diversity generated from the ER of polyomaviruses, we
investigated the implications of this highly divergent MCPyV LT
region. We identified an alternate T antigen ORF, hereafter
referred to as ALTO, overprinting within this region in MCPyV
LT (Fig. 1A) in the +1 frame relative to the second exon of LT.
ALTO is expressed during, but is not required for, viral genome
replication. Interestingly, despite almost no sequence similarity,
ALTO most probably shares a common evolutionary origin with
the other known case of overprinting in polyomaviruses, the
second exon of MT from MPyV. Indeed, we find that preserva-
tion of the overprinting ALTO/MT ORF defines a large mono-
phyletic clade of mammalian polyomaviruses. By comparing LT
genes in a phylogenetic context among polyomaviruses within
and outside this clade, we propose an evolutionary model for the
generation of a previously undiscovered viral gene de novo by
A Previously Uncharacterized Overprinting ORF in MCPyV. We noted
that the expanded, highly divergent region of MCPyV LT could
encode a previously undefined protein, which we named ALTO.
ALTO is the result of an overprinting ORF that is +1 frameshifted
Author contributions: J.J.C., M.D.D., H.S.M., and D.A.G. designed research; J.J.C., M.D.D.,
X.Q., A.B.-M., G.C.W., and K.R. performed research; A.B.-M. and G.C.W. contributed new
reagents/analytic tools; J.J.C., M.D.D., X.Q., A.R., H.S.M., and D.A.G. analyzed data; and
J.J.C., M.D.D., H.S.M., and D.A.G. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
1J.J.C. and M.D.D. contributed equally to this work.
2To whom correspondence should be addressed. E-mail: email@example.com.
3Present address: Institute for Systems Biology, Seattle, WA 98109.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
| July 30, 2013
| vol. 110
| no. 31 www.pnas.org/cgi/doi/10.1073/pnas.1303526110
relative to the second exon of LT (Fig. 1A and SI Appendix,
Fig. S1). All four previously mapped T antigen mRNAs (14)
have the capacity to encode ALTO (Fig. 1B). Based on conser-
vation of the AUG codon at genome position 880 among closely
related polyomaviruses (SI Appendix, Fig. S2), we predicted that
ALTO encodes a 248- or 250-residue protein depending on the
3′ splice variant (Fig. 1 B and C). The only recognizable features
of the predicted ALTO protein are proline-rich segments and
a C terminus containing a basic motif just upstream of a hydro-
phobic motif (Fig. 1C).
Overprinted ORFs often display decreased divergence at
synonymous DNA sites, because a synonymous site change in
the ancestral, overprinted ORF (i.e., LT) will likely result in
a nonsynonymous change in the overprinting ORF (i.e., ALTO).
In fact, decreased synonymous divergence is a means to identify
new overprinting ORFs (3, 15–18) and indicates that both ORFs
are being evolutionarily preserved and protected from pre-
sumably deleterious nonsynonymous changes. We investigated
whether the region of LT that overlaps ALTO displays decreased
synonymous site variation. We first compared divergence at
synonymous sites (dS) in LT genes in sequenced isolates of
MCPyV and found that the dS within the ALTO-encoding re-
gion is 2.5 times lower than the rest of the gene (dS = 0.048 for
the region of overlap vs. 0.12 for nonoverlap). We found a sim-
ilar pattern when we compared LT genes between MCPyV and
its closely related polyomaviruses (ChimpPyV1a, ChimpPyV2a,
and GorillaPyV), which also have the capacity to encode ALTO.
In all pairwise comparisons, synonymous divergence is decreased
directly overlapping the ALTO-encoding region (Fig. 1D), con-
sistent with both reading frames being maintained throughout
viral evolution. Interestingly, the ALTO coding sequence was
intact in all MCPyV sequences deposited in GenBank from
healthy individuals (n = 31) and from nonlesional skin from
MCC patients (n = 3) but was predicted to have truncating or
frame shift mutations in 19 of 46 MCC tumors (SI Appendix,
Table S1). These same mutations have also been previously
shown to truncate LT (14).
ALTO Is Expressed in Cells During Viral DNA Replication. To validate
that ALTO is a bona fide MCPyV expressed protein, we trans-
fected an intact circular MCPyV genome (SI Appendix, Fig. S3)
into human HEK293 cells and monitored protein expression and
viral DNA replication. Under these conditions, the MCPyV ge-
nome has been shown to replicate its DNA for several days (19,
20). Lysates from transfected cells were analyzed by immunoblot
using immune rabbit serum that had been generated against
a mixture of two ALTO peptides (Fig.1C). We detected ALTO
at the expected size (27.6 kDa; Fig. 2A) in cells transfected with
wild-type DNA. Importantly, the ALTO protein was not detec-
ted using a MCPyV genome bearing a point mutation that
eliminated the predicted ALTO initiation codon without af-
fecting the LT encoded protein sequence. These results con-
firmed that ALTO is a bona fide protein expressed during
MCPyV replication and validated that ALTO translation ini-
tiates at position 880 in the genome.
To measure the effect that ALTO had on DNA replication, we
harvested low molecular weight DNA 2 d posttransfection and
digested it with DpnI and BamHI. DpnI cleaves bacterially
derived (methylated) input DNA multiple times, but not
HEK293-derived (unmethylated) replicated DNA, allowing us to
distinguish the two DNA species (SI Appendix, Fig. S3). We
detected newly replicated MCPyV genome using ethidium bro-
mide-stained gels and Southern blots. The MCPyV genome
migrated at ∼5 kb following cleavage with BamHI to linearize
the genome (Fig. 2 B and C). Interestingly, loss of ALTO
resulted in no measurable difference in viral DNA replication in
HEK293 cells (Fig. 2 B and C). Thus, we conclude that ALTO
does not play an essential role in MCPyV genome replication in
tissue culture but may play an accessory role in the viral life
cycle, as has been observed for many overprinting genes (4).
We observed that ALTO localized to foci in the cytoplasm in
cells transfected with the MCPyV ER (Fig. 2D). No staining was
observed in cells transfected with vector only, or in cells treated
with preimmune rabbit serum (SI Appendix, Fig. S4). An ALTO
variant lacking an intact C-terminal hydrophobic motif (residues
1–234) appeared to be distributed diffusely throughout the cy-
toplasm. Thus, deletion of the hydrophobic C terminus alters the
subcellular localization of ALTO.
Overprinting ORF Defines a Clade of Polyomaviruses. ALTO has not
been previously described in polyomaviruses, despite extensive
investigation of prototype members such as SV40, BKPyV, and
JCPyV. We therefore wished to investigate the evolutionary origin
of ALTO, especially in light of the fact that all polyomaviruses
Small T antigen
1622 3078 2778
Large T antigen
Alternative T antigen
open reading frame
57kD T antigen
Region of low
MCPyV LT and
other human LTs
Amino acid window
0-100 100-200 200-300 300-400 400-500 500-600 600-700 700-800
site divergence (dS)
window plot (10-aa window size) of amino acid identity between MCPyV
and eight other human polyomaviruses. The arrows depict the LT and ALTO
ORFs; gray boxes indicate functional domains (14) (DNAj, activators of DNAk
chaperones; OBD, origin binding domain; YGS/T and LxCxE, conserved linear
motifs). (B) Four transcripts from the early region (ER) are present in MCPyV
positive tumors (14). ORFs encoding ST, LT, 57 kT, and ALTO are shown as
rectangles with colors indicating reading frames. Start sites, termination
sites, and splice sites are shown (accession no. HM355825). (C) ALTO protein
sequence is shown. Underlined peptides were used to generate immune
rabbit sera used in Fig. 2. Blue and red boxes indicate the C-terminal basic
and hydrophobic motifs, respectively. (D) Sliding window plot (100-aa win-
dow size) of the synonymous site divergence (dS) between the indicated LT
ortholog pair. Below is the schematic of LT and ALTO ORFs as in A.
ALTO is a unique protein encoded by the MCPyV ER. (A) Sliding
Carter et al. PNAS
| July 30, 2013
| vol. 110
| no. 31
encode the overprinted (original) LT gene. Initial similarity
searches [using tblastn (21)] indicated that ALTO was exclusively
present in MCPyV and three other hominoid polyomaviruses
(ChimpPyV1, ChimpPyV2, and GorillaPyV1). However, we also
noted that, like ALTO, the second exon of MPyV MT is the
result of overprinting onto the LT gene (22). We thus considered
whether ALTO and the second exon of MT are evolutionarily
related, representing a single gene birth.
We found striking parallels between ALTO and MT despite
finding little significant sequence similarity between the two
proteins. Both ALTO and MT are produced as a result of
overprinting of LT second exon in the same +1 frame, and both
contain a basic motif just upstream of a hydrophobic C terminus,
which is immediately followed by a stop codon (Figs. 1 and 3).
These C-terminal motifs are located in almost exactly the same
region in both MCPyV and MPyV genomes, overlapping LT’s
origin binding domain (OBD) that is highly conserved among
polyomaviruses (Fig. 1A). These similarities raised the possibility
that ALTO and MT derived from a common ancestor.
To delineate the evolutionary origins of ALTO and MT, we
performed a comprehensive survey of all polyomaviruses to as-
sess whether they could encode a putative overprinting ORF,
which terminated in the same region as ALTO and MT. Phy-
logenies based on LT alone (Fig. 3A and SI Appendix, Fig. S5A)
or the entire genome (SI Appendix, Fig. S5B) were similar and
consistent with recently published phylogenetic analyses of LT
genes from polyomaviruses (23–25). Strikingly, we found that
one lineage of polyomaviruses that includes MCPyV and MPyV,
as well as other polyomaviruses from primates, raccoons, rodents,
bats, and one other human PyV, Trichodysplasia spinulosa-asso-
ciated polyomavirus (TSPyV), all share the property of potentially
encoding an overprinting ALTO/MT-like gene (Fig. 3A). More-
over, for all viruses with an intact ORF, the predicted proteins
encode a hydrophobic C terminus similar to both ALTO and MT
(Fig. 3B). In contrast, polyomaviruses outside this lineage, in-
cluding the closely related human HPyV9 and Monkey B-Lym-
photropic papovavirus (LPV), contain several stop codons in this
frame and therefore cannot encode an ALTO/MT-like protein
(Fig. 3 A and C). We can thus delineate this clade of poly-
omaviruses, which we suggest calling Almipolyomaviruses (for
ALTO or middle T containing polyomaviruses), by virtue of their
conserved utilization of this alternate ORF (Fig. 3 and
SI Appendix, Fig. S3A). Taking advantage of the homology be-
tween the helicase domains of LT from polyomaviruses and E1
from papillomaviruses, we propose a root for the LT tree that
further supports a single monophyletic clade of Almipolyoma-
viruses (SI Appendix, Fig. S6). We conclude that ALTO/MT was
born de novo specifically in the common ancestor of Almipo-
lyomaviruses, via the utilization of an alternate overprinting ORF
in the second exon of LT to encode a unique protein (Fig. 3).
Following its birth, the overprinting ALTO/MT protein has
been preserved in all members of the Almipolyomaviruses. In
addition to a lack of stop codons, we observed additional evi-
dence for selection acting to preserve this ORF in several branches
of the Almipolyomavirus clade. As described above, divergence at
synonymous sites is depressed specifically in the LT region that
overlaps ALTO in MCPyV and related hominoid PyVs (Fig. 1D).
We observed a similar pattern when we compared the other hu-
man Almipolyomavirus, TSPyV, to its closely related primate PyV
(OrangutanPyVBo). In contrast, a pairwise comparison between
two non-Almipolyomaviruses, HPyV9 and LPV, showed no such
depression of synonymous divergence in the region that would
correspond to ALTO (SI Appendix, Table S3). Such data indicate
that, following overprinting of ALTO/MT onto LT, selection has
acted to preserve it in the Almipolyomaviruses.
Polyomaviruses have evolved a variety of strategies to maximize
is evident in the use of alternate translation start sites in the late
region. In contrast, all previous early region polyomavirus
proteins were believed to initiate at the same start codon in the
first exon of LT but were capable of producing multiple products
via alternative splicing of a primary transcript (8). Here, we have
identified a unique protein (ALTO) that is translated from an
internal start site in an alternate ORF of the second exon of LT
and expressed during MCPyV viral genome replication. Using the
unique position and frame of ALTO in the MCPyV genome, we
found that ALTO was a member of a family of proteins that
included distantly related ALTO homologs and the second exon
of MT from MPyV. Although these proteins have little recog-
nizable sequence identity, they share several key features in-
cluding a termination codon at the same location in the genome,
a hydrophobic C terminus flanked by one or more basic amino
acids, and a relatively high proportion of prolines.
The ALTO/MT proteins we have identified could encode
novel functions by which Almipolyomaviruses can manipulate
the host. Indeed, other genes encoded by the early region are
critical for cellular transformation and tumorigenicity (9–11).
For instance, SV40 uses large and small T antigens (LT and ST),
produced by alternative splicing of the same gene, to inactivate
tumor suppressors. Binding partners of SV40 LT first led to the
identification of the tumor suppressor protein p53 (26, 27) and
revealed mechanisms to disrupt the pRb and p130 pathways (10).
The main transforming activity of MPyV is encoded by the
middle T antigen (MT), which splices the first exon corresponding
DAPIALTO LT Merged
replication. (A) Cells (HEK293) were transfected with religated wild-type
MCPyVw156 DNA (MCPyV) or MCPyV genome harboring a point mutation
disrupting the ATG start codon of ALTO at nt 880 (880ko). An equal amount
of an unrelated plasmid was used as a negative control. Immunoblots for
ALTO and GAPDH were performed on protein lysates collected 48 h after
transfection or with a lysate from cells transfected with an ER expression
plasmid (using 50-fold less lysate). (B) Low molecular weight DNA was iso-
lated from transfected cells and digested with BamHI and DpnI. Digestion
products were separated on an agarose gel and visualized by staining with
ethidium bromide or (C) Southern blotting using a radioactive MCPyV DNA
probe. The input control plasmid contained 350 bp of the MCPyV genome
overlapping the region used for hybridization. Input and replicated MCPyV
DNAs are indicated (SI Appendix, Fig. S3). (D) Cells (U2OS) were transfected
with expression plasmids containing the MCPyV ER, ER with a C-terminally
truncated ALTO (ALTO 1–234), or an empty vector control. After 48 h, the
cells were fixed and stained for ALTO (green), MCPyV LT (red), and DNA
(blue). Individual and merged images are shown. (Scale bar: 30 μm.)
ALTO is expressed from the MCPyV early region during viral genome
| www.pnas.org/cgi/doi/10.1073/pnas.1303526110 Carter et al.
to ST to the overprinting frame in exon 2. Studies of MT in
MPyV have led to discoveries of phosphotyrosine kinases (PTK)
and phosphoinositide 3-kinase (PI3K) signaling (9, 22). Although
ALTO does not appear to be essential for viral genome repli-
cation, we find that it is mutated in many cancer tissues and that
elucidation of its function may uncover new roles for ALTO in
MCPyV manipulation of human cell signaling and perhaps in
The origin of de novo genes has been of great interest and has
principally been studied in eukaryotes where studies have shown
that novel genes can arise by stochastic removal of frameshift or
stop codons from previously noncoding regions or long noncoding
RNA (28, 29). RNA viruses, because of their genome size limi-
tations, are especially proficient at generating novel genes by
frame (3). However, the steps required to generate an intact,
functional overprinting ORF have been difficult to discern. This
because it is often not possible to find closely related viruses that
branched just before the origin of overprinting. The relatively low
evolutionary divergence of polyomaviruses, their broad sampling,
and our phylogenetic tracing of the birth of ALTO/MTproteins to
a single, monophyletic lineage of Almipolyomaviruses therefore
provide an unprecedented opportunity to dissect the de novo
evolutionary origins of an overprinting gene.
By comparing the ALTO/MT reading frames of Almipolyo-
maviruses with the lack of equivalent ORFs in non-Almipolyo-
maviruses, we investigated what preexisting constraints and novel
innovations may have led to overprinting of ALTO/MT onto LT.
Our ability to study LT and ALTO status along several points of
the well-resolved polyomavirus phylogeny allows us to propose
a series of steps that could have plausibly given rise to ALTO/
MT (Fig. 4). First, we observed that, compared with non-Almi-
polyomaviruses, Almipolyomaviruses generally possess longer
LT proteins (SI Appendix, Fig. S7A) that derive from an exten-
sion of the linker region between the start of exon 2 and the
conserved OBD (SI Appendix, Fig. S7B). Placed in the context of
the phylogenetic tree, there is an obvious increase in the size of
20 amino acids
Overlap with LT OBD domain
Exon 2 of MPyV MT
ALTO protein of MCPyV
Conserved stop codon
and LT ancestor
Start of LT OBD domain
Start of LT OBD domain
Table S2) were translated as a +1 frameshift from the coding region of LT exon 2. Predicted stop codons are shown in red. MCPyV ALTO is shown in blue, and
exon 2 of MT from MPyV is in gray. Due to length differences and lack of similarity between exon 2 sequences, translations are aligned by the conserved
region of the genome overlapping the OBD. The phylogeny shown here is based on an amino acid alignment of LT (SI Appendix, Fig. S5A) but is consistent
with a larger phylogeny of polyomaviruses based on the entire genomic sequence (SI Appendix, Fig. S5B). Asterisks indicate >75% bootstrap support in both
phylogenies. (B) The C terminus of the indicated alternate ORFs is shown aligned to their conserved hydrophobic region. Conservation data and hydro-
phobicity plots were generated using Geneious software (40). Stop codons are shown as red asterisks. (C) Alignment of the non-Almipolyomaviruses as in B.
A single clade of polyomaviruses has a predicted alternate ORF. (A) Genomic sequences from several polyomaviruses (accession nos. in SI Appendix,
Expansion of LT protein
Removal of stop codons
Ancestral LT gene
Potential for +1 frame
start codon and
Innovation of ALTO/MT
Create MT through splicing
corresponding to the LT and the ALTO/MT alternate frame are shown as in
Fig. 1. Dark blue denotes the region of ALTO/MT that encodes the hydro-
phobic domain and overlaps the evolutionarily conserved OBD of LT. Red
asterisks denote stop codons. Each schematic represents a proposed step
along the evolutionary pathway that led to the current repertoire of LT,
ALTO, and MT ORFs. An extant viral example of each schematic is also given,
with the exception of the inferred ancestor to all Almipolyomaviruses, which
gave birth to all current ALTO/MT-containing viruses. Although MCPyV
encodes ALTO wholly within exon 2, it remains to be experimentally de-
termined whether other Almipolyomaviruses besides MPyV and HamsterPyV
have splice variants that encode an MT-like protein. If all basally branching
Almipolyomaviruses were found to encode an MT protein, we would revise
this model to suggest that that the initial ALTO/MT innovation was actually
MT-like and later became ALTO-like due to the use of the downstream
methionine (conserved due to the LT YGS/T motif).
A model for polyomavirus de novo gene birth by overprinting. ORFs
Carter et al.PNAS
| July 30, 2013
| vol. 110
| no. 31
this region that exactly coincides with the region of overprinting
(SI Appendix, Fig. S7B). Interestingly, this increase in size is al-
ready evident in the clade of viruses most closely related to
Almipolyomaviruses (represented by HPyV9 and LPV), whose
LT region is the same size as the most basally branching Almi-
polyomaviruses (represented by TSPyV and OrangutanPyVBo).
These results suggest that expansion of LT likely preceded and
may have been necessary to accommodate the new ORF al-
though alternate models are plausible (Fig. 4). Although the LT
expansion could be the result of neutral drift, it may have been
selected for to facilitate additional protein–protein interactions.
Consistent with this hypothesis, the expanded region in MCPyV
LT is required for interactions with hVam6p, a cytoplasmic
protein involved in lysosomal processing (30).
A second feature that likely facilitated the birth of ALTO/MT
was the presence of two highly conserved regions of LT that
could encode important functional elements in the ALTO reading
frame. First, the experimentally determined start codon of ALTO
(Fig. 2) overlaps exactly with a conserved YGS/T motif located
near the Rb-binding motif of LT (SI Appendix, Fig. S8). Al-
though the functional significance of the YGS/T motif is not
understood, our observation that it is highly conserved supports
structural evidence for its importance for proper folding of the
DNAj domain (31). Interestingly, this potential to encode a start
codon in the beginning of exon 2 is conserved across every pol-
yomavirus containing the YGS/T motif, including many non-
Almipolyomaviruses, suggesting that the start codon of ALTO
was in place before ALTO was born. We observe a similar
phenomenon with the highly conserved LT OBD, which we hy-
pothesize resulted in the potential to encode a hydrophobic C
terminus in the alternate frame (ALTO) immediately upon birth
of ALTO/MT. Again, the precursor to this alternate frame hy-
drophobic domain can also be seen in non-Almipolyomaviruses
even before the birth of ALTO although, in most cases, this
domain contains stop codons (Fig. 3C). An intact hydrophobic
domain is the most conserved feature of the ALTO/MT proteins
that we identified, and we have now shown that this domain is
required for the subcellular distribution of ALTO (Fig. 2D),
similar to its role in MT localization and function (32). As a re-
sult, we expect that this hydrophobic motif and likely the cellular
targeting that it confers are critical for ALTO/MT and may have
immediately provided the newly born ALTO/MT with func-
tionality. Intriguingly, in an independent case of overprinting in
parvoviruses, the single most recognizable feature of the over-
printing SAT protein is its transmembrane segment that is re-
quired for correct localization (33).
Thus, preexisting constraints on the LT ORF predisposed the
alternate frame with these two key features, which bookended the
ALTO overprinting ORF even before its birth. We posit that
stochastic erasure of intervening stop codons then allowed the
generation of an intact overprinting ORF (Fig. 4). Interestingly,
because there can be no selective pressure for stop codon removal
before the generation of an intact ALTO, erasure of stop codons
is likely both the rate-limiting step in terms of de novo gene birth,
as well as the step that is driven almost entirely by stochasticity.
We propose that the combination of these three events—LT gene
expansion, the fact that the highly conserved YGS/T motif and
OBD had the potential to encode the N-terminal start codon and
C-terminal hydrophobic domain in the alternate (ALTO) frame,
and stochastic erasure of stop codons—allowed the LT gene to be
overprinted by ALTO/MT. Importantly, the retention of an intact
ALTO/MT ORF in all Almipolyomaviruses and the decrease in
synonymous divergence in the LT region that overlaps ALTO
indicate that this alternate ORF is functionally conserved in
Almipolyomaviruses, and likely has been since its birth.
After the ALTO/MT-like ORF invention, there was further
innovation of this ORF, including very rapid divergence of pri-
mary sequence, additional gene expansion, and/or changes in
splicing patterns (Fig. 4). In fact, there is very little sequence
relatedness among predicted ORFs in Almipolyomaviruses out-
side of the hydrophobic C terminus and an immediately adjacent
patch of polar residues (Fig. 3B). However, conserved features
are evident in ALTO comparisons among closely related Almi-
polyomaviruses (SI Appendix, Alignment S2) and reveal that, in
the region of overlap, neither ALTO nor LT appears to have
consistently higher sequence conservation, making it unclear
which reading frame has been more highly constrained during
evolution (SI Appendix, Table S4). Interestingly, unlike some
structured overprinted/overprinted pairs, including the recently
discovered influenza A PA-X protein and the PA protein it
overprints (15), neither ALTO nor the overprinted region of LT
is predicted to contain substantial protein structure (SI Appendix,
Fig. S1). Together, these findings are consistent with studies
showing that overprinting genes of viruses frequently encode
rapidly evolving, disordered proteins (3, 4). However, we hy-
pothesize that ALTO/MT proteins likely perform related func-
tions despite this low sequence and structural conservation,
similar to many examples of disordered proteins that can ac-
complish very similar biological tasks (34). Consistent with this
possibility of similar function, the second exons of MT proteins
from MPyV and HamsterPyV are less than 20% identical, yet
both MTs use short linear sequences, such as phosphatidylino-
sitol-3 kinase motifs, whose location is not conserved between
the two proteins but retention of these motifs has been shown to
be important for the functions of both proteins (22). We have also
identified putative short linear motifs among closely related ALTO
proteins of MCPyV-related viruses (SI Appendix, Alignment S2).
Beginning with our identification of a unique protein, ALTO,
overprinting the LT gene in MCPyV, we have found that the
presence of an overprinting ORF defines a single clade of poly-
omaviruses. In addition to MCPyV and another human virus,
TSPyV, the Almipolyomavirus clade also includes MPyV, in which
the overprinting sequence encodes the second exon of the MT
oncogene. Because of the strict conservation of LT in all poly-
as the subsequent lineage-specific subfunctionalization that oc-
curred after this protein was born. Thus, our studies not only reveal
a previously undefined protein expressed from the early region in
also provide insight into the evolutionary steps required to achieve
protein novelty in genomes so constrained by size.
Materials and Methods
Cell Culture. Osteosarcoma (U2OS) and adenovirus-transformed human em-
10% (vol/vol) FBS and penicillin–streptomycin (Life Technologies). All cell
lines were maintained at 37 °C in 5% CO2.
Plasmids and Cloning. For expression of MCPyV proteins in tissue culture, the
MCPyVw156 early region DNA was amplified by PCR and cloned into the
(Clontech). The ALTO start site mutation (880ko) and truncation mutation
(ALTO 1–234) were generated by site-directed mutagenesis. Details of
cloning and mutagenesis, including primer sequences, are described in SI
Appendix, Table 1).
Generation of Anti-ALTO Antibodies. Antibodies to ALTO were generated
using a combination of two synthetic peptides (C-GMGPSQRPRLQTPSPED and
C-DPVAERRPPIQEENPAH) (ProSci), which were used to immunize two rabbits.
Both rabbits had equivalent responses, and serum from the final bleed was
used at a dilution of 1:10,000. This study was reviewed and approved by the
Institutional Animal Care and Use Committee (IACUC) at Fred Hutchinson
Cancer Research Center prior to implementation.
MCPyV Replication Assay. Cells (106; HEK293) were transfected with 1 μg of
religated MCPyV genomic DNA, either MCPyVw156 or its ALTOko mutant, or
an equal amount of pUC.M plasmid DNA as negative control. Cells were
harvested 48 h posttransfection, and low molecular weight DNA was
| www.pnas.org/cgi/doi/10.1073/pnas.1303526110Carter et al.
extracted by the modified Hirt extraction method (20). One hundred Download full-text
nanograms of DNA was digested, visualized by staining with ethidium
bromide, and transferred onto Hybond N+ nylon membrane (GE Healthcare
Biosciences) for Southern blot analysis (35). In an MCPyV PCR using primers
751F and 1110R (SI Appendix, Table S5),32P-dATP was incorporated in the
product to prepare the probe specific for MCPyV DNA. The blot was exposed
to BioMax MR autoradiography film (Kodak).
Transient Transfections.For transfections using expression plasmids, cells were
grown in six-well dishes for immunoblots or on poly-L-Lysine–coated cover-
slips (Sigma Chemical) for immunofluorescence. The total amount of DNA in
all transfections was 750 ng per well and was kept constant using empty
vector DNA. All transfections used Fugene HD (Roche Diagnostics) following
the manufacturer’s protocol.
Immunoblotting. Cells were harvested 48 h posttransfection, and proteins
were extracted as described by Neumann et al. (19). Briefly, cells were
resuspended in lysis buffer (50 mM Tris, pH 8.0, 150 mM NaCl, 1% Nonidet
P-40, 0.5% Na-Deoxycholate, 5 mM EDTA, 0.1% SDS, and proteinase in-
hibitor mixture; Roche), and concentrations were determined using the DC
protein assay (Bio-Rad). Proteins (50 μg) were separated on 8–16% SDS/PAGE
(Life Technologies) gels and transferred to Immobilon-P membrane (EMD
Millipore). Membranes were blocked in 2.5% nonfat dry milk–PBS solution
and incubated at 4 °C for 1 h with primary antibody [rabbit serum or GAPDH
mAb (EMD Millipore)], followed with horseradish peroxidase-conjugated
secondary antibody. Proteins were detected with Lumi-Light (Roche) and
exposed to Kodak XAR-5 film (Fisher Scientific).
Immunofluorescence Staining. Transfected U2OS cells were washed with PBS,
fixed with 4% paraformaldehyde for 10 min, permeabilized with 0.05%
Triton X-100, washed three times with PBS, and blocked with 5% normal
goat serum. Primary antibodies were either immune or preimmune rabbit
using a Deltavision confocal microscope (Applied Precision), and the fluo-
rescent images were created using softWoRx 5.0 (Applied Precision).
Phylogenetic Analysis. Publically available polyomavirus sequences (SI Ap-
pendix, Table S2) were used for all analyses. LT protein sequences and
complete viral genomes were aligned using MUSCLE (36), and maximum
likelihood phylogenetic trees were constructed using PhyML (37) imple-
mented at www.phylogeny.fr (38). Trees were visualized using FigTree
(http://tree.bio.ed.ac.uk/software/figtree/). dS calculations on available MCPyV
isolates (SI Appendix, Table S1) with intact LT and ALTO genes and sliding
window calculations were performed using codeml (39).
ACKNOWLEDGMENTS. We appreciate helpful discussions and comments
from members of the D.A.G., H.S.M., A. Dusty Miller, and Paul Nghiem
laboratories and Christopher Buck (National Cancer Institute). We ac-
knowledge assistance from Julio Vasquez and Elizabeth Jensen (Fred
Hutchinson Cancer Research Center Shared Resources). This work was sup-
ported by a Cancer Research Institute Irvington Fellowship (to M.D.D.),
a National Science Foundation CAREER award (to H.S.M.), National Institutes
of Heath Grants CA064795 and CA042792 (to D.A.G.). H.S.M. is an Early Career
Scientist of the Howard Hughes Medical Institute.
1. Keese PK, Gibbs A (1992) Origins of genes: “Big bang” or continuous creation? Proc
Natl Acad Sci USA 89(20):9489–9493.
2. Chirico N, Vianelli A, Belshaw R (2010) Why genes overlap in viruses. Proc Biol Sci
3. Sabath N, Wagner A, Karlin D (2012) Evolution of viral proteins originated de novo by
overprinting. Mol Biol Evol 29(12):3767–3780.
4. Rancurel C, Khosravi M, Dunker AK, Romero PR, Karlin D (2009) Overlapping genes
produce proteins with unusual sequence properties and offer insight into de novo
protein creation. J Virol 83(20):10719–10736.
5. Firth AE, Atkins JF (2010) Candidates in Astroviruses, Seadornaviruses, Cytorhabdoviruses
and Coronaviruses for +1 frame overlapping genes accessed by leaky scanning. Virol J
6. Pavesi A, De Iaco B, Granero MI, Porati A (1997) On the informational content of
overlapping genes in prokaryotic and eukaryotic viruses. J Mol Evol 44(6):625–631.
7. Schowalter RM, Pastrana DV, Pumphrey KA, Moyer AL, Buck CB (2010) Merkel cell
polyomavirus and two previously unknown polyomaviruses are chronically shed from
human skin. Cell Host Microbe 7(6):509–515.
8. Cole CN, Conzen SD (2001) Fields Virology, Polyomaviridae: The Viruses and Their
Replication, ed Knipe DM,HowleyPM (Lippincott Williams & Wilkens, Philadelphia),
9. Cheng J, DeCaprio JA, Fluck MM, Schaffhausen BS (2009) Cellular transformation by
Simian Virus 40 and Murine Polyoma Virus T antigens. Semin Cancer Biol 19(4):218–228.
10. DeCaprio JA (2009) How the Rb tumor suppressor structure and function was revealed
by the study of Adenovirus and SV40. Virology 384(2):274–284.
11. Ahuja D, Sáenz-Robles MT, Pipas JM (2005) SV40 large T antigen targets multiple
cellular pathways to elicit cellular transformation. Oncogene 24(52):7729–7745.
12. Feng H, Shuda M, Chang Y, Moore PS (2008) Clonal integration of a polyomavirus in
human Merkel cell carcinoma. Science 319(5866):1096–1100.
13. Becker JC, et al. (2009) MC polyomavirus is frequently present in Merkel cell carci-
noma of European patients. J Invest Dermatol 129(1):248–250.
14. Shuda M, et al. (2008) T antigen mutations are a human tumor-specific signature for
Merkel cell polyomavirus. Proc Natl Acad Sci USA 105(42):16272–16277.
15. Jagger BW, et al. (2012) An overlapping protein-coding region in influenza A virus
segment 3 modulates the host response. Science 337(6091):199–204.
16. Lin MF, et al. (2011) Locating protein-coding sequences under selection for additional,
overlapping functions in 29 mammalian genomes. Genome Res 21(11):1916–1928.
17. Firth AE, Brown CM (2006) Detecting overlapping coding sequences in virus genomes.
BMC Bioinformatics 7:75.
18. Sabath N, Landan G, Graur D (2008) A method for the simultaneous estimation of
selection intensities in overlapping genes. PLoS ONE 3(12):e3996.
19. Neumann F, et al. (2011) Replication, gene expression and particle production by
a consensus Merkel Cell Polyomavirus (MCPyV) genome. PLoS ONE 6(12):e29112.
20. Schowalter RM, Pastrana DV, Buck CB (2011) Glycosaminoglycans and sialylated gly-
cans sequentially facilitate Merkel cell polyomavirus infectious entry. PLoS Pathog
21. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: A new generation of protein
database search programs. Nucleic Acids Res 25(17):3389–3402.
22. Fluck MM, Schaffhausen BS (2009) Lessons in signaling and tumorigenesis from pol-
yomavirus middle T antigen. Microbiol Mol Biol Rev 73(3):542–563.
23. Fagrouch Z, et al. (2012) Novel polyomaviruses in South American bats and their re-
lationship to other members of the family Polyomaviridae. J Gen Virol 93(Pt 12):
24. Lim ES, et al. (2013) Discovery of STL polyomavirus, a polyomavirus of ancestral re-
combinant origin that encodes a unique T antigen by alternative splicing. Virology
25. Tao Y, et al. (2013) Discovery of diverse polyomaviruses in bats and the evolutionary
history of the Polyomaviridae. J Gen Virol 94(Pt 4):738–748.
26. Lane DP, Crawford LV (1979) T antigen is bound to a host protein in SV40-trans-
formed cells. Nature 278(5701):261–263.
27. Reich NC, Levine AJ (1982) Specific interaction of the SV40 T antigen-cellular p53
protein complex with SV40 DNA. Virology 117(1):286–290.
28. Kaessmann H (2010) Origins, evolution, and phenotypic impact of new genes. Ge-
nome Res 20(10):1313–1326.
29. Xie C, et al. (2012) Hominoid-specific de novo protein-coding genes originating from
long non-coding RNAs. PLoS Genet 8(9):e1002942.
30. Liu X, et al. (2011) Merkel cell polyomavirus large T antigen disrupts lysosome clus-
tering by translocating human Vam6p from the cytoplasm to the nucleus. J Biol Chem
31. Kim HY, Ahn BY, Cho Y (2001) Structural basis for the inactivation of retinoblastoma
tumor suppressor by SV40 large T antigen. EMBO J 20(1-2):295–304.
32. Zhou AY, et al. (2011) Polyomavirus middle T-antigen is a transmembrane protein
that binds signaling proteins in discrete subcellular membrane sites. J Virol 85(7):
33. Zádori Z, Szelei J, Tijssen P (2005) SAT: A late NS protein of porcine parvovirus. J Virol
34. Dunker AK, Silman I, Uversky VN, Sussman JL (2008) Function and structure of in-
herently disordered proteins. Curr Opin Struct Biol 18(6):756–764.
35. Southern E (2006) Southern blotting. Nat Protoc 1(2):518–525.
36. Edgar RC (2004) MUSCLE: Multiple sequence alignment with high accuracy and high
throughput. Nucleic Acids Res 32(5):1792–1797.
37. Guindon S, et al. (2010) New algorithms and methods to estimate maximum-likeli-
hood phylogenies: Assessing the performance of PhyML 3.0. Syst Biol 59(3):307–321.
38. Dereeper A, et al. (2008) Phylogeny.fr: Robust phylogenetic analysis for the non-
specialist. Nucleic Acids Res 36(Web Server issue):W465-9.
39. Yang Z (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol
40. Drummond AJ, et al. (2010) . Geneious version 5.0 created by Biomatters. Available at
Carter et al.PNAS
| July 30, 2013
| vol. 110
| no. 31