Whole Genome Sequencing and Evolutionary Analysis of
Human Papillomavirus Type 16 in Central China
Min Sun1., Lei Gao1., Ying Liu1, Yiqiang Zhao1, Xueqian Wang1, Yaqi Pan1, Tao Ning1, Hong Cai1,
Haijun Yang3, Weiwei Zhai2*, Yang Ke1*
1Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China, 2Center for
Computational Biology and Laboratory of Disease Genomics and Individualized Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China,
3Anyang Cancer Hospital, Anyang, Henan, China
Human papillomavirus type 16 plays a critical role in the neoplastic transformation of cervical cancers. Molecular variants of
HPV16 existing in different ethnic groups have shown substantial phenotypic differences in pathogenicity, immunogenicity
and tumorigenicity. In this study, we sequenced the entire HPV16 genome of 76 isolates originated from Anyang, central
China. Phylogenetic analysis of these sequences identified two major variants of HPV16 in the Anyang area, namely the
European prototype (E(p)) and the European Asian type (E(As)). These two variants show a high degree of divergence
between groups, and the E(p) comprised higher genetic diversity than the E(As). Analysis with two measurements of genetic
diversity indicated that viral population size was relatively stable in this area in the past. Codon based likelihood models
revealed strong statistical support for adaptive evolution acting on the E6 gene. Bayesian analysis identified several
important amino acid positions that may be driving adaptive selection in the HPV 16 population, including R10G, D25E,
L83V, and E113D in the E6 gene. We hypothesize that the positive selection at these codons might be a contributing factor
responsible for the phenotypic differences in carcinogenesis and immunogenicity among cervical cancers in China based on
the potential roles of these molecular variants reported in other studies.
Citation: Sun M, Gao L, Liu Y, Zhao Y, Wang X, et al. (2012) Whole Genome Sequencing and Evolutionary Analysis of Human Papillomavirus Type 16 in Central
China. PLoS ONE 7(5): e36577. doi:10.1371/journal.pone.0036577
Editor: Zhi-Ming Zheng, National Institute of Health-National Cancer Institute, United States of America
Received November 11, 2011; Accepted April 10, 2012; Published May 4, 2012
Copyright: ? 2012 Sun et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work is supported in part by Natural Science Foundation of China 30872937, 30930102, 31000957 and 91131011, "973" Project of National Ministry
of Science and Technology Grant 2011CB504301, 2012CB316505, "863" Key Projects of National Ministry of Science and Technology Grant 2006AA2Z467, Charity
Project of National Ministry of Health 200902002, and Natural Science Foundation of Beijing 7100001. The funders had no role in study design, data collection and
analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: email@example.com (YK); firstname.lastname@example.org (WZ)
. These authors contributed equally to this work.
Human papillomaviruses (HPVs) are common and are clinically
important pathogens . Infection with high risk types of HPV is a
necessary factor for the development of precancerous lesions and
cervical cancer [2,3,4]. Of those that can infect human beings,
over 120 different types have been isolated, and among these
around 20 types are classified as high-risk HPV types (HR-HPV)
based on their established association with cancer [1,5,6]. Among
these high risk HPV types, HPV16 has been found to be the most
prevalent and shows the strongest association with invasive
cervical cancer [7,8].
It is now generally accepted that HPV has co-existed with its
human host over a very long period of time and has evolved into
multiple evolutionary lineages [9,10]. Intratypic variants of
HPV16 have been identified from different geographic locations
and are classified according to their host ethnic groups as
European (including prototypes and Asian types), Asian American,
African and North American [11,12]. Through epidemiological
and in-vitro experimental studies, natural variants of HPV16 have
shown substantial differences in pathogenicity, immunogenicity
and tumorigenicity. These variants may reflect the evolution of the
viral population as it has adapted to local human ethnic groups
. By studying molecular evolution of the viral genomes,
patterns of this evolutionary history can be identified and
important molecular variants responsible for viral pathogenicity
and carcinogenesis may be characterized .
There has been a paucity of HPV 16 population studies in
China. Most previous studies have focused on studying the two
[15,16,17,18,19,20,21,22,23,24]. The major goal of these studies
has been to explore existing variants in the viral population.
Although cataloging extant mutations is a necessary step in
understanding HPV16 evolution, prioritizing the functional
importance of these identified changes by examining their
evolutionary pattern is potentially much more informative. In this
work, we want to expand upon previous studies by characterizing
the genome wide pattern of genetic diversity, and more
importantly we want to pinpoint major genes/variants that are
driving the adaptation of the virus to the human populations in
central China. These evolutionarily important mutants may be
used for further epidemiological and experimental studies where
the functional consequences associated with these variants may be
investigated and vaccines targeting these sites can be developed.
PLoS ONE | www.plosone.org1 May 2012 | Volume 7 | Issue 5 | e36577
The nonsynonymous to synonymous rate ratio dN/dSin protein
coding regions has provided an important means for studying
molecular evolution of genes, and the use of this method has
gained increasing popularity in recent years . The basic
rational of this method is that synonymous mutations do not
change the underlying protein coding sequences and are not
affected by natural selection. The synonymous substitution rate dS
provides a natural measurement for the rate of evolution under
neutral processes . Since nonsynonymous mutations alter the
underlying protein sequences and can be affected by natural
selection, the relative magnitude of the nonsynonymous substitu-
tion rate dNto the synonymous rate dSprovides a good means for
studying natural selection . Specifically, dN/dS.1 represents
positive selection, dN/dS=1 indicates neutral evolution, and dN/dS
,1 implies there is purifying selection (or negative selection).
Thus, the nonsynonymous to synonymous rate ratio dN/dS
provides a proxy for studying natural selection acting on coding
genes, and many statistical methods have been developed to look
for genes which are under the influence of natural selection,
particularly Darwinian positive selection [28,29].
Recent development of codon based substitution models has
provided a natural extension of previous methods by allowing
different codons to have different dN/dSvalues [30,31]. Statistical
methods such as the likelihood ratio test can be employed to
determine whether patterns of molecular evolution at a certain
gene can be explained with models without invoking positive
selection [32,33]. Upon rejecting the null hypothesis in favor of the
alternative model where positive selection is explicitly allowed,
special codon positions under adaptive evolution can be identified
using a Bayesian based approach [33,34]. These methods have
been widely applied to many datasets, including some multiple
whole genome sequences .
In this work, we took a whole genome approach and sequenced
76 HPV16 isolates from Henan Province, China (located in
central China, see Figure S1). We wanted to determine whether
any of the genes in the HPV16 genome is driven by positive
selection. In addition, we sought to identify those codon positions
and associated amino acid changes responsible for the adaptive
evolution in this viral population.
Materials and Methods
HPV viruses often have low concentrations in normal tissues
and are difficult to amplify. In this study, ninety four paraffin-
embedded blocks of cervical cancer samples were collected to
extract the viral genomes from the human population. Of these
ninety four samples, seventy six tested HPV16 positive and were
used for subsequent sequence analysis. These tissue specimens
were collected from women with cervical cancers during their
primary treatment between 2005 and 2007 at Anyang Cancer
Hospital, Henan province, China (Figure S1). All the patients
received no chemotherapy before the surgery. The tumor samples
in this study were a small proportion of the patient samples from
this hospital where surgery (i.e. removing uterus) was chosen as an
effective treatment. Later stage cancer patients will directly go to
radiation therapy without surgery. The clinical stage and
associated age information for these patients were presented in
the supplementary information (Table S1). Official approval from
the Institutional Review Board of Peking University School of
Oncology, and an informed consent was signed by each patient
before sample collection.
5 mm paraffin sections of formalin-fixed tissue were de-
paraffinized in xylene, and washed with 100%, 95%, and 75%
ethanol. The tissue was pelleted, air dried and digested with
proteinase K (200 mg/ml) at 55uC overnight. 200 ul of this
material was isolated using an H.Q. & Q. Tissue DNA Kit (U-
GENE BIOTECHNOLOGY CO., LTD, Anhui, China). DNA
was re-suspended in a final volume of 100ul 10 mM Tris. The
DNA concentration was determined by use of a Nano-Drop
(NanoDrop Technologies, Wilmington, Delaware USA). A full
description of sample processing and DNA extraction were
presented in great detail in supplementary materials (Text S1).
Our experimental work followed strict quality control to avoid
possible contamination from lab environments. As presented in
great detail in a previous study , DNA extraction, PCR
reaction and DNA electrophoresis were done in separate rooms
and specimens moved only in one direction. Laboratory personnel
were instructed to wear gloves when handling the samples and the
experimental area were regularly cleaned before beginning work.
In addition, a routine procedure of inspecting the experimental
area surfaces (cotton bud was first applied to various of surfaces,
e.g. lab benches, subsequently they were soaked into deionized
water overnight. HPV detection was applied to the supernatant.
Experiments were allowed only when negative results were
observed). Additionally, we also used a mouse liver tissue as an
internal control together with the cancer samples. Experiments
were preceded only negative results were observed from these
internal controls (Text S1) and also our previous study .
HPV DNA Detection and HPV16 DNA identification
A modified set of primers, SPF1/GP6+, which amplify an L1
fragment of approximately 184 bp were used. The polymerase
chain reaction was carried out as follows. Qiagen Hot Start Taq
DNA polymerase mixture was used with 4 mM MgCl2, and
10 pmol of each primer. The activation of the enzyme was carried
out at 95uC for 15 minutes, followed by 40 amplification cycles at
95uC for 40 seconds, 49uC for 50 seconds, 72uC for 30 seconds,
and a final extension at 72uC for 5 minutes.
The presence of HPV16 DNA in the L1 positive samples was
evaluated by type-specific PCR which amplified a 335bp (nt231 to
565) fragment of HPV16 E6. PCR was performed at 95uC for
15 minutes, followed by 40 amplification cycles at 95uC for
40 seconds, 57uC for 40 seconds, 72uC for 40 seconds, and a final
extension at 72uC for 5 minutes. The experimental conditions and
amplification regions are presented in the supplementary materials
(Text S1 and Table S2, S3 and S6).
PCR and Sanger sequencing
PCR primers were designed to cover the HPV genome.
Platinum Taq DNA polymerase High Fidelity (Invitrogen Co.,
Carlsbad, CA, USA) was used for PCR experiments (Table S5
and S6). PCR products were purified using a PCR clean-up gel
extraction column (MACHEREY-NAGEL GmbH & Co, Du ¨ren,
Germany) according to the manufacturer’s instructions and were
directly sequenced using a capillary sequencer (ABI Prism 3100).
For this study, in addition to the quality control listed above, five
specimens were chosen to repeat the experimental procedures
(including sample processing, Text S1). A different primer sets
were used to amplify the HPV genome (Table S4). The PCR
products were purified and ligated into the pEASY-T1 vector
(Transgen Biotech Co. LTD, Beijing, China) and 3-5 colonies per
Adaptive Evolution of HPV16 in Central China
PLoS ONE | www.plosone.org2 May 2012 | Volume 7 | Issue 5 | e36577
60. Chen Z, Schiffman M, Herrero R, Desalle R, Anastos K, et al. (2011) Evolution
and taxonomic classification of human papillomavirus 16 (HPV16)-related
variant genomes: HPV31, HPV33, HPV35, HPV52, HPV58 and HPV67. PLoS
One 6: e20183.
61. Garcia-Vallve S, Alonso A, Bravo IG (2005) Papillomaviruses: different genes
have different histories. Trends Microbiol 13: 514–521.
62. Smith B, Chen Z, Reimers L, van Doorslaer K, Schiffman M, et al. (2011)
Sequence imputation of HPV16 genomes for genetic association studies. PLoS
One 6: e21375.
63. Jones RE, Wegrzyn RJ, Patrick DR, Balishin NL, Vuocolo GA, et al. (1990)
Identification of HPV-16 E7 peptides that are potent antagonists of E7 binding
to the retinoblastoma suppressor protein. J Biol Chem 265: 12782–12785.
64. Stephen AL, Thompson CH, Tattersall MH, Cossart YE, Rose BR (2000)
Analysis of mutations in the URR and E6/E7 oncogenes of HPV 16 cervical
cancer isolates from central China. Int J Cancer 86: 695–701.
65. Masterson PJ, Stanley MA, Lewis AP, Romanos MA (1998) A C-terminal
helicase domain of the human papillomavirus E1 protein binds E2 and the DNA
polymerase alpha-primase p68 subunit. J Virol 72: 7407–7419.
66. Blakaj DM, Fernandez-Fuentes N, Chen Z, Hegde R, Fiser A, et al. (2009)
Evolutionary and biophysical relationships among the papillomavirus E2
proteins. Front Biosci 14: 900–917.
67. Giannoudis A, Duin M, Snijders PJ, Herrington CS (2001) Variation in the E2-
binding domain of HPV 16 is associated with high-grade squamous
intraepithelial lesions of the cervix. Br J Cancer 84: 1058–1063.
68. Smith JM, Haigh J (1974) The hitch-hiking effect of a favourable gene. Genet
Res 23: 23–35.
69. Hildesheim A, Schiffman M, Bromley C, Wacholder S, Herrero R, et al. (2001)
Human papillomavirus type 16 variants and risk of cervical cancer. J Natl
Cancer Inst 93: 315–318.
70. Villa LL, Sichero L, Rahal P, Caballero O, Ferenczy A, et al. (2000) Molecular
variants of human papillomavirus types 16 and 18 preferentially associated with
cervical neoplasia. J Gen Virol 81: 2959–2968.
71. Xi LF, Carter JJ, Galloway DA, Kuypers J, Hughes JP, et al. (2002) Acquisition
and natural history of human papillomavirus type 16 variant infection among a
cohort of female university students. Cancer Epidemiol Biomarkers Prev 11:
72. Giannoudis A, Herrington CS (2001) Human papillomavirus variants and
squamous neoplasia of the cervix. J Pathol 193: 295–302.
73. Chin CS, Sorenson J, Harris JB, Robins WP, Charles RC, et al. (2011) The
origin of the Haitian cholera outbreak strain. N Engl J Med 364: 33–42.
74. Zheng ZM, Baker CC (2006) Papillomavirus genome structure, expression, and
post-transcriptional regulation. Front Biosci 11: 2286–2302.
75. Kang S, Jeon YT, Kim JW, Park NH, Song YS, et al. (2005) Polymorphism in
the E6 gene of human papillomavirus type 16 in the cervical tissues of Korean
women. Int J Gynecol Cancer 15: 107–112.
76. Pande S, Jain N, Prusty BK, Bhambhani S, Gupta S, et al. (2008) Human
papillomavirus type 16 variant analysis of E6, E7, and L1 genes and long control
region in biopsy samples from cervical cancer patients in north India. J Clin
Microbiol 46: 1060–1066.
77. Vaeteewoottacharn K, Jearanaikoon P, Ponglikitmongkol M (2003) Co-
mutation of HPV16 E6 and E7 genes in Thai squamous cervical carcinomas.
Anticancer Res 23: 1927–1931.
78. de Boer MA, Peters LA, Aziz MF, Siregar B, Cornain S, et al. (2004) Human
papillomavirus type 16 E6, E7, and L1 variants in cervical cancer in Indonesia,
Suriname, and The Netherlands. Gynecol Oncol 94: 488–494.
Adaptive Evolution of HPV16 in Central China
PLoS ONE | www.plosone.org10 May 2012 | Volume 7 | Issue 5 | e36577