The recombinant origin of emerging human norovirus GII.4/2008:
intra-genotypic exchange of the capsid P2 domain
Tommy Tsan-Yuk Lama,b,c*, Huachen Zhub,c, David K. Smithb,c, Yi Guanb,c, Edward C.
Holmesd,e and Oliver G. Pybusa
a Department of Zoology, University of Oxford, Oxford, UK.
b International Institute of Infection and Immunity, Shantou University Medical College, Shantou,
c State Key Laboratory of Emerging Infectious Diseases, Li Ka Shing Faculty of Medicine, The
University of Hong Kong, Hong Kong SAR, China.
d Center for Infectious Disease Dynamics, Department of Biology, The Pennsylvania State
e Fogarty International Center, National Institutes of Health, USA.
Running title: Recombinant GII.4/2008 Norovirus
Dr. Tommy Tsan-Yuk Lam
Department of Zoology, University of Oxford, South Park Road, OX1 3PS, UK;
JGV Papers in Press. Published January 11, 2012 as doi:10.1099/vir.0.039057-0
JGV Papers in Press. Published January 13, 2012 as doi:10.1099/vir.0.039057-0
Type of article: Short Communication
Content Category: Animal, Positive-strand RNA viruses
Words in Summary: 138
Words in Main text: 2,340
Number of references: 36
Number of figures and tables: 3 figures
Number of supplementary figures and tables: 3 figures, 1 table
Keywords: norovirus, GII.4, 2008 variant, recombination, P2 domain
GII.4 noroviruses are a major cause of acute gastroenteritis in humans. A new variant of
GII.4, the 2008 variant, has recently increased its prevalence on a global scale. A previous study
of this variant in Japan suggested it may be of recombinant origin, with a breakpoint at the
ORF1/ORF2 junction. Here, examination of the evolutionary origin of the 2008 variant based on
a larger sample of worldwide GII.4 norovirus sequences revealed a more complex pattern of
recombination between the 2006a and 2006b-like variants of genotype GII.4, involving the P2
antigenic domain. Double (termed ‘2008i’) and triple (termed ‘2008ii’) recombinant forms of
2008 variants were identified. This study highlights the possible importance of intra-genotypic
recombination over antigenic regions in driving norovirus evolution, and is suggestive of a
process analogous to the antigenic shift of influenza A virus by reassortment.
Noroviruses, a major etiologic agent of gastroenteritis in humans (Atmar & Estes, 2006), are
classified into five genogroups (GI to GV), which are in turn divided into a number of genotypes
(Zheng et al., 2006). Surveillance since the 1990s has confirmed that genotype 4 of the GII
genogroup (denoted GII.4) dominates most worldwide outbreaks of norovirus-related
gastroenteritis (Jin et al., 2008; Kroneman et al., 2008; Nayak et al., 2009; Park et al., 2010;
Siebenga et al., 2009).
Over the last twenty years GII.4 norovirus has evolved a series of genetic variants, some of
which persisted and replaced the previously circulating variants (Reuter et al., 2008; Siebenga et
al., 2010). The continuous evolution of the viral capsid VP1 protein has been proposed as a key
mechanism by which new antigenic variants are generated. This is supported by observations of
temporal changes in histo-blood group antigen binding characteristics, antigenic relatedness, and
genome composition among GII.4 norovirus variants (Lindesmith et al., 2008).
From 2006-07 onwards the GII.4 ‘2006b variant’ dominated GII.4 norovirus outbreaks
worldwide (Chan-It et al., 2011; Chung et al., 2010; Eden et al., 2010; Jin et al., 2008; Kittigul et
al., 2010). However, the ‘2008 variant’ has been reported increasingly frequently since its first
isolation in 2008 (Belliot et al., 2010; Motomura et al., 2010) and, at the time of writing, has
been identified in 12 countries on four continents (Belliot et al., 2010; Eden et al., 2010; Han et
al., 2011; Mans et al., 2010; Motomura et al., 2010; Pang et al., 2010; Schenk et al., 2010).
Previous phylogenetic analysis showed the 2008 variant emerged from the Hunter-2006a
ancestral lineage, with which it shares 88% nucleotide similarity (Eden et al., 2010). However,
the phylogenetic position of the 2008 variant differed according to the genome region
investigated (Belliot et al., 2010; Eden et al., 2010; Mans et al., 2010; Motomura et al., 2010),
with a recombination breakpoint identified at the ORF1/ORF2 junction of the Japanese 2008
variants (Motomura et al., 2010).
To better understand the role of recombination in the evolutionary origin of the 2008 variant,
partial and complete genome sequences of GII.4 noroviruses from GenBank were examined.
After alignment using Muscle v3.8 (Edgar, 2004), a maximum likelihood (ML) phylogeny of
GII.4 ORF2 sequences (1600nt; n=600) was estimated using PhyML v3.0 (Guindon & Gascuel,
2003). Topological robustness was assessed by performing ML bootstrap analysis (1,000
pseudo-replicates; using PhyML) and Bayesian phylogeny sampling (5,000 samplings, each at
every 2,000 states; using MrBayes 3.1.2; (Ronquist & Huelsenbeck, 2003)). This revealed a
distinct monophyletic lineage of variant strains (n=35; Fig. 1, 2) which was previously defined as
‘2008 variant’ (Belliot et al., 2010; Mans et al., 2010), and was also known as ‘2008a variant’
according to Motomura et al. 2010. Other known GII.4 variants are also identified in Fig. 1
Systematic screening for recombination was undertaken using the suite of statistical tests
implemented in RDP v3.41(Martin & Rybicki, 2000). Sequences of 2008 variant viruses (1620nt;
n=35; Table S1) consistently exhibited a mosaic fragment about 300-500nt long in the ORF2 P2
domain (statistically significant in >2 RDP tests, after Bonferroni correction). Results suggest the
2006b and 2006a variants were the ancestors of the small and large ORF2 fragments,
respectively. Next, we studied the complete genomes of 2008 variants (7509bp; n=11; Table S1)
which revealed similar breakpoints in the P2 region. However, all 2008 variants (except
Orange/NSW001P/2008/AU, denoted ‘NSW001P’) also exhibited an additional breakpoint near
the ORF1/ORF2 junction (statistically significant in >4 RDP tests, after Bonferroni correction).
Further analysis using similarity plots in Simplot v3.5.1 (Lole et al., 1999) gave concordant
results (Fig. S1a,b). Using the best recombination model selected by the phylogenetic GARD
method (Kosakovsky Pond et al., 2006), the recombination breakpoints were estimated to be at
nucleotide positions 5398, 5866 (5932 for NSW001P) and 6477 (Fig. S1a,b; right axis).
The estimated breakpoints thus define three non-recombinant genomic regions of the 2008
variant: region 1 (1-5397nt), region 2 (5398-5898nt plus 6478-7509nt) and region 3 (5899-
6477nt) (Fig. 2). Separate phylogenies of these three regions were estimated using both ML (in
PhyML) and Bayesian (in MrBayes) methods (Fig. 2). The GTR+I+Г4 substitution model was
used and 1,000 bootstrap replicates were computed. The MrBayes analysis was performed using
107 steps, sampled every 2,000. In the phylogeny of region 1 (mainly ORF1; Fig. 2 left),
NSW001P clustered with the 2006a variants while the other 2008 variant viruses (n=10; Table
S1) clustered with the Hunter variant. For region 2, all 2008 variants clustered with 2006a
variants (Fig. 2 center) whereas in region 3 they clustered with the 2006b variants (Fig. 2 right).
Most groupings are supported by high bootstrap scores and Bayesian posterior probabilities (Fig.
2). It is noteworthy that the 2008 variant group diverged from a node (indicated by clade support
values 97/1(53/.84) in the rightmost tree of Fig 2) that is genetically distant (0.129
substitutions/site) to the most recent common ancestor of 2006b in region 3. This suggests a
possibly much older origin in the region 3 of 2008 variant, compared to the regions 1 and 2 of
2008 variant originated from Hunter and 2006a variants. Hereafter, we use the term ‘2006b-like’
to describe the more distant relationship of 2006b lineage with the 2008 variant. Osaka
sequences of the Cairo variant are clustered with, but ancestral to, the 2008 and 2006b variants in
region 3 (Fig. S3).
Phylogenetic-based analyses of recombination may be affected by convergent evolution leading
to similar sets of amino acid substitutions in independent lineages. To examine such effects, we
aligned the consensus sequences of different GII.4 variants, and removed the codon positions
that are non-synonymous across variant lineages from the alignment. ML and Bayesian
phylogenies were then estimated from these synonymous-site sequences (4710bp, 1250bp and
412bp in length for the regions 1, 2 and 3, respectively). The resultant tree topologies are largely
congruent with the region 1, 2 and 3 phylogenies obtained from the full data (Fig. 2). Clustering
of 2008 and 2006b variants in the region 3 synonymous-site phylogeny has a lower ML
bootstrap support (53%) but quite high Bayesian posterior probability (0.84).
There are 35 viral isolates falling inside the 2008 variant lineage in ORF2 phylogeny (Fig. 1).
Only 11 of them have complete genomes which were confidently classified into two recombinant
forms (Fig. 2, 3): (1) a 2006a/2006b-like double recombinant and (2) a 2006a/2006b-like/Hunter
triple recombinant, which we term ‘2008i’ and ‘2008ii’, respectively. The remaining 24 of them
showed a recombinant pattern in the P2 domain similar to that observed in the 11 complete
genomes, hence these 24 are, at least, double recombinants like 2008i. Phylogenetic analyses of
six of them (those for which sufficient sequence before nucleotide position 5397 is available to
distinguish between 2008i and 2008ii; see Fig. S2a) suggests that three have the 2008i
recombination structure (orange without asterisk in Fig 1b) and three have the 2008ii structure
(yellow without asterisk in Fig 1b). Furthermore, Han et al. reported five 2008 variants in Korea
based on their partial sequences at ORF1/ORF2 junction (Accession numbers: HM635099-
HM635103; the available ORF2 region is too short to be included in the ORF2 phylogeny of Fig.
1; Han et al., 2011). Our re-analysis suggests that three of them might have the 2008ii genome
structure, one might have the 2008i structure, and one (Seoul/0654/2009/KOR) might be a
Hunter variant, because it branched from a node closer to the Hunter lineage than to the 2008ii
lineage, but could also derive from the early ancestor of 2008ii (Fig. S2a). Complete genomes of
these isolates would help to confirm these speculations.
A simple recombination history may explain the emergence of the norovirus GII.4 2008 variants
(Fig. 3). An ancestral 2006b-like virus might have recombined (by contributing a fragment of its
P2 domain) with an ancestral 2006a virus creating the 2008i recombinant. Some of these
continue to circulate (e.g. NSW001P). Others may have subsequently acquired a 5′ genomic
region (5397nt in length) from an ancestral Hunter variant to form the triple recombinant 2008ii.
This scenario involves the least number of recombination steps, but conflicts with the
observation that the 2008i lineage is slightly more distant than the 2008ii lineage to the most
recent common ancestor of the 2008 lineage (Fig. 1b). To resolve the order of recombination
with more certainty, full-length genome sequences of early 2008 variants (such as
8483/2008/ZAF and 2405/2008/ZAF) and molecular dating analyses (Lam et al., 2008; Tee et al.,
2009) are needed.
There are subsequent recombinants derived from 2008 and other GII.4 variants. Recently,
Motomura et al. (2010) reported a recombinant group, called 2008b, that is 2008-like in ORF1
and 2006b-like in other genomic regions. Our phylogenetic analysis (Fig. S2b) suggests that the
origin of the ORF1 of 2008b is most likely 2008ii: the 2008b group diverged before (but quite
close to) the 2008ii lineage in a phylogeny of positions 1-5096nt, and is placed inside the 2008ii
in a phylogeny of positions 4264-5096nt (Fig. S2b). There are two other Japanese sporadic
recombinant strains, Iwate5/2007/JP (termed 2007b in Motomura et al., 2010) which is a
recombinant of 2008b and 2006b (Fig. S2c), and Toyama5/2008/JP, which is a recombinant of
2006b and 2008ii (Fig. S2d). A South African isolate (4638/2008/ZAF) for which only an ORF2
sequence is available, may be a mosaic of Hunter and 2008i variants, but with a different
breakpoint location (at position 5966nt) to that of the 2008ii strain (Fig. S2e).
Both mutation and recombination are important mechanisms in norovirus evolution (Bull &
White, 2011; Rohayem et al., 2005). As the entire clade of 2008 variant viruses appear to have a
recombinant origin, intra-genotypic recombination may be more important in the emergence of
new norovirus GII.4 variants than previously suggested (Lindesmith et al., 2008; Siebenga et al.,
2007). By exchanging the P2 domain, which contains numerous receptor-binding and antigenic
sites (Donaldson et al., 2010; Lindesmith et al., 2008; Prasad et al., 1999; Siebenga et al., 2007;
Tan et al., 2003) between two GII.4 lineages, the virus may be able to move more rapidly across
its fitness landscape (Burke, 1997) in the face of high levels of immunity in the host population.
Incorporation by recombination of a P2 domain from a minority strain that exhibits epitopes
unfamiliar to host immunity may result in a fitter recombinant progeny. This scenario may
explain the emergence of the 2008 variant, whose 2006b-like parent may represent a minority
circulating lineage that has not been discovered in the past. The putative antigenic change caused
by recombination within the P2 domain of norovirus may resemble the antigenic shift caused by
reassortment in human influenza virus, whereby gene segments encoding viral surface proteins
are replaced by those of other (usually zoonotic) origins and representing novel antigenicity to
human. However, such an analogy must remain speculative until a robust way of classifying
noroviruses by antigenicity becomes available.
Although the pathogenicity and transmissibility of the 2008 variant remain to be studied
experimentally, its novel intra-genotypic recombinant nature and widespread global distribution
suggest it should be a surveillance target. Although the prevalence of the 2008 variant among
GII.4 infections varies geographically (~1% in Canada (Pang et al., 2010), ~8% in France
(Belliot et al., 2010), ~22% in Korea (Han et al., 2011), and up to 80% in South Africa (Mans et
al., 2010)), its increasing prevalence (Belliot et al., 2010; Motomura et al., 2010) and
involvement in other recombinants (such as the 2008b variant (Motomura et al., 2010)) raise a
global public health concern. Despite this, we note that emergence of new GII.4 variants do not
always cause dramatic outbreaks, e.g. during the 2009-2010 winter in the USA, when the 2008
variant first appeared there (Yen et al., 2010). In general, recombination of the norovirus
antigenic region could have major implications for the design and effectiveness of the norovirus
TTL is supported by Newton International Fellowship from Royal Society. MHZ, DKS and YG
are supported by Li Ka Shing Foundation and Research Grant Council of Hong Kong SAR
(HKU 765809M). OGP is supported by Royal Society, UK. Additional bioinformatic and
computational resources were provided by the Computer Centre at The University of Hong Kong
(HKU). We thank the two anonymous reviewers for useful suggestions. We also thank W. K.
Kwan, Frankie Cheung and Lilian Chan (HKU) for technical support.
Atmar, R. L. & Estes, M. K. (2006). The epidemiologic and clinical importance of norovirus
infection. Gastroenterol Clin North Am 35, 275-290.
Belliot, G., Kamel, A. H., Estienney, M., Ambert-Balay, K. & Pothier, P. (2010). Evidence of
emergence of new GGII.4 norovirus variants from gastroenteritis outbreak survey in
France during the 2007-to-2008 and 2008-to-2009 winter seasons. J Clin Microbiol 48,
Bull, R. A. & White, P. A. (2011). Mechanisms of GII.4 norovirus evolution. Trends Microbiol
Burke, D. S. (1997). Recombination in HIV: an important viral evolutionary strategy. Emerg
Infect Dis 3, 253-259.
Chan-It, W., Thongprachum, A., Okitsu, S., Nishimura, S., Kikuta, H., Baba, T.,
Yamamoto, A., Sugita, K., Hashira, S. & other authors (2011). Detection and genetic
characterization of norovirus infections in children with acute gastroenteritis in Japan,
2007-2009. Clin Lab 57, 213-220.
Chung, J. Y., Han, T. H., Park, S. H., Kim, S. W. & Hwang, E. S. (2010). Detection of GII-
4/2006b variant and recombinant noroviruses in children with acute gastroenteritis, South
Korea. J Med Virol 82, 146-152.
Donaldson, E. F., Lindesmith, L. C., Lobue, A. D. & Baric, R. S. (2010). Viral shape-shifting:
norovirus evasion of the human immune system. Nat Rev Microbiol 8, 231-241.
Eden, J. S., Bull, R. A., Tu, E., McIver, C. J., Lyon, M. J., Marshall, J. A., Smith, D. W.,
Musto, J., Rawlinson, W. D. & other authors (2010). Norovirus GII.4 variant 2006b
caused epidemics of acute gastroenteritis in Australia during 2007 and 2008. J Clin Virol
Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high
throughput. Nucleic Acids Res 32, 1792-1797.
Guindon, S. & Gascuel, O. (2003). A simple, fast, and accurate algorithm to estimate large
phylogenies by maximum likelihood. Syst Biol 52, 696-704.
Han, T. H., Kim, C. H., Chung, J. Y., Park, S. H. & Hwang, E. S. (2011). Emergence of
norovirus GII-4/2008 variant and recombinant strains in Seoul, Korea. Arch Virol 156,
Jin, M., Xie, H. P., Duan, Z. J., Liu, N., Zhang, Q., Wu, B. S., Li, H. Y., Cheng, W. X., Yang,
S. H. & other authors (2008). Emergence of the GII4/2006b variant and recombinant
noroviruses in China. J Med Virol 80, 1997-2004.
Kittigul, L., Pombubpa, K., Taweekate, Y., Diraphat, P., Sujirarat, D., Khamrin, P. &
Ushijima, H. (2010). Norovirus GII-4 2006b variant circulating in patients with acute
gastroenteritis in Thailand during a 2006-2007 study. J Med Virol 82, 854-860.
Kosakovsky Pond, S. L., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. (2006).
GARD: a genetic algorithm for recombination detection. Bioinformatics 22, 3096-3098.
Kroneman, A., Verhoef, L., Harris, J., Vennema, H., Duizer, E., van Duynhoven, Y., Gray,
J., Iturriza, M., Bottiger, B. & other authors (2008). Analysis of integrated virological
and epidemiological reports of norovirus outbreaks collected within the Foodborne
Viruses in Europe network from 1 July 2001 to 30 June 2006. J Clin Microbiol 46, 2959-
Lam, T. Y., Hon, C. C., Wang, Z., Hui, R. K., Zeng, F. & Leung, F. C. (2008). Evolutionary
analyses of European H1N2 swine influenza A virus by placing timestamps on the
multiple reassortment events. Virus Res 131, 271-278.
Lindesmith, L. C., Donaldson, E. F., Lobue, A. D., Cannon, J. L., Zheng, D. P., Vinje, J. &
Baric, R. S. (2008). Mechanisms of GII.4 norovirus persistence in human populations.
PLoS Med 5, e31.
Lole, K. S., Bollinger, R. C., Paranjape, R. S., Gadkari, D., Kulkarni, S. S., Novak, N. G.,
Ingersoll, R., Sheppard, H. W. & Ray, S. C. (1999). Full-length human
immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India,
with evidence of intersubtype recombination. J Virol 73, 152-160.
Mans, J., de Villiers, J. C., du Plessis, N. M., Avenant, T. & Taylor, M. B. (2010). Emerging
norovirus GII.4 2008 variant detected in hospitalised paediatric patients in South Africa.
J Clin Virol 49, 258-264.
Martin, D. & Rybicki, E. (2000). RDP: detection of recombination amongst aligned sequences.
Bioinformatics 16, 562-563.
Motomura, K., Yokoyama, M., Ode, H., Nakamura, H., Mori, H., Kanda, T., Oka, T.,
Katayama, K., Noda, M. & other authors (2010). Divergent evolution of norovirus
GII/4 by genome recombination from May 2006 to February 2009 in Japan. J Virol 84,
Nayak, M. K., Chatterjee, D., Nataraju, S. M., Pativada, M., Mitra, U., Chatterjee, M. K.,
Saha, T. K., Sarkar, U. & Krishnan, T. (2009). A new variant of Norovirus GII.4/2007
and inter-genotype recombinant strains of NVGII causing acute watery diarrhoea among
children in Kolkata, India. J Clin Virol 45, 223-229.
15 Download full-text
Pang, X. L., Preiksaitis, J. K., Wong, S., Li, V. & Lee, B. E. (2010). Influence of novel
norovirus GII.4 variants on gastroenteritis outbreak dynamics in Alberta and the Northern
Territories, Canada between 2000 and 2008. PLoS One 5, e11599.
Park, K. S., Jeong, H. S., Baek, K. A., Lee, C. G., Park, S. M., Park, J. S., Choi, Y. J., Choi,
H. J. & Cheon, D. S. (2010). Genetic analysis of norovirus GII.4 variants circulating in
Korea in 2008. Arch Virol 155, 635-641.
Prasad, B. V., Hardy, M. E., Dokland, T., Bella, J., Rossmann, M. G. & Estes, M. K. (1999).
X-ray crystallographic structure of the Norwalk virus capsid. Science 286, 287-290.
Reuter, G., Pankovics, P. & Szucs, G. (2008). Genetic drift of norovirus genotype GII-4 in
seven consecutive epidemic seasons in Hungary. J Clin Virol 42, 135-140.
Rohayem, J., Munch, J. & Rethwilm, A. (2005). Evidence of recombination in the norovirus
capsid gene. J Virol 79, 4977-4990.
Ronquist, F. & Huelsenbeck, J. P. (2003). MrBayes 3: Bayesian phylogenetic inference under
mixed models. Bioinformatics 19, 1572-1574.
Schenk, S., Petzold, A., Hoehne, M., Adam, R., Schroten, H. & Tenenbaum, T. (2010).
Severe gastroenteritis with secondary fever in a 10-month-old boy. J Clin Virol 47, 107-
Siebenga, J. J., Lemey, P., Kosakovsky Pond, S. L., Rambaut, A., Vennema, H. &
Koopmans, M. (2010). Phylodynamic reconstruction reveals norovirus GII.4 epidemic
expansions and their molecular determinants. PLoS Pathog 6, e1000884.
Siebenga, J. J., Vennema, H., Renckens, B., de Bruin, E., van der Veer, B., Siezen, R. J. &
Koopmans, M. (2007). Epochal evolution of GGII.4 norovirus capsid proteins from
1995 to 2006. J Virol 81, 9932-9941.