A hepatitis C virus cis-acting replication element forms a long-range RNA-RNA interaction with upstream RNA sequences in NS5B.
ABSTRACT The genome of hepatitis C virus (HCV) contains cis-acting replication elements (CREs) comprised of RNA stem-loop structures located in both the 5' and 3' noncoding regions (5' and 3' NCRs) and in the NS5B coding sequence. Through the application of several algorithmically independent bioinformatic methods to detect phylogenetically conserved, thermodynamically favored RNA secondary structures, we demonstrate a long-range interaction between sequences in the previously described CRE (5BSL3.2, now SL9266) with a previously predicted unpaired sequence located 3' to SL9033, approximately 200 nucleotides upstream. Extensive reverse genetic analysis both supports this prediction and demonstrates a functional requirement in genome replication. By mutagenesis of the Con-1 replicon, we show that disruption of this alternative pairing inhibited replication, a phenotype that could be restored to wild-type levels through the introduction of compensating mutations in the upstream region. Substitution of the CRE with the analogous region of different genotypes of HCV produced replicons with phenotypes consistent with the hypothesis that both local and long-range interactions are critical for a fundamental aspect of genome replication. This report further extends the known interactions of the SL9266 CRE, which has also been shown to form a "kissing loop" interaction with the 3' NCR (P. Friebe, J. Boudet, J. P. Simorre, and R. Bartenschlager, J. Virol. 79:380-392, 2005), and suggests that cooperative long-range binding with both 5' and 3' sequences stabilizes the CRE at the core of a complex pseudoknot. Alternatively, if the long-range interactions were mutually exclusive, the SL9266 CRE may function as a molecular switch controlling a critical aspect of HCV genome replication.
- SourceAvailable from: Alfredo Berzal-Herranz[Show abstract] [Hide abstract]
ABSTRACT: RNA viruses show enormous capacity to evolve and adapt to new cellular and molecular contexts, a consequence of mutations arising from errors made by viral RNA-dependent RNA polymerase during replication. Sequence variation must occur, however, without compromising functions essential for the completion of the viral cycle. RNA viruses are safeguarded in this respect by their genome carrying conserved information that does not code only for proteins but also for the formation of structurally conserved RNA domains that directly perform these critical functions. Functional RNA domains can interact with other regions of the viral genome and/or proteins to direct viral translation, replication and encapsidation. They are therefore potential targets for novel therapeutic strategies. This review summarises our knowledge of the functional RNA domains of human RNA viruses and examines the achievements made in the design of antiviral compounds that interfere with their folding and therefore their function. Copyright © 2013 John Wiley & Sons, Ltd.Reviews in Medical Virology 08/2013; · 7.62 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: The complex structures that RNA molecules fold into play important roles in their ability to perform various functions in the cell. The structure and composition of viral RNA influences the ability of the virus to implement the various stages of the viral lifecycle and can influence the severity of the virus effects on the host. Although many individual secondary structures and some tertiary interactions of the Hepatitis C virus genome have previously been identified, the global 3D architecture of the full 9678 nucleotide genome still remains uncertain. One promising technique for the determination of the overall 3D structure of large RNA molecules is nanoimaging with Atomic Force Microscopy. In order to get an idea of the structure of the HCV genome, we imaged the RNA prepared in the presence of Mg2+, which allowed us to observe the compact folded tertiary structure of the viral genome. In addition, to identify individual structural elements of the genome, we imaged the RNA prepared in the absence of Mg2+, which allowed us to visualize the unfolded secondary structure of the genome. We were able to identify a recurring single stranded region of the genome in many of the RNA molecules which was about 58 nm long. This method opens up a whole new avenue for the study of the secondary and tertiary structure of long RNA molecules. This ability to ascertain RNA structure can aid in drawing associations between the structure and the function of the RNA in cells which is vital to the development of potential antiviral therapies.Journal of Nanomedicine & Nanotechnology 02/2014; S5(010):1-7. · 5.72 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: The hepatitis C virus (HCV) RNA genome contains multiple structurally conserved domains that make long-distance RNA-RNA contacts important in the establishment of viral infection. Microarray antisense oligonucelotide assays, improved dimethyl sulfate probing methods and 2' acylation chemistry (selective 2'-hydroxyl acylation and primer extension, SHAPE) showed the folding of the genomic RNA 3' end to be regulated by the internal ribosome entry site (IRES) element via direct RNA-RNA interactions. The essential cis-acting replicating element (CRE) and the 3'X-tail region adopted different 3D conformations in the presence and absence of the genomic RNA 5' terminus. Further, the structural transition in the 3'X-tail from the replication-competent conformer (consisting of three stem-loops) to the dimerizable form (with two stem-loops), was found to depend on the presence of both the IRES and the CRE elements. Complex interplay between the IRES, the CRE and the 3'X-tail region would therefore appear to occur. The preservation of this RNA-RNA interacting network, and the maintenance of the proper balance between different contacts, may play a crucial role in the switch between different steps of the HCV cycle.Nucleic Acids Research 09/2013; · 8.81 Impact Factor
JOURNAL OF VIROLOGY, Sept. 2008, p. 9008–9022
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
Vol. 82, No. 18
A Hepatitis C Virus cis-Acting Replication Element Forms a
Long-Range RNA-RNA Interaction with Upstream
RNA Sequences in NS5B?†
Sine ´ad Diviney,1Andrew Tuplin,1Madeleine Struthers,1Victoria Armstrong,2
Richard M. Elliott,2Peter Simmonds,3and David J. Evans1*
Department of Biological Sciences, University of Warwick, Coventry CV4 7AL, United Kingdom1; Department of
Biomolecular Sciences, University of St. Andrews, Fife KY16 9ST, United Kingdom2; and Centre for
Infectious Diseases, University of Edinburgh, Summerhall, Edinburgh EH9 1QH, United Kingdom3
Received 26 October 2007/Accepted 2 July 2008
The genome of hepatitis C virus (HCV) contains cis-acting replication elements (CREs) comprised of RNA
stem-loop structures located in both the 5? and 3? noncoding regions (5? and 3? NCRs) and in the NS5B coding
sequence. Through the application of several algorithmically independent bioinformatic methods to detect
phylogenetically conserved, thermodynamically favored RNA secondary structures, we demonstrate a long-
range interaction between sequences in the previously described CRE (5BSL3.2, now SL9266) with a previously
predicted unpaired sequence located 3? to SL9033, approximately 200 nucleotides upstream. Extensive reverse
genetic analysis both supports this prediction and demonstrates a functional requirement in genome replica-
tion. By mutagenesis of the Con-1 replicon, we show that disruption of this alternative pairing inhibited
replication, a phenotype that could be restored to wild-type levels through the introduction of compensating
mutations in the upstream region. Substitution of the CRE with the analogous region of different genotypes of
HCV produced replicons with phenotypes consistent with the hypothesis that both local and long-range
interactions are critical for a fundamental aspect of genome replication. This report further extends the known
interactions of the SL9266 CRE, which has also been shown to form a “kissing loop” interaction with the 3?
NCR (P. Friebe, J. Boudet, J. P. Simorre, and R. Bartenschlager, J. Virol. 79:380-392, 2005), and suggests that
cooperative long-range binding with both 5? and 3? sequences stabilizes the CRE at the core of a complex
pseudoknot. Alternatively, if the long-range interactions were mutually exclusive, the SL9266 CRE may
function as a molecular switch controlling a critical aspect of HCV genome replication.
Hepatitis C virus (HCV), a flavivirus in the genus Hepacivi-
rus, possesses a positive (mRNA)-sense genome of approxi-
mately 9.6 kb encoding a single polyprotein. This polyprotein is
cleaved co- and posttranslationally to generate proteins that
form the enveloped virus particle and those that replicate the
genome. Polyprotein translation is initiated within a highly
structured internal ribosome entry site (IRES) occupying
much of the 5? noncoding region (5? NCR). The 5? NCR also
contains sequences required for genome replication (9, 19, 26),
and like functionally analogous regions in the 3? NCR, these
form defined stem-loop structures that operate in cis and are
known or suspected to recruit cellular or viral proteins (5, 10,
30). In addition to these cis-acting replication elements
(CREs) in the noncoding extremes of the genome, there is
evidence that additional RNA structures exist within the cod-
ing regions. The latter structure is of two types, phylogeneti-
cally conserved well-defined structures occupying the 5? and 3?
regions of the sense strand of the coding region of HCV (23,
31, 32, 36) and a less well-characterized but much more exten-
sive set of RNA secondary structures, collectively designated
genome-scale ordered RNA structure (GORS), that spans the
entire coding region of HCV (29).
The potential functional role(s) of phylogenetically con-
served RNA secondary structures in coding regions has been
analyzed extensively by reverse genetic analysis, predominantly
using antibiotic resistance or a luciferase-encoding subgenomic
replicon system (18), and more recently, in analysis of struc-
tures in the core-encoding region using the HCV replication
system (17, 23, 25, 34). Several groups have reported that a
short stem-loop structure in the NS5B coding region, variously
designated 5BSL3.2 or SL-V (16, 36), has a clearly defined
function in genome replication. This structure, henceforth des-
ignated SL9266 (see Materials and Methods for details of a
unified numbering scheme), forms a stem-loop with two short
base-paired helices, separated by a 8-nucleotide (nt) bulge
loop on the 3? side, and capped with a 12-nt terminal loop (16,
36). Extensive mutagenesis has demonstrated that the struc-
tural integrity of the element must be retained for replication.
In addition, substitutions within the two unpaired loop or bulge
regions can also be deleterious, which implies that these re-
gions also contribute important functions during replication.
SL9266 therefore forms a cis-acting replication element,
though its precise function during genome replication has yet
to be determined. SL9266 is the penultimate of five phyloge-
netically conserved RNA structures in the region encoding
NS5B. Limited mutagenesis of the upstream adjacent structure
(SL9217), which has also been designated SL-VII (16) or
* Corresponding author. Mailing address: Department of Biological
Sciences, University of Warwick, Coventry CV4 7AL, United King-
dom. Phone: 44 24765 74183. Fax: 44 24765 23701. E-mail: d.j.evans
† Supplemental material for this article may be found at http://jvi
?Published ahead of print on 9 July 2008.
5BSL3.1 (36), have produced contradictory results, and further
studies are required to unequivocally demonstrate a role in
Functional analysis of the SL9266 CRE and related RNA
structures in the NS5B coding region necessitates the intro-
duction of mutations that leave the underlying coding se-
quence intact. The restriction of mutagenesis to synonymous
substitutions naturally places some limits on the substitutions
that can be tested. However, Friebe and colleagues (8) have
demonstrated that SL9266 can be functionally moved to the 3?
NCR, albeit with a reduction in replication efficiency. This
suggests that the function of this structure is at least partially
position dependent but does allow more extensive mutagenic
studies. The position dependence could be due to a require-
ment for a spatially dependent interaction with another region
of the virus genome; indeed, they have demonstrated a func-
tionally required kissing loop (tertiary RNA structure) inter-
action between the terminal unpaired loop of SL9266 and SL2
in the X tail of the 3? NCR (8).
We have developed novel bioinformatic strategies to detect
phylogenetically conserved long-range RNA-RNA interac-
tions. These approaches are based upon well-established and
accepted thermodynamic methodologies but extend them to
take advantage of the wealth of sequence data available for
HCV. Using this information, we have investigated the struc-
ture and function of SL9266. We demonstrate that the rela-
tively weak prediction of SL9266 using standard bioinformatic
methods can be explained by the structure adopting an addi-
tional alternative and potentially metastable pairing with se-
quences situated approximately 200 nt upstream. Mutagenesis
of the two interacting sequences provides genetic support for
the interaction and also demonstrated some sequence speci-
ficity within SL9266. Duplex formation with the upstream se-
quences and the 3? X tail involves distinct regions of SL9266,
and the revised model presented here does not preclude the
existence of a combined kissing loop interaction with SL2 in
the 3? untranslated region and a pseudoknot interaction of the
CRE bulge sequence upstream to form a complex long-range
MATERIALS AND METHODS
Sequence alignments. Sequence data sets were initially compiled from all
available epidemiologically unlinked variants of all six genotypes (those showing
?2% sequence divergence from each other) that were ?95% complete between
nucleotide positions 9001 and 9377 and a second set between nucleotide posi-
tions 8204 and 9377. Nucleotides were numbered with reference to the H77
complete genome sequence, GenBank accession no. AF011753 (15). Represen-
tative subsets of sequences within each alignment were used for RNA structure
determination. GenBank accession numbers of analyzed sequences are provided
in the supplemental material.
Stem-loop nomenclature. Several methods have been used to describe stem-
loops in NS5B and elsewhere in the HCV genome (16, 32, 36). Following the
adoption of a standardized system for numbering HCV sequences (15), it had
been proposed that stem-loops are numbered based on the position of the first
5? paired base in the structure (16a). Accordingly, stem-loops previously referred
to as 5BSL1, 5BSL2, 5BSL3.1 to 5BSL3.3 (36), SLIV to SLVII (16) or SL8828,
SL8926, SL9011, SL9061, and SL9118 (16, 31, 32) are redesignated as SL9033,
SL9132, SL9217, SL9266, and SL9324, respectively, in the current study. Like-
wise, SL2 in the 3? X tail is renumbered SL9571.
RNA structure prediction. RNA structures were predicted using MFOLD
through the web interface at http://frontend.bioinfo.rpi.edu/applications/mfold/.
Automated analysis of most energetically stable RNA structures was performed
using the program StructureDist v. 1.3 (available at http://www.picornavirus
.org/). SFOLD analysis was conducted using the program Srna on the server at
http://sfold.wadsworth.org/srna.pl. PFOLD analysis used the web interface at
http://www.daimi.au.dk/?compbio/rnafold/. All programs were run with default
Cell culture, plasmids, and mutagenesis. Monolayers of the human hepatoma
cell line Huh7 (kindly provided by R. Bartenschlager) were maintained in Dul-
becco’s modified minimal essential medium (DMEM) (Invitrogen) supple-
mented with 10% fetal bovine serum, 1% nonessential amino acids, 100 U
pencillin/100 ?g streptomycin, and 2 mM L-glutamine (Invitrogen) (DMEM
P/S). Cells were passaged after treatment with trypsin-EDTA and seeded at a
dilution of 1:3 to 1:5.
The parental, genotype 1b, neomycin-encoding replicon, designated pFK-
I389neo/NS3-3?/wt was generously provided by R. Bartenschlager and has been
fully described by Lohmann et al. (18). The cDNA was modified by the intro-
duction of a previously described cell culture adaptive change of serine for
isoleucine as residue 2204 of the polyprotein (1). A derivative replicon, desig-
nated pFKnt341-sp-PI-lucEI3420-9605/5.1, expressing a firefly luciferase re-
porter gene (kindly provided by GlaxoSmithKline, United Kingdom) consisted
(5? to 3?) of the HCV 5? NCR, a 63-nt spacer, the poliovirus IRES, and luciferase
gene, followed by an encephalomyocarditis virus IRES, the NS3-NS5B coding
region, and 3? NCR of HCV. Derivatives of both replicons carrying substitutions
(GDD to GND) of the active site of the NS5B RNA-dependent RNA polymer-
ase were used as controls where appropriate.
All site-directed mutagenesis was conducted on a unique SpeI-XhoI fragment
(nt 5582 to 8005), subcloned in pBluescript II SK(?), using Stratagene
QuikChange site-directed mutagenesis. All mutations were detected and con-
firmed by sequencing, rebuilt into the appropriate subgenomic replicon, and
Substitution of SL9266 with the analogous sequence of other HCV genotypes
was achieved using a cassette system. Briefly, a 528-nt KpnI-SpeI fragment
spanning SL9266 was subcloned into pBluescript II SK(?) (Invitrogen) and used
as a template for PCR with primers BsmBI-1F (GCGTCTCTGTTCATGTGG
TGCCTACTCC) and BsmBI-2R (GCGTCTCTTAACCAGCAACGAACCAG
CT). The blunt ends of the reaction product were ligated to create a plasmid in
which SL9266 was precisely replaced with a stuffer fragment containing two
BsmBI restriction sites. This cassette vector was cleaved with BsmBI and ligated
with complementary oligonucleotides for the stem-loop sequences from other
genotypes. The sequences are illustrated below in Fig. 6A. After sequencing, the
KpnI-SpeI fragment was rebuilt into pFK-I389neo/NS3-3?/wt.
In vitro RNA transcription and replicon analysis. One microgram of ScaI-
linearized replicon cDNA was used as the template for the production of RNA
in vitro using a T7 MEGAscript kit (Ambion) according to the manufacturer’s
instructions. RNA was purified using RNeasy (Qiagen), the integrity of the RNA
was confirmed by agarose gel electrophoresis, and the RNA was quantified
Huh7 cells were transfected by electroporation. Briefly, 400 ?l of trypsinized,
washed Huh7 cells at 1 ? 107cells/ml in phosphate-buffered saline (PBS) was
mixed with 5 ?g in vitro-transcribed RNA in a prechilled 4-mm cuvette, pulsed
once (25 milliseconds, 250 V, 950 ?F, square wave) using a Bio-Rad Gene Pulser
Xcell unit, and transferred into 100-mm dishes with 10 ml of DMEM P/S added.
After 24 h of culture at 37°C, the medium was replaced with medium supple-
mented with 500 ?g/ml G418 (Geneticin, G418 sulfate; Invitrogen), and the
medium was changed at 2- to 3-day intervals for the duration of the selection
period. G418-resistant colonies were washed with PBS, fixed with 4% formalde-
hyde, and visualized with Giemsa stain after about 3 weeks.
Luciferase-encoding replicon RNA (10 ?g) was transfected into Huh7 cells as
described previously and transferred into 20 ml of DMEM P/S, and 4 ml was
placed in five wells of a six-well dish. At each time point (4, 24, 48, and 72 h
posttransfection), cells in one well were washed with PBS, lysed with 0.5 ml
Glo lysis buffer (Promega) and stored frozen before analysis using the Bright-
Glo luciferase assay system (Promega) and quantified on a Turner TL-20
Synonymous substitutions within SL9266 define a cis-acting
replication element. We investigated the role of base pairing
within the SL9266 stem-loop structure by introducing a limited
number of nucleotide substitutions to the region. For each of the
six mutants, designated SL9266mut1 to SL9266mut6 and hereaf-
ter referred to as mut1 to mut6, respectively, the modifications
VOL. 82, 2008LONG-RANGE CRE INTERACTION IN HCV9009
were at synonymous sites and were generated in a neomycin-
encoding subgenomic replicon (Fig. 1). MFOLD analysis (data
not shown) indicated that the mutations introduced in mut1,
mut3, mut4, and mut6 probably disrupted the predicted structure
of SL9266, but that introduction of the mutations in mut2 and
mut5 had no structural consequences, being restricted to the un-
paired terminal loop region. RNA generated in vitro was trans-
fected into Huh7 cells and analyzed in a G418 transduction assay.
Two of the six mutants analyzed, mut3 and mut5, generated col-
ony numbers consistent with replication levels at, or near, that of
the positive control. The remaining four mutants (mut1, mut2,
mut4, and mut6) failed to yield significant numbers of colonies
substitutions involving the 5? side of the lower duplex in SL9266
(mut1 [Fig. 1C]) or the terminal loop (mut2) were lethal, presum-
ably reflecting a requirement for stable base pairing in the former
or interaction with the 3? X tail in the latter, and were in agree-
Substitution of A9281U (mut5) alone did not impair replication,
again consistent with other studies (see A68C in Fig. 7 of refer-
ence 36), and appeared to complement the otherwise lethal sub-
stitution of U9296A (compare mut3, mut5, and mut6 in Fig. 1C
and D). Although the nonviable phenotype of mut4 could prob-
ably be ascribed to the U9296A mutation disrupting the upper
duplex of SL9266, two other substitutions in this mutant were
located in the unpaired 3? bulge loop (Fig. 1B and C). A potential
functional role for this region of SL9266, also hinted at by the
currently unexplained lack of viability of replicons bearing muta-
tions of C9303and/or A9305(designated C90A and A92G in Fig. 7
of reference 36), prompted us to investigate additional features of
SL9266 and possible interactions of the unpaired regions of the
CRE with flanking RNA sequences.
FIG. 1. SL9266 is a cis-acting replication element in hepatitis C virus. (A) The genetic organization of the hepatitis C subgenomic replicon
expressing either a luciferase reporter gene or neomycin selection marker is shown, together with an indication of the location of SL9266 in the
region encoding the C terminus of NS5B. EMCV, encephalomyocarditis virus. (B) The thermodynamically predicted structure of SL9266.
(C) Genetic analysis of synonymous mutations introduced to subgenomic replicons. The sequence of SL9266 is shown with the third “wobble”
position of each triplet underlined. Underneath the top sequence, the locations of individual mutations (mut1 to mut6) are shown, together with
their phenotype (?, growth; ?, no growth) after G418 selection. The shaded boxes joined by horizontal brackets and lines indicate the duplex
regions (lower [pale shading] and upper [dark shading]) of SL9266. (D) The phenotypes of SL9266 neomycin-encoding replicon mutants mut1 to
mut6 in a G418 selection assay. ?ive, positive control; pol?, defective polymerase negative control.
9010 DIVINEY ET AL.J. VIROL.
RNA secondary structure prediction. Previous comparative
analysis of minimum free energy structures of the NS5B region of
HCV revealed a series of evolutionarily conserved stem-loops
spanning the terminal 700 bases of the coding sequence (Fig. 2A)
(16, 31, 32, 36). Using an automated method (StructureDist) to
quantify the frequencies of concordant and discordant pairings at
individual sites for pairwise comparison of structure predictions
for each sequence (31), substantial variability in the degree of
FIG. 2. Stem-loop structures in the NS5B-encoding region of HCV. (A) Predicted RNA secondary structures in the terminal 350 bases of the HCV
coding sequence (in NS5B). Structures were numbered according their position in the H77 reference sequence, using standard nomenclature for
stem-loops (see Materials and Methods). (B) Frequencies of concordant pairing (left-hand y axis) predictions and predicted unpaired bases (right-hand
y axis) at each nucleotide position (x axis) in pairwise comparisons of the most energetically favored RNA structures predicted by MFOLD (38) for a set
of 150 sequences representative of HCV genotypes 1 to 6. Frequencies were compiled using StructureDist v.1.3 (31). The location of each of the five
predicted stem-loop structures is indicated above the graph. The location of the alternative upstream paired region is indicated as a black bar labeled Alt.
VOL. 82, 2008 LONG-RANGE CRE INTERACTION IN HCV9011
conservation of the stem-loops was found between HCV geno-
types (Fig. 2B). Similar variability was observed within sets of
approximately equally energetically favored structure predictions
for individual sequences (data not shown). The most highly con-
served predicted stem-loop was SL9324, while SL9266 was the
least conserved. Lack of conservation of the latter structure was
unexpected and relevant to the investigation of its demonstrated
role as a CRE (8, 16, 36).
To investigate whether there were alternative RNA structures
or pairings underlying this observed lack of conservation of
SL9266, RNA structures were predicted for 26 NS5B nucleotide
sequences, with each representing different (up to four) subtypes
within the six genotypes of HCV using the program SFOLD. This
generates a statistical sample of secondary structures from the
Boltzmann ensemble of RNA secondary structures using Turner
free energy rules (7). The relatedness of structures to each was
determined using the Diana method for cluster analysis followed
by calculation of the Calinski and Harabasz index to determine
the optimal number of centroids for which consensus structures
can be calculated, as previously described (6). Each sequence
submitted to SFOLD analysis generated between two and six
centroids, whose consensus structure predictions were compared
to the previously described RNA structure for the NS5B region
(Fig. 3). Despite the variability in RNA pairings between cen-
troids for individual sequences of the wide range of genotypes
analyzed, four of the five stem-loops were frequently (SL9033,
SL9132, and SL9217) or invariably (SL9324) found among sam-
pled structures, generally containing equivalent pairings to the
predicted structures for HCV genotype 1a (black filled boxes) or
pairings restricted to bases around the terminal loop (gray filled
boxes). However, consistent with the more variable structure pre-
dictions in this region visualized by StructureDist (Fig. 2B), only
around a third (26) of the 71 consensus structures of the centroids
contained pairings that matched those of SL9266 (Fig. 3). Alter-
native structures for this region frequently retained the pairing of
the terminal stem and loop (bases 9274 to 9297, but with a par-
tially overlapping longer range pairing of bases forming the bulge
with upstream predicted unpaired regions in NS5B. In approxi-
mately one half of these alternative structures (labeled A in Fig.
3), bases between 9296 and 9306 formed a duplex with the pre-
dicted unpaired bases between structures SL9033 and SL9132
(bases 9106 to 9123). Analogous alternative conformations were
found in predicted structures for all six genotypes of HCV and
frequently alternated with the standard structure in the Boltz-
mann ensemble for individual sequences. Similar frequencies of
standard and alternative pairings were observed when longer se-
quences spanning position 8301 to the 3? NCR were analyzed by
SFOLD (data not shown).
StructureDist and SFOLD use free energy minimization al-
gorithms (e.g., MFOLD) to predict candidate RNA structures.
Given the poor resolution of RNA structure in the HCV CRE,
we used an independent, non-energy-minimizing algorithm
that makes better use of the substantial comparative sequence
information available for HCV (14, 24). The method, imple-
mented as PFOLD, combines an explicit evolutionary model of
RNA sequences with a probabilistic model for secondary struc-
tures. A stochastic context-free grammar is used to produce a
prior probability distribution of RNA structures. For the anal-
ysis, a set of 40 NS5B sequences between positions 9001 and
9377 from genotype 1b and further sets of 20 sequences from
genotypes 1, 2, 3, 4, and 6 containing as diverse a range of
subtypes as possible, were analyzed individually and in combi-
nation by PFOLD (Fig. 4 and 5). For the set of genotype 1b
sequences, pairing predictions corresponded to those of the
standard structure, with robust prediction of SL9266 (upper
left half in Fig. 4) and the four other stem-loops predicted for
NS5B. Similar results were obtained for pairing predictions of
alignments of each genotype individually (Fig. 5A). Intrigu-
ingly, analyzing the combined data set of all five genotypes
produced a distinct pairing for the HCV CRE corresponding
to the alternative pairing found by SFOLD (lower right, la-
beled Alt in Fig. 4). By analyzing alignments of each combi-
nation of two, three, and four genotypes, a relationship was
found between sequence diversity and frequency of detection
of standard and alternative RNA structures (Fig. 5A). Repre-
sentative comparisons of duplexes formed in the alternative
pairing for a range of HCV genotypes are shown in Fig. 5B.
The region of maximum potential interaction (positions 9121
to 9107 with positions 9291 to 9305, see the bar graph at the
bottom of Fig. 5B) can be divided into two areas, a less well
conserved region (on the left in Fig. 5B) involving sequences
already implicated in forming the 3? side of the upper duplex of
SL9266 and a highly conserved block of five nucleotides cen-
tered around positions 9110 and 9302. To functionally test the
relevance of the predicted alternative pairing, we undertook
further mutagenesis studies.
Substitution of SL9266 with the analogous region of alter-
nate genotypes. Of the two previously defined interactions of
SL9266, one is local, forming the interrupted base pairing of
FIG. 3. SFOLD analysis of HCV NS5B sequences. Numbers of
consensus structures in 72 centroids generated by SFOLD from a total
of 26 HCV NS5B sequences (positions 9001 to 9377) corresponding to
standard stem-loop structures (Fig. 2A) (filled black) or containing
partial structure (filled gray). The frequencies of alternative pairings of
the 3? side of SL9266 to upstream sequences are shown by the Alter-
native (A) and Other (O) boxes.
9012DIVINEY ET AL.J. VIROL.
the CRE (16, 36), whereas the second is long-range, involving
an interaction with the X tail SL2 (8). Within SL9266, the
nucleotides in the terminal loop that base pair with the 3? NCR
are very highly conserved (8). Similarly, sequences occupying
the bulge loop of SL9266 are highly conserved, whereas those
forming the upper and lower duplexes show more variability.
This accounts for the different levels of conservation of base
pairing between the left- and right-hand sides of the interac-
tion depicted with the upstream sequences depicted in Fig. 5B.
Assuming SL9266 folds similarly in each genotype of HCV, we
reasoned that replacement of SL9266 in the subgenomic rep-
licon (genotype 1b) with the analogous structures from other
HCV genotypes might allow us to determine whether just some
or all of the sequences between positions 9291 and 9305 were
also involved in the alternative pairing we predict.
Using a BsmBI-based cassette system (see Materials and
Methods), we precisely replaced the regions between nucleo-
tides 9266 and 9312 with complementary oligonucleotides cor-
responding to the analogous sequences of other genotypes of
HCV. Inevitably, due to the sequence variation inherent in
HCV, this strategy resulted in changes to the encoded NS5B
polypeptide sequence (Fig. 6A). All modifications were made
in a neomycin-expressing replicon that, in parallel with appro-
priate controls, was independently transfected into Huh7 cells
and selected with G418. Of the eight substitutions made, five
were tolerated well, generating approximately equivalent col-
ony numbers to the positive control after G418 selection. The
remaining three substitutions of genotypes 3b (Tr), 4a (ED43),
and 6g (JK046) produced markedly reduced colony numbers,
indicating that the modifications introduced within SL9266
were incompatible with replication.
It seemed unlikely that the differences in the replication
phenotypes of the chimeric replicons were due to introduction
of incompatible residues into the NS5B polypeptide, with the
possible exception of the genotype 3b (Tr) sequence. The latter
contains two amino acid substitutions (G558N and P569S; Fig.
6A) not present in the other sequences analyzed. In the re-
maining genotype swaps, amino acid substitutions were re-
stricted to just three residues of NS5B, with both viable and
nonviable chimeric replicons containing the same changes, im-
plying that they alone do not account for the phenotype. For
example, the replication-deficient replicon containing geno-
type 4a (ED43) sequences has substitutions at positions 556,
564, and 566; of these, S556G is in genotype 2b (HCJ8), L564M
is in genotype 5a (EUH1480), and R566H is in genotype 1a
(HP-H), all of which are replication competent. Therefore,
unless particular combinations of these changes are deleteri-
ous, it seemed probable that the poor replication of genotypes
6g (JK046) and 4a (ED43) must be mediated at the level of
RNA, either by disruption of an RNA-RNA interaction, or
alteration of a sequence motif bound by a cellular or viral
Replication competence of the chimeric replicon did not
correlate directly with either invariant or covariant (underlined
in Fig. 6A) base pairing within the upper duplex region of
SL9266 or the covariation within the alternative interaction
with the upstream sequence (in bold type in Fig. 6A). For
example, genotype 1a (HP-H) and 4a (ED43) replicons were
identical to the control 1b replicon in the upper duplex of
SL9266, but only the former could replicate. Similarly, the
genotype 6g (JK046) replicon contains two compensating
changes in the upper duplex but cannot replicate, whereas
genotype 6a (EUHK2) and 5a (EUH1480) replicons had the
same covariance in the upper duplex and were replication
competent. Within the region forming the bulge loop of
SL9266, none of the chimeras changed the highly conserved
5?-GCCCG motif. However, of the six that contained variation
within this region of SL9266 (namely, genotype 1a [GLA],
genotype 1a [HP-H], genotype 2b [HCJ8], genotype 4a
[ED43], genotype 6a [EUHK2], and genotype 6g [JK046]), two
of the nonviable chimeras with genotype 4a (ED43) and geno-
type 6g (JK046) lacked any covariant changes within this re-
gion, whereas the genotype 1a (GLA), genotype 1a (HP-H),
and genotype 6a (EUHK2) chimeras all contained at least one
covariant substitution that could be involved in base pairing to
the upstream sequence (highlighted in bold type in Fig. 6A).
All chimeras also introduced covariant changes at C9291(to A
or G), the 5? nucleotide within the SL9266 sequence that could
pair with U9121(Fig. 5B and 6A), though there was not a
correlation between the viability of the replicon and the par-
ticular substitution at this position.
Results obtained with the chimeric replicons suggested that
the RNA-RNA interactions within SL9266 and the proposed
alternative upstream pairing were nontrivial. We therefore
specifically examined the upstream interaction in a more fo-
cused manner by further site-directed mutagenesis.
Critical interactions between SL9266 and the upstream se-
quence. Mutations were introduced singly or in combination
into SL9266 or the upstream sequence located around nt 9110.
FIG. 4. PFOLD analysis of HCV NS5B sequences. Coordinates
(dot plot) of pairing predictions for consensus structures predicted for
alignments of HCV genotype 1b sequences (top left) or HCV geno-
types 1 to 6 (bottom right) using PFOLD. The size of the dot depicts
the reliability of the pairing prediction. The positions of standard
predicted structures and base pairing forming the alternative RNA
structure (Alt) are shown as gray filled ellipses.
VOL. 82, 2008LONG-RANGE CRE INTERACTION IN HCV9013
In each instance, substitutions were selected to leave the en-
coded NS5B polypeptide unchanged, thereby excluding the
possibility that the resulting phenotype was due to the intro-
duction of an incompatible amino acid into the virus polymer-
ase. The majority of the mutations introduced were within the
SL9266 subterminal bulge loop or the complementary se-
quence around nt 9110, though additional changes were also
made in the sequences implicated in forming the 3? side of the
upper duplex in SL9266. These, or the complementary changes
3? to nt 9110, were designed to test the extent of the alternative
interaction proposed by our bioinformatic analysis.
In the upstream sequence (Fig. 7A, left-hand panel), substi-
tutions at C9108and G9110were incompatible with replication,
whereas substitution of U9107C, C9113A, or a combination of
the changes at A9114C and A9116U, also in combination with
C9113A, were tolerated well. Within the sequences that con-
tribute to the upper duplex or bulge loop of SL9266 (Fig. 7A,
right-hand panel), substitutions of U9296A, alone or in combi-
nation with U9299G and C9303A, prevented replication. This
phenotype is presumably attributable to the change at U9296
which disrupts the stability of the upper duplex. Of the other
single substitutions constructed, only U9299G had no impact on
replication, with changes of C9302and C9303all preventing
colony formation in the G418 transduction assay.
Mutations in the upstream and SL9266 regions were also
combined to test whether complementary substitutions could
restore the replication phenotype to resemble that of the pa-
rental replicon (Fig. 7B). In addition, combinations of substi-
tutions were introduced to determine the influence of increas-
ing the potential hydrogen bonding between the upstream
region and SL9266 sequences. Of the combinations con-
structed, four that restored the predicted ability to base pair
G9110and C9302all generated significant numbers of G418-
resistant colonies after transduction and selection. The dem-
FIG. 5. Alternative interactions of SL9266 sequences in a range of HCV genotypes. (A) Frequencies of RNA structure prediction by PFOLD
corresponding to the standard model or containing the alternative pairing. The x axis records the number of different genotypes in each alignment;
the numbers above the bars records the number of different genotype combinations tested by PFOLD. For example, there are 10 possible
combinations of the five genotypes tested, all of which were analyzed, and these results are presented in the second column (the column with 2
for the number of genotypes) of the graph. (B) Comparison of duplexes formed in the alternative pairing for representative sequences of HCV
genotypes 1 to 6. Genomic numbering for upstream and downstream bases is shown at the top and bottom of the figure, respectively. The locations
of known interactions of genotype 1b SL9266 are indicated at the top of the figure; KL indicates the location of sequences forming a kissing loop
interaction with the 3? X tail (8), and SL9266 Upper and SL9266 Lower indicate the 3? side of the upper and lower duplexes of SL9266. The gray
block highlights the area of maximal conserved base pairing (nucleotides 291 to 9305 and 9121 to 9107; indicated in a simple bar chart at the bottom
of the figure, each bar representing a single nucleotide in the aligned sequences) forming the predicted alternative interaction of sequences within
SL9266 and the upstream region.
9014 DIVINEY ET AL.J. VIROL.
onstration that individual substitutions of G9110or C9302that
disrupted the predicted base pairing prevented replication,
whereas all but one in which duplex formation could occur
(summarized in Fig. 7C) were replication competent, provides
strong support for the interaction of these regions. Double
substitution of nucleotides C9110U and C9303A did not restore
replication capacity. Furthermore, all combinations of muta-
tions that included U9296A were incapable of replicating (Fig.
7B); this included substitutions at nt 9113, 9114, and 9116, the
addition of which significantly increased the potential for hy-
drogen bonding between the upstream and SL9266 sequences.
This result suggested that disruption of the upper duplex of
SL9266 by U9296A could not be compensated for by strength-
ening the predicted interaction with upstream sequences.
FIG. 6. Exchange of SL9266 with the analogous region of other genotypes of HCV. (A) The SL9266 nucleotide sequence is shown (left)
together with the nucleotide differences introduced by exchange with the sequences from a range of genotypes indicated. At the top and
emphasized with a dark shaded box is the kissing loop interaction between the terminal loop of SL9266 and the 3? NCR (8). At the bottom and
highlighted by a pale shaded box is the predicted interaction between SL9266 and upstream sequences centered around nt 9110. Underlined
nucleotides in the SL9266 or upstream sequences indicate the third base “wobble” position of codons. The upper and lower duplexes that form
SL9266 are indicated by horizontal joined brackets (see also Fig. 1C). Nucleotides underlined in the alternative genotype sequences retain the
ability to form these duplexes. Nucleotides in bold type within the dark shaded box retain (or acquire) the potential to base pair with the upstream
sequence. The phenotypes (?, growth; ?, no growth) and genotypes (genotype 1b, the parental positive control [?ive]) are shown to the right of
the SL9266 nucleotide sequences. The NS5B amino acid sequences altered by exchange of SL9266 with the analogous region from other genotypes
is indicated on the right-hand side of the panel. (B) G418 selection assay of SL9266 substitutions for the sequences from the genotypes indicated
(genotypes 1a, 2b, 3b, 4a, 5a, 6a, and 6g).
VOL. 82, 2008 LONG-RANGE CRE INTERACTION IN HCV9015
FIG. 7. Mutational analysis of the alternative interaction of sequences within SL9266. (A) Phenotypes of neomycin-encoding replicons
containing mutations within the upstream region (nt 9107 to 9121; left panel) or within the sequences that form part of SL9266 (nt 9291 to 9305;
right panel). For each named mutant, a photograph of a stained dish after G418 selection is shown next to the sequence indicating the impact on
the alternative interaction predicted bioinformatically. For consistency with other figures, the upstream sequence is the lower sequence depicted.
Substitutions are indicated in bold type, as are additional or changed hydrogen bonding interactions. The total number of hydrogen bonds that
could form between the sequences shown are indicated in the column labeled H. The regions of the SL9266 sequence that form the 3? side of the
upper duplex of SL9266 are underlined. The positive-control replicon is shown at the top of the figure. GND indicates a control replicon containing
active site mutations within the NS5B polymerase (see Materials and Methods). (B) Phenotypes of neomycin-encoding replicons containing
substitutions in both upstream and SL9266 sequences. (C) Summary of changes made at nt 9110 and 9302. A plus sign indicates a replication
phenotype similar to that of a positive control, a minus sign indicates no apparent replication, and nd indicates that the change was not done.
(D) Replication phenotypes of luciferase-encoding subgenomic replicons bearing mutations at nucleotides 9110, 9113, 9114, 9296, 9299, 9302, 9303,
and combinations thereof. The average of two or three independent repeats at each time point are plotted. Con1b ?ve, Con1b, the parental
positive-control replicon; Pol?, defective-polymerase replicon.
9016DIVINEY ET AL.J. VIROL.
The majority of mutations constructed in the neomycin-
encoding replicon were also rebuilt into a replicon carrying a
luciferase reporter gene. Huh7 cells were transfected, and a
time course experiment of luciferase activity over 3 days was
performed (Fig. 7D). Of those tested, the mutants could be
divided into three broadly defined groups. With the exception
of single mutations involving nucleotides G9110, C9302, or C9303,
all the replicons harboring mutations that prevented replica-
tion in the G418 colony-forming assay (Fig. 7A) exhibited a
phenotype similar to that of the negative control (which lacks
an NS5B active site). This group included replicons with the
mutation of U9296A, the double mutations of C9113A plus
U9296A, and all the triple mutations tested. In contrast, repli-
cons that had generated colony numbers similar to that of the
parental 1b replicon (positive control) generated luciferase
activities indistinguishable from the parental luciferase-encod-
ing replicon. These included C9113A, U9299G, and the double
mutant A9114C A9116U. Significantly, this group also included
the double mutant G9110U C9302A (Fig. 7D). The final group
had intermediate phenotypes, exhibiting a steady decline of
luciferase activity over the second and third day of the time
course experiment but at a lower rate than that of the replicons
that resembled the defective-polymerase negative control. Al-
though we tested only a limited representative range of sub-
stitutions predicted to be involved in the highly conserved (Fig.
4 and 5B) upstream interaction, it was notable that all those
exhibiting an intermediate phenotype were from this group.
This included G9110U, C9302A, and C9303A (Fig. 7D). One
explanation for this could be an increase in RNA stability.
However, since this phenotype was observed only in mutants in
which the RNA structure was destabilized, we suspect that the
enhanced translation may be explained by some factor other
than an increase in RNA stability.
Many viral proteins are multifunctional, for example con-
trolling aspects of the virus replication cycle and the intracel-
lular milieu. Increasingly, studies are demonstrating that the
virus genome also has multiple functions, particularly in the
small RNA and DNA viruses where coding capacity is limited.
In the case of the small positive-strand RNA viruses, the ge-
nome must act as a template for both translation and replica-
tion. At least on the input genome, before a pool of progeny
genomes have been generated, these are mutually exclusive
processes. In certain examples, additional functions ascribed to
the RNA genome include subversion of the innate immune
response, temporal and spatial control of the replication pro-
cess, and encapsidation (12, 28, 35, 37). Functional specificity is
provided by the evolutionary conservation of binding determi-
nants, often in a structural context. The accurate prediction of
stem-loop and higher-order structures therefore provides pri-
mary information on key functional domains of the virus ge-
Well-established thermodynamic methods to predict two-
dimensional RNA structure (e.g., MFOLD; see references 20
and 38) exist; we have extended these methods and imple-
mented them in the program StructureDist to extract the ad-
ditional information present in large data sets of related se-
quences. Using this and an alternative thermodynamic
approach, SFOLD (7), we investigated structures in the termi-
nal 700 nt of the HCV coding region, an area of the genome in
which we had previously identified at least five well-conserved
stem-loop elements (31). One of the five structures predicted,
an interrupted stem-loop starting at nt 9266 (SL9266) shown in
VOL. 82, 2008LONG-RANGE CRE INTERACTION IN HCV9017
previous studies to be a cis-acting replication element, was only
poorly predicted. An alternative nonthermodynamic method
(PFOLD [see references 14 and 24]) robustly predicted
SL9266 in genotype 1, but analysis of all six genotypes of HCV
indicated a hitherto unsuspected interaction of sequences
within SL9266 and a region located approximately 200 nt in a
5? direction (Fig. 4).
The finding of poor RNA structure conservation of the HCV
replication element among alternative structures showing sim-
ilar folding free energies (StructureDist and SFOLD), may
arise from either an incorrect structure prediction for the HCV
CRE using thermodynamic methods or because there is more
than one (metastable) RNA structure in this region. The evi-
dence that the alternative folding better accommodates se-
quence variability between genotypes using PFOLD even
though the standard structure was predicted for individual
genotypes provides further evidence for possible alterations in
RNA structure in this genome region. Unfortunately, none of
the structure prediction methods are able to incorporate ter-
tiary RNA structure interactions, such as pseudoknots or kiss-
ing loop interactions, in predicted structure models. These
interactions may have significant stabilizing or destabilizing
influences on the two predicted structures for the HCV CRE.
Variability in prediction outcomes in this study may therefore
result from incomplete prediction of potential pairings in this
region of the HCV genome.
We investigated the relevance of the two predicted confor-
mations of SL9266 to HCV replication by site-directed mu-
tagenesis of a subgenomic replicon encoding either a neomycin
resistance marker or luciferase reporter gene. The definition of
9018 DIVINEY ET AL.J. VIROL.
SL9266 as a functional CRE was supported by limited site-
directed mutagenesis (Fig. 1C and D and Fig. 7A and B).
Disruption of the lower duplex (in mut1) or the sequences
(mut2) implicated in the “kissing loop” interaction with SL2
(now SL9571 [see reference 15]) in the 3? X tail prevented
replication in agreement with the published results of other
studies (8, 16, 36). Three of the mutants (mut3, mut4, and
mut6) had substitutions of U9296A, a substitution that in our
more extensive mutagenic analysis (Fig. 7) was always incom-
patible with replication. However, our results suggest that the
additional presence of A9281U (compare mut3, mut5, and mut6
in Fig. 1C and D) could somehow compensate for the other-
wise lethal substitution of U9296A. Our present understanding
of SL9266, together with knowledge of interactions of SL9266
with the 3? untranslated region or the upstream sequences
demonstrated here, does not explain how substitution of 9281
(unpaired in the terminal loop of SL9266) compensates for a
mutation that destabilizes the upper duplex of the CRE.
More extensive modification of SL9266 was achieved by
substituting the entire structure with the analogous region of
other genotypes of HCV. These modifications were intended
to allow the distinction between the importance of interactions
within the SL9266 structure and those involving more distant
sequences. Of the representative genotypes chosen, the se-
quence variation was unevenly distributed within the SL9266
structure, presumably reflecting evolutionary conservation of
certain features. Significantly, all of the introduced sequences
were invariant between nucleotides 9284 and 9290 (inclusive)
in the terminal loop, thereby excluding the possibility that the
resulting phenotypes of the chimeric replicons were due to
disruption of the “kissing loop” interaction with SL9571 in the
3? NCR (8). Other regions of significant conservation existed
within the 3? side of the bulge loop (nt 9300 to 9304) and the
central region of each of the two duplexes on either side of the
bulge loop. Unsurprisingly, considering the predicted structure
of SL9266, there was good evidence for covariation within the
region (underlined in Fig. 6A), in particular at nt 9267/9312
and 9275/9296. All but one of the chosen sequences included
an A9281U substitution, and all also carried a change at nt 9291
that created the potential to interact with U9121in the up-
stream region. The resulting phenotypes of replicons in the
G418 transduction assay (Fig. 6B) indicated that there was a
good correlation between the overall level of retained base
pairing—both within SL9266 and between SL9266 and the
upstream sequence around position 9110—and viability of the
chimeric replicon. Chimeras either generated good numbers of
colonies, broadly equivalent in number to the unmodified rep-
licon, or very limited numbers of G418-resistant colonies; the
latter phenotype is consistent with the introduced mutation
being grossly suboptimal for replication, with the appearance
of a limited number of colonies due to the acquisition of one or
more compensatory mutations that restore replicative capacity.
These are considered nonviable without the adaptive changes.
The nonviable chimeras exhibited only 43% (genotype 6g
[JK046]), 40% (genotype 4a [ED43]), or 30% (genotype 3b
[Tr]) covariation, whereas all viable chimeras contained ?50%
covariation. For example, 70% of the 10 nucleotide changes
between the genotype 1b parental replicon and the genotype
6a (EUHK2) chimera were covariant—5 within duplex regions
of SL9266, at nt 9267, 9268, 9275, 9296 and 9311, and a further
2, at nt 9291 and 9299, with regard to the upstream alternative
interaction proposed here. Although based on a limited sample
size, these results suggest that both the SL9266 CRE and the
interaction of SL9266 sequences with the upstream region
were important for replication. These studies also demon-
strated that there was no absolute requirement for a U at nt
9296; the viable chimeric replicons with genotypes 2b (HCJ8),
5a (EUH1480), and 6a (EUHK2) all had a substitution at nt
9296 but also carried a covariant change at nt 9275 that re-
tained the base pairing in the upper stem of SL9266 (Fig. 6A).
However, base pairing of nt 9275/9296, for example in geno-
type 6g (JK046), was alone not sufficient for replication. In this
chimera, encoding an NS5B polypeptide identical to that en-
coded by the viable genotype 1a HP-H construct (Fig. 6A), it is
presumed that the overall reduced level of conserved base
pairing within SL9266 and between the bulge loop of SL9266
and the upstream sequences rendered the chimera nonviable.
Despite demonstrating that replicons chimeric for the
SL9266 CRE exhibiting divergence of ?20% in this region
were still replication competent, the distribution of substitu-
tions within the replaced sequence meant that further site-
directed mutagenesis was required to determine the contribu-
tion of individual nucleotides to the predicted RNA-RNA
interactions with the upstream region. Individual substitutions
of U9107, C9113, and U9299were not detrimental to replicon
activity, whether determined by luciferase activity or the gen-
eration of G418 resistance (Fig. 7). Of these, C9113and U9299
are juxtaposed in the predicted long-range interaction but are
not complementary in the majority of sequences. In contrast, a
possible base pair between nt 9107 and 9305 is highly con-
served but apparently not necessary for replication (see U9107C
in Fig. 7A and Fig. 5B). Although substitution of nt 9107 had
no apparent effect, modification of A9305in isolation in a pre-
vious study (A92G/C/U in Fig. 7 of reference 36) generated a
wild-type phenotype when the A was converted to C, reduced
colony numbers when it was changed to U, and no colonies
when it was converted to G. This suggests qualitative differ-
ences between the potential A-U or G-U pairing of nt 9107/
9305 or, more likely, that nt 9305 is possibly involved in an-
other RNA or protein interaction that has yet to be defined.
Although covariation of nt 9275/9296 (Fig. 6A) could be
accommodated without destroying replication, all individual
substitutions of A9296or combinations of mutations that in-
cluded a change of A9296were incapable of replicating (Fig. 7A
and B). This included the combination of A9296U with substi-
tutions at nt 9113, 9114, and 9116. The latter were designed to
increase potential hydrogen bonding between sequences within
SL9266 and the upstream region. We interpret this to mean
that additional bonding between these more distant regions
cannot compensate for disruption of the upper duplex of
The remaining substitutions involved the highly conserved
5-nt 5?-GCCCG motif occupying the subterminal bulge loop of
SL9266 and the perfect complementarity to a 5?-CGGGC se-
quence centered on nt 9110. Individual synonymous substitu-
tions in both regions, of C9108A, of G9110to U, A, or C, and of
C9303A or C9302to U, A, or G all prevented colony formation
in the G418 transduction assay. Of these, only C9302U was
predicted to retain any capacity to base pair with the upstream
region. Interestingly, despite using standardized transfection
VOL. 82, 2008LONG-RANGE CRE INTERACTION IN HCV9019
conditions as with the chimeric SL9266 exchanges, point mu-
tations in this region did not generate any colonies in our
assays. Although not tested, this implies these mutants were
incapable of generating revertant colonies under G418 selec-
tion. We went on to investigate the effect of substitutions in
both parts of the predicted interacting sequence. In every case,
dual mutations that restored the potential for base pairing
between positions 9110 and 9302 resulted in a replication-
competent phenotype (Fig. 7B and C). Individually, both nt
9110 and 9302 were substituted for each possible alternative
nucleotide, indicating no sequence specificity at either posi-
tion. It was perhaps surprising therefore that the single substi-
tution of C9302U, which left a potential interaction with G9110,
was incapable of replicating when a G9110A C9302U double
mutant was viable. This strongly implies that a canonical
Watson-Crick pairing may be essential in this position to en-
sure the interaction of the two interacting regions. This con-
clusion is supported by the results of analysis of a large data set
of divergent HCV sequences, corresponding to available com-
plete genome sequences of all six genotypes of HCV, in which
none were identified with a G-U at this position (the distribu-
tion was 12% A-U and 88% G-C; data not shown). The re-
quirement to retain synonymous substitutions prevented an
individual mutation being introduced to restore complemen-
tarity between nt 9303 and 9109 (which, respectively, form the
first and second nucleotides in codons coding for arginine and
Our results strongly support a long-range interaction be-
tween highly conserved sequences located in the subterminal
bulge loop of SL9266 and a similarly conserved upstream re-
gion around nt 9110 that is not implicated in any evolutionary
conserved RNA structure. Additional supporting data for the
importance of this interaction comes from the study by Friebe
et al., who constructed a G9300A substitution (designated
bulge-G3A [see reference 8]) in a replicon with a duplication
of SL9266 sequences and the flanking regions within the 3?
NCR. This substitution rendered the replicon nonviable and
because G9300was now noncoding, this could not be attributed
to a defect in NS5B. In one construct, P1-ins3.2 (8), SL9266
alone was duplicated in the 3? NCR of a replicon bearing
synonymous substitutions that disrupted the native SL9217,
SL9266, and SL9324 structures in the NS5B coding region.
Although this replicon exhibited 10- to 15- fold-lower replica-
tion activity than the wild type did, it implies that the distance
separating sequences around nt 9110 and the complementary
functional SL9266 sequences are not absolutely critical for
The data available from our analysis and reinterpretation of
previous studies of SL9266 (8, 16, 36) cannot unequivocally
demonstrate whether formation of SL9266 and either or both
of the upstream and downstream interactions are mutually
exclusive events or could occur simultaneously. A number of
scenarios are possible; the rather weak (as evidenced by the
poor bioinformatic prediction) SL9266 structure could be sta-
bilized by interaction with either or both sequences around nt
9110 and SL9571 to form a complex extended pseudoknot
containing four duplexed regions. Alternatively, interaction of
sequences normally not paired within SL9266 with the 3? NCR
and the 9110 region could destabilize or prevent formation of
SL9266, thereby forming a molecular switch capable of adopt-
FIG. 8. Proposed structure of a complex pseudoknot in hepatitis C virus. (A) The solid black horizontal lines above and below a linear
representation of the HCV genome (broken line) indicate the interactions involved in formation of SL9266 (above) and the long-range interactions
(below) with sequences located 5? and 3? to SL9266. The positions of evolutionarily conserved stem-loop structures in the NS5B coding region and
the X tail in the 3? NCR are also indicated. (B) Schematic of a complex pseudoknot involving SL9266 and long-range interactions between the
subterminal bulge loop and sequences centered on nucleotide 9110 and the SL9266 terminal loop and complementary sequences in SL9571.
9020 DIVINEY ET AL.J. VIROL.
ing at least two conformations. Intermediates between these
two examples, separately involving the 3? NCR or the upstream
region, are also possible. Further mutagenic and functional
studies will be needed to distinguish between these various
possibilities. Considering the available data, we currently favor
a model in which SL9266 interacts, at least some of the time,
with both the upstream and downstream sequences to form an
extended pseudoknot structure, as illustrated in Fig. 8. In our
model, we define the upstream interaction as involving
complementarity between 5?-CGGGC and 5?-GCCCG se-
quences centered on nt 9110 and 9302, respectively. Good
evidence to support this interpretation includes the primary
involvement of single-stranded regions of SL9266 in the long-
range interactions. Furthermore, the phenotype exerted by the
majority of substitutions introduced to SL9266 in this and
previous studies can be interpreted as affecting either SL9266
per se or one or other of the long-range interactions. Se-
quences within the region from nt 9108 to 9112/9300 to 9304
are highly conserved; of 192 divergent HCV sequences ana-
lyzed, all exhibited G9109to C9303and C9112to G9300pairings.
There was a single, presumably unpaired, variant of C9108to
A9304, the remainder being C9108to G9304, and another single-
ton of G9111to U9301, with all others in the data set being G to
C pairs at this position (data not shown). The variation of nt
9110 and 9302 is listed above. This conservation of Watson-
Crick pairings presumably explains the inhibition of replication
mediated by the C9303A substitution constructed by You and
colleagues (for their substitutions of C90, see reference 36).
Overall, there is less variation or covariation in the unpaired
regions of SL9266, compared with the lower and upper du-
plexes of the stem-loop (36; data not shown). The lack of
covariation in the pentanucleotide motif forming the upstream
interaction described here is presumably a consequence of the
juxtaposition of the third base “wobble” position of the codons
in these regions; almost all variation is restricted to substitu-
tion of a G9110-C9302pair by an A-U pair in genotype 6 se-
Many viruses are known to possess pseudoknots that con-
tribute essential functions during the replication cycle. In most
viruses, pseudoknots located within coding regions are primar-
ily involved in translational control, in particular ?1 frame-
shifting (2). However, there is no evidence for such a role in
HCV, and the previously demonstrated positional indepen-
dence of SL9266 would argue strongly against any such func-
tion. Instead, it seems likely that the RNA structure forming
SL9266, together with interactions of the unpaired loop se-
quences of SL9266 and both upstream and downstream re-
gions, has one of more functions in genome replication. Pre-
cedents exist in bacteriophages, several plant viruses, and some
animal RNA viruses. The first identified pseudoknot, the
tRNA-like sequence (TLS) of turnip yellow mosaic virus (27),
has multiple functions, including recruitment of a nucleotidyl-
transferase for genome completion and genome circularization
(or at least juxtaposition of the 5? and 3? ends) probably via
interaction with eIF1a and consequent enhancement of trans-
lation. The TLS is also implicated in the switch from transla-
tion of the input genome to replication by competitive binding
with newly synthesized viral polymerase and may also have a
role in late replication functions, such as encapsidation (4, 11,
21, 22). Genome circularization by the TLS is probably protein
mediated, but long-range RNA-RNA interactions that form
pseudoknots can critically influence the global folding of RNA.
Such interactions form the core of the ribosome (reviewed in
reference 2) and are also known to occur in virus genomes. In
bacteriophage Q?, a pseudoknot spanning 1.2 kb of the ge-
nome recruits the 3? end of the genome to the internally bound
viral replicase (13). Similarly, recruitment of the replicase to
the 3? end of porcine reproductive and respiratory syndrome
virus requires a long-range (?300-nt) pseudoknot (33).
Considering the important role in replication of the complex
pseudoknot proposed here, it is perhaps unsurprising that the
RNA structures in the 3? end of the HCV coding region (3)
and SL9266, forming the core of the pseudoknot, interact with
NS5B in in vitro assays (16). Although further investigation is
required to define the function(s) of this complex RNA struc-
ture in the translation and replication of the HCV genome, our
demonstration of important 5? interactions with the subtermi-
nal bulge loop of SL9266 provides a structural basis on which
these studies can be based.
We thank R. Bartenschlager for the neomycin-encoding replicon
and GlaxoSmithKline for the luciferase-encoding replicon.
We thank the Medical Research Council for financial support
(D.J.E. and P.S.) and MRC/GlaxoSmithKline for a CASE Ph.D. stu-
dentship (to R.M.E.) for V.A.
1. Blight, K. J., A. A. Kolykhalov, and C. M. Rice. 2000. Efficient initiation of
HCV RNA replication in cell culture. Science 290:1972–1974.
2. Brierley, I., S. Pennell, and R. J. Gilbert. 2007. Viral RNA pseudoknots:
versatile motifs in gene expression and replication. Nat. Rev. Microbiol.
3. Cheng, J. C., M. F. Chang, and S. C. Chang. 1999. Specific interaction
between the hepatitis C virus NS5B RNA polymerase and the 3? end of the
viral RNA. J. Virol. 73:7044–7049.
4. Choi, Y. G., and A. L. N. Rao. 2003. Packaging of brome mosaic virus RNA3
is mediated through a bipartite signal. J. Virol. 77:9750–9757.
5. Clerte, C., and K. B. Hall. 2006. Characterization of multimeric complexes
formed by the human PTB1 protein on RNA. RNA 12:457–475.
6. Ding, Y., C. Y. Chan, and C. E. Lawrence. 2005. RNA secondary structure
prediction by centroids in a Boltzmann weighted ensemble. RNA 11:1157–
7. Ding, Y., and C. E. Lawrence. 2003. A statistical sampling algorithm for RNA
secondary structure prediction. Nucleic Acids Res. 31:7280–7301.
8. Friebe, P., J. Boudet, J. P. Simorre, and R. Bartenschlager. 2005. Kissing-
loop interaction in the 3? end of the hepatitis C virus genome essential for
RNA replication. J. Virol. 79:380–392.
9. Friebe, P., V. Lohmann, N. Krieger, and R. Bartenschlager. 2001. Sequences
in the 5? nontranslated region of hepatitis C virus required for RNA repli-
cation. J. Virol. 75:12047–12057.
10. Fukushi, S., M. Okada, T. Kageyama, F. B. Hoshino, K. Nagai, and K.
Katayama. 2001. Interaction of poly(rC)-binding protein 2 with the 5?-ter-
minal stem loop of the hepatitis C-virus genome. Virus Res. 73:67–79.
11. Giege, R. 1996. Interplay of tRNA-like structures from plant viral RNAs with
partners of the translation and replication machineries. Proc. Natl. Acad. Sci.
12. Han, J.-Q., H. L. Townsend, B. K. Jha, J. M. Paranjape, R. H. Silverman,
and D. J. Barton. 2007. A phylogenetically conserved RNA structure in the
poliovirus open reading frame inhibits the antiviral endoribonuclease RNase
L. J. Virol. 81:5561–5572.
13. Klovins, J., V. Berzins, and J. van Duin. 1998. A long-range interaction in
Qbeta RNA that bridges the thousand nucleotides between the M-site and
the 3? end is required for replication. RNA 4:948–957.
14. Knudsen, B., and J. Hein. 1999. RNA secondary structure prediction using
stochastic context-free grammars and evolutionary history. Bioinformatics
15. Kuiken, C., C. Combet, J. Bukh, I. T. Shin, G. Deleage, M. Mizokami, R.
Richardson, E. Sablon, K. Yusim, J. M. Pawlotsky, and P. Simmonds. 2006.
A comprehensive system for consistent numbering of HCV sequences, pro-
teins and epitopes. Hepatology 44:1355–1361.
16. Lee, H., H. Shin, E. Wimmer, and A. V. Paul. 2004. cis-Acting RNA signals
VOL. 82, 2008LONG-RANGE CRE INTERACTION IN HCV 9021
in the NS5B C-terminal coding sequence of the hepatitis C virus genome.
J. Virol. 78:10865–10877.
16a.Lemon, S. M., C. Walker, M. J. Alter, and M. Yi. 2007. Hepatitis C virus, p.
1253–1304. In D. M. Knipe, P. M. Howley, D. E. Griffin, R. A. Lamb, M. A.
Martin, B. Roizman, and S. E. Straus (ed.), Fields virology, 5th ed. Lippin-
cott Williams & Wilkins, Philadelphia, PA.
17. Lindenbach, B. D., M. J. Evans, A. J. Syder, B. Wolk, T. L. Tellinghuisen,
C. C. Liu, T. Maruyama, R. O. Hynes, D. R. Burton, J. A. McKeating, and
C. M. Rice. 2005. Complete replication of hepatitis C virus in cell culture.
18. Lohmann, V., F. Ko ¨rner, J. Koch, U. Herian, L. Theilmann, and R. Barten-
schlager. 1999. Replication of subgenomic hepatitis C virus RNAs in a
hepatoma cell line. Science 285:110–113.
19. Luo, G., S. Xin, and Z. Cai. 2003. Role of the 5?-proximal stem-loop struc-
ture of the 5? untranslated region in replication and translation of hepatitis
C virus RNA. J. Virol. 77:3312–3318.
20. Markham, N. R., and M. Zuker. 2005. DINAMelt web server for nucleic acid
melting prediction. Nucleic Acids Res. 33:W577–W581.
21. Matsuda, D., and T. W. Dreher. 2004. The tRNA-like structure of turnip
yellow mosaic virus RNA is a 3?-translational enhancer. Virology 321:36–46.
22. Matsuda, D., S. Yoshinari, and T. W. Dreher. 2004. eEF1A binding to
aminoacylated viral RNA represses minus strand synthesis by TYMV RNA-
dependent RNA polymerase. Virology 321:47–56.
23. McMullan, L. K., A. Grakoui, M. J. Evans, K. Mihalik, M. Puig, A. D.
Branch, S. M. Feinstone, and C. M. Rice. 2007. Evidence for a functional
RNA element in the hepatitis C virus core gene. Proc. Natl. Acad. Sci. USA
24. Pedersen, J. S., I. M. Meyer, R. Forsberg, P. Simmonds, and J. Hein. 2004.
A comparative method for predicting and folding RNA secondary structures
within protein-coding regions. Nucleic Acids Res. 32:4925–4936.
25. Pietschmann, T., A. Kaul, G. Koutsoudakis, A. Shavinskaya, S. Kallis, E.
Steinmann, K. Abid, F. Negro, M. Dreux, F. L. Cosset, and R. Barten-
schlager. 2006. Construction and characterization of infectious intrageno-
typic and intergenotypic hepatitis C virus chimeras. Proc. Natl. Acad. Sci.
26. Reusken, C. B., T. J. Dalebout, P. Eerligh, P. J. Bredenbeek, and W. J.
Spaan. 2003. Analysis of hepatitis C virus/classical swine fever virus chimeric
5?NTRs: sequences within the hepatitis C virus IRES are required for viral
RNA replication. J. Gen. Virol. 84:1761–1769.
27. Rietveld, K., R. van Poelgeest, C. W. A. Pleij, J. H. Van Boom, and L. Bosch.
1982. The tRNA-like structure at the 3? terminus of turnip yellow mosaic
virus RNA. Differences and similarities with canonical tRNA. Nucleic Acids
28. Sasaki, J., and K. Taniguchi. 2003. The 5?-end sequence of the genome of
Aichi virus, a picornavirus, contains an element critical for viral RNA en-
capsidation. J. Virol. 77:3542–3548.
29. Simmonds, P., A. Tuplin, and D. J. Evans. 2004. Detection of genome-scale
ordered RNA structure (GORS) in genomes of positive-stranded RNA
viruses: implications for virus evolution and host persistence. RNA 10:1337–
30. Spangberg, K., and S. Schwartz. 1999. Poly(C)-binding protein interacts with
the hepatitis C virus 5? untranslated region. J. Gen. Virol. 80:1371–1376.
31. Tuplin, A., D. J. Evans, and P. Simmonds. 2004. Detailed mapping of RNA
secondary structures in core and NS5B-encoding region sequences of hep-
atitis C virus by RNase cleavage and novel bioinformatic prediction methods.
J. Gen. Virol. 85:3037–3047.
32. Tuplin, A., J. Wood, D. J. Evans, A. H. Patel, and P. Simmonds. 2002.
Thermodynamic and phylogenetic prediction of RNA secondary structures
in the coding region of hepatitis C virus. RNA 8:824–841.
33. Verheije, M. H., R. C. L. Olsthoorn, M. V. Kroese, P. J. M. Rottier, and
J. J. M. Meulenberg. 2002. Kissing interaction between 3? noncoding and
coding sequences is essential for porcine arterivirus RNA replication. J. Vi-
34. Wakita, T., T. Pietschmann, T. Kato, T. Date, M. Miyamoto, Z. Zhao, K.
Murthy, A. Habermann, H. G. Krausslich, M. Mizokami, R. Bartenschlager,
and T. J. Liang. 2005. Production of infectious hepatitis C virus in tissue
culture from a cloned viral genome. Nat. Med. 11:791–796.
35. Wang, S., and K. A. White. 2007. Riboswitching on RNA virus replication.
Proc. Natl. Acad. Sci. USA 104:10406–10411.
36. You, S., D. D. Stump, A. D. Branch, and C. M. Rice. 2004. A cis-acting
replication element in the sequence encoding the NS5B RNA-dependent
RNA polymerase is required for hepatitis C virus RNA replication. J. Virol.
37. Yu, L., and L. Markoff. 2005. The topology of bulges in the long stem of the
flavivirus 3? stem-loop is a major determinant of RNA replication compe-
tence. J. Virol. 79:2309–2324.
38. Zuker, M. 2003. Mfold web server for nucleic acid folding and hybridization
prediction. Nucleic Acids Res. 31:3406–3415.
9022DIVINEY ET AL.J. VIROL.