Crystal structure of the Escherichia coli 23S rRNA:m5C methyltransferase RlmI (YccW) reveals evolutionary links between RNA modification enzymes.
ABSTRACT Methylation is the most common RNA modification in the three domains of life. Transfer of the methyl group from S-adenosyl-l-methionine (AdoMet) to specific atoms of RNA nucleotides is catalyzed by methyltransferase (MTase) enzymes. The rRNA MTase RlmI (rRNA large subunit methyltransferase gene I; previously known as YccW) specifically modifies Escherichia coli 23S rRNA at nucleotide C1962 to form 5-methylcytosine. Here, we report the crystal structure of RlmI refined at 2 A to a final R-factor of 0.194 (R(free)=0.242). The RlmI molecule comprises three domains: the N-terminal PUA domain; the central domain, which resembles a domain previously found in RNA:5-methyluridine MTases; and the C-terminal catalytic domain, which contains the AdoMet-binding site. The central and C-terminal domains are linked by a beta-hairpin structure that has previously been observed in several MTases acting on nucleic acids or proteins. Based on bioinformatics analyses, we propose a model for the RlmI-AdoMet-RNA complex. Comparative structural analyses of RlmI and its homologs provide insight into the potential function of several structures that have been solved by structural genomics groups and furthermore indicate that the evolutionary paths of RNA and DNA 5-methyluridine and 5-methylcytosine MTases have been closely intertwined.
-
Article: SAM (dependent) I AM: the S-adenosylmethionine-dependent methyltransferase fold.
[show abstract] [hide abstract]
ABSTRACT: The S-adenosylmethionine-dependent methyltransferase enzymes share little sequence identity, but incorporate a highly conserved structural fold. Surprisingly, residues that bind the common cofactor are poorly conserved, although the binding site is localised to the same region of the fold. The substrate-binding region of the fold varies enormously. Over the past two years, there has been a significant increase in the number of structures that are known to incorporate this fold, including several uncharacterized proteins and two proteins that lack methyltransferase activity.Current Opinion in Structural Biology 01/2003; 12(6):783-93. · 9.42 Impact Factor -
Article: Many paths to methyltransfer: a chronicle of convergence.
[show abstract] [hide abstract]
ABSTRACT: S-adenosyl-L-methionine (AdoMet) dependent methyltransferases (MTases) are involved in biosynthesis, signal transduction, protein repair, chromatin regulation and gene silencing. Five different structural folds (I-V) have been described that bind AdoMet and catalyze methyltransfer to diverse substrates, although the great majority of known MTases have the Class I fold. Even within a particular MTase class the amino-acid sequence similarity can be as low as 10%. Thus, the structural and catalytic requirements for methyltransfer from AdoMet appear to be remarkably flexible.Trends in Biochemical Sciences 07/2003; 28(6):329-35. · 10.85 Impact Factor -
SourceAvailable from: biochemsoctrans.cn
Article: S-adenosylmethionine: jack of all trades and master of everything?
[show abstract] [hide abstract]
ABSTRACT: SAM (S-adenosylmethionine, also known as AdoMet) is well known as the methyl donor for the majority of methyltransferases that modify DNA, RNA, histones and other proteins, dictating replicational, transcriptional and translational fidelity, mismatch repair, chromatin modelling, epigenetic modifications and imprinting, which are all topics of great interest and importance in cancer research and aging. In total, 15 superfamilies of SAM-binding proteins have been identified, with many additional functions varying from methylation of phospholipids and small molecules such as arsenic to synthesis of polyamines or radical formation. SAM is regenerated from demethylated SAM via the methionine cycle, which involves folate. Imbalance of this cycle in humans, e.g. through folate shortage via dietary insufficiency, alcohol abuse, arsenic poisoning or hereditary factors, leads to depletion of SAM and human disease. In addition to its role as a methyl donor to modification enzymes that protect bacterial DNA against cognate restriction, SAM also serves as a co-factor for nucleases such as the type I restriction enzyme EcoKI, which is unable to restrict DNA in the absence of SAM. Finally, on a completely different tack, SAM can bind to certain RNA structures called riboswitches that control transcription or translation. In this way, expression of multiple genes can be regulated in a SAM-dependent manner, an unexpected finding that opens up new avenues into gene control. This minireview discusses some of these diverse and amazing roles of this small metabolite.Biochemical Society Transactions 05/2006; 34(Pt 2):330-3. · 3.71 Impact Factor
Page 1
Crystal Structure of the Escherichia coli 23S rRNA:m5C
Methyltransferase RlmI (YccW) Reveals Evolutionary
Links between RNA Modification Enzymes
S. Sunita1, Karolina L. Tkaczuk2,3, Elzbieta Purta2,4,5, Joanna M. Kasprzak6,
Stephen Douthwaite5, Janusz M. Bujnicki2,6and J. Sivaraman1⁎
1Department of Biological
Sciences, National University of
Singapore, 14 Science Drive,
Singapore 117543
2Laboratory of Bioinformatics
and Protein Engineering,
International Institute of
Molecular and Cell Biology in
Warsaw, ul. Ks. Trojdena 4,
02-109 Warsaw, Poland
3Institute of Technical
Biochemistry, Technical
University of Lodz,
B. Stefanowskiego 4/10,
90-924 Lodz, Poland
4Institute of Biochemistry and
Biophysics, Polish Academy of
Sciences, Pawinskiego 5a,
02-106 Warsaw, Poland
5Department of Biochemistry
and Molecular Biology,
University of Southern
Denmark, Campusvej 55,
DK-5230 Odense M, Denmark
6Bioinformatics Laboratory,
Institute of Molecular Biology
and Biotechnology, Faculty of
Biology, Adam Mickiewicz
University, ul. Umultowska 98,
61-614 Poznan, Poland
Received 8 May 2008;
received in revised form
19 August 2008;
accepted 21 August 2008
Available online
29 August 2008
Edited by J. Doudna
Methylation is the most common RNA modification in the three domains of
life. Transfer of the methyl group from S-adenosyl-L-methionine (AdoMet)
to specific atoms of RNA nucleotides is catalyzed by methyltransferase
(MTase) enzymes. The rRNA MTase RlmI (rRNA large subunit methyl-
transferase gene I; previously known as YccW) specifically modifies
Escherichia coli 23S rRNA at nucleotide C1962 to form 5-methylcytosine.
Here, we report the crystal structure of RlmI refined at 2 Å to a final R-factor
of 0.194 (Rfree=0.242). The RlmI molecule comprises three domains: the N-
terminal PUA domain; the central domain, which resembles a domain pre-
viously foundin RNA:5-methyluridine MTases; and the C-terminal catalytic
domain, which contains the AdoMet-binding site. The central and C-
terminal domains are linked by a β-hairpin structure that has previously
been observed in several MTases acting on nucleic acids or proteins. Based
onbioinformaticsanalyses,weproposeamodelfortheRlmI–AdoMet–RNA
complex.ComparativestructuralanalysesofRlmIanditshomologsprovide
insightintothe potential function of severalstructuresthat have been solved
by structural genomics groups and furthermore indicate that the evolu-
tionary paths of RNA and DNA 5-methyluridine and 5-methylcytosine
MTases have been closely intertwined.
© 2008 Elsevier Ltd. All rights reserved.
Keywords: crystal structure; rRNA modification; methyltransferase; YccW; RlmI
*Corresponding author. E-mail address: dbsjayar@nus.edu.sg.
Abbreviationsused:AdoMet,S-adenosyl-L-methionine;MTase,methyltransferase;RFM,Rossmann-foldmethyltransferases;
m5C, 5-methylcytosine; SelMet, L-selenomethionine; AUC, analytical ultracentrifugation; PDB, Protein Data Bank; r-protein,
ribosomal protein; AdoHcy, S-adenosylhomocysteine; ITC, isothermal titration calorimetry; m5U, 5-methyluridine.
doi:10.1016/j.jmb.2008.08.062J. Mol. Biol. (2008) 383, 652–666
Available online at www.sciencedirect.com
0022-2836/$ - see front matter © 2008 Elsevier Ltd. All rights reserved.
Page 2
Introduction
S-Adenosyl-L-methionine(AdoMet)isaubiquitous
methyl donor in biological systems, and AdoMet-
dependent methyltransferases (MTases) represent a
large group of enzymes that show considerable
diversity with respect to structure and mechanism
of action. Functions mediated by these enzymes
include signal transduction, metabolism, biosynth-
esis, gene expression, and protein trafficking and
sorting;1,2substrates for these enzymes include
various small molecules, proteins, lipids, and nucleic
acids.3At least seven different three-dimensional
folds are exhibited by AdoMet-dependent MTases.4
The largest superfamily—the Rossmann-fold methyl-
transferases (RFM)—is characterized by the presence
of an AdoMet-binding/catalytic domain that resem-
bles that of Rossmann-fold oxidoreductases.5MTases
often contain additional domains, sometimes with
elaborations of the common fold, and these are often
involved in the recognition and binding of substrates
or in mediation of oligomerization.1
RNA molecules exhibit a large variety of more than
100different types of nucleosidemodification†—more
than half of which involve methylation.6–8Most of the
modificationsarefoundintRNAs,9andmanyofthese
havebeenconnectedtorolesthatincludemaintenance
oftranslationalefficiencyandfidelity,regulationofcell
cycle transitions, and tRNA–protein interaction.10In
rRNAs, nucleoside methylations, together with their
MTases, play crucial roles in the assembly, matura-
tion, and regulation of the protein synthetic cellular
machinery,11–13and can additionallyconferantibiotic
resistance.14The typesofmodifiednucleosidesfound
in RNA are, by now, fairly comprehensively charted
and, in many cases, the enzymes that are responsible
forthesemodificationsarealsoknown.However,our
understandingofthestructuresandcatalyticmechan-
ismsoftheseRNA-specificMTaseshasconspicuously
lagged behind.
Here we report the crystal structure and bioinfor-
matics/phylogenetic study of the recently character-
ized 5-methylcytosine (m5C) RNA MTase RlmI
(rRNA large subunit methyltransferase gene I; for-
merly known as YccW). RlmI specifically methylates
nucleotideC1962inEscherichiacoli23SrRNA.15RlmI
belongs to the COG1092 family, which comprises a
number of functionally uncharacterized putative
MTases with similar domain compositions and also
includes longer proteins such as the 23S rRNA:
m2G2445 MTase RlmL (formerly known as YcbY).16
A comparative structural analysis of RlmI enables us
to predict the potential function of several presently
uncharacterized MTases. The relationship between
the higher-order structure and catalytic function of
RlmI further reveals important clues about the evo-
lution of MTases that modify the C5 atom of pyri-
midines in DNA and RNA.
Results and Discussion
Structure of RlmI
The structure of recombinant E. coli RlmI was
solved by the multiwavelength anomalous disper-
sionmethod17fromsynchrotrondatausing L-seleno-
methionine (SelMet)-labeled protein. The model was
refined to a final R-factor of 0.194 (Rfree=0.242) at
2.0 Å resolution, with good stereochemical para-
meters (Fig. 1a, Table 1). The RlmI protein consists of
396 amino acids and an N-terminal (His)6tag, and
the crystallographic model clearly shows the path of
†http://www.modomics.genesilico.pl/modification_
list
Fig. 1. Structure of RlmI. (a) Ribbon diagram of the
RlmI monomer. (b) Ribbon diagram of the RlmI dimer.
The catalytic RFM domain is shown in blue, the β-hairpin
structure is shown in green, the EEHEE domain is shown
in red, and other nonconserved structural elements are
shown in gray. The second subunit in the dimer is shown
in yellow.
653
Structure of YccW/RlmI
Page 3
the polypeptide chain from Ser2 to Met396. The N-
terminal residue Met1 and the (His)6affinity tag had
nointerpretableelectrondensityandwerenotmodeled.
RlmI wasfoundtoexist asahomodimer in solution
with an apparent mass of 100 kDa, as determined by
analytical ultracentrifugation (AUC) experiments
(Supplementary Fig. 1). This is consistent with the
observation of a dimer in the asymmetric unit of the
crystal (Fig. 1b). Each subunit of the RlmI dimer
comprises three domains: the N-terminal PUA
domain (residues Met1-Asp75), the central domain
(residues Ile76-Tyr205), and the C-terminal RFM
domain (residues Leu206-Met396). The N-terminal
PUA domain assumes a predominantly β-stranded
structurewithoneαhelix(β1↑α1β2↓β3↑β4↓β5↑).18
The central domain has five β-strands and three α-
helices(α2β6↑β7↓β8↑α3β9↑α4β10↓),andislinked
totheC-terminaldomainbyaβ-hairpinloop(β11and
β12). The C-terminal domain exhibits a typical RFM
fold5consisting of a seven-stranded β-sheet sur-
rounded by α5–α7, on one side, and α8–α10, on the
other side (α5 β13↑ α6 β14↑ α7 β15↑ α8 β16↑ α9 β17↑
α10 β19↓ β18↑). Most of the interdomain contacts
within the monomer are between the N-terminal
domain and thecentral domain,which includeTrp21,
Ser56, Ser59, and Arg64 from the N-terminal domain,
and Glu108 and Glu165 from the central domain.
Each monomer has dimensions of approximately
70 Å (length) and 35 Å (width) (Fig. 1a). The two
monomers of the dimeric RlmI exhibit a high degree
of complementary packing, where the N-terminal of
one monomer packs against the C-terminal of the
other.Residuesinthemonomersareatdistancesthat
would maintain the dimer by 11 strong hydrogen-
bonding contacts (b3.2 Å) and several hydrophobic
interactions. In the dimer form, the residues Arg17,
Arg18, Lys88, Trp92, and Asp97from the N-terminal
domain of one protomer have hydrogen-bonding
contacts with the residues Arg217, Lys222, Asp355,
Ile358, Asp363, and Arg394 from the C-terminal
domain of the other protomer. The crystallographic
dimerizationinterfaceshowsaburiedsurfaceareaof
∼3666Å2,whichcovers13%ofthetotalsurfaceareaof
onemonomer,ascalculatedusingtheCNSprogram.19
A DALI search20for globally similar proteins was
performed within the Protein Data Bank (PDB). Sig-
nificant structural similarity was found between RlmI
and four other members of COG1092, which are all
Table 1. Crystallographic data collection and refinement statistics
Data setPeakInflection
Cell parameters
a=64.31 Å, b=80.88 Å,
c=66.16 Å, β=98.91°
P21
a=64.44 Å, b=80.96 Å,
c=66.23 Å, β=98.92°
P21
Space group
Data collection
Resolution range (Å)
Wavelength (Å)
Observed reflections N1σ
Unique reflections
Completeness (%)
Overall (I/σI)
I/σI for the highest-resolution shell (2.1–2 Å)
Overall Rsyma(%)
Rsyma(%) for the highest-resolution shell (2.1–2 Å)
50–2.0
0.9791
320,168
84,024
99.9
18.6
5.9
5.6
16.7
50–2.0
0.9795
320,547
84,387
100
16.7
4.5
5.8
22.1
Refinement and quality
Resolution range (Å)
Rworkb(number of reflections)
Rworkbfor the highest-resolution shell (2.1–2 Å)
Rfreec(number of reflections)
Rfreecfor the highest-resolution shell (2.1–2 Å)
Cross-validated estimated coordinate error (Å)
ESD from C–V Luzzati plot
ESD from C–V SIGMAA
RMSD bond lengths (Å)
RMSD bond angles(°)
Average B-factorsd(Å2)
Main chain
Side chains
B-RMSD main chain (Å2)
B-RMSD side chain (Å2)
Ramachandran plot
% residues in favored regions (number of residues)
% residues in allowed regions (number of residues)
Nonglycine residues in disallowed regions (%)
45–2.0
0.194 (74,448)
0.22
0.242 (8199)
0.28
0.28
0.18
0.01
1.24
23.26
26.41
0.92
1.83
95.6 (751)
99.4 (781)
0.0
ESD, Estimated Standard Deviation; C–V, Cross validated.
aRsym=∑|Ii−〈I〉|/∑|Ii|, where Iiis the intensity of the ith measurement and 〈I〉 is the mean intensity for that reflection.
bRwork=∑|Fobs−Fcalc|/∑|Fobs|, where Fcalcand Fobsare the calculated and observed structure factor amplitudes, respectively.
cRfreeas for Rwork, but for 9% of the total reflections chosen at random and omitted from refinement.
dIndividual B-factor refinements were calculated.
654
Structure of YccW/RlmI
Page 4
uncharacterized proteins solved by structural geno-
mics consortia, thus far without published functional
analyses (Fig. 2). The highest structural similarity was
observed between RlmI and PH1915 from Pyrococcus
horikoshii (PDB code 2as0),21followed by TTHA1280
from Thermus thermophilus (PDB code 1wxx)22and
SMU776 from Streptococcus mutans (PDB code 2b78;
Nan, J., Wang, K. T. & Su, X. D., manuscript in pre-
paration). These three proteins form dimers similar to
RlmI. A dimeric structure in solution has been also
confirmed for TTHA1280 (unpublished AUC experi-
mentscitedinPioszaketal.22).RlmIalsoshowsamore
remotestructural similarityto aprotein AGC592 from
Agrobacterium tumefaciens (PDB code 2igt; Kim, Y.,
Joachimiak, A., Xu, X., Gu, J., Edwards, A. & Sav-
chenko, A., manuscript in preparation), a diverged
member of COG1092 that lacks the N-terminal PUA
domain and forms a trimer in the crystal. Overall, the
tertiary structures of all COG1092 members super-
impose well, with only slight differences in the
positions of corresponding domains (Fig. 2).
Evolutionary relationship of RlmI to other
protein structures
The RlmI family is composed of orthologs that
presumably perform the same functions and para-
logsthatwouldhavedifferentfunctions,andonehas
to discriminate between these subgroups before
functionally important residues can be identified
andinterpreted within structural contexts.Aswould
be expected, alignments reveal conservation of the
AdoMet-binding site (motifs I, II, and III) among all
members of the COG1092 family (Supplementary
Fig. 2; representative sequences are shown in Fig. 3).
The conserved sequence D-P-P-X-(F/Y/h) in motif
IV [X is any residue and h is a hydrophobic residue]
displays similarity to the (D/N/S)-P-P-(F/Y/W/H)
motifofMTasesthatactonexocyclicnitrogengroups
in nucleic acids.23One residue has been inserted
between the D-P-P sequence and the aromatic/
hydrophobic residue, which respectively stabilize
the target base by hydrogen-bonding and stacking
interactions. Similarities are also observed in the
structurally important motif V. However, the
sequences of the COG1092 family diverge in C-ter-
minal motifs VI–VIII, which are involved in sub-
strate binding and catalysis; this suggests that these
proteins should be divided into subgroups. In
particular, there are differences in motifs VI and VIII
where catalytic residues of RFM are often located.
A phylogenetic tree of the COG1092 family based
on the alignment (Fig. 4) reveals five main branches
correspondingtoproteins withdifferentfeatures.The
first and largest branch groups true orthologs of RlmI
from Gram-negative and Gram-positive bacteria, as
wellasfromarchaea.Theseproteinsarecharacterized
by the presence of Cys in motif VI and an N-terminal
PUA domain; the structures of three members have
been solved by crystallography (RlmI, 2as0, and
1wxx). The second branch, closely related to the
RlmI branch, exclusively contains Gram-positive
members exemplified by YwbD from Bacillus subtilis
and a single crystal structure (2b78). These proteins
Fig. 2. Stereo Cαsuperposition of RlmI with its close homologs from COG1092. RlmI (black), PH1915 (PDB ID 2as0;
yellow; RMSD of 1.8 Å for 373 Cαatoms; 34% sequence identity with RlmI), TTHA1280 (PDB ID 1wxx; cyan; RMSD of
2.8 Å for 373 Cαatoms; 38% sequence identity), SMU776 (PDB ID 2b78; green; RMSD of 2.7 Å for 371 Cαatoms; 28%
sequence identity), and AGC592 (PDB ID 2igt; orange; RMSD of 2.7 Å for 248 Cαatoms; 17% sequence identity).
655
Structure of YccW/RlmI
Page 5
share the PUA domain and the YLK motif with the
RlmIbranch,buttheyhaveAsn inmotif VIwherethe
presumably catalytic Cys residue of RlmI is located.
The third branch contains two proteins from Deino-
coccusradioduransandPseudomonasaeruginosa,bothof
which lack the PUA domain, have only the LK
dipeptide conserved in motif VIII, and have Asn in
motif VI instead of Cys.
The fourth and yet more distantly related branch
groups the C-terminal segments of RlmL (YcbY)
Fig. 3. Sequence alignment of the COG1092 family, including RlmI and its homologs of known structure. A full
version of this alignment is available in Supplementary Fig. 2; here, only individual members of each lineage are shown.
Amino acids are colored according to the similarity of their physicochemical properties. The secondary structural
elements for E. coli RlmI are shown above the sequence. Approximate boundaries of PUA and EEHEE domains and of the
β-hairpin structure at the N-terminus of the RFM domain are indicated below the alignment. Structural motifs that
characterize RFM domains (X and I–VIII) are also indicated.
656
Structure of YccW/RlmI
Page 6
together with their free-standing counterpart from
Neisseria meningitidis. RlmL contains two RFM
domains, where the C-terminal RlmI-like module
(theCOG1092part) is fused toanother MTase module
(member of COG0116) consisting of a putative N-
MTase domain23and a THUMP domain.24The
proteins in this fourth branch lack PUA, have Asn in
motif VI, and havepreserved residues and motifs that
are absent in other subfamilies. No member of the
fourth branch has yet been structurally characterized.
The fifth branch is comprised of proteins from α-
Proteobacteria, together with chlamydiae; they lack
PUA and have the least sequence conservation with
the central domains of other COG1092 members (this
branch is represented by the 2igt structure).
The dispersion of different taxa throughout the
COG1092 branches suggests that these proteins
evolved through multiple gene duplications and
losses. Particularly noteworthy are the recent dupli-
cations of an RlmI ortholog within Methanococcus
jannaschii (MJ1653 and MJ1649), and three paralogs
in Ps. aeruginosa and D. radiodurans. Establishing the
substrate specificities of these paralogs and relating
these to their sequence differences will give us a
clearer understanding of their evolution. Phyloge-
netic analysis suggests that different subfamilies
within COG1092 have preserved a common Ado-
Met-binding site while adapting to carry out distinct
reactions through the strongly diversified catalytic
part of their RFM domain. With this consideration in
mind, we mapped only the sequences most closely
related to RlmI, and thus presumably true orthologs,
onto the RlmI structure. The analysis revealed a
strongly conserved groove containing the putative
AdoMet-binding site and substrate-binding/cataly-
tic site (Fig. 5a). These putative sites on the RlmI
protein match well with the surface distribution of
electrostatic charges: negative charges are concen-
trated in the AdoMet-binding site (the cofactor has a
net positive charge), and a large surface of positive
charge is positioned to interact with the backbone of
the RNA substrate (Fig. 5b). The rest of the protein
surface, including the dimerization interface, shows
no particular groups of invariant or strongly con-
served residues. In the dimer, the positively charged
region spans both subunits, forming a very large
Fig. 4. A phylogenetic tree of the COG1092 family. Five main branches referred to in the text are shown as thick lines
and numbered. Sequences are indicated by their National Center for Biotechnology Information GI number, protein name
from the COG database, and abbreviated genus and species name (e.g., Esccol for E. coli). The PDB accession number is
indicated in cases where the three-dimensional structures are known. Values at the nodes indicate statistical support for
the particular branches, according to the bootstrap test. The most pertinent sequence features that distinguish different
families are indicated in brackets.
657
Structure of YccW/RlmI
Page 7
potentialRNA-bindingsite(Fig.5d)surroundingthe
two clefts containing conserved active sites (Fig. 5c).
PUA domain of RlmI
PUA domains have been implicated in several
functions, the most common of which is RNA
binding.25The architecture of the PUA domain
within the N-terminal region of RlmI is typical of
this type of structure and consists of two short
helices and a β-sheet with six mostly antiparallel
β-strands.18The topological similarities of known
PUA domains were analyzed using the DALI
server and revealed that the PUA domain of RlmI
is structurally similar not only to the N-terminal
domains of COG1092 orthologs (2as0, 1wxx, and
2b78) but also to the PUA domains of various
RNA-binding proteins such as a human homolog
of Nip7p (PDB code 1t5y; RMSD of 1.3 Å for 62 Cα
atoms), archeosine tRNA guanine transglycosylase
(PDB code 1it8; RMSD of 1.4 Å for 61 Cαatoms),
and pseudouridine synthase TruB (PDB code 1r3f;
RMSD of 1.9 Å for 49 Cαatoms). Superimposition
of PUA domains is shown in Supplementary Fig. 3.
Notably, RlmI shows no close structural similarity to
the PUA domain of RsmF,26which is also an E. coli
rRNA m5C MTase. The PUA domain of RlmI is more
similar to other known PUA domains, while the
PUA domain of RsmF has diverged markedly (data
not shown).
The sequence and structural variation in the PUA
domain probably reflect their function in recogniz-
ing distinct and specific substrates. In particular, the
sequence and structural divergence among the
COG1092 PUA domains suggest that they may
interact with different sequences within structurally
conserved rRNA substrates. To date, the crystal
structures of only three PUA-containing RNA
Fig. 5. Surface representations of RlmI. (a) RlmI in the surface representation, colored according to sequence
conservation among genuine RlmI orthologs. The color spectrum ranges from deep blue (invariant residues) to cyan
(conserved), to yellow/red (highly variable). A highly conserved blue patch at the bottom of the large cleft indicates the
cofactor-binding site and the catalytic site. (b) RlmI in the surface representation, colored according to the electrostatic
potential. The color spectrum ranges from deep red (−5 kT) to deep blue (+5 kT). The predicted binding pockets exhibit a
charge that is complementary to that of the ligands: the AdoMet-binding pocket is negatively charged (red), while the
predicted RNA-binding site is positively charged (blue). (c and d) The surface of the RlmI dimer.
658
Structure of YccW/RlmI
Page 8
modification enzymes have been solved in complex
with their RNA substrate: the transglycosylase
ArcTGT with full-length tRNAVal,27the pseudour-
idine synthase TruB with a tRNAPhefragment,28and
Cbf5 in complex with H/ACA RNA.29No structures
are presently available for rRNA-targeting PUA
MTases in complex with their substrate. The sub-
strate-recognition elements seem to be distributed
throughout the PUA domain18and include, in
particular, the Gly-containing loop located between
α-helix 1, β-strand 2, and residues from β-strand 6.
Thisstructural arrangementenables the PUA domain
to recognize and interact with a double-stranded
RNA stem, and variations within the sequences
of different PUA domains facilitate the recognition
of distinct RNAs (for a review, see Perez-Arellano
et al.18).
Central domain of RlmI
Searching the PDB database for structures homo-
logous to the central domain of RlmI predictably
reveals that the highest similarity is observed with
other members of COG1092 (including PDB IDs 2as0,
1wxx, 2b78, and 2igt) with RMSDs of 1.7–3.4 Å. An
interesting corollary is the structural similarity
between the central RlmI domain and the correspond-
ing region of RlmD (formerly RumA; PDB ID 2bh2;
RMSD of 2.5 Å for 67 Cαatoms); RlmD specifically
methylates 23S rRNA nucleotide m5U1939.30In
addition, there is similarity to domains in PH0793
fromP.horikoshii(PDBID2frn;RMSDof3.3Åfor77Cα
atoms),whichisanarchaealorthologoftheeukaryotic
tRNA modification enzyme TYW2 (Yml005w in
Saccharomyces cerevisiae). Notably, although TYW2
Fig. 6. The EEHEE domain is common to various RNA-modification enzymes. (a) 3c0k (E. coli RlmI); (b) 2igt; (c) 2bh2;
(d) 2frn. The catalytic RFM domain is shown in blue, the β-hairpin structure is shown in green, the EEHEE domain is
shown in red, and other nonconserved structural elements are shown in gray. AdoHcy and Fe-S cluster are shown in
black, and RNA is shown in magenta. This figure was prepared using PyMol [http://www.pymol.org].
659
Structure of YccW/RlmI
Page 9
acts very similarly to MTases, it transfers the α-amino-
α-carboxypropyl group of AdoMet (instead of the
methyl group) to the C-7 position of the hypermodi-
fied yW base (wybutosine).31Therefore, all enzymes
that possess a homolog of the central domain of RlmI
appear to be involved in RNA modification. Super-
positionof thesedomainsrevealsacommoncorewith
four extended β-strands and one α-helix (in RlmI β7↓
β8↑ α3 β9↑ β10↓) and variable elements in the N-
terminus. The 2frn domain corresponds particularly
well with this common core, suggesting that it repre-
sents the ancestral structural unit. We refer to this
domainasthe“EEHEEdomain,”whereErepresentsa
β-strand and H represents an α-helix.
Each variant of the EEHEE domain is structurally
stabilized in a different manner (Fig. 6); for instance,
in RlmD, this is achieved through a unique Fe–S
cluster in the variable N-terminal part. In RlmD, the
EEHEE domain is involved in specific recognition of
the RNA substrate, but the Fe–S cluster does not
directly participate in this process or in catalysis. In
RlmI, 2frn, and 2igt, the surface region of the EEHEE
domain, which in RlmD binds the RNA substrate, is
exposed at the 5′ side of the methylated base. This
regionoftherRNAsubstrateissingle-strandedinthe
RlmD–RNA complex, and this contrasts with its
structure within the ribosome.30The distribution of
electrostatic potential on the protein surfaces sug-
gests that a similar region of the EEHEE domain in
RlmI, 2frn, and 2igt has a high propensity to bind
RNA. This fits well with the bioinformatic predic-
tions of RNA-binding residues (details in Materials
and Methods) showing that the size of the cleft in all
three proteins is compatible with binding a single-
stranded—but not a double-stranded—RNA, and
leads us to speculate that they recognize their sub-
strates in a manner similar to that observed for
RlmD.30This would explain why RlmI acts prefer-
entially on naked rRNA (which can be more easily
unfolded), but not on rRNA that is already engaged
ininteractionswithribosomalproteins(r-proteins).15
Catalytic domain of RlmI
The C-terminal domain of RlmI was found to ex-
hibit the highest degree of structural similarity with
its orthologs from COG1092 (PDB codes 2as0, 1wxx,
2b78, and 2igt), as well as with a number of other
RFM enzymes, including PDB codes 2bh2 and 2frn
(Supplementary Table 1). All these structures pos-
sess an additional β-hairpin at the N-terminus (β11
and β12 in RlmI) that is also found in MTases such
as RsmC,32which modifies nitrogen atoms of nuc-
leic acid bases, or PrmC,33which modifies amino
acid side chains in proteins. However, with the ex-
ception of RlmI, this element has not been observed
in an m5C MTase. The β-hairpin structure provides
a scaffold for an aromatic residue that stacks upon
and stabilizes the target base. This residue has been
identified as F263 in the RlmD–RNA crystal
structure,30and Y205 probably fulfils this role in
RlmI (as illustrated by the docking model; see
below).
Thermodynamics of cofactor binding by RlmI
Attempts to cocrystallize RlmI with the methyl
group donor AdoMet cofactor [or the reaction pro-
duct S-adenosylhomocysteine (AdoHcy)] did not
revealelectrondensityfromwhichthecofactorcould
be modeled. However, the interaction of RlmI with
AdoMet and AdoHcy was experimentally observed
using isothermal titration calorimetry (ITC; Fig. 7),
and these experiments indicate a single binding site
for both compounds. The thermodynamic binding
parameters for RlmI were estimated as follows: for
AdoHcy: Ka=1.3×106M−1(±1.3×105), ΔH=
−9.6kcal/mol (±0.15),and N=0.96±0.01;forAdoMet,
Ka=3.4×105M−1(±1.9×104), ΔH=−3.2 kcal/mol
(±0.05), and N=1.01±0.01 (where Kais the association
constant, ΔH is the change in enthalpy, and N is the
number of binding sites). RlmI thus exhibits a higher
binding affinity for AdoHcy than for AdoMet, con-
sistent with most of the RFM enzymes studied.
Preliminary model of the RlmI–ligand complex
In order to identify potential enzyme–ligand inter-
actions, we constructed a model of the RlmI–
AdoMet–RNA complex. First, a complex of RlmI–
AdoMet was modeled by copying the AdoHcy
coordinates from the superimposed 2cww structure,
by adding a methyl group, and by energy-minimiz-
ing the structure. The model agrees with the align-
ment-based prediction that N226, D250, and D279
are likely to coordinate the methionine, ribose, and
adenine moieties of the cofactor (Fig. 8). Subse-
quently, we attempted computational docking of the
RNA substrate with a number of low-resolution and
high-resolution methods (e.g., GRAMM and HAD-
DOCK). No reasonable solutions were obtained
when the RNAwas constrained in the vicinity of the
target base to maintain the secondary structure
found in the ribosome; computational removal of r-
proteins and placement of the target base into the
active site of the enzyme did not improve the result.
A different modeling approach was therefore
taken using as reference RlmD, the most closely
related MTase for which an RNA–complex structure
is available.30A hybrid was constructed by super-
imposing the RlmD and RlmI EEHEE domains and
by transferring the 5′ part of the RNA substrate from
the RlmD–RNA structure, taking care to avoid steric
clashes with RlmI. This model was then used to infer
the RNA-binding site on RlmI and the size of the
RNA that would be involved in the interaction. The
pentanucleotide AGACC, corresponding to 23S
rRNA nucleotides 1958–1962 where 3′-C is the
RlmI target nucleotide, was chosen. A well-packed
model was obtained by docking the pentanucleotide
onto RlmI while maintaining the same orientation as
in the RlmD–RNA complex (Fig. 8). A pentanucleo-
tide used for docking is too short to extend beyond a
single subunit in the RlmI dimer. Thus, although we
cannot exclude that the native RNA substrate makes
contactswithbothsubunits,thecurrentmodelreflects
only protein–RNA contacts within the immediate
660
Structure of YccW/RlmI
Page 10
vicinity of only one active site. Although speculative,
this model agrees with the key experimental observa-
tion thatRlmIrequires anakedRNA substratewhose
conformation is not constrained by r-proteins.15We
predict that RlmI unfolds its substrate RNA into a
single-strandedconformationthatflipsC1962intothe
active site, in a manner similar to the mechanism
employed by RlmD.30By analogy with RNA sub-
strate binding by RlmD, the RlmI–RNA interaction
would involve R64 from the PUA domain of RlmI,
R103 and Q127 from its EEHEE domain, and K201
and Y387fromtheRFM domain.ResidueY205is also
likely to be involved by stacking onto and stabilizing
the target cytosine. Residue D300 from motif IV is
conserved in many RFM enzymes acting on various
molecules, where it typically functions in stabilizing
the cofactor and the substrate.4
Based on our model, we predict that RlmI carries
out the first step of the methylation reaction by
forming a covalent intermediate between residue
Cys340 and the target base C1962. Following
attachment of the methyl group, a general base is
required to abstract the proton at C5 and to resolve
the covalent complex. However, RlmI lacks the
counterpart of conserved residues postulated to
function as the general base in functionally related
enzymes. It possesses neither the second Cys
residue in motif IV as in RNA:m5C MTases
RsmB34and RsmF26nor the second carboxylate
residue equivalent, for example, to D363 in RlmD30
(Supplementary Fig. 4). The only conserved carbox-
ylate in the active site of RlmI, other than on D300,
is at D207; however, this residue is located ∼10 Å
from the predicted position of the C5 atom. In the
absence of other appropriate candidates, we postu-
late that D300 may function as a general base.
Alternatively, abstraction of the proton may be
carried out by a hydroxyl ion from the solvent, at
10− 7mole fraction (pH 7), as postulated for the
DNA:m5C MTase HhaI based on extensive compu-
ter simulations.35
Evolutionary implications for C5
pyrimidine MTases
In this article, we have reported the crystal struc-
ture of RlmI and the preliminary model of its com-
plex with AdoMet and the RNA substrate. Each of
the three domains of RlmI has structural homologs
that have been functionally characterized in other
proteins, although, in its entirety, the RlmI structure
is most similar to proteins that still await functional
classification. Through comparisons with the RlmD–
RNA complex, we have identified an EEHEE do-
Fig. 7. ITC profiles. ITC spectra for RlmI titrated against the cofactor AdoMet (left) and the cofactor product AdoHcy
(right). The raw ITC data for injections of AdoMet and AdoHcy into the sample cell containing RlmI are indicated in the
upper panels of ITC profiles (left) and (right), respectively (with the baseline subtracted). The peaks were normalized to
the ligand:protein molar ratio and were integrated as shown in the bottom panels. Solid dots indicate the experimental
data, and their best fit was obtained from a nonlinear least squares method using a one-site binding model depicted by a
continuous line.
661
Structure of YccW/RlmI
Page 11
main in RlmI, which is a structure that RlmD uses to
specifically recognize its RNA substrate. The pre-
sence of the EEHEE domain, in addition to the shape
and chargedistributionof the substrate-bindingcleft
inRlmI,indicatesthatthemechanismbywhichRlmI
recognizes its substrate may be similar to that of
RlmD.
The comparative sequence and structural analyses
presented heredemonstrate a clear evolutionary rela-
tionship between the DNA and RNA C5 pyrimidine
MTases. It was speculated previously that RNA:5-
methyluridine (m5U) MTases are an evolutionary
intermediate for RNA:m5C and DNA:m5C MTases.36
Sincethen,newstructureshavebeensolvedforRNA:
m5C MTases and the DNA:m5C MTase complex of
DNMT3a and DNM3L.37Moreover, the function of
the putative DNA:m5C MTase DNMT2 has been
reassignedasatRNA:m5CMTase.38Finally,RlmIhas
been shown to be an RNA:m5C MTase and not an
RNA:m5U MTase;15this latter observation presum-
ably also counts for the COG1092 homologs most
closely related to RlmI. Using these new data, we
investigated the relationship between pyrimidine C5
MTases using structural comparisons. We calculated
evolutionary distances based on the structural diver-
gence of catalytic domains of all known pyrimidine
C5 MTase structures and their homologs (not all of
which have been functionally characterized). As an
outgroup, we used structures of two N-MTases:
RsmC, which acts on RNA,32and PrmC, which me-
thylates proteins.33A distance matrix of Q(h) scores
calculated from optimal pairwise comparisons has
beenusedasinputfortheneighbor-joiningalgorithm;
the resulting tree is shown in Fig. 9.
The structure-based phylogenetic analysis, while
consistent with the hypothesis that m5C MTases
most likely evolved with m5U MTases from a com-
mon ancestor, also reveals new details. The most
parsimonious scenario inferred from the current tree
is that the ancestral MTase, which predated all
enzymes analyzed herein, contained an N-terminal
β-hairpin. This hairpin was not previously observed
in enzymes of this type because it has been lost in the
RNA:m5CMTases(e.g.,RsmB)andalltheDNA:m5C
MTases that have been characterized to date. This
loss most likely resulted from steric incompatibility
between the β-hairpin and the new substrate-
binding domains that replaced EEHEE in the lineage
containing RNA:m5C MTases RsmB and RsmF and
all DNA:m5C MTases. Finally, DNMT2 is rather
remotelyrelatedtoallotherRNA:m5CMTasesandis
the only enzyme of this type in a branch that other-
wise exclusively contains DNA:m5C MTases. In con-
clusion, we predict that a common ancestor of these
enzymes was most likely active on DNA, but could
have also retained a latent ability to act on RNA,
which has been later restored in DNMT2 (Fig. 9).
With a continuous influx of new data, phyloge-
netic inferences are constantly being revised; hence,
the current scenario, although much more detailed
than the one proposed previously, remains a work-
ingmodelandwillbesubjectedtofurtherrefinement
as new data on structures and functions of related
enzymes become available. In this context, the func-
tional characterization of those members of
COG1092 lacking Cys in motif VI will be of
particular interest. The most obvious candidates for
such analyses appear to be the C-terminal domain of
RlmL, the SMU776 protein (2b78), and the AGC592
protein (2igt).
Materials and Methods
Protein expression, purification, and
crystallization
The yccW (RlmI) gene within the pCA24N vector that
encodes a noncleavable N-terminal (His)6 tag was
obtained from the ASKA recloned library (E. coli; NBRP
NIG, Japan). This plasmid was used to transform the BL21
(DE3) strain of E. coli for protein expression. The E. coli
cells were cultured in 1 L of LB medium at 37 °C and
induced with 100 μM IPTG when the optical density at
600 nm was 0.5–0.6. The culture was then allowed to grow
overnight at 20 °C. After the cell pellet had been harvested,
it was resuspended in 40 ml of lysis buffer [50 mM Tris–
HCl (pH 8.8), 200 mM NaCl, 5% (vol/vol) glycerol, 0.5%
Fig. 8. Docking model of RlmI with AdoMet and the
RNA substrate. The protein backbone is shown as a gray
Cαtrace. The RNA is shown in yellow, with the target
C1962 residue shown in orange. AdoMet is shown in
magenta. Conserved residues predicted to be important
for RNA binding, AdoMet binding, and catalysis are
shown in blue, green, and red, respectively.
662
Structure of YccW/RlmI
Page 12
(vol/vol) Triton X-100, and 5 mM β-mercaptoethanol
containing 1 Complete™ ethylenediaminetetraacetic acid
free protease inhibitor tablet (Roche Diagnostics)]. The cell
lysate obtained after sonication was centrifuged at
18,000 rpm for 30 min at 4 °C (JA-25.50 fixed-angle rotor,
Beckman Coulter centrifuge). The supernatant was left on
a Ni-NTA (Qiagen) agarose column for 1 h at 4 °C and was
subsequently washed with wash I [50 mM Tris–HCl
(pH 8.8), 200 mM NaCl, 5% (vol/vol) glycerol, 0.5% (vol/
vol) Triton X-100, 5 mM β-mercaptoethanol, and 5 mM
imidazole], wash II (50 mM Tris–HCl (pH 8.8), 1 M NaCl,
5% (vol/vol) glycerol, 10 mM imidazole, and 5 mM β-
mercaptoethanol], and wash III (50 mM Tris–HCl (pH 8.8),
200 mM NaCl, 5% (vol/vol) glycerol, 5 mM β-mercap-
toethanol, and 10 mM imidazole]. Protein was eluted from
the affinity column with 50 mM Tris–HCl (pH 8.8),
200 mM NaCl, 5% (vol/vol) glycerol, 5 mM β-mercap-
toethanol, and 200 mM imidazole. The protein was loaded
onto a Superdex 75 column (Amersham Biosciences)
equilibrated with 50 mM Tris–HCl (pH 8.8), 200 mM
NaCl, 5% (vol/vol) glycerol, 5 mM β-mercaptoethanol,
and 10 mM MgCl2. The peak fractions were collected and
concentrated by ultrafiltration to a final concentration of
4 mg/ml.
SelMet-substituted RlmI was obtained by growing cells
under conditions of endogenous methionine synthesis
inhibition in M9 medium.39SelMet protein was purified as
described for the native protein. The presence of SelMet in
the protein was verified by matrix-assisted laser deso-
rption ionization time-of-flight mass spectrometry.
Crystallization conditions for the protein were screened
using Hampton Research Screens (screens 1 and 2) using
the hanging-drop vapor-diffusion technique. Crystals of
the native protein were observed with 30% (wt/vol)
polyethylene glycol 4000, 0.1 M Tris–HCl (pH 8.5), and
0.2 M sodium acetate (Hampton Research screen 1,
condition 22). Small diamond-shaped crystals formed
within 4–5 days at a protein concentration of 4 mg/ml and
grew to diffraction quality after a week. The SelMet
crystals formed under conditions similar to those of the
native crystals.
Data collection, structure solution, and
refinement
Thecrystalsweredirectlytakenfromthedropandflash-
cooled in a N2cold stream at 100 K. The native crystals
diffracted up to 2.3 Å resolution using an R-axis 1V++
image plate detector mounted on a RU-H3RHB rotating
anode generator (Rigaku Corp., Tokyo, Japan). Synchro-
tron data were collected at beamlines X12C and X29
(National Synchrotron Light Source, Brookhaven National
Laboratory) for the SelMet protein. A complete multi-
wavelength anomalous dispersion dataset was collected
(Table 1) using Quantum 4-CCD detector (Area Detector
Systems Corp., Poway, CA, USA) up to 2.0 Å resolution.
Data were processed and scaled using the program
HKL2000.40
The BnP program located16 of the expected 20 selenium
sites in the asymmetric unit.41The initial phases were
further improved by density modification using
RESOLVE.42Density modification was followed by itera-
tive model building and refinement until approximately
90% of the molecule had been built with side chains. The
remainder was built manually using the program Coot.43
Further cycles of model building alternating with refine-
ment using the program CNS19resulted in the final model,
with an R-factor of 0.194 (Rfree=0.242) refined to 2.0 Å
resolution with no σ cutoff. The final model comprises 790
residues (chains A and B) (Ser2-Met396) and 463 water
molecules. MolProbity44analysis does not show any
residues other than glycine in the disallowed regions of
the Ramachandran plot (Table 1). Figure 10 shows the
representative electron density maps.
Isothermal titration calorimetry
The protein was extensively dialyzed (for about 14 h)
against a 500-fold excess volume of the buffer containing
20 mM Na-Hepes (pH 8.0), 5% (vol/vol) glycerol, 10 mM
MgCl2, 0.2 M NaCl, and 10 mM DTT for the titration
experiments. AdoMet (Sigma) and S-adenosyl-L-
Fig. 9. Evolutionary tree of pyrimidine C5 MTases (with N-MTases as an outgroup). Colors indicate chemically similar
reaction products, with gray reserved for proteins that have apparently lost their enzymatic function or whose func-
tionality is uncertain. Broken lines indicate proteins that are functionally uncharacterized and lack functionally
characterizedclosehomologs.Arrowswithlettersindicatepredictedtimingofthefollowingevents:(A)appearanceofCys
residueinmotifVI,developmentoftheC-MTaseactivityontheN-MTasescaffold;(B)N-terminalfusionwiththe“EEHEE”
domain; (C) loss of N-terminal β-hairpin, appearance of additional Cys residue in motif IV; (D) disappearance of Cys
residuefrommotifVI,circularpermutation:theN-terminalhelixistransferredtotheC-terminus;(E)switchinDNAMTase
specificity to act on RNA; (F) inactivation of the ancestor of DNMT3L as a DNA MTase to become a regulatory subunit.
663
Structure of YccW/RlmI
Page 13
homocysteine (AdoHcy; Sigma) solutions were prepared
in the same buffer. ITC experiments were carried out using
a VP-ITC calorimeter (Microcal, LLC) at 20 °C using
0.02 mM protein in the sample cell and 0.4 mM AdoMet or
AdoHcy in the injector. All samples were thoroughly
degassed and then centrifuged to remove precipitates.
Four-microliter volumes per injection were used for the
different experiments. Consecutive injections were sepa-
rated by at least 5 min to allow the peak to return to
baseline. ITC data were analyzed with a single-site fitting
model using Origin 7.0 (OriginLab Corp.) software.
Analytical ultracentrifugation
The oligomeric state of RlmI was investigated using a
BeckmanCoulterXL-Ianalyticalultracentrifugefittedwitha
four-holeAN-50Tirotoranddouble-sectorcenterpieces.The
reference sector was loaded with 0.43 ml of Tris–HCl
(pH 8.8). The sample sector contained 0.41 ml of the protein
sample to be analyzed, at a concentration of 1.0 mg/ml (in
the same buffer). The sedimentation velocity profiles were
collected by monitoring the absorbance at 280 nm as the
samples were centrifuged at 40,000 rpm at 20 °C. Multiple
scans at differenttime points werefitted toa continuoussize
distribution by using the SEDFIT program.45
Bioinformatics methods
Searches for similar structures in PDB were carried out
withDALI.20Multiple-sequence alignmentofthe COG1092
protein family was calculated using PROMALS46with
default parameters, followed by manual correction based
on the result of structure superposition. Mapping of se-
quence conservation onto the model was performed via
the COLORADO3D server,47using the Rate4Site method,
with the JTT substitution matrix and ML model for rate
inference. Prediction of RNA-binding residues was carried
out using RNABindR48Protein–RNA docking was per-
formed with HADDOCK.49Figures were rendered using
PyMol‡ and ESPript.50,51
Fig. 10. Electron density maps.
(a) Stereo view of the experimental
electron density map (RESOLVE
density-modified map). The map is
contoured at a level of 1.0σ. (b)
Simulated annealing Fo−Fc omit
map in the active-site region of
RlmI. Ser339 and all atoms within
2.5 Å were omitted prior to refine-
ment.Thecysteineintheactivesiteis
labeled. The map is contoured at a
level of 3.0σ. These figures were
prepared using PyMol [http://
www.pymol.org].
‡http://www.pymol.org
664
Structure of YccW/RlmI
Page 14
A phylogenetic tree based on sequences was calculated
from the sequence alignment with MEGA4,52using the
neighbor-joining method, with the JTT model of substitu-
tions and pairwise deletions. The stability of individual
nodes was calculated using the bootstrap test (1000
replicates). A phylogenetic tree based on structures (for
selected RFM proteins) was calculated using the metho-
dologydevelopedbyO'DonoghueandLuthey-Schulten.53
First, a multiple structure alignment was calculated for a
set of related structures; then, for each pair of structures, a
measure of structural similarity Q(h) was calculated. Q(h)
considers the structural distances of both the aligned core
of structures and the unaligned variable regions by
estimating the effects of gaps and how insertions perturb
the aligned core structure, according to the following
formula: Q(h)=X−1(qaln+qgap), where X is normalization
index, qalncounts structurally aligned regions by comput-
ing the unnormalized fraction of Cα–Cαpairs distances
that are similar between two aligned structures, and qgap
corresponds to structural deviations introduced by inser-
tionsineachproteininanalignedpair.Apairwisedistance
matrix based on Q(h) scores was used as an input to
compute a phylogenetic tree using the neighbor-joining
approach.
PDB accession code
Coordinates and structure factors have been deposited
with the RCSB PDB54with the code 3C0K.
Acknowledgements
WearegratefultotheBiomedicalResearchCouncil
of Singapore, A⁎STAR for providing A research
grant (R154000362305) in support of this study. We
thank Dr. Anand Saxena for assistance with data
collection and Irina Tuszynska for assistance with
protein–RNA docking. The authors also acknowl-
edge the assistance of Jobichen Chacko and Nilofer
Husain in the AUC experiments. Data for this study
were measured at beamlines X12C and X29 of the
National Synchrotron Light Source, Brookhaven
National Laboratory. J.M.B., K.L.T., J.M.K., and E.P
were supported by grant N301 2396 33 from the
Polish Ministry of Science; K.L.T. was additionally
supported by a PhD grant (N301 105 32/3599). S.D.
gratefully acknowledges support from the Danish
Research Agency (FNU rammebevilling 272-07-
0613), the Nucleic Acid Center of the Danish Grund-
forskningsfond, and NAC-DRUG under the FP6
Marie Curie Initial Training Networks (to E.P.). We
thank ASKA recloned library (E.coli; NBRP, NIG,
Japan) for the RlmI clone. S.Sunita isa PhD student in
receipt of a research scholarship from the National
University of Singapore.
Supplementary Data
Supplementary data associated with this article
can be found, in the online version, at doi:10.1016/
j.jmb.2008.08.062
References
1. Martin,J.L.&McMillan,F.M.(2002).SAM(dependent)
I AM: the S-adenosylmethionine-dependent methyl-
transferase fold. Curr. Opin. Struct. Biol. 12, 783–793.
2. Schubert, H. L., Blumenthal, R. M. & Cheng, X. (2003).
Many paths to methyltransfer: a chronicle of conver-
gence. Trends Biochem. Sci. 28, 329–335.
3. Loenen,W.A.(2006).S-Adenosylmethionine:jackofall
trades and master of everything? Biochem. Soc. Trans.
34, 330–333.
4. Kozbial, P. Z. & Mushegian, A. R. (2005). Natural his-
tory of S-adenosylmethionine-binding proteins. BMC
Struct. Biol. 5, 19.
5. Bujnicki,J.M.(1999).Comparisonofproteinstructures
revealsmonophyleticoriginoftheAdoMet-dependent
methyltransferase family and mechanistic conver-
gence rather than recent differentiation of N4N6-
adenine DNA methylation. In Silico Biol. 1, 175–182.
6. Rozenski, J., Crain, P. F. & McCloskey, J. A. (1999). The
RNA modification database: 1999 update. Nucleic
Acids Res. 27, 196–197.
7. McCloskey, J. A. & Rozenski, J. (2005). The small
subunit rRNA modification database. Nucleic Acids
Res. 33, D135–D138.
8. Dunin-Horkawicz, S., Czerwoniec, A., Gajda, M. J.,
Feder, M., Grosjean, H. & Bujnicki, J. M. (2006).
MODOMICS: a database of RNA modification path-
ways. Nucleic Acids Res. 34, D145–D149.
9. Björk, G. R., Durand, J. M., Hagervall, T. G., Leipu-
viene, R., Lundgren, H. K., Nilsson, K. et al. (1999).
Transfer RNA modification: influence on translational
frameshifting and metabolism. FEBS Lett. 452, 47–51.
10. Hopper, A. K. & Phizicky,E. M. (2003). tRNA transfers
to the limelight. Genes Dev. 17, 162–180.
11. Lapeyre, B. (2005). Conserved ribosomal RNA modi-
fication and their putative roles in ribosome bioge-
nesis and translation. In Fine-Tuning of RNA Functions
by Modification and Editing (Grosjean, H., ed), Fine-
Tuning of RNA Functions by Modification and Editing,
vol. 12, pp. Springer-Verlag, Berlin.
12. Chow, C. S., Lamichhane, T. N. & Mahto, S. K. (2007).
Expanding the nucleotide repertoire of the ribosome
with post-transcriptional modifications. ACS Chem.
Biol. 2, 610–619.
13. Das, G., Thotala, D. K., Kapoor, S., Karunanithi, S.,
Thakur, S. S., Singh, N. S. & Varshney, U. (2008). Role
of 16S ribosomal RNA methylations in translation
initiation in Escherichia coli. EMBO J. 27, 840–851.
14. Douthwaite, S., Fourmy, D. & Yoshizawa, S. (2005).
Nucleotide methylations in rRNA that confer resis-
tance to ribosome-targeting antibiotics. In Fine-Tuning
of RNA Functions by Modification and Editing (Grosjean,
H., ed), Fine-Tuning of RNA Functions by Modification
andEditing,vol.12,pp.287–309.Springer-Verlag,Berlin.
15. Purta, E., O'Connor, M., Bujnicki, J. M. & Douthwaite,
S. (2008). YccW is the m5C methyltransferase specific
for 23S rRNA nucleotide 1962. J. Mol. Biol. 383,
641–651; JMB-D-08-00572.
16. Lesnyak, D. V., Sergiev, P. V., Bogdanov, A. A. &
Dontsova, O. A. (2006). Identification of Escherichia
coli m2G methyltransferases: I. The ycbY Gene
Encodes a Methyltransferase Specific for G2445 of
the 23S rRNA. J. Mol. Biol. 364, 20–25.
17. Hendrickson, W. A., Horton, J. R. & LeMaster, D. M.
(1990). Selenomethionyl proteins produced for analy-
sis bymultiwavelength anomalous diffraction (MAD):
a vehicle for direct determination of three-dimen-
sional structure. EMBO J. 9, 1665–1672.
665
Structure of YccW/RlmI
Page 15
18. Perez-Arellano, I., Gallego, J. & Cervera, J. (2007). The
PUA domain—a structural and functional overview.
FEBS J. 274, 4972–4984.
19. Brünger, A. T., Adams, P. D., Clore, G. M., DeLano,
W. L., Gros, P., Grosse-Kunstleve, R. W. et al. (1998).
Crystallography and NMR system: a new software
suite for macromolecular structure determination.
Acta Crystallogr. Sect. D, 54 (Pt 5), 905–921.
20. Holm, L. & Sander, C. (1993). Protein structure com-
parison by alignment of distance matrices. J. Mol. Biol.
233, 123–138.
21. Sun, W., Xu, X., Pavlova, M., Edwards, A. M.,
Joachimiak, A., Savchenko, A. & Christendat, D.
(2005). The crystal structure of a novel SAM-depen-
dent methyltransferase PH1915 from Pyrococcus hor-
ikoshii. Protein Sci. 14, 3121–3128.
22. Pioszak,A.A.,Murayama,K.,Nakagawa,N.,Ebihara,
A.,Kuramitsu,S.,Shirouzu,M.&Yokoyama,S.(2005).
Structures of a putative RNA 5-methyluridine methyl-
transferase, Thermus thermophilus S-adenosyl-L-homo-
cysteine. Acta Crystallogr. Sect. F, 61, 867–874.
23. Bujnicki, J. M. (2000). Phylogenomic analysis of 16S
rRNA:(guanine-N2) methyltransferases suggests new
family members and reveals highly conserved motifs
and a domain structure similar to other nucleic acid
amino-methyltransferases. FASEB J. 14, 2365–2368.
24. Aravind, L. & Koonin, E. V. (2001). THUMP—a pre-
dicted RNA-binding domain shared by 4-thiouridine,
pseudouridine synthases and RNA methylases. Trends
Biochem. Sci. 26, 215–217.
25. Aravind, L. & Koonin, E. V. (1999). Novel predicted
RNA-binding domains associated with the translation
machinery. J. Mol. Evol. 48, 291–302.
26. Hallberg, B. M., Ericsson, U. B., Johnson, K. A.,
Andersen, N. M., Douthwaite, S., Nordlund, P. et al.
(2006).ThestructureoftheRNAm5Cmethyltransferase
YebU from Escherichia coli reveals a C-terminal RNA-
recruiting PUA domain. J. Mol. Biol. 360, 774–787.
27. Ishitani, R., Nureki, O., Nameki, N., Okada, N.,
Nishimura, S. & Yokoyama, S. (2003). Alternative ter-
tiary structure of tRNA for recognition by a post-
transcriptionalmodificationenzyme.Cell,113,383–394.
28. Hoang, C. & Ferre-D'Amare, A. R. (2001). Cocrystal
structure of a tRNA Psi55 pseudouridine synthase:
nucleotide flipping by an RNA-modifying enzyme.
Cell, 107, 929–939.
29. Liang, B., Xue, S., Terns, R. M., Terns, M. P. & Li, H.
(2007). Substrate RNA positioning in the archaeal H/
ACA ribonucleoprotein complex. Nat. Struct. Mol.
Biol. 14, 1189–1195.
30. Lee, T. T., Agarwalla, S. & Stroud, R. M. (2005). A
unique RNA fold in the RumA–RNA–cofactor ternary
complex contributes to substrate selectivity and
enzymatic function. Cell, 120, 599–611.
31. Noma, A., Kirino, Y., Ikeuchi, Y. & Suzuki, T. (2006).
Biosynthesis of wybutosine, a hyper-modified nucleo-
side in eukaryotic phenylalanine tRNA. EMBO J. 25,
2142–2154.
32. Sunita, S., Purta, E., Durawa, M., Tkaczuk, K. L.,
Swaathi, J., Bujnicki, J. M. & Sivaraman, J. (2007).
Functional specialization of domains tandemly dupli-
cated within 16S rRNA methyltransferase RsmC.
Nucleic Acids Res. 35, 4264–4274.
33. Schubert, H. L., Phillips, J. D. & Hill, C. P. (2003).
Structures along the catalytic pathway of PrmC/
HemK, an N5-glutamine AdoMet-dependent methyl-
transferase. Biochemistry, 42, 5592–5599.
34. Foster, P. G., Nunes, C. R., Greene, P., Moustakas, D. &
Stroud, R. M. (2003). The first structure of an RNA
m5C methyltransferase, Fmu, provides insight into
catalytic mechanism and specific binding of RNA
substrate. Structure (Cambridge), 11, 1609–1620.
35. Zhang, X. & Bruice, T. C. (2006). The mechanism of
M.HhaI DNAC5cytosinemethyltransferaseenzyme:a
quantum mechanics/molecular mechanics approach.
Proc. Natl Acad. Sci. USA, 103, 6148–6153.
36. Bujnicki, J. M., Feder, M., Ayres, C. L. & Redman,
K. L. (2004). Sequence–structure–function studies of
tRNA:m5U methyltransferases. Nucleic Acids Res. 32,
2453–2463.
37. Jia, D., Jurkowska, R. Z., Zhang, X., Jeltsch, A. &
Cheng, X. (2007). Structure of Dnmt3a bound to
Dnmt3L suggests a model for de novo DNA methyla-
tion. Nature, 449, 248–251.
38. Goll, M. G., Kirpekar, F., Maggert, K. A., Yoder, J. A.,
Hsieh, C. L., Zhang, X. et al. (2006). Methylation of
tRNAAsp by the DNA methyltransferase homolog
Dnmt2. Science, 311, 395–398.
39. Doublie, S. (1997). Preparation of selenomethionyl
proteins for phase determination. Methods Enzymol.
276, 523–530.
40. Otwinowski, Z. & Minor, W. (1997). Processing of X-
ray diffraction data collected in oscillation mode.
Methods Enzymol. 276, 307–326.
41. Weeks, C. M. & Miller, R. (1999). The design and imple-
mentation of SnB v2.0. J. Appl. Crystallogr. 32, 120–124.
42. Terwilliger, T. C. (2000). Maximum-likelihood density
modification. Acta Crystallogr. Sect. D, 56, 965–972.
43. Emsley, P. & Cowtan, K. (2004). Coot: model-building
tools for molecular graphics. Acta Crystallogr. Sect. D,
60, 2126–2132.
44. Davis, I. W., Leaver-Fay, A., Chen, V. B., Block, J. N.,
Kapral, G. J., Wang, X. et al. (2007). MolProbity: all-
atom contacts and structure validation for proteins
and nucleic acids. Nucleic Acids Res. 35, W375–W383.
45. Schuck, P. (2000). Size-distribution analysis of macro-
molecules by sedimentation velocity ultracentrifugation
andLAMMequationmodeling.Biophys.J.78,1606–1619.
46. Pei, J. & Grishin, N. V. (2007). PROMALS: towards
accurate multiple sequence alignments of distantly
related proteins. Bioinformatics, 23, 802–808.
47. Sasin, J. M. & Bujnicki, J. M. (2004). COLORADO3D, a
web server for the visual analysis of protein struc-
tures. Nucleic Acids Res. 32, W586–W589.
48. Terribilini,M.,Sander,J.D.,Lee,J.H.,Zaback,P.,Jernigan,
R. L., Honavar, V. & Dobbs, D. (2007). RNABindR: a
server for analyzing and predicting RNA-binding sites in
proteins. Nucleic Acids Res. 35, W578–W584.
49. van Dijk, M., van Dijk, A. D., Hsu, V., Boelens, R. &
Bonvin, A. M. (2006). Information-driven protein–
DNA docking using HADDOCK: it is a matter of
flexibility. Nucleic Acids Res. 34, 3317–3325.
50. DeLano, W. L. (2002). The PyMOL Molecular Graphics
System. DeLano Scientific, San Carlos, CA, USA.
51. Gouet, P., Robert, X. & Courcelle, E. (2003). ESPript/
ENDscript: extracting and rendering sequence and 3D
information from atomic structures of proteins.
Nucleic Acids Res. 31, 3320–3323.
52. Tamura, K., Dudley, J., Nei, M. & Kumar, S. (2007).
MEGA4: Molecular Evolutionary Genetics Analysis
(MEGA)softwareversion4.0.Mol.Biol.Evol.24,1596–1599.
53. O'Donoghue, P. & Luthey-Schulten, Z. (2005). Evolu-
tionary profiles derived from the QR factorization of
multiple structural alignments gives an economy of
information. J. Mol. Biol. 346, 875–894.
54. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G.,
Bhat, T. N., Weissig, H. et al. (2000). The Protein Data
Bank. Nucleic Acids Res. 28, 235–242.
666
Structure of YccW/RlmI