The chemistries and consequences of DNA and RNA methylation and demethylation
Franziska R. Traube and Thomas Carell
Department of Chemistry, Ludwig-Maximilians-Universit€
at M€
unchen, Butenandtstrasse, Munich, Germany
Received 10 January 2017
Revised 4 April 2017
Accepted 6 April 2017
Chemical modification of nucleobases plays an important role for the control of gene expression on
different levels. That includes the modulation of translation by modified tRNA-bases or silencing and
reactivation of genes by methylation and demethylation of cytosine in promoter regions. Especially
dynamic methylation of adenine and cytosine is essential for cells to adapt to their environment or for the
development of complex organisms from a single cell. Errors in the cytosine methylation pattern are
associated with most types of cancer and bacteria use methylated nucleobases to resist antibiotics. This
Point of View wants to shed light on the known and potential chemistry of DNA and RNA methylation and
demethylation. Understanding the chemistry of these processes on a molecular level is the first step
towards a deeper knowledge about their regulation and function and will help us to find ways how
nucleobase methylation can be manipulated to treat diseases.
Cytosine modifications; DNA
modifications; Epigenetics;
methyltransferases; RNA
modifications; TET enzymes
Since the discovery of (deoxy)adenosine (dA, A), (deoxy)cyti-
dine (dC, C), (deoxy)guanosine (dG, G), (deoxy)thymidine (dT,
T) and uracil (U) in the early 20th century as the information
carrying building blocks, which form the basis for RNA and
DNA, various modifications of these nucleosides were discov-
ered (Fig. 1).
Particularly in transfer-RNA (tRNA) but also in
rRNA (rRNA), modified bases are central elements, needed to
fine tune the translation of the genetic code.
In rRNA of bacte-
rial pathogens, many methylated bases are present to block
binding of small molecules that work as translation inhibitors,
resulting in a resistance against antibiotics such as aminoglyco-
More recently it was discovered that also mRNA
(mRNA) contains modified bases. Although it is not yet fully
understood what the function of these bases are, it was revealed
that the modification chemistry is to some extent reversible.
This suggests that the modification and de-modification chemis-
try has a novel and yet unexplored regulatory function. In this
regard N6-methylated adenine (m6A) is the best analyzed modi-
fication, but most recently also the reversible formation of N6,
C20-dimethyl adenine (m6Am) was discovered. According to
current knowledge, reversible chemistry on modified RNA bases
is limited to methyl groups, which are introduced by methyl-
transferases and removed by demethylases. DNA, in contrast, as
the prime carrier of genetic information in the biosphere, is
structurally less complex and only few modified bases are
known. Most prominent is the methylated base 5-methyl deoxy-
cytosine (5mdC). Ideas about the potential chemistry of methyl-
ations and demethylation are the focus of this review. For other
aspects, the following excellent reviews can be consulted.
5mdC is the most abundant modified base in genomic DNA
of eukaryotes and also present in the DNA of prokaryotes.
In mammals, 5mdC typically reaches global levels between 1
and 5% in genomic DNA.
Methylated adenine (6mA),
which is the DNA equivalent to m6A in RNA, is another DNA
modification that is under intensive investigation at the
moment. Whereas 6mA is a well-characterized modification in
bacterial DNA, its presence was only recently shown in several
higher eukaryotic organisms.
In Caenorhabditis elegans,
where 5mdC is not detectable, 6mA is dynamically regulated
and linked to other epigenetic marks
and in early embryos of
Drosophila melanogaster, 6mA levels are high, but decrease fast
during development, resulting in very low 6mA levels in adult
In the unicellular green alga Chlamydomonas rein-
hardtii, 6mA was discovered in 84% of the genes, where it is
mainly located at transcription start sites.
Recently, it was
reported that mammalian DNA, including human and mouse,
also contains 6mA.
There, 6mA seems to be distributed
across the genome, but absent in gene exons,
and 6mA-
demethylation in mouse embryonic stem cell (mESC) DNA
was shown to correlate with ALKBH1 depletion.
These find-
ings question the previous paradigm that DNA modifications
in mammalian genome are limited to cytosine residues. How-
ever, when our group tried to confirm these results by a novel
ultrasensitive UHPLC-MS method, we were not able to detect
6mA in mESC DNA or DNA from mouse tissue, whereas Chla-
mydomonas DNA, which served as a positive control, delivered
the expected positive result.
These observations suggest that
6mA might be present at defined time points in mammalian
DNA, but is not an epigenetic mark. In the coming years, the
question whether 6mA is a relevant modification in mamma-
lian DNA or not will thus certainly be under intensive
2017, VOL. 14, NO. 9, 1099–1107
Chemistry of RNA and DNA base methylation
The addition of the methyl-group to DNA and RNA bases
(Fig. 2) is catalyzed by DNA- and RNA-methyltransferases that
use S-adenosyL-methionine (SAM) as an active methyl-group
While the methyltransferases that methylate RNA
bases are now under extensive investigations, the enzymes that
catalyze the methylation of dC in DNA are well characterized.
In mammalian cells, 3 active DNA-methyltransferases (DNMTs:
DNMT1, DNMT3a and DNMT3b) exist.
DNMT3a and 3b
are de novo DNMTs, which methylate canonical dC bases.
contrast, DNMT1 maintains the methylation status during cell
division. DNMT1 operates on hemi-methylated DNA during
replication, where the template strand is already methylated, but
the newly synthesized strand is lacking methylation.
As such,
DNMT1 converts the methylation of dC into an inheritable
modification that can be transferred during reproduction.
DNMTs and thus cytosine methylation is essential in those
multicellular organisms, where it exists. The presence or absence
of 5mdC is associated with various important cellular functions,
such as transcription control, X-chromosome silencing and
genomic imprinting.
A global deletion of only one of the 3
DNMTs leads to severe cellular aberrations and is therefore
lethal in early embryogenesis (DNMT1 and 3b) or postnatal
During differentiation the “methylome”is
highly dynamic and a celltype-characteristic 5mdC pattern is
established during this process.
While 5mdC is located to a
CpG-dinucleotide context in the majority of somatic cells, non-
CpG methylation is also present in embryonic stem cells, many
pluripotent progenitor cells and adult brain. However, CpG-
methylation is also dominating here.
in vertebrates occurs in all types of DNA sequence contexts,
including repetitive and regulatory sequences, genes and trans-
posable elements; in contrast to invertebrates, where mostly
repetitive sequences are methylated.
The majority of cytosines
in a CpG-context, depending on the cell type up to 80%, are
methylated, leaving so-called CpG islands (CGI) of actively tran-
scribed genes as unmethylated patterns in a CpG-context.
Figure 1. Examples of methylated and oxidized bases found in RNA and DNA.
Figure 2. Mechanism of methylation leading to the formation of het-CH
and C-CH
connectivities in RNA and DNA.
CGIs are regions of high CpG frequency over a length of at least
500 base pairs compared with the bulk genomic DNA and found
in 40% of promoter regions in the mammalian genome, with
even higher levels (60%) in the human genome.
methylation of CpG:GpC islands is consequently a hallmark of
silenced genes.
The enzymatic mechanism of how methyltransferases meth-
ylate DNA and RNA bases is shown in Fig. 2. Centers with a cer-
tain nucleophilicity like the amino group of the RNA base A can
attack the SAM coenzyme directly leading to immediate methyl-
ation. This type of direct methylation is certainly operating for
the formation of 6m
A, 4mC or m6Am. SAM as nature’s
“methyl iodide”is hence reactive enough to methylate even weak
nucleophilic centers such as the exocyclic amino groups of A,
which feature, as an sp
-hybridized N-atom only a very weak
nucleophilic lone pair at the N-atom. This type of direct methyl-
ation creates bases, which possess the methyl group attached to a
heteroatom establishing a het-CH
system. This will be impor-
tant in the context of active demethylation (vide infra).
In contrast to the formation of het-CH
connections, meth-
ylation of the dC base in DNA at position C5 is far more com-
plex. The C5-center features no nucleophilicity at all, making
direct methylation impossible. Nature solves this problem by
exploiting a helper nucleophile (R-SH, Fig. 2). The DNMT
enzymes attack the dC base first with a nucleophilic thiol in a
1,6 addition reaction. This establishes a nucleophilic enamine
substructure (green in Fig. 2), which can subsequently be meth-
ylated with the SAM cofactor. Importantly, the helper nucleo-
phile is subsequently eliminated, thereby re-establishing the
aromatic system. This more complex enzymatic transformation
allows nature to methylate non-nucleophilic carbon atoms to
create C-CH
connectivities which feature a strong and stable
C-C single bond.
Chemistry of demethylation
To establish the reversibility needed for switching biochemical
processes, nature requires to remove the attached methyl groups.
Removal of het-CH
groups found predominantly in RNA was
found to occur with the help of a-ketoglutarate (a-KG) depen-
dent oxidases. These proteins contain a reactive Fe(II) center,
which reacts to a strongly oxidizing Fe(IV) DO species with
oxygen under concomitant decarboxylation of a-KG to succinate
(Fig. 3).
The Fe(IV) DO species is able to abstract a H-atom
from the het-CH
group to form a het-stabilized het-CH
cal, which reacts with the Fe-bound hydroxylradical to form a
-OH hemiaminal/acetal functionality.
However, these structures are unstable. In water, they
decompose in a spontaneous reaction under loss of formalde-
hyde to give the unmethylated compound. It is interesting that
formaldehyde is formed as a byproduct of this reaction because
it is typically a rather toxic compound. It needs to be seen how
this molecule is detoxified in the context of the demethylation
reaction. Particularly well studied is the removal of the N6-
methyl group from m6A to revert into the canonical RNA base
A. So far 2 a-KG dependent oxidases were found to catalyze
the oxidation. One is the fat mass and obesity-associated pro-
tein (FTO) protein and the second is ALKBH5. It was shown,
that knockdown of FTO led to increased amounts of m6A and
in turn overexpression of FTO resulted in decreased m6A lev-
Alkbh5-deficient mice had a similar effect as FTO knock-
down in human cells and resulted in increased m6A levels of
the mRNA.
The demethylation activity of both proteins is
comparable, although ALKBH5 shows direct demethylation,
whereas FTO-mediated demethylation is supposed to create
hm6A and f6A as intermediates.
In 2009 it was found that also 5mdC is further enzymatically
oxidized in a stepwise fashion to give first 5-hydroxymethylde-
soxycytosine (5hmdC), followed by 5-formyldesoxycytosine
(5fdC) and 5-carboxydesoxycytosine (5cadC). “Ten-11 translo-
cation”(TET) enzymes, which are Fe
/a-KG dependent diox-
ygenases, were discovered to catalyze this iterative 5mdC
oxidation reaction.
Regarding the first oxidation step that
transforms 5mdC to 5hmdC, the Fe
/a-KG catalyzed reaction
generates a stable C-CH
-OH connectivity, which is as a pri-
mary alcohol stable in water (Fig. 3). 5hmdC is consequently a
Figure 3. Oxidation of m6A followed by decomposition of the hemiaminal to A and oxidation if 5mdC to stable 5hmdC.
stable DNA base modification and it was suggested that the
base has indeed epigenetic functions. For example, 5hmdC con-
stitutes 0.6% of all nucleotides in Purkinje neurons, a special
neural cell type of the cerebellum, and 0.032% of all nucleotides
in embryonic stem (ES) cells.
The highest 5hmdC levels in
fully differentiated tissues were found in the brain with up to
1% of all cytosines.
Evidence accumulates that 5hmdC in a
given gene is able to accelerate transcription and it is not sur-
prising that 5hmdC is mainly present in the promoter of
actively transcribed genes.
TET enzymes are in this sense required to orchestrate the
transcriptional activity of genes. In vertebrates, TET proteins
exist in 3 different types (TET1 –TET3) that do not differ
regarding their chemistry, but seem to have different spatio-
temporal activity. Whereas TET1 is mostly expressed in stem
cells, TET3 is upregulated during differentiation and the most
abundant TET enzyme in fully differentiated cells.
A global
TET3-knockout is lethal in embryogenesis, because it prevents
epigenetic reprogramming during differentiation.
It is inter-
esting, that the presence of 5hmdC in mammalian DNA was
described first already in 1972.
It took more than 30 y to con-
firm that 5hmdC is really present in substantial amounts that
are highly depending on the cell and tissue type.
The further oxidized bases 5fdC and especially 5cadC
(Fig. 4) could not be associated yet with distinct cellular func-
tions, but for 5fdC it was reported that it might have regulatory
purposes and is also a stable epigenetic mark.
In accordance
with these previous findings, a recently reported single-cell
5fdC-sequencing method called CLEVER-seq revealed that the
generation of 5fdC in promoter regions precedes the upregula-
tion of gene expression.
Despite this faint evidence for
epigenetic functions, 5fdC and 5cadC are currently mainly con-
sidered to be intermediates on the way of an active DNA
demethylation process. DNA demethylation is a crucial process
of cell development. Especially during fertilization (paternal
part of the genome), early embryogenesis (maternal part of the
genome) and the development of germ cells, DNA demethyla-
tion takes place in a genome-wide manner, allowing a broad
reprogramming of the fertilized oocyte and the cells in the early
But not only during development, also in fully dif-
ferentiated cells, it occurs at specific sites of the genome. In
brain, for example, locus-specific DNA demethylation and de
novo methylation is induced by neural activation, arguing that
DNA demethylation is important for normal brain function,
including memory formation and learning.
DNA demeth-
ylation can take place either actively, which means replication-
independent, or passively when DNMT1 does not methylate
the nascent DNA strand in hemi-methylated DNA after repli-
cation. Passive demethylation occurs, when DNMT1 is absent
or blocked during the replication process, which happens for
example during early embryogenesis to ensure the demethyla-
tion of the maternal genome.
Interestingly, 6mA demethyla-
tion in Drosophila is catalyzed by Drosophila’sTET homolog
(DMAD or dTet). DMAD depletion results in higher 6mA lev-
els, but unchanged 5mdC patterns, and is lethal at pupa stage
or shortly after.
DMAD and TET possess similar catalytic
active Cys-rich and DSBH domains, however, 6mA-demethyla-
tion activity was not observed yet for mammalian TET
Although oxidation of 5hmdC to 5fdC and 5cadC creates
stable molecules due to the lack of a het-atom in b-position, it
is discussed that both could be turned into unstable structures
Figure 4. Potential mechanism of chemically induced active demethylation with a potential immediate re-methylation.
upon further chemical manipulation. A chemically attractive
mechanism requires that 5fdC and 5cadC are attacked by a
helper nucleophile, preferentially a thiol group at the C6 posi-
tion, in a Michael-type reaction (Fig. 4). Hydratization of 5fdC
and tautomerization of the reacted 5fdC and 5cadC allows us
to formulate a “b-imino-type”substructure that is prone to
deformylation and decarboxylation (red arrows in Fig. 4).
Indeed, we could show that reaction of 5fdC and 5cadC with a
thiol-nucleophile leads to spontaneous deformylation and
decarboxylation showing that the suggested chemistry is feasi-
ble. There is currently no evidence that this type of chemistry
occurs in vivo but we could show that stem cell lysates feature a
decarboxylating activity.
Interesting is the observation that
deformylation and decarboxylation of 5fdC and 5cadC after
reaction with a thiol nucleophile leads to a reaction intermedi-
ate (boxed in Fig. 2 and 4) that is the key intermediate observed
already during methylation of dC to 5mdC by the DNMTs. It is
therefore tempting to speculate that DNMT enzymes are
involved in the deformylation and decarboxylation maybe fol-
lowed by immediate re-methylation. Although this reaction
sequence would follow chemical logic, it needs to clarified in
the near future, if such reactions occur indeed in nature. It was,
however, shown that C5-DNA-methyltransferases are indeed
able to remove formaldehyde from 5hmdC, converting 5hmdC
directly to dC, therefore supporting these ideas.
In this context, it is interesting to note that 5hmC and 5fC
were also discovered in RNA. In human cells at tRNA position
C34, the oxidation of the corresponding RNA base 5mrC to
5frC is catalyzed by the Fe
/a-KG dependent enzyme
ALKBH1, which is also responsible for m1 A demethylation in
mammalian tRNA.
Interestingly, 5hmrC was not detected
as an intermediate in the ALKBH1-dependent 5mrC oxida-
In Drosophila, 5hmrC was discovered in polyadenylated
RNA and is associated with enhanced mRNA-translation effi-
ciency back to normal level, when 5mrC has lowered the
Surprisingly, the oxidation reaction is catalyzed by
Drosophila’sTET homolog dTet that is also responsible for
6mA demethylation, but does not oxidize 5mdC.
there is evidence that TET enzymes are also responsible for
5mrC oxidation,
but at the moment it is not clear whether
TET-mediated 5hmrC or 5frC formation are stable or rather
transient modifications.
In contrast to the chemical mechanism of active demethyla-
tion discussed above, strong evidence exists that active demethyl-
ation via formation of 5fdC and 5cadC is also linked to base
excision repair (BER), which repairs also mismatches caused by
deamination of 5hmdC to 5hmdU (Fig. 5). This mechanism
includes excision of 5dfC or 5cadC and subsequent activation of
BER. The dG/dT mismatch specific thymine DNA glycosylase
(TDG) recognizes dG/dT mismatches, but with an even higher
activity it excises 5fdC or 5cadC, but not 5mdC and 5hmdC, in
This reactivity was not observed for other DNA glycosy-
lases. Evidence that TDG excises 5fdC and 5cadC also in vivo is
given by the fact that 5fdC and 5cadC levels are
5–10 times increased in TDG-deficient ES cells compared with
the wildtype.
However, TET/TDG-mediated demethylation is
very unlikely to be the only demethylation mechanism. It rather
occurs at defined promoter regions in the genome than in a
genome-wide manner. First, TDG-activity causes abasic sites.
this happened genome-wide, it may impair genomic stability,
which is crucial for correct development. Second, TDG knockout
starts to be lethal not before embryonic day 12.5 and TDG levels
are very low in the zygote, where the paternal genome is
Most recently it was suggested that nature may not need to
oxidize 5mdC to 5fdC and 5cadC for demethylation and that a
third TET-independent pathway has to exist. In the zygote, the
most drastic demethylation occurs when 5mdC is globally
erased from the paternal part of the genome, while the maternal
part is shielded from demethylation. DNA-demethylation of
Figure 5. Active demethylation via base excision repair. Two possibilities are discussed: A direct removal of 5fdC and 5cadC in xdC:dG base pairs or removal of a deami-
nated 5hmdU in a 5hmdU:dG mismatch by BER glycosylase.
the paternal pro-nuclei is replication- and TET-independent,
since 5hmdC levels increase after 5mdC levels have dropped
and global demethylation can be detected in Tet3-deficient
It might be that deamination of genomic 5mdC to
dT and subsequent dT/dG mismatch repair are the mechanism
behind this observation.
However, this would also impair
genomic stability.
Implication of misguided methylation and
Whereas the distribution of 5mdC and 5hmdC is tightly regu-
lated to ensure the anticipated functionality of a cell and its
response to DNA damage, one hallmark of cancer cells is their
completely different methylation and hydroxyL-methylation
In many cancer types, the global methylation levels
are decreased, while promoter regions of important regulatory
and tumor suppressor genes are hypermethylated and therefore
One example is the hypermethylation of the pro-
moter region of HIC1, which is a transcriptional repressor of
SIRT1, a survival protein (proto-oncogene) that is consequently
It was also shown that certain CGIs are coordi-
nately methylated in some tumor cells, which is called “CpG
island methylator phenotype.”
Since TET1 was initially dis-
covered in 2002 as a fusion protein to MLL1 H3K4 methyl-
transferase in patients with acute lymphoblastic leukemia,
which is characterized by mutations in the MLL1 protein, it
was considered to be an oncogene.
Only when the biologic
function of the TET enzymes was elucidated in 2009, it was
proven a few years later that TET enzymes are actually tumor
suppressors that are silenced in various types of tumors.
Decreased levels and activity of TET1 and therefore reduced
5hmdC levels are associated with haematopoietic malignancies,
colon, breast, prostate, liver and lung cancers, which show
greater levels of proliferation and for breast cancer increased
invasion rates as a direct consequence of TET1 downregula-
TET2 mutations and consequently decreased 5hmdC
levels occur in various myeloid malignancies, including chronic
myelomonocytic leukemia, myeloid proliferative neoplasm and
acute myeloid leukemia.
Mutations in the genes of isoci-
trate dehydrogenase (IDH) 1 and 2 lead to the production of
D-2-hydroxyglutarate (D-2HG), a metabolite that inhibits
TET2 activity for example in AML and MPN, but also in malig-
nant gliomas, resulting in a dramatic decrease of 5hmdC lev-
Interestingly, IDH1/2 and TET2 mutations seem to be
mutually exclusive in these types of tumor, with IDH1/2 muta-
tions being the ones with higher oncogenic potential.
Recent results show that not only mutations in TET genes or
their inhibition by cancer metabolites are important for tumor-
igenesis, but also tumor hypoxia is responsible for reduced TET
There is more and more evidence that epigenetics and metabo-
lism are closely connected not only via D-2HG in cancer metabo-
lism, but also in normal cells.
As an intermediate in the tri-
citric acid (TCA) cycle and part of nitrogen catabolism through
deamination of glutamate, a-KG is one of the key metabolites.
Since it is the co-substrate of TET enzymes and other dioxygenases
involved in epigenetic regulation, such as histone lysine demethy-
lase, it links epigenetics directly to metabolism. Levels of a-KG are
rate limiting for TET activity and higher a-KG levels result in
higher TET activity with direct impact on differentiation pro-
Depending on the cell type and status, a-KG can either
promote self-renewal or induce differentiation.
In brown adi-
pose tissue (BAT) development, for example, TET3 mediates cell
commitment to BAT by demethylating the Prdm16 promoter.
AMP activated protein kinase a1(AMPKa1) influences a-KG lev-
els positively and therefore increase TET3 activity.
and glutamine metabolism increases a-KG levels, leading to self-
renewal in pluripotent mouse embryonic stem cells, while succinate
supply leads to differentiation.
Additionally, succinate and also
fumarate, another 2 intermediates of TCA cycle, show an inhibi-
tory effect on TET enzymes in vitro.
In the future, it will be challenging not only to prove the
existence, but to reveal the distinct biologic functions of
the various DNA and RNA modifications that exist. The role of
the modified bases in mRNA are currently under extensive
investigation and for DNA, especially the functions of 5hmdC
in regulatory and learning processes in brain, but also during
development and in cancer cells are of great interest.
FRT thanks the Boehringer Ingelheim Fonds for a PhD fellowship. We
thank the Deutsche Forschungsgemeinschaft for financial support through
the programs: SFBs 646, 749 and 1032, as well as the SPP1784. Further
support is acknowledged from the Excellence Cluster CiPS
(Center for
Integrated Protein Science).
