Available via license: CC BY 4.0
Content may be subject to copyright.
Nucleic Acids Research, 2023 1
https://doi.org/10.1093/nar/gkad312
Led-Seq: ligation-enhanced double-end
sequence-based structure analysis of RNA
Tim Kolberg 1,†, Sarah von L¨
ohneysen2,†, Iuliia Ozerova2, Karolin Wellner1,
Roland K. Hartmann3, Peter F. Stadler 2,4,5,6,7 and Mario M¨
orl 1,*
1Institute for Biochemistry, Leipzig University, Br¨
uderstr. 34, 04103 Leipzig, Germany, 2Bioinformatics Group,
Department of Computer Science and Interdisciplinary Center for Bioinformatics, Leipzig University, H¨
artelstr. 16–18,
04107 Leipzig, Germany, 3Institute for Pharmaceutical Chemistry, Philipps University Marburg, Marbacher Weg
6, 35037 Marburg, Germany, 4Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103
Leipzig, Germany, 5Department of Theoretical Chemistry, University of Vienna, W¨
ahringerstraße 17, A-1090 Wien,
Austria, 6Facultad de Ciencias, Universidad Nacional de Colombia, Bogot´
a, Colombia and 7Santa Fe Institute, 1399
Hyde Park Rd., Santa Fe, NM 87501, USA
Received January 24, 2023; Revised March 21, 2023; Editorial Decision April 11, 2023; Accepted April 13, 2023
ABSTRACT
Structural analysis of RNA is an important and ver-
satile tool to investigate the function of this type of
molecules in the cell as well as
in vitro
. Several ro-
bust and reliable procedures are available, relying
on chemical modification inducing RT stops or nu-
cleotide misincorporations during reverse transcrip-
tion. Others are based on cleavage reactions and RT
stop signals. However, these methods address only
one side of the RT stop or misincorporation position.
Here, we describe Led-Seq, a new approach based
on lead-induced cleavage of unpaired RNA positions,
where both resulting cleavage products are investi-
gated. The RNA fragments carrying 2,3
-cyclic phos-
phate or 5-OH ends are selectively ligated to oligonu-
cleotide adapters by specific RNA ligases. In a deep
sequencing analysis, the cleavage sites are identified
as ligation positions, avoiding possible false positive
signals based on premature RT stops. With a bench-
mark set of transcripts in
Escherichia coli
,weshow
that Led-Seq is an improved and reliable approach
based on metal ion-induced phosphodiester hydrol-
ysis to investigate RNA structures
in vivo
.
INTRODUCTION
RNA is an extremely versatile molecule, despite its lim-
ited number of building blocks. It performs various tasks
in many biological processes ranging from encoding genetic
information as mRNAs to delivering amino acids to the ri-
bosome as tRNAs, catalyzing chemical reactions, regulating
gene expression and shielding viral RNA from degradation
by host RNases, just to name examples (1–3). The func-
tional diversity of RNA is based on helical arrangements
comprising stacking interactions and base pairing that form
both local structural motifs and long range interactions (4).
RNA structure formation is energetically dominated by
canonical (Watson–Crick and GU-wobble) pairs forming
helical stem regions separated by unpaired stretches of nu-
cleotides. Such secondary structures also appear as inter-
mediates during the folding process before additional inter-
actions stabilize the three-dimensional structure (5). RNA
secondary structure, i.e. the arrangement of canonical base
pairs, can be computed based on an energy model that con-
siders sequence-specic stacking of adjacent base pairs and
entropy-driven, destabilizing contributions of loops (6). Ef-
cient dynamic programming algorithms have been devised
that compute minimum free energy (MFE) structures (7)
and partition functions (8). Deviations from this simple
nearest neighbor model, inaccuracies of energy parameters
in particular large and multi-branched loops, tertiary in-
teractions including pseudoknots, the limited understand-
ing of the effects of salt concentrations and temperature
limit the accuracy of the thermodynamic model. Additional
factors such as interactions with proteins, metal ions and
other ligands as well as the cellular localization of RNA
molecules likewise have an impact on the conformational
states (9,10). That makes computational prediction difcult
and highlights the need for additional data to infer reliable
RNA structures. Chemical probing methods provide infor-
mation on the base pairing status of individual nucleotides.
While it is well known that this information alone is insuf-
cient to uniquely determine a secondary structure, it can
be readily combined with thermodynamic prediction algo-
rithms in the form of position-specic constraints or ‘bonus
energies’ to guide the reconstruction of the biologically
*To whom correspondence should be addressed. Tel: +49 341 9736911; Fax: +49 9736919; Email: mario.moerl@uni-leipzig.de
†The authors wish it to be known that, in their opinion, the rst two authors should be regarded as Joint First Authors.
C
The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
2Nucleic Acids Research, 2023
relevant structure (11–13). It is crucial, therefore, to develop
and improve methods to determine RNA structure in vivo
to deepen our understanding of how it is forming and how
this structure can be predicted.
A rapid development has taken place in the last two
decades regarding the determination of RNA secondary
and tertiary structure (reviewed in (14–16)). Although bio-
physical methods such as X-ray crystallography, NMR and
cryo-EM also have the potential to provide detailed insights
into RNA structure, they are limited to in vitro applica-
tions. The possibility to structurally investigate the entirety
of RNAs in vivo was pioneered with the development of
SHAPE (17). This method and its derivatives evolved over
the years utilizing different reagents, like NMIA, 1M7 or
BzCN, to modify RNA depending on its structure (18). The
coupling of SHAPE and other previously established struc-
ture probing methods with capillary electrophoresis and
later high throughput sequencing allowed massively paral-
lel investigation of RNA structures, enabling a global as-
sessment of the ‘in vivo structurome’ (19–21). Most meth-
ods target nucleotides in single-stranded and conformation-
ally exible RNA regions that are more accessible to chem-
ical modication than base-paired regions. Chemical mod-
ication gives rise to either reverse transcription (RT) stop
(-seq approaches) or mutation signals (-MaP approaches)
in cDNA. These signals can be read out as a reactivity
score and be used to infer RNA structure (10). Structure-
dependent cleavage is an attractive alternative to the cova-
lent modication strategy. Compared to enzymatic cleav-
age, which is to date limited to in vitro applications (22,23),
divalent metal ions promise to be an interesting alternative
for in vivo investigations.
In particular Pb2+ is well established in RNA structure
probing both invitro and invivo (24). Briey, this method re-
lies on structure dependent cleavage catalyzed by hydrated
Pb2+ ions that abstract the proton from the ribose 2-OH
group. In an in-line conguration, the 2-O−group attacks
the phosphorous in the sugar-phosphate backbone, result-
ing in cleavage of the RNA with fragments carrying 2,3-
cyclic phosphate (2,3-cP) and 5-OH ends. As exible re-
gions are more likely to adopt the in-line conformation,
this reaction occurs predominantly in single-stranded re-
gions. Enhanced cleavage can also occur at metal ion bind-
ing sites (25,26). Lead-seq (27) combines the established in
vivo Pb2+ probing approach (28) with next generation se-
quencing (NGS) to allow its use in a high-throughput set-
ting. It serves as a promising starting point to broaden the
spectrum of methods to investigate the RNA structurome
in vivo. The incorporation of a lead cleavage score as ‘bonus
energy’ in the minimum free energy structure computation
resulted in a distinct improvement of the predicted struc-
ture of several tRNAs. Like most NGS-coupled probing
approaches, Lead-seq uses random fragmentation to gen-
erate a more homogeneously sized pool of RNAs. This in-
troduces additional strand breaks, which may obscure the
actual cleavage signal even though a negative control can re-
duce this effect to a certain degree. The 5phosphorylation
and 3dephosphorylation of the RNAs with T4 polynu-
cleotide kinase prior to adapter ligation impedes the distinc-
tion of the 5and 3phosphorylation status of the ligated
RNAs. In this manner, all cellular RNAs are potentially in-
cluded in the libraries, which negatively impacts the speci-
city of the method. Lead-seq thus is a promising concept
with ample potential for improvement.
Here we propose a modied procedure that improves the
specicity and validity of this method. Cleavage by divalent
metal ions introduces strand breaks that generate fragments
with distinct end groups, 2,3-cP and 5-OH. For the specic
capture of these fragments, we utilized the unique features
of two RNA ligases to mark the cleavage positions via lig-
ation of sequencing adapters. The 5fragments, carrying a
2,3-cP, are captured by an Arabidopsis thaliana tRNA lig-
ase (Ath RNL) variant. The wild-type enzyme is able to lig-
ate 2,3-cP and 5-OH via its 2,3cyclic phosphodiesterase
and 5kinase activity (29,30). To ensure specic ligation of
a pre-adenylated adapter to the substrate RNA, two muta-
tions were introduced that inactivate the enzyme’s ability to
phosphorylate and adenylate 5-OH groups (31). This Ath
RNL AA double mutant recently proved to keep its speci-
city for 2,3-cP carrying substrates in a ribozyme activ-
ity screen (32). The corresponding 3fragments, carrying
5-OH ends, are captured in a similar manner utilizing Es-
cherichia coli (Eco) RtcB. This ligase and its homologs also
have the ability to directly ligate 5-OH and 2,3
-cP but ex-
hibit a unique mechanism without the necessity of phos-
phorylating the RNA 5end (33). It also proved to be ap-
plicable to library preparation in previous studies (34,35).
The combination of two separate ligation approaches and
analysis of cleavage positions from both sides allows mutual
validation of the identied cleavage sites. Moreover, the use
of reads from both the 5and 3side ensures that one of the
libraries remains informative close to the transcript ends,
where the reads from the other library are too short for un-
ambiguous mapping.
MATERIALS AND METHODS
Protein purication
Ath RNL K152A D726A was expressed with 6xHisTag
from a pET28a vector kindly gifted by Christina Wein-
berg in E. coli BL21 (DE3) codon PLUS RIPL cells. Sac-
charomyces cerevisiae Tpt1 was expressed from a pET-24b
vector in E. coli BL21 (DE3) cells. Both enzymes were ex-
pressed and puried as described (32). T4 RNL 2 truncated
KQ and TS2126 RNL expression plasmids were gifted by
Jan Medenbach and the proteins were expressed and puri-
ed as described previously (36,37). Eco RtcB was expressed
with 6xHisTag from a pET-53 vector (Addgene #51282) in
E. coli BL21 (DE3) according to (38). Briey, transformed
cells were cultivated in TB medium with 100 g/ml ampi-
cillin at 37◦CtoanOD
600 of 0.6. Cultures were chilled on
ice for 30 min before induction with 0.1 mM IPTG and ad-
dition of ethanol to a nal concentration of 2%. The in-
duced cells were cultivated for 18 h at 16◦C and harvested
by centrifugation. Pellets were stored at −80◦C until use.
Cell pellets were resuspended in 25 ml ice-cold lysis buffer
(50 mM Tris–HCl, pH 7.4, 250 mM NaCl, 10% (w/v) su-
crose, 0.2 mg/ml lysozyme) and incubated for 1 h at 4◦C.
Then Triton-X100 was added to a nal concentration of
0.1%. Cells were disrupted by sonication at 70% intensity,
7×10 s with 20 s breaks. The lysate was centrifuged, the
supernatant sterile-ltered and used for purication. All
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 3
further purication steps were carried out on an ¨
AKTA
pure protein purication system starting with metal ion
afnity chromatography on a HisTrapFF 1 ml column
(Cytiva). To this end, the column was equilibrated with
5 column volumes (CV) of binding buffer (50 mM Tris–
HCl, pH 7.4, 150 mM NaCl, 10% glycerol) containing
25 mM imidazole. After loading the sample, the column
was washed with 10 CV binding buffer with 25 mM imi-
dazole, followed by a second wash step with 10 CV wash
buffer (50mM Tris–HCl, pH 7.4, 2M KCl). Step-wise elu-
tion was carried out using 5 CV binding buffer containing
100, 300 and 500 mM imidazole. Fractions containing Eco
RtcB ligase were identied via SDS-PAGE and Coomassie
staining, pooled and subsequently puried by size exclusion
chromatography on a HiLoad 16/60 Superdex 75 pg col-
umn with SEC buffer (10 mM Tris–HCl, pH 8.0, 350mM
NaCl, 1 mM DTT). Fractions containing the desired pro-
tein were identied via SDS-PAGE, pooled and concen-
trated with a Vivaspin 6 column (MWCO 10kDa) and
stored at −80◦C until use.
T4 polynucleotide kinase R38A was expressed with
6xHisTag from a pET-53 vector in E. coli BL21 (DE3)
according to (39). Transformed cells were grown in TB
medium containing 100 g/ml ampicillin at 37◦Ctoan
OD600 of 0.6. The cultures were chilled on ice for 30 min
before induction with 0.3mM IPTG and cultivation at
16◦C for 18 h. Cells were harvested via centrifugation and
stored at −80◦C until use. The cells were resuspended
in 25 ml ice-cold lysis buffer (50 mM Tris–HCl, pH 7.5,
1.2 M NaCl, 15 mM imidazole, 10% (v/v) glycerol, 0.2 mM
phenylmethylsulphonyl uoride, 1 mg/ml lysozyme) and in-
cubated 1 h at 4◦C. Triton-X100 was added to a nal con-
centration of 0.1%. Cells were disrupted by sonication at
70% intensity, 7 ×10 s with 20s breaks. The lysate was cen-
trifuged, the supernatant sterile-ltered and used for pu-
rication. Metal ion afnity chromatography was carried
out on an ¨
AKTA pure protein purication system using a
HisTrapFF 1 ml column (Cytiva). The binding buffer con-
tained 50 mM Tris–HCl, pH 7.5, 200 mM NaCl and 10%
glycerol. For step-wise elution, the 5 CV buffer addition-
ally contained 125, 300 and 500 mM imidazole. The frac-
tions with T4 PNK R38A were identied by SDS-PAGE,
subsequently pooled and dialyzed twice against 1l dialy-
sis buffer (10 mM Tris–HCl, pH 7.4, 50 mM KCl, 1 mM
DTT). Glycerol mix was added to reach storage condi-
tions (10 mM Tris–HCl, pH 7.5, 50% (v/v) glycerol, 0.2 mM
EDTA, 1mM DTT, 50 mM KCl, 0.2 M ATP). Aliquots
of the protein solution were stored at –80◦C. Plasmids en-
coding the enzymes used in this work are either available at
Addgene (addgene.org) or from the authors upon request.
Oligonucleotide preparation
Oligonucleotides were ordered from biomers (Ulm, Ger-
many) and Microsynth (Balgach, Switzerland). 3Adapter,
RT-Primer and circularization RT-Primer were 5la-
beled using [␥32P]-ATP and T4 PNK (NEB) and puri-
ed via denaturing PAGE and ethanol precipitation. Non-
radioactively labeled adapters and RT-Primers were pre-
pared in the same way with ATP and T4 PNK, using T4
PNK (3phosphatase minus) for the 5adapter. The 5phos-
phorylated and 3blocked 3adapter was pre-adenylated
with TS2126 RNL. In one 20 l reaction, 2.5 M TS2126
RNL, 100 pmol adapter (10 pmol end-labeled with 32P)
and 0.5 mM ATP were incubated in 1×adenylation buffer
(50 mM MOPS, pH 7.5, 10 mM KCl, 5 mM MgCl2) sup-
plemented with 2.5 mM MnCl2and 1 mM DTT for 1 h at
60◦C. The reaction was stopped by heat inactivation for 10
min at 80◦C followed by preparative denaturing PAGE and
ethanol precipitation. For details on oligomers, see Supple-
mentary Information.
Sample preparation for in vivo lead probing
In vivo lead probing was performed as described (27,28,40).
Briey, LB medium was inoculated with an E. coli DH5a
overnight culture to a starting OD600 of 0.06, and cells were
grown to an OD600 of 0.5 at 37◦C. Lead-II-acetate solu-
tions were freshly prepared by mixing 3 volumes of a lead-
II-acetate stock solution with 1 volume 4×LB medium
and pre-warming to 37◦C. After reaching the desired den-
sity (OD600 0.5), 20 ml of each main culture were mixed
with LB-lead-II-acetate solutions to a nal concentration
of 75 mM and incubated for 7 min. In Pb2+ (–) samples,
lead-II-acetate was replaced by autoclaved, deionized wa-
ter (dH2O). The reaction was stopped by adding 10ml ice-
cold 500 mM EDTA. Cells were immediately pelleted and
RNA was isolated using peqGOLD TriFast®(VWR) ac-
cording to the manufacturer’s instructions and precipitated
from the aqueous phase by adding twice the volume of ice-
cold isopropanol. The pellet was resuspended in dH2Oand
incubated with 2 U DNase I (NEB). RNA was recovered
by phenol/chloroform extraction and ethanol precipitation
(41). Recovered RNA was redissolved in dH2O and stored
at –80◦C.
Specic adapter ligation
Total RNA was used for 2’, 3’-cP mapping via specic
adapter ligation with Ath RNL K152A D726A and for 5-
OH mapping via 3dephosphorylation with T4 PNK R38A
and subsequent 5-OH specic adapter ligation with Eco
RtcB.
2,3
-cP capture. 35ng total RNA were pre-incubated
with 20 pmol pre-adenylated 3adapter for 5 min at 65◦C
and immediately put on ice for at least 1 min. Subse-
quently, the mixture was incubated in 1×reaction buffer
(20 mM Tris–HCl, pH 7.5, 5 mM MgCl2, 2.5 mM sper-
midine, 100 M DTT) and 20% (v/v) PEG8000 with
12 pmol Ath RNL K152A D726A for 2 h at 25◦Cina
volume of 16 l. In the ligation reaction, the 2’, 3’-cP
is converted into a 2’-P group that can interfere with
reverse transcription. To remove this obstacle, 10pmol
S. cerevisiae tRNA 2’-phosphotransferase Tpt1 and 1 mM
NAD were added to the ligation mixture. The volume was
adjusted to 20 l with dH2O and reaction buffer and the
samples were incubated for 30 min at 30◦C. The ligated
and 2’-dephosphorylated RNA was recovered using the
Monarch®RNA clean up kit (NEB) and used as template
for reverse transcription.
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
4Nucleic Acids Research, 2023
5-OH capture. 70ng total RNA were incubated in 1×
PNK buffer (NEB) with 1 mM ATP and 10 pmol T4 PNK
R38A for 30 min at 37◦Cin15l total volume. The 3de-
phosphorylated RNA was recovered with the Monarch®
RNA clean up kit (NEB) using the protocol to bind RNA
down to 15 nt length. The RNA was eluted in 6 ldH
2O
and 3 l thereof were pre-incubated with 20pmol 5adapter
for 5 min at 65◦C and put on ice for 1 min. Subsequently,
the samples were incubated in 50mM Tris–HCl (pH 7.4),
2 mM MnCl2, 100 M GTP and 20% (v/v) PEG8000 with
50 pmol Eco RtcB for 1 h at 37◦Cin20l total volume.
The ligated RNA was recovered again with the Monarch®
RNA clean up kit and used for 3adapter ligation. To
this end, the ligated RNA was mixed with 20 pmol pre-
adenylated 3adapter, pre-incubated for 5 min at 65◦Cand
put on ice for 1 min. The mixture was incubated in 1×T4
ligase buffer (NEB) and 20% PEG8000 (v/v) with 20 pmol
T4 RNL 2 truncated KQ for 2h at 25◦C. The ligated RNA
was extracted using the Monarch®RNA cleanup kit and
used as template for reverse transcription.
Reverse transcription
The ligated and recovered RNA from both strategies was
reverse transcribed with Superscript IV reverse transcrip-
tase (Thermo). Reactions of 20 l were set up according to
the manufacturer, using RT-Primer for 5-OH samples. For
2,3-cP samples, a biotin-dNTP-Mix (nal concentration
500 MdATP/dGTP/dTTP each, 350 M dCTP, 150 M
biotin-16-dCTP (Jena Bioscience)) and circularization RT-
Primer was used. For both strategies, 20 pmol of the respec-
tive primer were used containing trace amounts of the same
5labeled primer for product detection. After incubation at
55◦C for 10 min, the template RNA in the reaction mixes
was degraded by adding NaOH to a nal concentration of
250 mM and incubation for 3 min at 95◦C. The reaction mix
was neutralized with 250 mM HCl, cDNA was extracted
and size-selected via preparative denaturing PAGE. A 32P-
labeled size standard was used to identify cDNA above the
size of the used RT-Primer + UMI sequence. The cDNA
was eluted from the gel and precipitated with ethanol and
10 g/ml LPA (Thermo) as carrier. Samples of the 2,3-cP
strategy were circularized (see below), 5-OH cDNA was re-
dissolved in 20 ldH
2O and directly used for amplication
and introduction of ow cell linkers and indices.
Circularization of cDNA and streptavidin bead cleanup
The cDNA for 2,3
-cP library construction, redissolved in
20 ldH
2O, was incubated for 2 h at 60◦Cin1×adeny-
lation buffer supplemented with 2.5 mM MnCl2,10mM
DTT and 50 M ATP using 2.5 M TS2126 RNL (42).
The mixture was heat-inactivated at 80◦C for 10 min, ad-
justed to 100 lwith1×wash/binding buffer (20 mM Tris–
HCl, pH 7.5, 1 mM EDTA, 0.5 M NaCl) and directly used
for purication with Hydrophilic Streptavidin Magnetic
Beads (NEB). Per reaction, 16 l beads were prepared: the
beads were washed three times by resuspending them in
160 lwash/binding buffer each time and removing the
supernatant while placing the tubes on a magnetic rack.
The washed beads were resuspended with the circulariza-
tion reaction mixture and incubated for 20 min at room
temperature with careful mixing every 5 min. After incu-
bation, samples were spun down, supernatants were dis-
carded and the beads were resuspended by pipetting. The
beads where washed three times with 500lwash/binding
buffer. Each wash included the following steps: resuspen-
sion of the beads by pipetting, brief centrifugation in a desk-
top centrifuge, again careful resuspension of the bead pel-
let by pipetting, transfer of the sample to a new tube (this
step turned out to be crucial to avoid carry over of non-
biotinylated cDNA), using a magnetic rack when remov-
ing the supernatants. The beads were nally resuspended in
20 ldH
2O each and directly used for amplication and in-
troduction of ow cell linkers and indices.
Introduction of ow cell linkers and indices
Flow cell binding sequences and indices were introduced via
PCR. One reaction mix contained 2.5 l of the cDNA tem-
plate solution, 1×Phusion HF buffer (Thermo), 200 M
dNTPs, 0,5 M Illumina PCR and Index Primer and
0.02 U/l Phusion®high-delity polymerase (Thermo)
in a volume of 25 l. To samples containing circularized
cDNA templates, 5 M of a PNA clamp were added to
reduce the accumulation of a side product resulting from
residual prolonged circularization RT-Primer as template in
this reaction. Cycling conditions were as follows: initial de-
naturation at 98◦C for 30 s, followed by 15 (5-OH strategy)
/18 (2,3-cP strategy) cycles of 98◦C for 10 s, 80◦Cfor20s,
60◦C for 20 s and 72◦C for 20 s. The additional annealing
step at 80◦C was introduced to ensure optimal PNA binding
in the 2,3
-cP libraries (43). The amplied libraries were pu-
ried by preparative native PAGE, excised and eluted from
the gel, and precipitated with ethanol and 10 g/ml LPA as
carrier. The nal libraries were sequenced on an Illumina
NovaSeq 6000 (Azenta). The experimental protocol is illus-
trated in Figure 1.
Read preprocessing and mapping
Sequencing quality of paired reads was evaluated using
MultiQC (44). 3adapter sequences were removed from
read1 and read2, and pairs were ltered for a correct UMI
sequence in read2 with cutadapt v2.10 (45). A mini-
mum read length of 12 nt was set to enable unambiguous
mapping. An additional xed sequence between the UMI
and RNA insert was removed from the 5-end of read2
and if necessary from the 3-end of read1 by cutadapt.
Preprocessed reads were mapped to the E. coli genome
(NZ CP026085.1) with segemehl v0.3.4 (46,47). Libraries
were deduplicated with umi tools v1.0.1 (48) and ltered
for primary hits of properly mapped read pairs. After an
initial sample composition analysis, selected (multi-copy)
genes were masked by substitution with ‘N’ and one copy
was attached to the end of the genome to facilitate unique
mapping of reads. As we wanted to focus our analysis
on highly represented transcripts, this procedure included
all tRNA and rRNA genes. For details on the generated
genome and corresponding transcriptome annotation le,
see Supplementary Information. Mapping and deduplica-
tion steps were repeated accordingly.
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 5
Figure 1. Preparation of Led-Seq Illumina libraries. Left: Mapping of 2,3-cP carrying cleavage fragments via specic ligation of a pre-adenylated RNA
3adapter using Ath RNL AA. The 3adapter carries an 8 nt unique molecular identier (UMI) to account for PCR bias. After ligation, RNA is reverse
transcribed using the circularization RT-Primer and biotinylated dCTP (B-dCTP). The resulting cDNA is circularized by TS2126 RNL and extracted with
hydrophilic streptavidin magnetic beads (brown circular symbol denoted ‘A’), removing residual circularization RT-Primer in the process. Circularized,
bead-bound cDNA is used as a PCR template to introduce ow cell linkers (P5, P7) and 6 nt indices. A PNA PCR clamp is utilized to minimize the ampli-
cation of circularized products without cDNA insert. Right: Mapping of 5-OH carrying cleavage fragments via specic ligation of a biphosphorylated
RNA 5adapter using Eco RtcB. RNA was previously 3dephosphorylated with T4 PNK R38A to prevent ligation of RNA fragments with 2,3-cP to
5-OH ends. The 3adapter is subsequently ligated to the RNA using T4 RNL II KQ truncated, followed by reverse transcription and amplication via
PCR.
Intensity of the probing signal
We ltered for reads that mapped uniquely in proper pairs
and subsequently considered only hits that mapped to non-
overlapping annotated regions (bedtools v2.27.1 (49)) to
ensure an unambiguous signal. In the 2-3-cP libraries, the
last nucleotide of a ligated RNA fragment represents the po-
sition immediately upstream of the cleavage site in the RNA
backbone. Correspondingly, the rst nucleotide of a frag-
ment in 5-OH libraries represents the position downstream
of the cleavage site. The raw probing signal was obtained for
each nucleotide of the transcriptome by counting the num-
ber of read starts (or start-1, respectively) at that position.
Normalization of the raw signal was performed separately
for each transcript. According to Low and Weeks (50), we
divided the raw read count by the average count of the 90th
to 98th percentile of the signal. This is motivated by consid-
ering the largest 2% of the signals as outliers. We denote the
normalized signal for position iby Si. Its range is limited to
the interval [0,7] because very high values of outliers were
capped at 7. Where applicable, mean values of replicates
were used for all downstream analyses. The workow for
the computation of the normalized probing signal from raw
sequencing reads is implemented as a snakemake v3.13.3
pipeline (51), which is available at github.com/xamiiii/Led-
Seq.
Estimating the probability to be unpaired
We use a bayesian approach to estimate the probability qi
that position iis unpaired based on the normalized signal
Si. To this end, we employ a collection of reference struc-
tures comprising 32 RNAs of lengths 74 nt – 682 nt. This
set includes non-coding RNAs that belong to Rfam fami-
lies (52) and that are sufciently represented by our data,
see coverage criteria below. The small-subunit rRNA (16S)
and the large-subunit rRNA (23S) were divided into smaller
domains as described (53,54). Secondary structures for the
resulting 40 sequences were taken from the RNAcentral
data base (see Supplementary Table S1 and Supplementary
Information for full details).
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
6Nucleic Acids Research, 2023
Figure 2. Conversion of normalized probing signals Sto probabilities of
being unpaired. Top: One-dimensional function p(S) estimated separately
for 2,3
-cP (left) and 5-OH libraries (right). Different ts were used
for cleavage at CA dinucleotide positions (orange). Grey lines indicate
the intervals (‘bins’) within which the signals were pooled. Below: Two-
dimensional t p(ScP,SOH ) combining the information of both libraries.
Again CA dinucleotides (right) were treated separately.
Denote by nu(S)andn(S), the number of unpaired po-
sitions and the total number of positions, respectively, that
exhibit a normalized signal in the calibration set that falls
within a bin, i.e. an interval of signal values, centered at S.
The probability that a position with a signal that falls within
this bin is unpaired can then be estimated by
p(S):P[unpaired|S] ≈nu(S)
n(S) (1)
In a more elaborate model, we combine the two signals ScP
and SOH of the 2,3
-cP and 5-OH libraries. We then con-
sider
p(ScP,SOH):P[unpaired|S] ≈nu(ScP,SOH)
n(ScP,SOH )(2)
where nu(ScP ,SOH)andn(ScP,SOH ) are the counts of un-
paired and all positions, respectively, in the calibration set
that have signal values for the 2,3
-cP and 5-OH libraries
within intervals centered at ScP and SOH, respectively. In
order to reduce the effects of inaccuracies in the reference
structures and other noise in the data we approximate p(S)
and p(ScP,SOH ) by tting sigmoidal functions of the form
p(S)=(1+exp(−aS+b))−1+c
p(ScP,SOH)=1+exp(−a1ScP −a2SOH +b)−1+c
(3)
to the binned data, see Figure 2. To estimate parameters
a,b,c,anda1,a2,b,c, respectively, we used the function
curvefit of the python v3.6.12 library scipy v1.5.2.
Since cleavage sites in CA dinucleotides were found to be-
have differently compared to the other dinucleotides, differ-
ent parameters were tted for this special case. As a control
we randomized the sequence positions and, as expected, ob-
tained a at response curve, see Supplementary Figure S1.
The 2,3-cP library produces sequence fragments that are
too short for reliable mapping close to the 5end of each
transcript. Consequently, no signal ScP exists for the rst
11 nucleotides of each transcript. Analogously, the 5-OH
library is uninformative close to the 3end (last 12 nt). Thus
we use p(SOH
i)orp(ScP
i) for positions iat the ends of a tran-
script and p(ScP
i,SOH
i) for its interior.
Secondary structure prediction with probing data
The conversion of the probability of being unpaired, i.e. qi=
p(Si)orqi=p(ScP
i,SOH
i), into a pseudo-energy in essence
follows the scheme proposed by Zarringhalam et al. (55).
However, we associate pseudo-energies only with the un-
paired nucleotides. It is not difcult to see that it sufces
to associate pseudo-energies only with paired or only with
unpaired nucleotides as long as one is not interested in the
absolute value of the partition function, see (12).
To incorporate the probing data into secondary structure
prediction algorithms, we converted qiinto pseudo free en-
ergy terms that can be interpreted as log-likelihoods for nu-
cleotides being unpaired (56). In addition, we compensate
for the fact that the aprioriprobability p0that a base is un-
paired differs from 1/2. This yields
GPb2+,i=−RT ·c·ln qi
1−qi
−ln p0
1−p0(4)
where Ris the gas constant, Tis the absolute temperature,
and cis a constant that allows to tune the relative impor-
tance or trust in the probing data. Throughout this contri-
bution, we used c=1.2. The value of p0=0.42 was de-
termined from the calibration set. The position-dependent
pseudo-energies are used as soft constraints (12) in the pro-
gram RNAfold of the ViennaRNA package v2.4.15 (57)as
described (11). A detailed structural analysis requires suf-
cient cleavage signal across the transcript. We require rather
stringent conditions: (i) at least 75% of a transcript must be
represented by reads, and (ii) there must be at least 2.5 read
starts per position on average.
To assess prediction quality, we calculated positive pre-
dictive value (PPV), sensitivity (SEN) and the Matthews
correlation coefcient (MCC) (denitions see Supplemen-
tary Information). Plots were generated using the python
package matplotlib v3.3.1. Secondary structures with
mapped pseudo-energies were visualized using the forna
Web serve r ( 58).
RESULTS
Lead probing protocol
Metal ion cleavage is an established approach to probe
RNA structure (28,59). Recently, Twittenhoff et al. showed
that in vivo probing with lead(II) ions coupled with next
generation sequencing is suitable to investigate RNA struc-
tures on a transcriptome-wide level (27). Here, we present
a novel lead-based approach where NGS adapters are se-
lectively ligated to the resulting fragments of lead-induced
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 7
Figure 3. Proportion of reads mapping to indicated RNAs in 2,3
-cP (left)
and 5-OH libraries (right) for samples treated with (+) or without (−)
Pb2+.nc=non-coding, n/a=not annotated
cleavage (see Figure 1). Accordingly, both cleavage ends (2,
3-cP and 5-OH) are converted into sequencing libraries.
During the development of this method, we noticed two
major side products in the sequencing data of the 2,3
-
cP libraries, both identied as results from the circulariza-
tion reaction with TS2126 RNL. These originated from left-
over 3adapter and circularization RT-Primer in the reverse
transcription reaction which subsequently gave rise to the
presence of three RT-Primer based DNA species. In addi-
tion to circularization RT-Primer+UMI+cDNA, also cir-
cularization RT-Primer+UMI and RT-Primer alone were
present, despite performing a size selection via preparative
denaturing PAGE after reverse transcription. These side
products caused a considerable loss in usable sequences.
To reduce these side products, we combined two strategies.
First, cDNA was internally labeled with biotin-dCTP and
extracted using magnetic streptavidin beads after circular-
ization to eliminate unused circularization RT-Primer. Sec-
ond, as PNA oligonucleotides exhibit a very tight binding to
complementary DNA sequences with high thermal duplex
stability (43,60), we designed a PNA clamp binding to the
constant 3-region of the UMI-containing oligonucleotide
and to the 5adapter-derived cDNA sequence. These re-
gions are only adjacent if the circularization RT-Primer
elongated by the sequence of the 3adapter was circularized
without insert. Hence, the PNA clamp blocked the ampli-
cation of this side product. Both strategies led to a drasti-
cally reduced formation of by-products (see Supplementary
Figure S2, Supplementary Figure S3).
We generated two independent biological replicates of
both Pb2+(+) libraries. In addition, negative controls in the
absence of lead (Pb2+(–)) were prepared and sequenced.
One of these controls is shown here as a representative ex-
ample. After read preprocessing, mapping and removal of
duplicated reads, we obtained 4.8 – 16.8 million uniquely
mapping reads per library (see Supplementary Table S2).
Figure 3summarizes the composition of mapped read ends
in terms of the annotated biotypes. As expected, the ma-
Figure 4. Distribution of normalized probing signal Sat unpaired (orange)
and paired (blue) positions of the calibration set. Signal is higher at un-
paired sites in the 2,3
-cP (left) and 5-OH libraries (right).
jority of reads originates from rRNA and tRNAs. The dis-
tributions are similar for lead-treated and negative control
libraries and also differ only moderately between the 2,3
-
cP and 5-OH libraries.
Between 221 and 465 RNAs in the Pb2+(+) libraries (and
104 to 171 RNAs in the Pb2+(-) libraries) are covered suf-
ciently by reads to meet our criteria for structural analysis
(described in Methods section). Of these, we set up a bench-
mark set of 32 transcripts that we analyzed in more detail.
We found that probing signals are highly reproducible with
a Pearson correlation coefcient of 0.83 for the two 2,3-
cP libraries and 0.80 for 5-OH libraries (calculated over all
sufciently covered transcripts).
Signal corresponds to structure
Based on reference structures, we investigated distributions
of the signal for paired and unpaired nucleotides separately
(Figure 4). Consistent with our expectations, unpaired posi-
tions display signicantly higher cleavage levels (two-sided
Mann–Whitney U test p=3.0 ×10−210 for 2,3
-cP and p
=2.9 ×10−200 for 5-OH). We also observed that most po-
sitions exhibit low signal regardless of the structure. High
signal values are therefore informative for unpaired posi-
tions. However, not all nucleotides that are unpaired in the
reference structures are associated with high cleavage activ-
ity. Thus, low signals do not provide a reliable predictor for
pairedness.
To visualize the correlation between probing intensity
and structure, the prole for tRNALeu(CAG) is displayed in
Figure 5. D-loop, anticodon loop, variable loop, T-loop as
well as the unpaired 3end consisting of the CCA sequence
and the discriminator base are reected in the peaks of 2,3-
cP as well as 5-OH libraries. Since we capture both RNA
cleavage fragments in our protocol, the information for the
entire transcript is retained even though both library types
have one ‘blind’ end of 11 (or 12) nucleotides. To further in-
vestigate and quantify the structural information within the
probing signal, we converted the scores into probabilities to
be unpaired qi,seeFigure2in the Methods section.
Since the mapped reads determine the exact cleavage po-
sition, we considered the possibility of sequence-specic
biases. We indeed found a strong bias towards CA din-
ucleotides and a milder bias towards UA in 2,3-cP li-
braries (Figure 6). This observation cannot be explained
by genomic overrepresentation of these dinucleotides or the
inuence of only a few outliers (see Supplementary Fig-
ure S4). Cleavage between C and A (but not between U and
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
8Nucleic Acids Research, 2023
Figure 5. Probing signal prole of tRNALeu(CAG) reects the secondary
structure. tRNALeu(CAG) signal prole from (top) 2,3
-cP and (bottom) 5-
OH libraries. Depicted are both biological replicates as dots and a savitzky-
golay smoothing curve for visualization only. Coloring in orange or blue
according to unpaired or paired status within the structural reference.
Figure 6. Read fractions (in %) sorted by dinucleotide composition at the
cleavage site in 2,3-cP and 5-OH libraries. Adenine – A, uracile – U, gua-
nine – G, cytosine – C.
A) is clearly less correlated to an unpaired structural state
compared to all other dinucleotides. We account for this ef-
fect by estimating the probability of unpairedness qisepa-
rately for CA and all other dinucleotides, see Figure 2above.
The deviations observed for UA in the 2,3-cP libraries were
small enough to be neglected.
Comparison of Pb2+(+) and Pb2+ (–) libraries
The signal proles of Pb2+-treated and non-treated sam-
ples in a subregion of the non-coding RNase P RNA are
shown as an example in Figure 7. RNase P RNA harbors
several high afnity metal ion binding sites that were iden-
tied already in previous studies (26,61,62). Two major (Ia
and Ib) and one minor (IIa) high afnity metal ion bind-
ing site that are present in the illustrated subregion yielded
strong signals in the Pb2+-treated but not in the H2O control
libraries (Figure 7A, B; see Supplementary Figure S6 and
accompanying supplementary text for further details). Like-
A
B
CD
Figure 7. RNase P RNA signal prole (nucleotide positions 101-200) com-
paring samples treated with (+) or without (–) Pb2+ from (A)2
,3
-cP and
(B)5
-OH libraries. The proles for Pb2+-treated samples (Pb2+(+), repli-
cate 1 and 2) and for the H2O control (Pb2+(–)) are superimposed in dif-
ferent colors. Paired and unpaired nucleotides are indicated by small cyan
(paired) and large ocher (unpaired) spheres below the proles according to
the structure shown in Supplementary Figure S6A; Roman numerals that
mark prominent cleavage sites and nucleotide numbering (x-axis) and as in
Supplementary Figure S6A. (C) Correlation between probing signals for
Pb2+(–) and of Pb2+ (+) 5-OH libraries in the benchmark set. (D) Function
p(S)ofPb
2+(–) libraries reveals their structural content.
wise, Figure 7C illustrates that there are many pronounced
peaks in the proles after lead treatment that are absent
in the control samples. These ndings emphasize the suc-
cessful application and utility of lead-induced cleavage in
combination with end-specic ligation reactions to deter-
mine RNA secondary structure. The high quality of the
data is illustrated in detail in Supplementary Figure S6B.
We noticed that the Pb2+(–) libraries were also highly
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 9
correlated with both Pb2+(+) signal and the unpaired posi-
tions of the known reference structures (Figure 7). Overall,
the distribution of normalized signals in the H2O-treated
sample is similar to that of the Pb2+-treated samples (Sup-
plementary Figure S5A), although the separation of un-
paired and paired positions is less pronounced compared
to Figure 4. We nd that the patterns are close matches for
both the 2,3
-cP libraries (Figure 7A) and the 5-OH li-
braries (Figure 7B). The normalized signals are also highly
correlated across the entire data set (Figure 7C, Supple-
mentary Figure S5B). This similarity between Pb2+-treated
and H2O-treated libraries persists when considering the un-
paired positions of our calibration set. As shown in Fig-
ure 7D (Supplementary Figure S5C), the Pb2+(–) signal can
also be converted into a probability qithat behaves simi-
larly to the one obtained from the Pb2+-treated samples. We
therefore tested whether an additional ‘bonus energy’ based
on Pb2+(–) as independent source of information results in
an increase of secondary structure inference over the use of
Pb2+(+) only. Since this was not the case, the negative con-
trol libraries were not used in further downstream analysis.
Experimental signal improves RNA secondary structure
prediction
Probing data by denition offer only information whether
a given nucleotide is paired or unpaired and therefore pro-
vide experimental constraints on RNA secondary struc-
tures rather then determining the structure unambiguously.
We therefore assessed the usefulness of the Led-Seq data
by testing whether they are capable of improving the sec-
ondary structures over computational predictions based on
the rules of the thermodynamic standard model. To this
end, we calculated pseudo-energy contributions for posi-
tions with qigreater than the average probability to be un-
paired p0. These values served as soft constraints for the
computation of MFE structures with RNAfold.
We observed that for the vast majority of the transcripts
in our benchmark set there is indeed an improvement. More
precisely, the MFE structures computed with the experi-
mentally determined pseudo-energies are close to the ref-
erence structures (Figure 8). All reference structures from
RNAcentral as well as predictions by construction do not
include pseudoknots. Pseudoknots therefore appear as ‘un-
paired’ in the reference structures, see e.g. Supplementary
Figure S6.
Table 1summarizes the results quantitatively in terms of
the quality metrics PPVs, SENs, and MCCs. Notably, ap-
plying the experimental signal of only one type of prob-
ing library (2,3
-cP or 5-OH) already yields substantially
increased prediction accuracy. This demonstrates the high
performance of both libraries individually. As expected,
best accuracy is achieved by incorporating both libraries. In
particular, improvements close to both ends of a transcript
require both libraries since each library type is informative
only for either the 5or the 3terminal region. Figure 9
gives three illustrative examples (5S rRNA, 23S rRNA (do-
main IV) and tRNAIle(GAU)) showing how incorporation of
experimental pseudo-energies guides structure prediction.
In particular, positions engaged in false-positive (stacked)
base pairs predicted by the thermodynamic model are re-
Figure 8. Matthews correlation coefcient (MCC) for structures predicted
with or without experimental signal in the form of soft constraints based
on both libraries. Known references served as ‘correct’ structures. Dots re-
ect one transcript of the benchmark set. Positions above the diagonal cor-
respond to improved structure prediction. For 6S RNA and tRNATyr (la-
beled by *), the positions indicate a lower correlation of the mapped struc-
tures with the reference structures. This, however, is the result of structural
artifacts in the references that do not correspond to experimental ndings.
Table 1. Assessment of secondary structure prediction accuracy. Mini-
mum free energy (MFE) structureswere calculated for all benchmark RNA
sequences with and without experimental data as soft constraints. Apply-
ing only the signal from 2,3
-cP or 5-OH libraries yields higher predic-
tion quality. Incorporating probing data from both library types achieves
highest precision. PPV positive predictive value, SEN sensitivity, MCC
Matthews correlation coefcient
PPV SEN MCC
No constraints 0.62 0.70 0.66
Soft constraints - 2,3
-cP libraries 0.77 0.78 0.77
Soft constraints - 5-OH libraries 0.79 0.79 0.79
Soft constraints - both libraries 0.82 0.81 0.81
arranged towards an open structure upon inclusion of ex-
perimental evidence by means of pseudo-energy contribu-
tions. Prediction accuracy decreases in only three cases (Fig-
ure 8, Supplementary Figure S7). In all three cases, major
parts of the mapped structure agree with the reference. Dif-
ferences are conned to additional base pairs at positions
without cleavage signal and unpaired regions in our pre-
dictions that are shown as paired in the reference but show
large cleavage signals. Considering that the reference struc-
tures are derived from computationally obtained consensus
structures (63), these three cases can be attributed mostly to
issues with the reference structures. In the case of 6S RNA,
it is known that upon transcription, this molecule is restruc-
tured and forms a hairpin between positions 132 and 152
(64–66) (Supplementary Figure S8), which is clearly visi-
ble in our probing analysis. The reference structure, how-
ever, does not represent this element. For tRNATyr , Led-Seq
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
10 Nucleic Acids Research, 2023
A
BC
Figure 9. Effect of Led-Seq data on secondary structure prediction of three examples from the calibration data set: (A) domain IV of the 23S rRNA, (B)5S
rRNA and (C)tRNA
Ile(GAU). In each case we show the structure predicted by the thermodynamic model (left), the structure obtained by incorporating the
probing data as pseudo-energies in RNAfold (middle) and the reference structure taken from the RNAcentral data base. In each case, the incorporation
of the probing data pushes the structure closer to the reference. Note that negative pseudo-energies stabilize the unpaired state of nucleotide positions and
thus led to more open secondary structures. Base pairs that deviate from the reference structure are highlighted in red.
shows an open conformation for the variable loop, while it is
base-paired in the reference. As the resulting terminal loop
consists of three individual residues, it is very likely that this
poses a considerable tension on the short helical part that,
due to the central U-A pair, is not highly stable. As a conse-
quence, this region probably shows a certain fraying, result-
ing in a single-stranded conformation that is recognized by
Led-Seq. For more details, see Supplementary Information.
mRNA analysis
The secondary structures of mRNAs in the vicinity of the
translation start site have repeatedly been reported to be
subject to selective pressures. Such effects can be detected
by considering patterns of accessibility, i.e. the prole of the
position-specic unpairedness qi,seee.g.(23,27,67,68).
We collected all mRNAs for which a stretch of 50 nu-
cleotides upstream of the AUG start codon does not inter-
sect another annotated gene and that are sufciently cov-
ered by the probing data (n=89 for 2,3-cP libraries and n
=146 for 5-OH libraries). Sequences were aligned at AUG
start codons and mean values were calculated for every po-
sition in the 5untranslated region (UTR, positions −48
to −1) and the beginning of the coding sequence (CDS,
positions 1 to 180, 60 codons). As there is no valid sig-
nal for the rst 11 nucleotides within the 2,3-cP libraries,
these positions where excluded. The resulting proles are
displayed in Figure 10. The signal shows a pronounced peak
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 11
Figure 10. Mean normalized probing signal of coding sequences (CDS)
and 5untranslated regions (5-UTR) calculated for mRNAs aligned at
start codon from (top) 2,3-cP libraries (n=89) and (bottom) 5-OH li-
braries (n=146). Included positions range from −48 to 180 with position
1 corresponding to the 5-position of the AUG start codon. Dashed lines
represent mean values for the respective regions.
at position −1, i.e. just before the start codon, indicating an
open conformation at this site. There is a (local) minimum
around position −9. This area coincides with the location
of the ribosomal binding site (Shine-Dalgarno sequence).
Further upstream around position −17, we observe a lo-
cal maximum. The rst 60 nt of CDSs appear to be less
structured than 5-UTRs (2,3-cP libraries: mean 5-UTR
=0.38, mean CDS1-60=0.45, two-sided t-test p=0.05; 5-
OH libraries: mean 5-UTR =0.43, mean CDS1-60=0.48,
two-sided t-test p=0.001). However, after this stretch of 20
codons, the signal drops again to a consistently lower level
(2,3
-cP libraries: mean CDS61-180=0.35, two-sided t-test p
=2.0 ×10−9;5
-OH libraries: mean CDS61-180=0.42, two-
sided t-test p=6.3 ×10−11). We further assessed the mean
structural signal of rst, second and third positions within
codons. Triplets were found to exhibit a signicant period-
icity in CDSs. No periodicity is detectable in the 5-UTRs.
Both libraries suggest that, on average, last codon positions
are less structured than rst codon positions (see Supple-
mentary Figure S9).
DISCUSSION
Utility and limitations of probing methods for RNA structure
elucidation
Here we show that Led-Seq is capable of probing RNA
secondary structures in vivo with highly reproducible re-
sults between biological replicates. A major advantage of
the method is that the interrogation of both cleavage prod-
ucts in the 2,3-cP and 5-OH libraries not only increases
the reliability of the data and provides an internal quality
control, but also ensures that the full length of transcripts
can be probed in a high throughput setting.
As with other probing methods, including SHAPE,
lead cleavage does not unambiguously distinguish between
paired and unpaired positions but provides quantitative
evidence that can be converted into a probability that a
nucleotide is unpaired. We emphasize that this is not a
methodological shortcoming but an inevitable consequence
of the fact that RNAs form a free-energy weighted ensemble
of structures rather than a single, unambiguous secondary
structure (8,69,70). Indeed, recently methods have become
available that deconvolve multiple representative structures
from a probing signal (70–72). The ‘known’ reference struc-
tures are therefore necessarily approximations rather than a
perfect gold standard.
The probing signal is confounded further by the fact that
the chemical reactions underlying a probing method are in-
uenced by the detailed local conformation of the target,
metal ion or protein binding, and other factors that go be-
yond the pairing status of nucleotides. It is therefore not
surprising that the lead probing signal does not distinguish
paired from unpaired positions in an all-or-nothing fashion.
Instead, one observes distributions of signals Sthat are bi-
ased towards larger signals for unpaired positions. The dis-
tribution of lead signals in Figure 4are indeed very simi-
lar to corresponding distributions for SHAPE data, see e.g.
(73).
A general issue of probing methods is that low signals,
here for paired nucleotides, cannot be distinguished from
missing data due to other causes, such as limited accessibil-
ity to the reagent, see e.g. (74,75). Based on this ambiguity,
we cannot safely use low values of Sand thus p(S) as sup-
port for pairedness. We therefore include pseudo-energies
only if p(S)>p0, i.e., if the signal provides evidence that a
position in unpaired. Lifting this restriction indeed did not
improve the prediction quality.
We observed that the incorporation of probing data did
not result in improved predictions in a subset of the bench-
mark structures, and in a few cases the prediction accu-
racy even decreased, albeit only slightly. This is not unex-
pected. First, in many cases the thermodynamic model pro-
duces rather accurate structures even without additional ex-
perimental evidence (76,77). In these cases, additional ex-
perimental data conrm rather than modify the structure.
Moreover, the reference data from RNAcentral have been
obtained with the help of templates that, in turn, are in-
formed either by in vitro structures obtained from NMR or
X-ray data, or have been constructed as consensus struc-
tures over a large set of phylogenetically related molecules
(63). They cannot account for uctuations in the structures
of actual RNAs with (temporarily) open segments as im-
plied by probing data of the transcripts with decreased fold-
ing accuracy. Taken together, it is reassuring that inclusion
of the probing data increases consistency with the reference
models. At the same time there is no reason to expect the
probing data to reproduce the reference structure perfectly.
Figure 8can thus be interpreted as strong support for the
proper functioning and usefulness of Led-Seq.
Negative controls as information source in Led-Seq
We observed that negative controls without lead treatment
(Pb2+(–)) produce a signal that is similar to the Pb2+(+) li-
braries. This observation can be explained by the reactive in-
tracellular environment. For example, [Zn(H2O)5OH]+and
[Cu(H2O)5OH]+with pKavalues of 8 to 9 (78) were also
showntobeabletohydrolyzeE. coli RNase P RNA at neu-
tral pH (79). Considering that Zn2+ and Cu2+ are natural
trace elements, it is reasonable to assume that RNA frag-
ments generated by endogenous transition metal ion hy-
drates entered the 2,3-cP and 5-OH libraries of the H2O
control samples. Although lead-treated libraries are more
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
12 Nucleic Acids Research, 2023
informative owing to the identication of additional cleav-
age sites and a better separation of paired and unpaired sig-
nals, the Pb2+(–)-libraries also convey structural informa-
tion. This implies that the Pb2+(–) libraries are not genuine
background controls, thus questioning the concept to deter-
mine differences or ratios such as SPb2+/SH2Ofor quantify-
ing the results of in vivo lead probing experiments. We em-
phasize that the lack of a ‘control’, or rather a background
signal is not an issue that might invalidate the method. In
fact, genome-wide data sets often include sufcient infor-
mation on the background already in the foreground data
to render negative controls redundant. In the case of Led-
Seq, the relevant information on the background signal is
implicitly provided by the distribution of normalized signals
for paired positions in the reference structures. We therefore
advocate to utilize the control libraries in Led-Seq experi-
ments as an additional source of RNA structure informa-
tion and to assess the quality and integrity of Led-Seq data
by computing the distributions of normalized signals and
the functions p(S) using a few reference structures, prefer-
ably well-characterized non-coding RNAs.
Bias for cleavage between pyrimidine and adenosine in 2,3-
cP libraries
In general, RNA cleavage induced by Pb2+ has been re-
ported to have no overall sequence bias (27). Nevertheless,
a considerable bias towards the representation of cleavage
sites between CA, and to a lesser extent UA, dinucleotides
was evident in our 2,3-cP libraries. At present, we can
only speculate about the origin of this effect. In several
small RNA sequencing libraries, ligation biases were de-
scribed, and the secondary structures of the ligation part-
ners seem to represent a major cause of such biases (80).
In our libraries, however, the CA (UA) bias is presum-
ably not induced by secondary structures, because the re-
maining three dinucleotides starting with C or U did not
show this behavior. In addition, we did not observe any
sequence-related effect in ligation efciency. Theoretically,
the observed pattern could also be caused by an endonu-
clease with a specic recognition sequence that leaves 2,3-
cP residues after cleavage. E. coli expresses such enzymes,
most notably MazF (81,82). In a previous study targeting
cyclic phosphate-containing RNAs in mice, a similar effect
was observed predominantly in mRNA and attributed to
a hypothetical enzymatic cleavage event (83). In following
studies, the authors reported a similar effect in human cell
lines and explained it as the product of ANG cleavage (84).
In Bombyx mori cells, cleavage with BmRNase (85)was
identied as a possible cause. If a similar mechanism is at
work on E. coli RNA, we would expect to observe the over-
representation of the CA dinucleotide in both, 2,3-cP and
5-OH libraries. This, however, is not the case.
Hence, the molecular basis for this phenomenon is cur-
rently unclear. To our knowledge, no bacterial enzyme is
known that cleaves RNAs specically at pyrimidine-A din-
ucleotides leaving 2,3-cP but not 5-OH groups (82). The
effect is also not caused by an increased occurrence of the
CA dinucleotide in the E. coli transcriptome and our data
do not contain one or a few ‘hotspot’ RNAs whose over-
representation might explain the CA bias. Interestingly, UA
and CA phosphodiester-bonds have been described as more
susceptible to hydrolysis in general. However, this effect was
later reported to be too small and unsystematic and also
highly dependent on the other neighboring nucleotides in
the investigated oligomers. Eventually, it was attributed to
other structural effects, such as stacking interactions, that
enhance or reduce cleavage (86–89). Further investigation
is needed to determine the cause for this dinucleotide bias
in 2,3-cP libraries. From another perspective, our observa-
tion of a CA bias in the 2,3-cP libraries, but essentially not
in the 5-OH libraries, again illustrates the strength of our
dual Led-Seq approach. The observed discrepancy suggests
unresolved technical reasons and prevented us from draw-
ing premature conclusions on the biological signicance of
this nding.
Analysis of mRNA structures
Led-Seq can also be used to evaluate mRNA structure.
Lead treatment of cells resulted in a larger relative abun-
dance of reads mapping to mRNAs in both libraries (Fig-
ure 3). The coverage of mRNAs (or other non-coding
RNAs) could be improved further by implementing an
rRNA depletion step. Since we aimed to demonstrate the
general applicability of Led-Seq in this work, we did not in-
clude any selection step for certain RNAs, especially since
rRNAs and tRNAs are also part of our benchmark set. In
rst experiments on the applicability of commercially avail-
able rRNA depletion kits we could already observe that
both RNase H-based (NEBNext®, NEB) and bead-based
(riboPOOLs, siTOOLs) kits are compatible with our ap-
proach and allow a considerable reduction of the rRNA
content (data not shown). With the possibility of an indi-
vidual adaptation of the used probes to other targets, such
as tRNAs, we expect a general applicability of these kits for
the depletion of a variety of RNAs. Nevertheless, the appli-
cation of these methods can also bring disadvantages. For
example, RNase H-based depletion methods have already
been shown to display off-target effects that can negatively
impact ribosome proling data (90). Averaging the normal-
ized probing signals of mRNAs aligned at the start codon
(position 1) revealed a local minimum around −9 nt and a
local maximum around −17nt in both libraries. These sig-
nals are remarkably consistent with earlier reports based
on parallel analysis of RNA structure (PARS) probing data
(67). Del Campo et al. postulated that the unstructured re-
gion 20 nt upstream of the start codon serves as a non-
specic docking site of the 30S ribosomal subunit and de-
scribed it as a general feature of E. coli genes. They inter-
preted the low signal near nucleotide position -10 as an
effect of the Shine-Dalgarno sequence. We also observed
a substantially increased signal immediately preceding the
start of CDS, implying an open conformation. This obser-
vation also conforms to previous ndings (67,68,91). Com-
parison of the average signal intensity of UTRs and CDSs
showed a signicant increase in the rst 60 nt of the CDS.
We also noticed a periodicity of the signal in the mRNA
coding regions, while such a signal is absent in the UTR.
Interestingly, the 2,3
-cP mapping shows a signicantly
higher signal for every 3rd position of a codon. In con-
trast, 5-OH data show no difference between 2nd and
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 13
3rd position, but a signicantly lower signal for the rst po-
sition. Taken together, this leads to the conclusion that the
third codon position is more susceptible to Pb2+-induced
cleavage than the rst position and thus appears less likely
to be ‘structured’. The same effect was already observed in
PARS data of E. coli (67), as well as in lead-based struc-
ture analysis in Yersinia pseudotuberculosis (27). Other stud-
ies also recognized a periodic pattern in eukaryotic mRNA.
However, in DMS-seq data of A. thaliana (68), PARS data
for S. cerevisiae (23), and CIRS-seq data for mice (92), the
rst nucleotide was least likely to be ‘structured’. Intrigu-
ingly, no periodicity was observed in SHAPE-MaP data for
E. coli (93). The authors provided several explanations for
this discrepancy, which we would like to address with an
emphasis on our approach: The authors argued that the
RNases used in the PARS approach are known to exhibit a
certain sequence bias in their cleavage efciency. In contrast,
Pb2+-induced cleavage occurs sequence-independent (27).
Another potential cause are artifactual signals generated ei-
ther by methodical or cellular processes. While we cannot
rule out a methodical cause entirely, we would not expect to
see a periodical pattern restricted to coding regions. More-
over, cotranslational decay by exonucleases as described by
Pelechano et al. (94) would result in fragments that are not
mapped by our approach. To our knowledge, no known ex-
onuclease leaves 2,3-cP and 5-OH ends. The periodicity
effect observable in our data is also not assignable to in vitro
probing conditions, as we used an in vivo probing approach.
The only suggested explanation for the mentioned discrep-
ancy between the methods, that applies to Led-Seq, is that
it relies on the enzymatic ligation of adapters to cleavage
fragments to infer structural information. Therefore it is,
in principle, prone to a bias based on the ligation proper-
ties of the used enzymes. Despite the enhanced CA dinu-
cleotide cleavage described above, where a ligation bias can
be excluded, we could not identify a ligation bias matching
the average nucleotide composition of the investigated mR-
NAs. The periodicity of the genetic code is a known feature
based on the fact that in coding sequences no truly random
nucleotide/triplet composition is present due to the inher-
ent bias of amino acid-coding triplets (95). Our results sug-
gest that the observed structural periodicity is in fact caused
by this intrinsic feature of coding sequences in DNA and
therefore in mRNA. It is conceivable that the small size al-
lows the lead ions to enter actively translating ribosomes,
where they might have access to the mRNA codons engaged
in tRNA binding. The weak wobble interaction at codon
position three would then be less protected from cleavage,
resulting in the observed periodicity pattern. Further in-
vestigation is needed to explain the discrepancy between
SHAPE-MaP results and other previous ndings concern-
ing CDS structural periodicity.
CONCLUSION
The Led-Seq approach described here offers the unique
advantage that both sites of the metal ion-induced cleav-
age position in RNA are mapped, increasing the reliabil-
ity of the observed signals. Furthermore, the interrogation
of both cleavage sites allows for the structural analysis of
RNA regions close to the 5and 3ends of transcripts, as
the two separate libraries can mutually compensate for in-
formation loss at the end of transcripts, which is caused by
inaccurate mapping of short cDNAs to the genome. This
poses an advantage over other sequencing based methods
that lose the 3-end information because of a missing com-
pensation option. While mutational proling approaches
elegantly circumvent that information loss, RT reactions
generally represent an error-prone enzymatic step and in-
herent RT stops and nucleotide misincorporations may re-
sult in a loss of information. Using our double-end ap-
proach, we minimize articially introduced signals by a re-
dundant design of the method. Nonetheless, Led-Seq and
SHAPE based approaches complement each other well, as
they involve entirely different chemistries. An additional ad-
vantage resulting from the use of metal ion-based cleav-
age is its potential applicability for the in vivo investigation
of the structurome of psychrophiles and thermophiles, as
the exploited probing reaction is theoretically suitable for
a wide range of temperatures. Taken together, the double-
end structure investigation of Led-Seq represents a very use-
ful approach to characterize RNA structures in vivo as well
as in vitro, expanding our technical arsenal to investigate
structure–function relations of RNA.
DATA AVAILABILITY
The data for this study have been deposited in the European
Nucleotide Archive (ENA) at EMBL-EBI under acces-
sion number PRJEB58715, see www.ebi.ac.uk/ena/browser/
view/PRJEB58715. A computational pipeline is accessi-
ble at github.com/xamiiii/Led-Seq and https://doi.org/10.
5281/zenodo.7821447.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
ACKNOWLEDGEMENTS
We thank Maja Etzel and Christian Lorenz for pilot stud-
ies, Tobias Friedrich for expert technical assistance and
Christina E. Weinberg, V. Janett Olzog and Leif A. Kirse-
bom for valuable scientic discussion. Special thank goes to
K. Moon and J. Page for acronym inspiration.
FUNDING
This work was supported by Deutsche Forschungsgemein-
schaft [MO 634/18-1, STA 850/48-1, GRK 2355 to RKH].
Funding for open access charge: Open Access Publishing
Fund of Leipzig University supported by the Deutsche
Forschungsgemeinschaft within the program Open Access
Publication Funding.
Conict of interest statement. None declared.
REFERENCES
1. Guerrier-Takada,C., Gardiner,K., Marsh,T., Pace,N. and Altman,S.
(1983) The RNA moiety of ribonuclease P is the catalytic subunit of
the enzyme. Cell,35, 849–857.
2. Kieft,J.S., Rabe,J.L. and Chapman,E.G. (2015) New hypotheses
derived from the structure of a aviviral Xrn1-resistant RNA:
conservation, folding, and host adaptation. RNA Biol.,12, 1169–1177.
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
14 Nucleic Acids Research, 2023
3. Serganov,A. and Patel,D.J. (2007) Ribozymes, riboswitches and
beyond: regulation of gene expression without proteins. Nat. Rev.
Genet.,8, 776–790.
4. Vicens,Q. and Kieft,J.S. (2022) Thoughts on how to think (and talk)
about RNA structure. Proc. Natl. Acad. Sci. U.S.A.,119,
e2112677119.
5. Thirumalai,D., Lee,N., Woodson,S.A. and Klimov,D.K. (2001) Early
events in RNA folding. Annu. Rev. Phys. Chem.,52, 751–762.
6. Turner,D.H. and Mathews,D.H. (2010) NNDB: the nearest neighbor
parameter database for predicting stability of nucleic acid secondary
structure. Nucleic Acids Res.,38, D280–D282.
7. Zuker,M. and Stiegler,P. (1981) Optimal computer folding of larger
RNA sequences using thermodynamics and auxiliary information.
Nucleic Acids Res.,9, 133–148.
8. McCaskill,J.S. (1990) The equilibrium partition function and base
pariring probabilities for RNA secondary structures. Biopolymers,29,
1105–1119.
9. Sun,L., Fazal,F.M., Li,P., Broughton,J.P., Lee,B., Tang,L., Huang,W.,
Kool,E.T., Chang,H.Y. and Zhang,Q.C. (2019) RNA structure maps
across mammalian cellular compartments. Nat. Struct. Mol. Biol.,26,
322–330.
10. Wang,X.-W., Liu,C.-X., Chen,L.-L. and Zhang,Q.C. (2021) RNA
structure probing uncovers RNA structure-dependent biological
functions. Nat. Chem. Biol.,17, 755–766.
11. Lorenz,R., Luntzer,D., Hofacker,I.L., Stadler,P.F. and
Wolnger,M.T. (2016) SHAPE directed RNA folding. Bioinformatics,
32, 145–147.
12. Lorenz,R., Hofacker,I.L. and Stadler,P.F. (2016) RNA folding with
hard and soft constraints. Alg. Mol. Biol.,11,8.
13. Spasic,A., Assmann,S.M., Bevilacqua,P.C. and Mathews,D.H. (2018)
Modeling RNA secondary structure folding ensembles using SHAPE
mapping data. Nucleic Acids Res.,46, 314–323.
14. Gilmer,O., Quignon,E., Jousset,A.-C., Paillart,J.-C., Marquet,R. and
Vivet-Boudou,V. (2021) Chemical and enzymatic probing of viral
RNAs: From infancy to maturity and beyond. Viruses,13, 1894.
15. Mailler,E., Paillart,J.-C., Marquet,R., Smyth,R.P. and
Vivet-Boudou,V. (2019) The evolution of RNA structural probing
methods: From gels to next-generation sequencing. Wiley Interdiscipl.
Rev. RNA,10, e1518.
16. Strobel,E.J., Yu,A.M. and Lucks,J.B. (2018) High-throughput
determination of RNA structures. Nat. Rev. Genet.,19, 615–634.
17. Merino,E.J., Wilkinson,K.A., Coughlan,J.L. and Weeks,K.M. (2005)
RNA structure analysis at single nucleotide resolution by selective
2’-hydroxyl acylation and primer extension (SHAPE). J. Am. Chem.
Soc.,127, 4223–4231.
18. Lee,B., Flynn,R.A., Kadina,A., Guo,J.K., Kool,E.T. and Chang,H.Y.
(2017) Comparison of SHAPE reagents for mapping RNA structures
inside living cells. RNA,23, 169–174.
19. Kwok,C.K., Tang,Y., Assmann,S.M. and Bevilacqua,P.C. (2015) The
RNA structurome: transcriptome-wide structure probing with
next-generation sequencing. Trends Biochem. Sci.,40, 221–232.
20. Lu,Z. and Chang,H.Y. (2016) Decoding the RNA structurome. Curr.
Op. Struct. Biol.,36, 142–148.
21. Westhof,E. and Romby,P. (2010) The RNA structurome.
High-throughput probing. Nat. Methods,7, 965–967.
22. Underwood,J.G., Uzilov,A.V., Katzman,S., Onodera,C.S.,
Mainzer,J.E., Mathews,D.H., Lowe,T.M., Salama,S.R. and
Haussler,D. (2010) FragSeq. Transcriptome-wide RNA structure
probing using high-throughput sequencing. Nat. Methods,7,
995–1001.
23. Kertesz,M., Wan,Y., Mazor,E., Rinn,J.L., Nutter,R.C., Chang,H.Y.
and Segal,E. (2010) Genome-wide measurement of RNA secondary
structure in yeast. Nature,467, 103–107.
24. Forconi,M. and Herschlag,D. (2009) Metal ion-based RNA cleavage
as a structural probe. In: Herschlag,D. (ed). Biophysical, Chemical,
and Functional Probes of RNA Structure, Interactions and Folding.
Academic Press/Elsevier, San Diego, CA, Vol. 468, pp. 91–106.
25. Rubin,J.R. and Sundaralingam,M. (1983) Lead ion binding and
RNA chain hydrolysis in phenylalanine tRNA. J. Biomol. Struct.
Dyn.,1, 639–646.
26. Ciesiolka,J., Hardt,W.D., Schlegl,J., Erdmann,V.A. and
Hartmann,R.K. (1994) Lead-ion-induced cleavage of RNase P RNA.
Eur. J. Biochem.,219, 49–56.
27. Twittenhoff,C., Brandenburg,V.B., Righetti,F., Nuss,A.M., Mosig,A.,
Dersch,P. and Narberhaus,F. (2020) Lead-seq: transcriptome-wide
structure probing in vivo using lead(II) ions. Nucleic Acids Res.,48,
e71.
28. Lindell,M., Romby,P. and Wagner,E. G.H. (2002) Lead(II) as a probe
for investigating RNA structure in vivo.RNA,8, 534–541.
29. Englert,M. and Beier,H. (2005) Plant tRNA ligases are
multifunctional enzymes that have diverged in sequence and substrate
specicity from RNA ligases of other phylogenetic origins. Nucleic
Acids Res.,33, 388–399.
30. Schutz,K., Hesselberth,J.R. and Fields,S. (2010) Capture and
sequence analysis of RNAs with terminal 2’,3’-cyclic phosphates.
RNA,16, 621–631.
31. Remus,B.S. and Shuman,S. (2013) A kinetic framework for tRNA
ligase and enforcement of a 2’-phosphate requirement for ligation
highlights the design logic of an RNA repair machine. RNA,19,
659–669.
32. Olzog,V.J., G¨
artner,C., Stadler,P.F., Fallmann,J. and Weinberg,C.E.
(2021) cyPhyRNA-seq: a genome-scale RNA-seq method to detect
active self-cleaving ribozymes by capturing RNAs with 2’,3’ cyclic
phosphates and 5’ hydroxyl ends. RNA Biol.,18, 818–831.
33. Chakravarty,A.K., Subbotin,R., Chait,B.T. and Shuman,S. (2012)
RNA ligase RtcB splices 3’-phosphate and 5’-OH ends via covalent
RtcB-(histidinyl)-GMP and polynucleotide-(3’)pp(5’)G
intermediates. Proc. Natl. Acad. Sci. U.S.A.,109, 6072–6077.
34. Peach,S.E., York,K. and Hesselberth,J.R. (2015) Global analysis of
RNA cleavage by 5’-hydroxyl RNA sequencing. Nucleic Acids Res.,
43, e108.
35. Solayman,M., Litn,T., Zhou,Y. and Zhan,J. (2022) High-throughput
mapping of RNA solvent accessibility at the single-nucleotide
resolution by RtcB ligation between a xed 5’-OH-end linker and
unique 3’-P-end fragments from hydroxyl radical cleavage. RNA Biol.,
19, 1179–1189.
36. Viollet,S., Fuchs,R.T., Munafo,D.B., Zhuang,F. and Robb,G.B.
(2011) T4 RNA ligase 2 truncated active site mutants: improved tools
for RNA analysis. BMC Biotech.,11, 72.
37. Blondal,T., Thorisdottir,A., Unnsteinsdottir,U., Hjorleifsdottir,S.,
Ævarsson,A., Ernstsson,S., Fridjonsson,O.H., Skirnisdottir,S.,
Wheat,J.O., Hermannsdottir,A.G. et al. (2005) Isolation and
characterization of a thermostable RNA ligase 1 from a Thermus
scotoductus bacteriophage TS2126 with good single-stranded DNA
ligation properties. Nucleic Acids Res.,33, 135–142.
38. Tanaka,N. and Shuman,S. (2011) RtcB is the RNA ligase component
of an Escherichia coli RNA repair operon. J Biol. Chem.,286,
7727–7731.
39. Wang,L.K. and Shuman,S. (2001) Domain structure and mutational
analysis of T4 polynucleotide kinase. J. Biol. Chem.,276,
26868–26874.
40. Ivanova,N., Lindell,M., Pavlov,M., Holmberg Schiavone,L.,
Wagner,E.G.H. and Ehrenberg,M. (2007) Structure probing of
tmRNA in distinct stages of trans-translation. RNA,13, 713–722.
41. Sambrook,J. and Russell,D.W. (2006) Purication of nucleic acids by
extraction with phenol:chloroform. CSH Protoc.,2006,1.
42. Seidl,C.I. and Ryan,K. (2011) Circular single-stranded synthetic
DNA delivery vectors for microRNA. PLOS ONE,6, e16925.
43. Bender,M., Holben,W.E., Sørensen,S.J. and Jacobsen,C.S. (2007) Use
of a PNA probe to block DNA-mediated PCR product formation in
prokaryotic RT-PCR. BioTechniques,42, 609–614.
44. Ewels,P., Magnusson,M., Lundin,S. and K¨
aller,M. (2016) MultiQC:
summarize analysis results for multiple tools and samples in a single
report. Bioinformatics,32, 3047–3048.
45. Martin,M. (2011) Cutadapt removes adapter sequences from
high-throughput sequencing reads. EMBnet J.,17,1.
46. Hoffmann,S., Otto,C., Kurtz,S., Sharma,C., Khaitovich,P., Vogel,J.,
Stadler,P.F. and Hackerm¨uller,J. (2009) Fast mapping of short
sequences with mismatches, insertions and deletions using index
structures. PLOS Comp. Biol.,5, e1000502.
47. Hoffmann,S., Otto,C., Doose,G., Tanzer,A., Langenberger,D.,
Christ,S., Kunz,M., Holdt,L.M., Teupser,D., Hackerm¨uller,J. et al.
(2014) A multi-split mapping algorithm for circular RNA, splicing,
trans-splicing, and fusion detection. Genome Biol.,15, R34.
48. Smith,T.S., Heger,A. and Sudbery,I. (2017) UMI-tools: modelling
sequencing errors in unique molecular identiers to improve
quantication accuracy. Genome Res.,27, 491–499.
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 15
49. Quinlan,A.R. and Hall,I.M. (2010) BEDTools: a exible suite of
utilities for comparing genomic features. Bioinformatics,26, 841–842.
50. Low,J.T. and Weeks,K.M. (2010) SHAPE-directed RNA secondary
structure prediction. Methods,52, 150–158.
51. Petereit,J. (2022) Pipeline automation via snakemake. Methods Mol.
Biol.,2443, 181–196.
52. Kalvari,I., Nawrocki,E., Argasinska,J., Quinones-Olvera,N., Finn,R.,
Bateman,A. and Petrov,A.I. (2018) Non-coding RNA analysis using
the Rfam database. Curr. Protoc. Bioinform.,62, e51.
53. Jaeger,J.A., Turner,D.H. and Zuker,M. (1989) Improved predictions
of secondary structures for RNA. Proc. Natl. Acad. Sci. U.S.A.,86,
7706–7710.
54. Mathews,D.H., Sabina,J., Zuker,M. and Turner,D.H. (1999)
Expanded sequence dependence of thermodynamic parameters
improves prediction of RNA secondary structure. J. Mol. Biol.,288,
911–940.
55. Zarringhalam,K., Meyer,M.M., Dotu,I., Chuang,J.H. and Clote,P.
(2012) Integrating chemical footprinting data into RNA secondary
structure prediction. PLOS ONE,7, e45160.
56. Cordero,P., Kladwang,W., Vanlang,C.C. and Das,R. (2012)
Quantitative dimethyl sulfate mapping for automated RNA
secondary structure inference. Biochemistry,51, 7037–7039.
57. Lorenz,R., Bernhart,S.H., H ¨
oner zu Siederdissen,C., Tafer,H.,
Flamm,C., Stadler,P.F. and Hofacker,I.L. (2011) ViennaRNA
Package 2.0. Alg. Mol. Biol.,6, 26.
58. Kerpedjiev,P., Hammer,S. and Hofacker,I.L. (2015) Forna
(force-directed RNA): simple and effective online RNA secondary
structure diagrams. Bioinformatics,31, 3377–3379.
59. Regulski,E.E. and Breaker,R.R. (2008) In-line probing analysis of
riboswitches. Methods Mol. Biol.,419, 53–67.
60. Giesen,U., Kleider,W., Berding,C., Geiger,A., Ørum,H. and
Nielsen,P.E. (1998) A formula for thermal stability (Tm) prediction of
PNA/DNA duplexes. Nucleic Acids Res.,26, 5004–5006.
61. Hardt,W.D. and Hartmann,R.K. (1996) Mutational analysis of the
joining regions anking helix P18 in E. coli RNase P RNA. J. Mo l.
Biol.,259, 422–433.
62. Lindell,M., Br ¨
annvall,M., Wagner,E.G. and Kirsebom,L.A. (2005)
Lead(II) cleavage analysis of RNase P RNA in vivo.RNA,11,
1348–1354.
63. Sweeney,B.A., Hoksza,D., Nawrocki,E.P., Ribas,C.E., Madeira,F.,
Cannone,J.J., Gutell,R., Maddala,A., Meade,C.D., Williams,L.D.
et al. (2021) R2DT is a framework for predicting and visualising
RNA secondary structure using templates. Nat. Commun.,12, 3494.
64. Chen,J., Wassarman,K.M., Feng,S., Leon,K., Feklistov,A.,
Winkelman,J.T., Li,Z., Walz,T., Campbell,E.A. and Darst,S.A. (2017)
6S RNA mimics B-form DNA to regulate Escherichia coli RNA
polymerase. Mol. Cell.,68, 388–397.
65. Beckmann,B.M., Hoch,P.G., Marz,M., Willkomm,D.K., Salas,M.
and Hartmann,R.K. (2012) A pRNA-induced structural
rearrangement triggers 6S-1 RNA release from RNA polymerase in
Bacillus subtilis.EMBO J.,31, 1727–1738.
66. Panchapakesan,S. S.S. and Unrau,P.J. (2012) E. coli 6S RNA release
from RNA polymerase requires 70 ejection by scrunching and is
orchestrated by a conserved RNA hairpin. RNA,18, 2251–2259.
67. Del Campo,C., Bartholom ¨
aus,A., Fedyunin,I. and Ignatova,Z.
(2015) Secondary structure across the bacterial transcriptome reveals
versatile roles in mRNA regulation and function. PLOS Genet.,11,
e1005613.
68. Ding,Y., Tang,Y., Kwok,C.K., Zhang,Y., Bevilacqua,P.C. and
Assmann,S.M. (2014) In vivo genome-wide proling of RNA
secondary structure reveals novel regulatory features. Nature,505,
696–700.
69. Rogers,E. and Heitsch,C.E. (2014) Proling small RNA reveals
multimodal substructural signals in a Boltzmann ensemble. Nucleic
Acids Res.,42, e171.
70. Aviran,S. and Incarnato,D. (2022) Computational approaches for
RNA structure ensemble deconvolution from structure probing data.
J. Mol. Biol.,434, 167635.
71. Li,T. J.X. and Reidys,C.M. (2020) On an enhancement of RNA
probing data using information theory. Alg. Mol. Biol.,15, 15.
72. Morandi,E., Manfredonia,I., Simon,L.M., Anselmi,F., van
Hemert,M.J., Oliviero,S. and Incarnato,D. (2021) Genome-scale
deconvolution of RNA structure ensembles. Nat. Methods,18,
249–252.
73. Kutchko,K.M. and Laederach,A. (2016) Transcending the prediction
paradigm: novel applications of SHAPE to RNA function and
evolution: Novel applications of SHAPE. WIREs RNA,8, 1374.
74. Ingle,S., Azad,R.N., Jain,S.S. and Tullius,T.D. (2014) Chemical
probing of RNA with the hydroxyl radical at single-atom resolution.
Nucleic Acids Res.,42, 12758–12767.
75. Solayman,M., Litn,T., Singh,J., Paliwal,K., Zhou,Y. and Zhan,J.
(2022) Probing RNA structures and functions by solvent accessibility:
an overview from experimental and computational perspectives.
Brief. Bioinform.,23, bbac112.
76. Hajiaghayi,M., Condon,A. and Hoos,H.H. (2012) Analysis of
energy-based algorithms for RNA secondary structure prediction.
BMC Bioinformatics,13, 22.
77. Xu,X. and Chen,S.-J. (2015) Physics-based RNA structure prediction.
Biophys. Rep.,1, 2–13.
78. Jackson,V.E., Felmy,A.R. and Dixon,D.A. (2015) Prediction of the
pKa’s of aqueous metal ion +2 complexes. J. Phys. Chem. A,119,
2926–2939.
79. Kazakov,S. and Altman,S. (1991) Site-specic cleavage by metal ion
cofactors and inhibitors of M1 RNA, the catalytic subunit of RNase
PfromEscherichia coli.Proc. Natl. Acad. Sci. U.S.A.,88, 9193–9197.
80. Fuchs,R.T., Zhiyi,S., Zhuang,F. and Robb,G.B. (2015) Bias in
ligation-based small RNA sequencing library construction is
determined by adaptor and RNA structure. PLOS One,10, e0126049.
81. Zhang,Y., Zhang,J., Hara,H., Kato,I. and Inouye,M. (2005) Insights
into the mRNA Cleavage Mechanism by MazF, an mRNA
Interferase. J.Biol. Chem.,280, 3143–3150.
82. Bechhofer,D.H. and Deutscher,M.P. (2019) Bacterial ribonucleases
and their roles in RNA metabolism. Crit. Rev. Biochem. Mol. Biol.,
54, 242–300.
83. Shigematsu,M., Morichika,K., Kawamura,T., Honda,S. and
Kirino,Y. (2019) Genome-wide identication of short 2’,3’-cyclic
phosphate-containing RNAs and their regulation in aging. PLOS
Genet.,15, e1008469.
84. Shigematsu,M. and Kirino,Y. (2020) Oxidative stress enhances the
expression of 2’,3’-cyclic phosphate-containing RNAs. RNA Biol.,17,
1060–1069.
85. Shigematsu,M., Kawamura,T., Morichika,K., Izumi,N., Kiuchi,T.,
Honda,S., Pliatsika,V., Matsubara,R., Rigoutsos,I., Katsuma,S. et al.
(2021) RNase promotes robust piRNA production by generating
2’,3’-cyclic phosphate-containing precursors. Nat. Commun.,12,
4498.
86. Kierzek,R. (1992) Hydrolysis of oligoribonucleotides: inuence of
sequence and length. Nucleic Acids Res.,20, 5073–5077.
87. Ciesiołka,J., Michałowski,D., Wrzesinski,J., Krajewski,J. and
Krzyzosiak,W.J. (1998) Patterns of cleavages induced by lead ions in
dened RNA secondary structure motifs. J. Mol. Biol.,275, 211–220.
88. Mikkola,S., Kaukinen,U. and L¨
onnberg,H. (2001) The effect of
secondary structure on cleavage of the phosphodiester bonds of
RNA. Cell Biochem. Biophys.,34, 95–119.
89. Kaukinen,U., Ven¨
al¨
ainen,T., L¨
onnberg,H. and Per¨
akyl¨
a,M. (2003)
The base sequence dependent exibility of linear single-stranded
oligoribonucleotides correlates with the reactivity of the
phosphodiester bond. Org. Biomol. Chem.,1, 2439–2447.
90. Zinshteyn,B., Wangen,J.R., Hua,B. and Green,R. (2020)
Nuclease-mediated depletion biases in ribosome footprint proling
libraries. RNA,26, 1481–1488.
91. Burkhardt,D.H., Rouskin,S., Zhang,Y., Li,G.-W., Weissman,J.S. and
Gross,C.A. (2017) Operon mRNAs are organized into ORF-centric
structures that predict translation efciency. eLife,6, e22037.
92. Incarnato,D., Neri,F., Anselmi,F. and Oliviero,S. (2014)
Genome-wide proling of mouse RNA secondary structures reveals
key features of the mammalian transcriptome. Genome Biol.,15, 491.
93. Mustoe,A.M., Busan,S., Rice,G.M., Hajdin,C.E., Peterson,B.K.,
Ruda,V.M., Kubica,N., Nutiu,R., Baryza,J.L. and Weeks,K.M.
(2018) Pervasive regulatory functions of mRNA structure revealed by
high-resolution SHAPE probing. Cell,173, 181–195.
94. Pelechano,V., Wei,W. and Steinmetz,L. (2015) Widespread
co-translational RNA decay reveals ribosome dynamics. Cell,161,
1400–1412.
95. Shabalina,S.A., Ogurtsov,A.Y. and Spiridonov,N.A. (2006) A
periodic pattern of mRNA secondary structure created by the genetic
code. Nucleic Acids Res.,34, 2428–2437.
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023