ArticlePDF Available

Led-Seq: ligation-enhanced double-end sequence-based structure analysis of RNA

Authors:

Abstract

Structural analysis of RNA is an important and versatile tool to investigate the function of this type of molecules in the cell as well as in vitro. Several robust and reliable procedures are available, relying on chemical modification inducing RT stops or nucleotide misincorporations during reverse transcription. Others are based on cleavage reactions and RT stop signals. However, these methods address only one side of the RT stop or misincorporation position. Here, we describe Led-Seq, a new approach based on lead-induced cleavage of unpaired RNA positions, where both resulting cleavage products are investigated. The RNA fragments carrying 2', 3'-cyclic phosphate or 5'-OH ends are selectively ligated to oligonucleotide adapters by specific RNA ligases. In a deep sequencing analysis, the cleavage sites are identified as ligation positions, avoiding possible false positive signals based on premature RT stops. With a benchmark set of transcripts in Escherichia coli, we show that Led-Seq is an improved and reliable approach based on metal ion-induced phosphodiester hydrolysis to investigate RNA structures in vivo.
Nucleic Acids Research, 2023 1
https://doi.org/10.1093/nar/gkad312
Led-Seq: ligation-enhanced double-end
sequence-based structure analysis of RNA
Tim Kolberg 1,, Sarah von L¨
ohneysen2,, Iuliia Ozerova2, Karolin Wellner1,
Roland K. Hartmann3, Peter F. Stadler 2,4,5,6,7 and Mario M¨
orl 1,*
1Institute for Biochemistry, Leipzig University, Br¨
uderstr. 34, 04103 Leipzig, Germany, 2Bioinformatics Group,
Department of Computer Science and Interdisciplinary Center for Bioinformatics, Leipzig University, H¨
artelstr. 16–18,
04107 Leipzig, Germany, 3Institute for Pharmaceutical Chemistry, Philipps University Marburg, Marbacher Weg
6, 35037 Marburg, Germany, 4Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103
Leipzig, Germany, 5Department of Theoretical Chemistry, University of Vienna, W¨
ahringerstraße 17, A-1090 Wien,
Austria, 6Facultad de Ciencias, Universidad Nacional de Colombia, Bogot´
a, Colombia and 7Santa Fe Institute, 1399
Hyde Park Rd., Santa Fe, NM 87501, USA
Received January 24, 2023; Revised March 21, 2023; Editorial Decision April 11, 2023; Accepted April 13, 2023
ABSTRACT
Structural analysis of RNA is an important and ver-
satile tool to investigate the function of this type of
molecules in the cell as well as
in vitro
. Several ro-
bust and reliable procedures are available, relying
on chemical modification inducing RT stops or nu-
cleotide misincorporations during reverse transcrip-
tion. Others are based on cleavage reactions and RT
stop signals. However, these methods address only
one side of the RT stop or misincorporation position.
Here, we describe Led-Seq, a new approach based
on lead-induced cleavage of unpaired RNA positions,
where both resulting cleavage products are investi-
gated. The RNA fragments carrying 2,3
-cyclic phos-
phate or 5-OH ends are selectively ligated to oligonu-
cleotide adapters by specific RNA ligases. In a deep
sequencing analysis, the cleavage sites are identified
as ligation positions, avoiding possible false positive
signals based on premature RT stops. With a bench-
mark set of transcripts in
Escherichia coli
,weshow
that Led-Seq is an improved and reliable approach
based on metal ion-induced phosphodiester hydrol-
ysis to investigate RNA structures
in vivo
.
INTRODUCTION
RNA is an extremely versatile molecule, despite its lim-
ited number of building blocks. It performs various tasks
in many biological processes ranging from encoding genetic
information as mRNAs to delivering amino acids to the ri-
bosome as tRNAs, catalyzing chemical reactions, regulating
gene expression and shielding viral RNA from degradation
by host RNases, just to name examples (1–3). The func-
tional diversity of RNA is based on helical arrangements
comprising stacking interactions and base pairing that form
both local structural motifs and long range interactions (4).
RNA structure formation is energetically dominated by
canonical (Watson–Crick and GU-wobble) pairs forming
helical stem regions separated by unpaired stretches of nu-
cleotides. Such secondary structures also appear as inter-
mediates during the folding process before additional inter-
actions stabilize the three-dimensional structure (5). RNA
secondary structure, i.e. the arrangement of canonical base
pairs, can be computed based on an energy model that con-
siders sequence-specic stacking of adjacent base pairs and
entropy-driven, destabilizing contributions of loops (6). Ef-
cient dynamic programming algorithms have been devised
that compute minimum free energy (MFE) structures (7)
and partition functions (8). Deviations from this simple
nearest neighbor model, inaccuracies of energy parameters
in particular large and multi-branched loops, tertiary in-
teractions including pseudoknots, the limited understand-
ing of the effects of salt concentrations and temperature
limit the accuracy of the thermodynamic model. Additional
factors such as interactions with proteins, metal ions and
other ligands as well as the cellular localization of RNA
molecules likewise have an impact on the conformational
states (9,10). That makes computational prediction difcult
and highlights the need for additional data to infer reliable
RNA structures. Chemical probing methods provide infor-
mation on the base pairing status of individual nucleotides.
While it is well known that this information alone is insuf-
cient to uniquely determine a secondary structure, it can
be readily combined with thermodynamic prediction algo-
rithms in the form of position-specic constraints or ‘bonus
energies’ to guide the reconstruction of the biologically
*To whom correspondence should be addressed. Tel: +49 341 9736911; Fax: +49 9736919; Email: mario.moerl@uni-leipzig.de
The authors wish it to be known that, in their opinion, the rst two authors should be regarded as Joint First Authors.
C
The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
2Nucleic Acids Research, 2023
relevant structure (11–13). It is crucial, therefore, to develop
and improve methods to determine RNA structure in vivo
to deepen our understanding of how it is forming and how
this structure can be predicted.
A rapid development has taken place in the last two
decades regarding the determination of RNA secondary
and tertiary structure (reviewed in (14–16)). Although bio-
physical methods such as X-ray crystallography, NMR and
cryo-EM also have the potential to provide detailed insights
into RNA structure, they are limited to in vitro applica-
tions. The possibility to structurally investigate the entirety
of RNAs in vivo was pioneered with the development of
SHAPE (17). This method and its derivatives evolved over
the years utilizing different reagents, like NMIA, 1M7 or
BzCN, to modify RNA depending on its structure (18). The
coupling of SHAPE and other previously established struc-
ture probing methods with capillary electrophoresis and
later high throughput sequencing allowed massively paral-
lel investigation of RNA structures, enabling a global as-
sessment of the in vivo structurome’ (19–21). Most meth-
ods target nucleotides in single-stranded and conformation-
ally exible RNA regions that are more accessible to chem-
ical modication than base-paired regions. Chemical mod-
ication gives rise to either reverse transcription (RT) stop
(-seq approaches) or mutation signals (-MaP approaches)
in cDNA. These signals can be read out as a reactivity
score and be used to infer RNA structure (10). Structure-
dependent cleavage is an attractive alternative to the cova-
lent modication strategy. Compared to enzymatic cleav-
age, which is to date limited to in vitro applications (22,23),
divalent metal ions promise to be an interesting alternative
for in vivo investigations.
In particular Pb2+ is well established in RNA structure
probing both invitro and invivo (24). Briey, this method re-
lies on structure dependent cleavage catalyzed by hydrated
Pb2+ ions that abstract the proton from the ribose 2-OH
group. In an in-line conguration, the 2-Ogroup attacks
the phosphorous in the sugar-phosphate backbone, result-
ing in cleavage of the RNA with fragments carrying 2,3-
cyclic phosphate (2,3-cP) and 5-OH ends. As exible re-
gions are more likely to adopt the in-line conformation,
this reaction occurs predominantly in single-stranded re-
gions. Enhanced cleavage can also occur at metal ion bind-
ing sites (25,26). Lead-seq (27) combines the established in
vivo Pb2+ probing approach (28) with next generation se-
quencing (NGS) to allow its use in a high-throughput set-
ting. It serves as a promising starting point to broaden the
spectrum of methods to investigate the RNA structurome
in vivo. The incorporation of a lead cleavage score as ‘bonus
energy’ in the minimum free energy structure computation
resulted in a distinct improvement of the predicted struc-
ture of several tRNAs. Like most NGS-coupled probing
approaches, Lead-seq uses random fragmentation to gen-
erate a more homogeneously sized pool of RNAs. This in-
troduces additional strand breaks, which may obscure the
actual cleavage signal even though a negative control can re-
duce this effect to a certain degree. The 5phosphorylation
and 3dephosphorylation of the RNAs with T4 polynu-
cleotide kinase prior to adapter ligation impedes the distinc-
tion of the 5and 3phosphorylation status of the ligated
RNAs. In this manner, all cellular RNAs are potentially in-
cluded in the libraries, which negatively impacts the speci-
city of the method. Lead-seq thus is a promising concept
with ample potential for improvement.
Here we propose a modied procedure that improves the
specicity and validity of this method. Cleavage by divalent
metal ions introduces strand breaks that generate fragments
with distinct end groups, 2,3-cP and 5-OH. For the specic
capture of these fragments, we utilized the unique features
of two RNA ligases to mark the cleavage positions via lig-
ation of sequencing adapters. The 5fragments, carrying a
2,3-cP, are captured by an Arabidopsis thaliana tRNA lig-
ase (Ath RNL) variant. The wild-type enzyme is able to lig-
ate 2,3-cP and 5-OH via its 2,3cyclic phosphodiesterase
and 5kinase activity (29,30). To ensure specic ligation of
a pre-adenylated adapter to the substrate RNA, two muta-
tions were introduced that inactivate the enzyme’s ability to
phosphorylate and adenylate 5-OH groups (31). This Ath
RNL AA double mutant recently proved to keep its speci-
city for 2,3-cP carrying substrates in a ribozyme activ-
ity screen (32). The corresponding 3fragments, carrying
5-OH ends, are captured in a similar manner utilizing Es-
cherichia coli (Eco) RtcB. This ligase and its homologs also
have the ability to directly ligate 5-OH and 2,3
-cP but ex-
hibit a unique mechanism without the necessity of phos-
phorylating the RNA 5end (33). It also proved to be ap-
plicable to library preparation in previous studies (34,35).
The combination of two separate ligation approaches and
analysis of cleavage positions from both sides allows mutual
validation of the identied cleavage sites. Moreover, the use
of reads from both the 5and 3side ensures that one of the
libraries remains informative close to the transcript ends,
where the reads from the other library are too short for un-
ambiguous mapping.
MATERIALS AND METHODS
Protein purication
Ath RNL K152A D726A was expressed with 6xHisTag
from a pET28a vector kindly gifted by Christina Wein-
berg in E. coli BL21 (DE3) codon PLUS RIPL cells. Sac-
charomyces cerevisiae Tpt1 was expressed from a pET-24b
vector in E. coli BL21 (DE3) cells. Both enzymes were ex-
pressed and puried as described (32). T4 RNL 2 truncated
KQ and TS2126 RNL expression plasmids were gifted by
Jan Medenbach and the proteins were expressed and puri-
ed as described previously (36,37). Eco RtcB was expressed
with 6xHisTag from a pET-53 vector (Addgene #51282) in
E. coli BL21 (DE3) according to (38). Briey, transformed
cells were cultivated in TB medium with 100 g/ml ampi-
cillin at 37CtoanOD
600 of 0.6. Cultures were chilled on
ice for 30 min before induction with 0.1 mM IPTG and ad-
dition of ethanol to a nal concentration of 2%. The in-
duced cells were cultivated for 18 h at 16C and harvested
by centrifugation. Pellets were stored at 80C until use.
Cell pellets were resuspended in 25 ml ice-cold lysis buffer
(50 mM Tris–HCl, pH 7.4, 250 mM NaCl, 10% (w/v) su-
crose, 0.2 mg/ml lysozyme) and incubated for 1 h at 4C.
Then Triton-X100 was added to a nal concentration of
0.1%. Cells were disrupted by sonication at 70% intensity,
7×10 s with 20 s breaks. The lysate was centrifuged, the
supernatant sterile-ltered and used for purication. All
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 3
further purication steps were carried out on an ¨
AKTA
pure protein purication system starting with metal ion
afnity chromatography on a HisTrapFF 1 ml column
(Cytiva). To this end, the column was equilibrated with
5 column volumes (CV) of binding buffer (50 mM Tris–
HCl, pH 7.4, 150 mM NaCl, 10% glycerol) containing
25 mM imidazole. After loading the sample, the column
was washed with 10 CV binding buffer with 25 mM imi-
dazole, followed by a second wash step with 10 CV wash
buffer (50mM Tris–HCl, pH 7.4, 2M KCl). Step-wise elu-
tion was carried out using 5 CV binding buffer containing
100, 300 and 500 mM imidazole. Fractions containing Eco
RtcB ligase were identied via SDS-PAGE and Coomassie
staining, pooled and subsequently puried by size exclusion
chromatography on a HiLoad 16/60 Superdex 75 pg col-
umn with SEC buffer (10 mM Tris–HCl, pH 8.0, 350mM
NaCl, 1 mM DTT). Fractions containing the desired pro-
tein were identied via SDS-PAGE, pooled and concen-
trated with a Vivaspin 6 column (MWCO 10kDa) and
stored at 80C until use.
T4 polynucleotide kinase R38A was expressed with
6xHisTag from a pET-53 vector in E. coli BL21 (DE3)
according to (39). Transformed cells were grown in TB
medium containing 100 g/ml ampicillin at 37Ctoan
OD600 of 0.6. The cultures were chilled on ice for 30 min
before induction with 0.3mM IPTG and cultivation at
16C for 18 h. Cells were harvested via centrifugation and
stored at 80C until use. The cells were resuspended
in 25 ml ice-cold lysis buffer (50 mM Tris–HCl, pH 7.5,
1.2 M NaCl, 15 mM imidazole, 10% (v/v) glycerol, 0.2 mM
phenylmethylsulphonyl uoride, 1 mg/ml lysozyme) and in-
cubated 1 h at 4C. Triton-X100 was added to a nal con-
centration of 0.1%. Cells were disrupted by sonication at
70% intensity, 7 ×10 s with 20s breaks. The lysate was cen-
trifuged, the supernatant sterile-ltered and used for pu-
rication. Metal ion afnity chromatography was carried
out on an ¨
AKTA pure protein purication system using a
HisTrapFF 1 ml column (Cytiva). The binding buffer con-
tained 50 mM Tris–HCl, pH 7.5, 200 mM NaCl and 10%
glycerol. For step-wise elution, the 5 CV buffer addition-
ally contained 125, 300 and 500 mM imidazole. The frac-
tions with T4 PNK R38A were identied by SDS-PAGE,
subsequently pooled and dialyzed twice against 1l dialy-
sis buffer (10 mM Tris–HCl, pH 7.4, 50 mM KCl, 1 mM
DTT). Glycerol mix was added to reach storage condi-
tions (10 mM Tris–HCl, pH 7.5, 50% (v/v) glycerol, 0.2 mM
EDTA, 1mM DTT, 50 mM KCl, 0.2 M ATP). Aliquots
of the protein solution were stored at –80C. Plasmids en-
coding the enzymes used in this work are either available at
Addgene (addgene.org) or from the authors upon request.
Oligonucleotide preparation
Oligonucleotides were ordered from biomers (Ulm, Ger-
many) and Microsynth (Balgach, Switzerland). 3Adapter,
RT-Primer and circularization RT-Primer were 5la-
beled using [32P]-ATP and T4 PNK (NEB) and puri-
ed via denaturing PAGE and ethanol precipitation. Non-
radioactively labeled adapters and RT-Primers were pre-
pared in the same way with ATP and T4 PNK, using T4
PNK (3phosphatase minus) for the 5adapter. The 5phos-
phorylated and 3blocked 3adapter was pre-adenylated
with TS2126 RNL. In one 20 l reaction, 2.5 M TS2126
RNL, 100 pmol adapter (10 pmol end-labeled with 32P)
and 0.5 mM ATP were incubated in 1×adenylation buffer
(50 mM MOPS, pH 7.5, 10 mM KCl, 5 mM MgCl2) sup-
plemented with 2.5 mM MnCl2and 1 mM DTT for 1 h at
60C. The reaction was stopped by heat inactivation for 10
min at 80C followed by preparative denaturing PAGE and
ethanol precipitation. For details on oligomers, see Supple-
mentary Information.
Sample preparation for in vivo lead probing
In vivo lead probing was performed as described (27,28,40).
Briey, LB medium was inoculated with an E. coli DH5a
overnight culture to a starting OD600 of 0.06, and cells were
grown to an OD600 of 0.5 at 37C. Lead-II-acetate solu-
tions were freshly prepared by mixing 3 volumes of a lead-
II-acetate stock solution with 1 volume 4×LB medium
and pre-warming to 37C. After reaching the desired den-
sity (OD600 0.5), 20 ml of each main culture were mixed
with LB-lead-II-acetate solutions to a nal concentration
of 75 mM and incubated for 7 min. In Pb2+ (–) samples,
lead-II-acetate was replaced by autoclaved, deionized wa-
ter (dH2O). The reaction was stopped by adding 10ml ice-
cold 500 mM EDTA. Cells were immediately pelleted and
RNA was isolated using peqGOLD TriFast®(VWR) ac-
cording to the manufacturer’s instructions and precipitated
from the aqueous phase by adding twice the volume of ice-
cold isopropanol. The pellet was resuspended in dH2Oand
incubated with 2 U DNase I (NEB). RNA was recovered
by phenol/chloroform extraction and ethanol precipitation
(41). Recovered RNA was redissolved in dH2O and stored
at –80C.
Specic adapter ligation
Total RNA was used for 2’, 3’-cP mapping via specic
adapter ligation with Ath RNL K152A D726A and for 5-
OH mapping via 3dephosphorylation with T4 PNK R38A
and subsequent 5-OH specic adapter ligation with Eco
RtcB.
2,3
-cP capture. 35ng total RNA were pre-incubated
with 20 pmol pre-adenylated 3adapter for 5 min at 65C
and immediately put on ice for at least 1 min. Subse-
quently, the mixture was incubated in 1×reaction buffer
(20 mM Tris–HCl, pH 7.5, 5 mM MgCl2, 2.5 mM sper-
midine, 100 M DTT) and 20% (v/v) PEG8000 with
12 pmol Ath RNL K152A D726A for 2 h at 25Cina
volume of 16 l. In the ligation reaction, the 2’, 3’-cP
is converted into a 2’-P group that can interfere with
reverse transcription. To remove this obstacle, 10pmol
S. cerevisiae tRNA 2’-phosphotransferase Tpt1 and 1 mM
NAD were added to the ligation mixture. The volume was
adjusted to 20 l with dH2O and reaction buffer and the
samples were incubated for 30 min at 30C. The ligated
and 2’-dephosphorylated RNA was recovered using the
Monarch®RNA clean up kit (NEB) and used as template
for reverse transcription.
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
4Nucleic Acids Research, 2023
5-OH capture. 70ng total RNA were incubated in 1×
PNK buffer (NEB) with 1 mM ATP and 10 pmol T4 PNK
R38A for 30 min at 37Cin15l total volume. The 3de-
phosphorylated RNA was recovered with the Monarch®
RNA clean up kit (NEB) using the protocol to bind RNA
down to 15 nt length. The RNA was eluted in 6 ldH
2O
and 3 l thereof were pre-incubated with 20pmol 5adapter
for 5 min at 65C and put on ice for 1 min. Subsequently,
the samples were incubated in 50mM Tris–HCl (pH 7.4),
2 mM MnCl2, 100 M GTP and 20% (v/v) PEG8000 with
50 pmol Eco RtcB for 1 h at 37Cin20l total volume.
The ligated RNA was recovered again with the Monarch®
RNA clean up kit and used for 3adapter ligation. To
this end, the ligated RNA was mixed with 20 pmol pre-
adenylated 3adapter, pre-incubated for 5 min at 65Cand
put on ice for 1 min. The mixture was incubated in 1×T4
ligase buffer (NEB) and 20% PEG8000 (v/v) with 20 pmol
T4 RNL 2 truncated KQ for 2h at 25C. The ligated RNA
was extracted using the Monarch®RNA cleanup kit and
used as template for reverse transcription.
Reverse transcription
The ligated and recovered RNA from both strategies was
reverse transcribed with Superscript IV reverse transcrip-
tase (Thermo). Reactions of 20 l were set up according to
the manufacturer, using RT-Primer for 5-OH samples. For
2,3-cP samples, a biotin-dNTP-Mix (nal concentration
500 MdATP/dGTP/dTTP each, 350 M dCTP, 150 M
biotin-16-dCTP (Jena Bioscience)) and circularization RT-
Primer was used. For both strategies, 20 pmol of the respec-
tive primer were used containing trace amounts of the same
5labeled primer for product detection. After incubation at
55C for 10 min, the template RNA in the reaction mixes
was degraded by adding NaOH to a nal concentration of
250 mM and incubation for 3 min at 95C. The reaction mix
was neutralized with 250 mM HCl, cDNA was extracted
and size-selected via preparative denaturing PAGE. A 32P-
labeled size standard was used to identify cDNA above the
size of the used RT-Primer + UMI sequence. The cDNA
was eluted from the gel and precipitated with ethanol and
10 g/ml LPA (Thermo) as carrier. Samples of the 2,3-cP
strategy were circularized (see below), 5-OH cDNA was re-
dissolved in 20 ldH
2O and directly used for amplication
and introduction of ow cell linkers and indices.
Circularization of cDNA and streptavidin bead cleanup
The cDNA for 2,3
-cP library construction, redissolved in
20 ldH
2O, was incubated for 2 h at 60Cin1×adeny-
lation buffer supplemented with 2.5 mM MnCl2,10mM
DTT and 50 M ATP using 2.5 M TS2126 RNL (42).
The mixture was heat-inactivated at 80C for 10 min, ad-
justed to 100 lwith1×wash/binding buffer (20 mM Tris–
HCl, pH 7.5, 1 mM EDTA, 0.5 M NaCl) and directly used
for purication with Hydrophilic Streptavidin Magnetic
Beads (NEB). Per reaction, 16 l beads were prepared: the
beads were washed three times by resuspending them in
160 lwash/binding buffer each time and removing the
supernatant while placing the tubes on a magnetic rack.
The washed beads were resuspended with the circulariza-
tion reaction mixture and incubated for 20 min at room
temperature with careful mixing every 5 min. After incu-
bation, samples were spun down, supernatants were dis-
carded and the beads were resuspended by pipetting. The
beads where washed three times with 500lwash/binding
buffer. Each wash included the following steps: resuspen-
sion of the beads by pipetting, brief centrifugation in a desk-
top centrifuge, again careful resuspension of the bead pel-
let by pipetting, transfer of the sample to a new tube (this
step turned out to be crucial to avoid carry over of non-
biotinylated cDNA), using a magnetic rack when remov-
ing the supernatants. The beads were nally resuspended in
20 ldH
2O each and directly used for amplication and in-
troduction of ow cell linkers and indices.
Introduction of ow cell linkers and indices
Flow cell binding sequences and indices were introduced via
PCR. One reaction mix contained 2.5 l of the cDNA tem-
plate solution, 1×Phusion HF buffer (Thermo), 200 M
dNTPs, 0,5 M Illumina PCR and Index Primer and
0.02 U/l Phusion®high-delity polymerase (Thermo)
in a volume of 25 l. To samples containing circularized
cDNA templates, 5 M of a PNA clamp were added to
reduce the accumulation of a side product resulting from
residual prolonged circularization RT-Primer as template in
this reaction. Cycling conditions were as follows: initial de-
naturation at 98C for 30 s, followed by 15 (5-OH strategy)
/18 (2,3-cP strategy) cycles of 98C for 10 s, 80Cfor20s,
60C for 20 s and 72C for 20 s. The additional annealing
step at 80C was introduced to ensure optimal PNA binding
in the 2,3
-cP libraries (43). The amplied libraries were pu-
ried by preparative native PAGE, excised and eluted from
the gel, and precipitated with ethanol and 10 g/ml LPA as
carrier. The nal libraries were sequenced on an Illumina
NovaSeq 6000 (Azenta). The experimental protocol is illus-
trated in Figure 1.
Read preprocessing and mapping
Sequencing quality of paired reads was evaluated using
MultiQC (44). 3adapter sequences were removed from
read1 and read2, and pairs were ltered for a correct UMI
sequence in read2 with cutadapt v2.10 (45). A mini-
mum read length of 12 nt was set to enable unambiguous
mapping. An additional xed sequence between the UMI
and RNA insert was removed from the 5-end of read2
and if necessary from the 3-end of read1 by cutadapt.
Preprocessed reads were mapped to the E. coli genome
(NZ CP026085.1) with segemehl v0.3.4 (46,47). Libraries
were deduplicated with umi tools v1.0.1 (48) and ltered
for primary hits of properly mapped read pairs. After an
initial sample composition analysis, selected (multi-copy)
genes were masked by substitution with ‘N’ and one copy
was attached to the end of the genome to facilitate unique
mapping of reads. As we wanted to focus our analysis
on highly represented transcripts, this procedure included
all tRNA and rRNA genes. For details on the generated
genome and corresponding transcriptome annotation le,
see Supplementary Information. Mapping and deduplica-
tion steps were repeated accordingly.
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 5
Figure 1. Preparation of Led-Seq Illumina libraries. Left: Mapping of 2,3-cP carrying cleavage fragments via specic ligation of a pre-adenylated RNA
3adapter using Ath RNL AA. The 3adapter carries an 8 nt unique molecular identier (UMI) to account for PCR bias. After ligation, RNA is reverse
transcribed using the circularization RT-Primer and biotinylated dCTP (B-dCTP). The resulting cDNA is circularized by TS2126 RNL and extracted with
hydrophilic streptavidin magnetic beads (brown circular symbol denoted A’), removing residual circularization RT-Primer in the process. Circularized,
bead-bound cDNA is used as a PCR template to introduce ow cell linkers (P5, P7) and 6 nt indices. A PNA PCR clamp is utilized to minimize the ampli-
cation of circularized products without cDNA insert. Right: Mapping of 5-OH carrying cleavage fragments via specic ligation of a biphosphorylated
RNA 5adapter using Eco RtcB. RNA was previously 3dephosphorylated with T4 PNK R38A to prevent ligation of RNA fragments with 2,3-cP to
5-OH ends. The 3adapter is subsequently ligated to the RNA using T4 RNL II KQ truncated, followed by reverse transcription and amplication via
PCR.
Intensity of the probing signal
We ltered for reads that mapped uniquely in proper pairs
and subsequently considered only hits that mapped to non-
overlapping annotated regions (bedtools v2.27.1 (49)) to
ensure an unambiguous signal. In the 2-3-cP libraries, the
last nucleotide of a ligated RNA fragment represents the po-
sition immediately upstream of the cleavage site in the RNA
backbone. Correspondingly, the rst nucleotide of a frag-
ment in 5-OH libraries represents the position downstream
of the cleavage site. The raw probing signal was obtained for
each nucleotide of the transcriptome by counting the num-
ber of read starts (or start-1, respectively) at that position.
Normalization of the raw signal was performed separately
for each transcript. According to Low and Weeks (50), we
divided the raw read count by the average count of the 90th
to 98th percentile of the signal. This is motivated by consid-
ering the largest 2% of the signals as outliers. We denote the
normalized signal for position iby Si. Its range is limited to
the interval [0,7] because very high values of outliers were
capped at 7. Where applicable, mean values of replicates
were used for all downstream analyses. The workow for
the computation of the normalized probing signal from raw
sequencing reads is implemented as a snakemake v3.13.3
pipeline (51), which is available at github.com/xamiiii/Led-
Seq.
Estimating the probability to be unpaired
We use a bayesian approach to estimate the probability qi
that position iis unpaired based on the normalized signal
Si. To this end, we employ a collection of reference struc-
tures comprising 32 RNAs of lengths 74 nt 682 nt. This
set includes non-coding RNAs that belong to Rfam fami-
lies (52) and that are sufciently represented by our data,
see coverage criteria below. The small-subunit rRNA (16S)
and the large-subunit rRNA (23S) were divided into smaller
domains as described (53,54). Secondary structures for the
resulting 40 sequences were taken from the RNAcentral
data base (see Supplementary Table S1 and Supplementary
Information for full details).
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
6Nucleic Acids Research, 2023
Figure 2. Conversion of normalized probing signals Sto probabilities of
being unpaired. Top: One-dimensional function p(S) estimated separately
for 2,3
-cP (left) and 5-OH libraries (right). Different ts were used
for cleavage at CA dinucleotide positions (orange). Grey lines indicate
the intervals (‘bins’) within which the signals were pooled. Below: Two-
dimensional t p(ScP,SOH ) combining the information of both libraries.
Again CA dinucleotides (right) were treated separately.
Denote by nu(S)andn(S), the number of unpaired po-
sitions and the total number of positions, respectively, that
exhibit a normalized signal in the calibration set that falls
within a bin, i.e. an interval of signal values, centered at S.
The probability that a position with a signal that falls within
this bin is unpaired can then be estimated by
p(S):P[unpaired|S] nu(S)
n(S) (1)
In a more elaborate model, we combine the two signals ScP
and SOH of the 2,3
-cP and 5-OH libraries. We then con-
sider
p(ScP,SOH):P[unpaired|S] nu(ScP,SOH)
n(ScP,SOH )(2)
where nu(ScP ,SOH)andn(ScP,SOH ) are the counts of un-
paired and all positions, respectively, in the calibration set
that have signal values for the 2,3
-cP and 5-OH libraries
within intervals centered at ScP and SOH, respectively. In
order to reduce the effects of inaccuracies in the reference
structures and other noise in the data we approximate p(S)
and p(ScP,SOH ) by tting sigmoidal functions of the form
p(S)=(1+exp(aS+b))1+c
p(ScP,SOH)=1+exp(a1ScP a2SOH +b)1+c
(3)
to the binned data, see Figure 2. To estimate parameters
a,b,c,anda1,a2,b,c, respectively, we used the function
curvefit of the python v3.6.12 library scipy v1.5.2.
Since cleavage sites in CA dinucleotides were found to be-
have differently compared to the other dinucleotides, differ-
ent parameters were tted for this special case. As a control
we randomized the sequence positions and, as expected, ob-
tained a at response curve, see Supplementary Figure S1.
The 2,3-cP library produces sequence fragments that are
too short for reliable mapping close to the 5end of each
transcript. Consequently, no signal ScP exists for the rst
11 nucleotides of each transcript. Analogously, the 5-OH
library is uninformative close to the 3end (last 12 nt). Thus
we use p(SOH
i)orp(ScP
i) for positions iat the ends of a tran-
script and p(ScP
i,SOH
i) for its interior.
Secondary structure prediction with probing data
The conversion of the probability of being unpaired, i.e. qi=
p(Si)orqi=p(ScP
i,SOH
i), into a pseudo-energy in essence
follows the scheme proposed by Zarringhalam et al. (55).
However, we associate pseudo-energies only with the un-
paired nucleotides. It is not difcult to see that it sufces
to associate pseudo-energies only with paired or only with
unpaired nucleotides as long as one is not interested in the
absolute value of the partition function, see (12).
To incorporate the probing data into secondary structure
prediction algorithms, we converted qiinto pseudo free en-
ergy terms that can be interpreted as log-likelihoods for nu-
cleotides being unpaired (56). In addition, we compensate
for the fact that the aprioriprobability p0that a base is un-
paired differs from 1/2. This yields
GPb2+,i=−RT ·c·ln qi
1qi
ln p0
1p0(4)
where Ris the gas constant, Tis the absolute temperature,
and cis a constant that allows to tune the relative impor-
tance or trust in the probing data. Throughout this contri-
bution, we used c=1.2. The value of p0=0.42 was de-
termined from the calibration set. The position-dependent
pseudo-energies are used as soft constraints (12) in the pro-
gram RNAfold of the ViennaRNA package v2.4.15 (57)as
described (11). A detailed structural analysis requires suf-
cient cleavage signal across the transcript. We require rather
stringent conditions: (i) at least 75% of a transcript must be
represented by reads, and (ii) there must be at least 2.5 read
starts per position on average.
To assess prediction quality, we calculated positive pre-
dictive value (PPV), sensitivity (SEN) and the Matthews
correlation coefcient (MCC) (denitions see Supplemen-
tary Information). Plots were generated using the python
package matplotlib v3.3.1. Secondary structures with
mapped pseudo-energies were visualized using the forna
Web serve r ( 58).
RESULTS
Lead probing protocol
Metal ion cleavage is an established approach to probe
RNA structure (28,59). Recently, Twittenhoff et al. showed
that in vivo probing with lead(II) ions coupled with next
generation sequencing is suitable to investigate RNA struc-
tures on a transcriptome-wide level (27). Here, we present
a novel lead-based approach where NGS adapters are se-
lectively ligated to the resulting fragments of lead-induced
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 7
Figure 3. Proportion of reads mapping to indicated RNAs in 2,3
-cP (left)
and 5-OH libraries (right) for samples treated with (+) or without ()
Pb2+.nc=non-coding, n/a=not annotated
cleavage (see Figure 1). Accordingly, both cleavage ends (2,
3-cP and 5-OH) are converted into sequencing libraries.
During the development of this method, we noticed two
major side products in the sequencing data of the 2,3
-
cP libraries, both identied as results from the circulariza-
tion reaction with TS2126 RNL. These originated from left-
over 3adapter and circularization RT-Primer in the reverse
transcription reaction which subsequently gave rise to the
presence of three RT-Primer based DNA species. In addi-
tion to circularization RT-Primer+UMI+cDNA, also cir-
cularization RT-Primer+UMI and RT-Primer alone were
present, despite performing a size selection via preparative
denaturing PAGE after reverse transcription. These side
products caused a considerable loss in usable sequences.
To reduce these side products, we combined two strategies.
First, cDNA was internally labeled with biotin-dCTP and
extracted using magnetic streptavidin beads after circular-
ization to eliminate unused circularization RT-Primer. Sec-
ond, as PNA oligonucleotides exhibit a very tight binding to
complementary DNA sequences with high thermal duplex
stability (43,60), we designed a PNA clamp binding to the
constant 3-region of the UMI-containing oligonucleotide
and to the 5adapter-derived cDNA sequence. These re-
gions are only adjacent if the circularization RT-Primer
elongated by the sequence of the 3adapter was circularized
without insert. Hence, the PNA clamp blocked the ampli-
cation of this side product. Both strategies led to a drasti-
cally reduced formation of by-products (see Supplementary
Figure S2, Supplementary Figure S3).
We generated two independent biological replicates of
both Pb2+(+) libraries. In addition, negative controls in the
absence of lead (Pb2+(–)) were prepared and sequenced.
One of these controls is shown here as a representative ex-
ample. After read preprocessing, mapping and removal of
duplicated reads, we obtained 4.8 16.8 million uniquely
mapping reads per library (see Supplementary Table S2).
Figure 3summarizes the composition of mapped read ends
in terms of the annotated biotypes. As expected, the ma-
Figure 4. Distribution of normalized probing signal Sat unpaired (orange)
and paired (blue) positions of the calibration set. Signal is higher at un-
paired sites in the 2,3
-cP (left) and 5-OH libraries (right).
jority of reads originates from rRNA and tRNAs. The dis-
tributions are similar for lead-treated and negative control
libraries and also differ only moderately between the 2,3
-
cP and 5-OH libraries.
Between 221 and 465 RNAs in the Pb2+(+) libraries (and
104 to 171 RNAs in the Pb2+(-) libraries) are covered suf-
ciently by reads to meet our criteria for structural analysis
(described in Methods section). Of these, we set up a bench-
mark set of 32 transcripts that we analyzed in more detail.
We found that probing signals are highly reproducible with
a Pearson correlation coefcient of 0.83 for the two 2,3-
cP libraries and 0.80 for 5-OH libraries (calculated over all
sufciently covered transcripts).
Signal corresponds to structure
Based on reference structures, we investigated distributions
of the signal for paired and unpaired nucleotides separately
(Figure 4). Consistent with our expectations, unpaired posi-
tions display signicantly higher cleavage levels (two-sided
Mann–Whitney U test p=3.0 ×10210 for 2,3
-cP and p
=2.9 ×10200 for 5-OH). We also observed that most po-
sitions exhibit low signal regardless of the structure. High
signal values are therefore informative for unpaired posi-
tions. However, not all nucleotides that are unpaired in the
reference structures are associated with high cleavage activ-
ity. Thus, low signals do not provide a reliable predictor for
pairedness.
To visualize the correlation between probing intensity
and structure, the prole for tRNALeu(CAG) is displayed in
Figure 5. D-loop, anticodon loop, variable loop, T-loop as
well as the unpaired 3end consisting of the CCA sequence
and the discriminator base are reected in the peaks of 2,3-
cP as well as 5-OH libraries. Since we capture both RNA
cleavage fragments in our protocol, the information for the
entire transcript is retained even though both library types
have one ‘blind’ end of 11 (or 12) nucleotides. To further in-
vestigate and quantify the structural information within the
probing signal, we converted the scores into probabilities to
be unpaired qi,seeFigure2in the Methods section.
Since the mapped reads determine the exact cleavage po-
sition, we considered the possibility of sequence-specic
biases. We indeed found a strong bias towards CA din-
ucleotides and a milder bias towards UA in 2,3-cP li-
braries (Figure 6). This observation cannot be explained
by genomic overrepresentation of these dinucleotides or the
inuence of only a few outliers (see Supplementary Fig-
ure S4). Cleavage between C and A (but not between U and
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
8Nucleic Acids Research, 2023
Figure 5. Probing signal prole of tRNALeu(CAG) reects the secondary
structure. tRNALeu(CAG) signal prole from (top) 2,3
-cP and (bottom) 5-
OH libraries. Depicted are both biological replicates as dots and a savitzky-
golay smoothing curve for visualization only. Coloring in orange or blue
according to unpaired or paired status within the structural reference.
Figure 6. Read fractions (in %) sorted by dinucleotide composition at the
cleavage site in 2,3-cP and 5-OH libraries. Adenine A, uracile U, gua-
nine G, cytosine C.
A) is clearly less correlated to an unpaired structural state
compared to all other dinucleotides. We account for this ef-
fect by estimating the probability of unpairedness qisepa-
rately for CA and all other dinucleotides, see Figure 2above.
The deviations observed for UA in the 2,3-cP libraries were
small enough to be neglected.
Comparison of Pb2+(+) and Pb2+ (–) libraries
The signal proles of Pb2+-treated and non-treated sam-
ples in a subregion of the non-coding RNase P RNA are
shown as an example in Figure 7. RNase P RNA harbors
several high afnity metal ion binding sites that were iden-
tied already in previous studies (26,61,62). Two major (Ia
and Ib) and one minor (IIa) high afnity metal ion bind-
ing site that are present in the illustrated subregion yielded
strong signals in the Pb2+-treated but not in the H2O control
libraries (Figure 7A, B; see Supplementary Figure S6 and
accompanying supplementary text for further details). Like-
A
B
CD
Figure 7. RNase P RNA signal prole (nucleotide positions 101-200) com-
paring samples treated with (+) or without (–) Pb2+ from (A)2
,3
-cP and
(B)5
-OH libraries. The proles for Pb2+-treated samples (Pb2+(+), repli-
cate 1 and 2) and for the H2O control (Pb2+(–)) are superimposed in dif-
ferent colors. Paired and unpaired nucleotides are indicated by small cyan
(paired) and large ocher (unpaired) spheres below the proles according to
the structure shown in Supplementary Figure S6A; Roman numerals that
mark prominent cleavage sites and nucleotide numbering (x-axis) and as in
Supplementary Figure S6A. (C) Correlation between probing signals for
Pb2+(–) and of Pb2+ (+) 5-OH libraries in the benchmark set. (D) Function
p(S)ofPb
2+(–) libraries reveals their structural content.
wise, Figure 7C illustrates that there are many pronounced
peaks in the proles after lead treatment that are absent
in the control samples. These ndings emphasize the suc-
cessful application and utility of lead-induced cleavage in
combination with end-specic ligation reactions to deter-
mine RNA secondary structure. The high quality of the
data is illustrated in detail in Supplementary Figure S6B.
We noticed that the Pb2+(–) libraries were also highly
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 9
correlated with both Pb2+(+) signal and the unpaired posi-
tions of the known reference structures (Figure 7). Overall,
the distribution of normalized signals in the H2O-treated
sample is similar to that of the Pb2+-treated samples (Sup-
plementary Figure S5A), although the separation of un-
paired and paired positions is less pronounced compared
to Figure 4. We nd that the patterns are close matches for
both the 2,3
-cP libraries (Figure 7A) and the 5-OH li-
braries (Figure 7B). The normalized signals are also highly
correlated across the entire data set (Figure 7C, Supple-
mentary Figure S5B). This similarity between Pb2+-treated
and H2O-treated libraries persists when considering the un-
paired positions of our calibration set. As shown in Fig-
ure 7D (Supplementary Figure S5C), the Pb2+(–) signal can
also be converted into a probability qithat behaves simi-
larly to the one obtained from the Pb2+-treated samples. We
therefore tested whether an additional ‘bonus energy’ based
on Pb2+(–) as independent source of information results in
an increase of secondary structure inference over the use of
Pb2+(+) only. Since this was not the case, the negative con-
trol libraries were not used in further downstream analysis.
Experimental signal improves RNA secondary structure
prediction
Probing data by denition offer only information whether
a given nucleotide is paired or unpaired and therefore pro-
vide experimental constraints on RNA secondary struc-
tures rather then determining the structure unambiguously.
We therefore assessed the usefulness of the Led-Seq data
by testing whether they are capable of improving the sec-
ondary structures over computational predictions based on
the rules of the thermodynamic standard model. To this
end, we calculated pseudo-energy contributions for posi-
tions with qigreater than the average probability to be un-
paired p0. These values served as soft constraints for the
computation of MFE structures with RNAfold.
We observed that for the vast majority of the transcripts
in our benchmark set there is indeed an improvement. More
precisely, the MFE structures computed with the experi-
mentally determined pseudo-energies are close to the ref-
erence structures (Figure 8). All reference structures from
RNAcentral as well as predictions by construction do not
include pseudoknots. Pseudoknots therefore appear as ‘un-
paired’ in the reference structures, see e.g. Supplementary
Figure S6.
Table 1summarizes the results quantitatively in terms of
the quality metrics PPVs, SENs, and MCCs. Notably, ap-
plying the experimental signal of only one type of prob-
ing library (2,3
-cP or 5-OH) already yields substantially
increased prediction accuracy. This demonstrates the high
performance of both libraries individually. As expected,
best accuracy is achieved by incorporating both libraries. In
particular, improvements close to both ends of a transcript
require both libraries since each library type is informative
only for either the 5or the 3terminal region. Figure 9
gives three illustrative examples (5S rRNA, 23S rRNA (do-
main IV) and tRNAIle(GAU)) showing how incorporation of
experimental pseudo-energies guides structure prediction.
In particular, positions engaged in false-positive (stacked)
base pairs predicted by the thermodynamic model are re-
Figure 8. Matthews correlation coefcient (MCC) for structures predicted
with or without experimental signal in the form of soft constraints based
on both libraries. Known references served as ‘correct’ structures. Dots re-
ect one transcript of the benchmark set. Positions above the diagonal cor-
respond to improved structure prediction. For 6S RNA and tRNATyr (la-
beled by *), the positions indicate a lower correlation of the mapped struc-
tures with the reference structures. This, however, is the result of structural
artifacts in the references that do not correspond to experimental ndings.
Table 1. Assessment of secondary structure prediction accuracy. Mini-
mum free energy (MFE) structureswere calculated for all benchmark RNA
sequences with and without experimental data as soft constraints. Apply-
ing only the signal from 2,3
-cP or 5-OH libraries yields higher predic-
tion quality. Incorporating probing data from both library types achieves
highest precision. PPV positive predictive value, SEN sensitivity, MCC
Matthews correlation coefcient
PPV SEN MCC
No constraints 0.62 0.70 0.66
Soft constraints - 2,3
-cP libraries 0.77 0.78 0.77
Soft constraints - 5-OH libraries 0.79 0.79 0.79
Soft constraints - both libraries 0.82 0.81 0.81
arranged towards an open structure upon inclusion of ex-
perimental evidence by means of pseudo-energy contribu-
tions. Prediction accuracy decreases in only three cases (Fig-
ure 8, Supplementary Figure S7). In all three cases, major
parts of the mapped structure agree with the reference. Dif-
ferences are conned to additional base pairs at positions
without cleavage signal and unpaired regions in our pre-
dictions that are shown as paired in the reference but show
large cleavage signals. Considering that the reference struc-
tures are derived from computationally obtained consensus
structures (63), these three cases can be attributed mostly to
issues with the reference structures. In the case of 6S RNA,
it is known that upon transcription, this molecule is restruc-
tured and forms a hairpin between positions 132 and 152
(64–66) (Supplementary Figure S8), which is clearly visi-
ble in our probing analysis. The reference structure, how-
ever, does not represent this element. For tRNATyr , Led-Seq
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
10 Nucleic Acids Research, 2023
A
BC
Figure 9. Effect of Led-Seq data on secondary structure prediction of three examples from the calibration data set: (A) domain IV of the 23S rRNA, (B)5S
rRNA and (C)tRNA
Ile(GAU). In each case we show the structure predicted by the thermodynamic model (left), the structure obtained by incorporating the
probing data as pseudo-energies in RNAfold (middle) and the reference structure taken from the RNAcentral data base. In each case, the incorporation
of the probing data pushes the structure closer to the reference. Note that negative pseudo-energies stabilize the unpaired state of nucleotide positions and
thus led to more open secondary structures. Base pairs that deviate from the reference structure are highlighted in red.
shows an open conformation for the variable loop, while it is
base-paired in the reference. As the resulting terminal loop
consists of three individual residues, it is very likely that this
poses a considerable tension on the short helical part that,
due to the central U-A pair, is not highly stable. As a conse-
quence, this region probably shows a certain fraying, result-
ing in a single-stranded conformation that is recognized by
Led-Seq. For more details, see Supplementary Information.
mRNA analysis
The secondary structures of mRNAs in the vicinity of the
translation start site have repeatedly been reported to be
subject to selective pressures. Such effects can be detected
by considering patterns of accessibility, i.e. the prole of the
position-specic unpairedness qi,seee.g.(23,27,67,68).
We collected all mRNAs for which a stretch of 50 nu-
cleotides upstream of the AUG start codon does not inter-
sect another annotated gene and that are sufciently cov-
ered by the probing data (n=89 for 2,3-cP libraries and n
=146 for 5-OH libraries). Sequences were aligned at AUG
start codons and mean values were calculated for every po-
sition in the 5untranslated region (UTR, positions 48
to 1) and the beginning of the coding sequence (CDS,
positions 1 to 180, 60 codons). As there is no valid sig-
nal for the rst 11 nucleotides within the 2,3-cP libraries,
these positions where excluded. The resulting proles are
displayed in Figure 10. The signal shows a pronounced peak
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 11
Figure 10. Mean normalized probing signal of coding sequences (CDS)
and 5untranslated regions (5-UTR) calculated for mRNAs aligned at
start codon from (top) 2,3-cP libraries (n=89) and (bottom) 5-OH li-
braries (n=146). Included positions range from 48 to 180 with position
1 corresponding to the 5-position of the AUG start codon. Dashed lines
represent mean values for the respective regions.
at position 1, i.e. just before the start codon, indicating an
open conformation at this site. There is a (local) minimum
around position 9. This area coincides with the location
of the ribosomal binding site (Shine-Dalgarno sequence).
Further upstream around position 17, we observe a lo-
cal maximum. The rst 60 nt of CDSs appear to be less
structured than 5-UTRs (2,3-cP libraries: mean 5-UTR
=0.38, mean CDS1-60=0.45, two-sided t-test p=0.05; 5-
OH libraries: mean 5-UTR =0.43, mean CDS1-60=0.48,
two-sided t-test p=0.001). However, after this stretch of 20
codons, the signal drops again to a consistently lower level
(2,3
-cP libraries: mean CDS61-180=0.35, two-sided t-test p
=2.0 ×109;5
-OH libraries: mean CDS61-180=0.42, two-
sided t-test p=6.3 ×1011). We further assessed the mean
structural signal of rst, second and third positions within
codons. Triplets were found to exhibit a signicant period-
icity in CDSs. No periodicity is detectable in the 5-UTRs.
Both libraries suggest that, on average, last codon positions
are less structured than rst codon positions (see Supple-
mentary Figure S9).
DISCUSSION
Utility and limitations of probing methods for RNA structure
elucidation
Here we show that Led-Seq is capable of probing RNA
secondary structures in vivo with highly reproducible re-
sults between biological replicates. A major advantage of
the method is that the interrogation of both cleavage prod-
ucts in the 2,3-cP and 5-OH libraries not only increases
the reliability of the data and provides an internal quality
control, but also ensures that the full length of transcripts
can be probed in a high throughput setting.
As with other probing methods, including SHAPE,
lead cleavage does not unambiguously distinguish between
paired and unpaired positions but provides quantitative
evidence that can be converted into a probability that a
nucleotide is unpaired. We emphasize that this is not a
methodological shortcoming but an inevitable consequence
of the fact that RNAs form a free-energy weighted ensemble
of structures rather than a single, unambiguous secondary
structure (8,69,70). Indeed, recently methods have become
available that deconvolve multiple representative structures
from a probing signal (70–72). The ‘known’ reference struc-
tures are therefore necessarily approximations rather than a
perfect gold standard.
The probing signal is confounded further by the fact that
the chemical reactions underlying a probing method are in-
uenced by the detailed local conformation of the target,
metal ion or protein binding, and other factors that go be-
yond the pairing status of nucleotides. It is therefore not
surprising that the lead probing signal does not distinguish
paired from unpaired positions in an all-or-nothing fashion.
Instead, one observes distributions of signals Sthat are bi-
ased towards larger signals for unpaired positions. The dis-
tribution of lead signals in Figure 4are indeed very simi-
lar to corresponding distributions for SHAPE data, see e.g.
(73).
A general issue of probing methods is that low signals,
here for paired nucleotides, cannot be distinguished from
missing data due to other causes, such as limited accessibil-
ity to the reagent, see e.g. (74,75). Based on this ambiguity,
we cannot safely use low values of Sand thus p(S) as sup-
port for pairedness. We therefore include pseudo-energies
only if p(S)>p0, i.e., if the signal provides evidence that a
position in unpaired. Lifting this restriction indeed did not
improve the prediction quality.
We observed that the incorporation of probing data did
not result in improved predictions in a subset of the bench-
mark structures, and in a few cases the prediction accu-
racy even decreased, albeit only slightly. This is not unex-
pected. First, in many cases the thermodynamic model pro-
duces rather accurate structures even without additional ex-
perimental evidence (76,77). In these cases, additional ex-
perimental data conrm rather than modify the structure.
Moreover, the reference data from RNAcentral have been
obtained with the help of templates that, in turn, are in-
formed either by in vitro structures obtained from NMR or
X-ray data, or have been constructed as consensus struc-
tures over a large set of phylogenetically related molecules
(63). They cannot account for uctuations in the structures
of actual RNAs with (temporarily) open segments as im-
plied by probing data of the transcripts with decreased fold-
ing accuracy. Taken together, it is reassuring that inclusion
of the probing data increases consistency with the reference
models. At the same time there is no reason to expect the
probing data to reproduce the reference structure perfectly.
Figure 8can thus be interpreted as strong support for the
proper functioning and usefulness of Led-Seq.
Negative controls as information source in Led-Seq
We observed that negative controls without lead treatment
(Pb2+(–)) produce a signal that is similar to the Pb2+(+) li-
braries. This observation can be explained by the reactive in-
tracellular environment. For example, [Zn(H2O)5OH]+and
[Cu(H2O)5OH]+with pKavalues of 8 to 9 (78) were also
showntobeabletohydrolyzeE. coli RNase P RNA at neu-
tral pH (79). Considering that Zn2+ and Cu2+ are natural
trace elements, it is reasonable to assume that RNA frag-
ments generated by endogenous transition metal ion hy-
drates entered the 2,3-cP and 5-OH libraries of the H2O
control samples. Although lead-treated libraries are more
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
12 Nucleic Acids Research, 2023
informative owing to the identication of additional cleav-
age sites and a better separation of paired and unpaired sig-
nals, the Pb2+(–)-libraries also convey structural informa-
tion. This implies that the Pb2+(–) libraries are not genuine
background controls, thus questioning the concept to deter-
mine differences or ratios such as SPb2+/SH2Ofor quantify-
ing the results of in vivo lead probing experiments. We em-
phasize that the lack of a ‘control’, or rather a background
signal is not an issue that might invalidate the method. In
fact, genome-wide data sets often include sufcient infor-
mation on the background already in the foreground data
to render negative controls redundant. In the case of Led-
Seq, the relevant information on the background signal is
implicitly provided by the distribution of normalized signals
for paired positions in the reference structures. We therefore
advocate to utilize the control libraries in Led-Seq experi-
ments as an additional source of RNA structure informa-
tion and to assess the quality and integrity of Led-Seq data
by computing the distributions of normalized signals and
the functions p(S) using a few reference structures, prefer-
ably well-characterized non-coding RNAs.
Bias for cleavage between pyrimidine and adenosine in 2,3-
cP libraries
In general, RNA cleavage induced by Pb2+ has been re-
ported to have no overall sequence bias (27). Nevertheless,
a considerable bias towards the representation of cleavage
sites between CA, and to a lesser extent UA, dinucleotides
was evident in our 2,3-cP libraries. At present, we can
only speculate about the origin of this effect. In several
small RNA sequencing libraries, ligation biases were de-
scribed, and the secondary structures of the ligation part-
ners seem to represent a major cause of such biases (80).
In our libraries, however, the CA (UA) bias is presum-
ably not induced by secondary structures, because the re-
maining three dinucleotides starting with C or U did not
show this behavior. In addition, we did not observe any
sequence-related effect in ligation efciency. Theoretically,
the observed pattern could also be caused by an endonu-
clease with a specic recognition sequence that leaves 2,3-
cP residues after cleavage. E. coli expresses such enzymes,
most notably MazF (81,82). In a previous study targeting
cyclic phosphate-containing RNAs in mice, a similar effect
was observed predominantly in mRNA and attributed to
a hypothetical enzymatic cleavage event (83). In following
studies, the authors reported a similar effect in human cell
lines and explained it as the product of ANG cleavage (84).
In Bombyx mori cells, cleavage with BmRNase (85)was
identied as a possible cause. If a similar mechanism is at
work on E. coli RNA, we would expect to observe the over-
representation of the CA dinucleotide in both, 2,3-cP and
5-OH libraries. This, however, is not the case.
Hence, the molecular basis for this phenomenon is cur-
rently unclear. To our knowledge, no bacterial enzyme is
known that cleaves RNAs specically at pyrimidine-A din-
ucleotides leaving 2,3-cP but not 5-OH groups (82). The
effect is also not caused by an increased occurrence of the
CA dinucleotide in the E. coli transcriptome and our data
do not contain one or a few ‘hotspot’ RNAs whose over-
representation might explain the CA bias. Interestingly, UA
and CA phosphodiester-bonds have been described as more
susceptible to hydrolysis in general. However, this effect was
later reported to be too small and unsystematic and also
highly dependent on the other neighboring nucleotides in
the investigated oligomers. Eventually, it was attributed to
other structural effects, such as stacking interactions, that
enhance or reduce cleavage (86–89). Further investigation
is needed to determine the cause for this dinucleotide bias
in 2,3-cP libraries. From another perspective, our observa-
tion of a CA bias in the 2,3-cP libraries, but essentially not
in the 5-OH libraries, again illustrates the strength of our
dual Led-Seq approach. The observed discrepancy suggests
unresolved technical reasons and prevented us from draw-
ing premature conclusions on the biological signicance of
this nding.
Analysis of mRNA structures
Led-Seq can also be used to evaluate mRNA structure.
Lead treatment of cells resulted in a larger relative abun-
dance of reads mapping to mRNAs in both libraries (Fig-
ure 3). The coverage of mRNAs (or other non-coding
RNAs) could be improved further by implementing an
rRNA depletion step. Since we aimed to demonstrate the
general applicability of Led-Seq in this work, we did not in-
clude any selection step for certain RNAs, especially since
rRNAs and tRNAs are also part of our benchmark set. In
rst experiments on the applicability of commercially avail-
able rRNA depletion kits we could already observe that
both RNase H-based (NEBNext®, NEB) and bead-based
(riboPOOLs, siTOOLs) kits are compatible with our ap-
proach and allow a considerable reduction of the rRNA
content (data not shown). With the possibility of an indi-
vidual adaptation of the used probes to other targets, such
as tRNAs, we expect a general applicability of these kits for
the depletion of a variety of RNAs. Nevertheless, the appli-
cation of these methods can also bring disadvantages. For
example, RNase H-based depletion methods have already
been shown to display off-target effects that can negatively
impact ribosome proling data (90). Averaging the normal-
ized probing signals of mRNAs aligned at the start codon
(position 1) revealed a local minimum around 9 nt and a
local maximum around 17nt in both libraries. These sig-
nals are remarkably consistent with earlier reports based
on parallel analysis of RNA structure (PARS) probing data
(67). Del Campo et al. postulated that the unstructured re-
gion 20 nt upstream of the start codon serves as a non-
specic docking site of the 30S ribosomal subunit and de-
scribed it as a general feature of E. coli genes. They inter-
preted the low signal near nucleotide position -10 as an
effect of the Shine-Dalgarno sequence. We also observed
a substantially increased signal immediately preceding the
start of CDS, implying an open conformation. This obser-
vation also conforms to previous ndings (67,68,91). Com-
parison of the average signal intensity of UTRs and CDSs
showed a signicant increase in the rst 60 nt of the CDS.
We also noticed a periodicity of the signal in the mRNA
coding regions, while such a signal is absent in the UTR.
Interestingly, the 2,3
-cP mapping shows a signicantly
higher signal for every 3rd position of a codon. In con-
trast, 5-OH data show no difference between 2nd and
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 13
3rd position, but a signicantly lower signal for the rst po-
sition. Taken together, this leads to the conclusion that the
third codon position is more susceptible to Pb2+-induced
cleavage than the rst position and thus appears less likely
to be ‘structured’. The same effect was already observed in
PARS data of E. coli (67), as well as in lead-based struc-
ture analysis in Yersinia pseudotuberculosis (27). Other stud-
ies also recognized a periodic pattern in eukaryotic mRNA.
However, in DMS-seq data of A. thaliana (68), PARS data
for S. cerevisiae (23), and CIRS-seq data for mice (92), the
rst nucleotide was least likely to be ‘structured’. Intrigu-
ingly, no periodicity was observed in SHAPE-MaP data for
E. coli (93). The authors provided several explanations for
this discrepancy, which we would like to address with an
emphasis on our approach: The authors argued that the
RNases used in the PARS approach are known to exhibit a
certain sequence bias in their cleavage efciency. In contrast,
Pb2+-induced cleavage occurs sequence-independent (27).
Another potential cause are artifactual signals generated ei-
ther by methodical or cellular processes. While we cannot
rule out a methodical cause entirely, we would not expect to
see a periodical pattern restricted to coding regions. More-
over, cotranslational decay by exonucleases as described by
Pelechano et al. (94) would result in fragments that are not
mapped by our approach. To our knowledge, no known ex-
onuclease leaves 2,3-cP and 5-OH ends. The periodicity
effect observable in our data is also not assignable to in vitro
probing conditions, as we used an in vivo probing approach.
The only suggested explanation for the mentioned discrep-
ancy between the methods, that applies to Led-Seq, is that
it relies on the enzymatic ligation of adapters to cleavage
fragments to infer structural information. Therefore it is,
in principle, prone to a bias based on the ligation proper-
ties of the used enzymes. Despite the enhanced CA dinu-
cleotide cleavage described above, where a ligation bias can
be excluded, we could not identify a ligation bias matching
the average nucleotide composition of the investigated mR-
NAs. The periodicity of the genetic code is a known feature
based on the fact that in coding sequences no truly random
nucleotide/triplet composition is present due to the inher-
ent bias of amino acid-coding triplets (95). Our results sug-
gest that the observed structural periodicity is in fact caused
by this intrinsic feature of coding sequences in DNA and
therefore in mRNA. It is conceivable that the small size al-
lows the lead ions to enter actively translating ribosomes,
where they might have access to the mRNA codons engaged
in tRNA binding. The weak wobble interaction at codon
position three would then be less protected from cleavage,
resulting in the observed periodicity pattern. Further in-
vestigation is needed to explain the discrepancy between
SHAPE-MaP results and other previous ndings concern-
ing CDS structural periodicity.
CONCLUSION
The Led-Seq approach described here offers the unique
advantage that both sites of the metal ion-induced cleav-
age position in RNA are mapped, increasing the reliabil-
ity of the observed signals. Furthermore, the interrogation
of both cleavage sites allows for the structural analysis of
RNA regions close to the 5and 3ends of transcripts, as
the two separate libraries can mutually compensate for in-
formation loss at the end of transcripts, which is caused by
inaccurate mapping of short cDNAs to the genome. This
poses an advantage over other sequencing based methods
that lose the 3-end information because of a missing com-
pensation option. While mutational proling approaches
elegantly circumvent that information loss, RT reactions
generally represent an error-prone enzymatic step and in-
herent RT stops and nucleotide misincorporations may re-
sult in a loss of information. Using our double-end ap-
proach, we minimize articially introduced signals by a re-
dundant design of the method. Nonetheless, Led-Seq and
SHAPE based approaches complement each other well, as
they involve entirely different chemistries. An additional ad-
vantage resulting from the use of metal ion-based cleav-
age is its potential applicability for the in vivo investigation
of the structurome of psychrophiles and thermophiles, as
the exploited probing reaction is theoretically suitable for
a wide range of temperatures. Taken together, the double-
end structure investigation of Led-Seq represents a very use-
ful approach to characterize RNA structures in vivo as well
as in vitro, expanding our technical arsenal to investigate
structure–function relations of RNA.
DATA AVAILABILITY
The data for this study have been deposited in the European
Nucleotide Archive (ENA) at EMBL-EBI under acces-
sion number PRJEB58715, see www.ebi.ac.uk/ena/browser/
view/PRJEB58715. A computational pipeline is accessi-
ble at github.com/xamiiii/Led-Seq and https://doi.org/10.
5281/zenodo.7821447.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
ACKNOWLEDGEMENTS
We thank Maja Etzel and Christian Lorenz for pilot stud-
ies, Tobias Friedrich for expert technical assistance and
Christina E. Weinberg, V. Janett Olzog and Leif A. Kirse-
bom for valuable scientic discussion. Special thank goes to
K. Moon and J. Page for acronym inspiration.
FUNDING
This work was supported by Deutsche Forschungsgemein-
schaft [MO 634/18-1, STA 850/48-1, GRK 2355 to RKH].
Funding for open access charge: Open Access Publishing
Fund of Leipzig University supported by the Deutsche
Forschungsgemeinschaft within the program Open Access
Publication Funding.
Conict of interest statement. None declared.
REFERENCES
1. Guerrier-Takada,C., Gardiner,K., Marsh,T., Pace,N. and Altman,S.
(1983) The RNA moiety of ribonuclease P is the catalytic subunit of
the enzyme. Cell,35, 849–857.
2. Kieft,J.S., Rabe,J.L. and Chapman,E.G. (2015) New hypotheses
derived from the structure of a aviviral Xrn1-resistant RNA:
conservation, folding, and host adaptation. RNA Biol.,12, 1169–1177.
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
14 Nucleic Acids Research, 2023
3. Serganov,A. and Patel,D.J. (2007) Ribozymes, riboswitches and
beyond: regulation of gene expression without proteins. Nat. Rev.
Genet.,8, 776–790.
4. Vicens,Q. and Kieft,J.S. (2022) Thoughts on how to think (and talk)
about RNA structure. Proc. Natl. Acad. Sci. U.S.A.,119,
e2112677119.
5. Thirumalai,D., Lee,N., Woodson,S.A. and Klimov,D.K. (2001) Early
events in RNA folding. Annu. Rev. Phys. Chem.,52, 751–762.
6. Turner,D.H. and Mathews,D.H. (2010) NNDB: the nearest neighbor
parameter database for predicting stability of nucleic acid secondary
structure. Nucleic Acids Res.,38, D280–D282.
7. Zuker,M. and Stiegler,P. (1981) Optimal computer folding of larger
RNA sequences using thermodynamics and auxiliary information.
Nucleic Acids Res.,9, 133–148.
8. McCaskill,J.S. (1990) The equilibrium partition function and base
pariring probabilities for RNA secondary structures. Biopolymers,29,
1105–1119.
9. Sun,L., Fazal,F.M., Li,P., Broughton,J.P., Lee,B., Tang,L., Huang,W.,
Kool,E.T., Chang,H.Y. and Zhang,Q.C. (2019) RNA structure maps
across mammalian cellular compartments. Nat. Struct. Mol. Biol.,26,
322–330.
10. Wang,X.-W., Liu,C.-X., Chen,L.-L. and Zhang,Q.C. (2021) RNA
structure probing uncovers RNA structure-dependent biological
functions. Nat. Chem. Biol.,17, 755–766.
11. Lorenz,R., Luntzer,D., Hofacker,I.L., Stadler,P.F. and
Wolnger,M.T. (2016) SHAPE directed RNA folding. Bioinformatics,
32, 145–147.
12. Lorenz,R., Hofacker,I.L. and Stadler,P.F. (2016) RNA folding with
hard and soft constraints. Alg. Mol. Biol.,11,8.
13. Spasic,A., Assmann,S.M., Bevilacqua,P.C. and Mathews,D.H. (2018)
Modeling RNA secondary structure folding ensembles using SHAPE
mapping data. Nucleic Acids Res.,46, 314–323.
14. Gilmer,O., Quignon,E., Jousset,A.-C., Paillart,J.-C., Marquet,R. and
Vivet-Boudou,V. (2021) Chemical and enzymatic probing of viral
RNAs: From infancy to maturity and beyond. Viruses,13, 1894.
15. Mailler,E., Paillart,J.-C., Marquet,R., Smyth,R.P. and
Vivet-Boudou,V. (2019) The evolution of RNA structural probing
methods: From gels to next-generation sequencing. Wiley Interdiscipl.
Rev. RNA,10, e1518.
16. Strobel,E.J., Yu,A.M. and Lucks,J.B. (2018) High-throughput
determination of RNA structures. Nat. Rev. Genet.,19, 615–634.
17. Merino,E.J., Wilkinson,K.A., Coughlan,J.L. and Weeks,K.M. (2005)
RNA structure analysis at single nucleotide resolution by selective
2’-hydroxyl acylation and primer extension (SHAPE). J. Am. Chem.
Soc.,127, 4223–4231.
18. Lee,B., Flynn,R.A., Kadina,A., Guo,J.K., Kool,E.T. and Chang,H.Y.
(2017) Comparison of SHAPE reagents for mapping RNA structures
inside living cells. RNA,23, 169–174.
19. Kwok,C.K., Tang,Y., Assmann,S.M. and Bevilacqua,P.C. (2015) The
RNA structurome: transcriptome-wide structure probing with
next-generation sequencing. Trends Biochem. Sci.,40, 221–232.
20. Lu,Z. and Chang,H.Y. (2016) Decoding the RNA structurome. Curr.
Op. Struct. Biol.,36, 142–148.
21. Westhof,E. and Romby,P. (2010) The RNA structurome.
High-throughput probing. Nat. Methods,7, 965–967.
22. Underwood,J.G., Uzilov,A.V., Katzman,S., Onodera,C.S.,
Mainzer,J.E., Mathews,D.H., Lowe,T.M., Salama,S.R. and
Haussler,D. (2010) FragSeq. Transcriptome-wide RNA structure
probing using high-throughput sequencing. Nat. Methods,7,
995–1001.
23. Kertesz,M., Wan,Y., Mazor,E., Rinn,J.L., Nutter,R.C., Chang,H.Y.
and Segal,E. (2010) Genome-wide measurement of RNA secondary
structure in yeast. Nature,467, 103–107.
24. Forconi,M. and Herschlag,D. (2009) Metal ion-based RNA cleavage
as a structural probe. In: Herschlag,D. (ed). Biophysical, Chemical,
and Functional Probes of RNA Structure, Interactions and Folding.
Academic Press/Elsevier, San Diego, CA, Vol. 468, pp. 91–106.
25. Rubin,J.R. and Sundaralingam,M. (1983) Lead ion binding and
RNA chain hydrolysis in phenylalanine tRNA. J. Biomol. Struct.
Dyn.,1, 639–646.
26. Ciesiolka,J., Hardt,W.D., Schlegl,J., Erdmann,V.A. and
Hartmann,R.K. (1994) Lead-ion-induced cleavage of RNase P RNA.
Eur. J. Biochem.,219, 49–56.
27. Twittenhoff,C., Brandenburg,V.B., Righetti,F., Nuss,A.M., Mosig,A.,
Dersch,P. and Narberhaus,F. (2020) Lead-seq: transcriptome-wide
structure probing in vivo using lead(II) ions. Nucleic Acids Res.,48,
e71.
28. Lindell,M., Romby,P. and Wagner,E. G.H. (2002) Lead(II) as a probe
for investigating RNA structure in vivo.RNA,8, 534–541.
29. Englert,M. and Beier,H. (2005) Plant tRNA ligases are
multifunctional enzymes that have diverged in sequence and substrate
specicity from RNA ligases of other phylogenetic origins. Nucleic
Acids Res.,33, 388–399.
30. Schutz,K., Hesselberth,J.R. and Fields,S. (2010) Capture and
sequence analysis of RNAs with terminal 2’,3’-cyclic phosphates.
RNA,16, 621–631.
31. Remus,B.S. and Shuman,S. (2013) A kinetic framework for tRNA
ligase and enforcement of a 2’-phosphate requirement for ligation
highlights the design logic of an RNA repair machine. RNA,19,
659–669.
32. Olzog,V.J., G¨
artner,C., Stadler,P.F., Fallmann,J. and Weinberg,C.E.
(2021) cyPhyRNA-seq: a genome-scale RNA-seq method to detect
active self-cleaving ribozymes by capturing RNAs with 2’,3’ cyclic
phosphates and 5’ hydroxyl ends. RNA Biol.,18, 818–831.
33. Chakravarty,A.K., Subbotin,R., Chait,B.T. and Shuman,S. (2012)
RNA ligase RtcB splices 3’-phosphate and 5’-OH ends via covalent
RtcB-(histidinyl)-GMP and polynucleotide-(3’)pp(5’)G
intermediates. Proc. Natl. Acad. Sci. U.S.A.,109, 6072–6077.
34. Peach,S.E., York,K. and Hesselberth,J.R. (2015) Global analysis of
RNA cleavage by 5’-hydroxyl RNA sequencing. Nucleic Acids Res.,
43, e108.
35. Solayman,M., Litn,T., Zhou,Y. and Zhan,J. (2022) High-throughput
mapping of RNA solvent accessibility at the single-nucleotide
resolution by RtcB ligation between a xed 5’-OH-end linker and
unique 3’-P-end fragments from hydroxyl radical cleavage. RNA Biol.,
19, 1179–1189.
36. Viollet,S., Fuchs,R.T., Munafo,D.B., Zhuang,F. and Robb,G.B.
(2011) T4 RNA ligase 2 truncated active site mutants: improved tools
for RNA analysis. BMC Biotech.,11, 72.
37. Blondal,T., Thorisdottir,A., Unnsteinsdottir,U., Hjorleifsdottir,S.,
Ævarsson,A., Ernstsson,S., Fridjonsson,O.H., Skirnisdottir,S.,
Wheat,J.O., Hermannsdottir,A.G. et al. (2005) Isolation and
characterization of a thermostable RNA ligase 1 from a Thermus
scotoductus bacteriophage TS2126 with good single-stranded DNA
ligation properties. Nucleic Acids Res.,33, 135–142.
38. Tanaka,N. and Shuman,S. (2011) RtcB is the RNA ligase component
of an Escherichia coli RNA repair operon. J Biol. Chem.,286,
7727–7731.
39. Wang,L.K. and Shuman,S. (2001) Domain structure and mutational
analysis of T4 polynucleotide kinase. J. Biol. Chem.,276,
26868–26874.
40. Ivanova,N., Lindell,M., Pavlov,M., Holmberg Schiavone,L.,
Wagner,E.G.H. and Ehrenberg,M. (2007) Structure probing of
tmRNA in distinct stages of trans-translation. RNA,13, 713–722.
41. Sambrook,J. and Russell,D.W. (2006) Purication of nucleic acids by
extraction with phenol:chloroform. CSH Protoc.,2006,1.
42. Seidl,C.I. and Ryan,K. (2011) Circular single-stranded synthetic
DNA delivery vectors for microRNA. PLOS ONE,6, e16925.
43. Bender,M., Holben,W.E., Sørensen,S.J. and Jacobsen,C.S. (2007) Use
of a PNA probe to block DNA-mediated PCR product formation in
prokaryotic RT-PCR. BioTechniques,42, 609–614.
44. Ewels,P., Magnusson,M., Lundin,S. and K¨
aller,M. (2016) MultiQC:
summarize analysis results for multiple tools and samples in a single
report. Bioinformatics,32, 3047–3048.
45. Martin,M. (2011) Cutadapt removes adapter sequences from
high-throughput sequencing reads. EMBnet J.,17,1.
46. Hoffmann,S., Otto,C., Kurtz,S., Sharma,C., Khaitovich,P., Vogel,J.,
Stadler,P.F. and Hackerm¨uller,J. (2009) Fast mapping of short
sequences with mismatches, insertions and deletions using index
structures. PLOS Comp. Biol.,5, e1000502.
47. Hoffmann,S., Otto,C., Doose,G., Tanzer,A., Langenberger,D.,
Christ,S., Kunz,M., Holdt,L.M., Teupser,D., Hackerm¨uller,J. et al.
(2014) A multi-split mapping algorithm for circular RNA, splicing,
trans-splicing, and fusion detection. Genome Biol.,15, R34.
48. Smith,T.S., Heger,A. and Sudbery,I. (2017) UMI-tools: modelling
sequencing errors in unique molecular identiers to improve
quantication accuracy. Genome Res.,27, 491–499.
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
Nucleic Acids Research, 2023 15
49. Quinlan,A.R. and Hall,I.M. (2010) BEDTools: a exible suite of
utilities for comparing genomic features. Bioinformatics,26, 841–842.
50. Low,J.T. and Weeks,K.M. (2010) SHAPE-directed RNA secondary
structure prediction. Methods,52, 150–158.
51. Petereit,J. (2022) Pipeline automation via snakemake. Methods Mol.
Biol.,2443, 181–196.
52. Kalvari,I., Nawrocki,E., Argasinska,J., Quinones-Olvera,N., Finn,R.,
Bateman,A. and Petrov,A.I. (2018) Non-coding RNA analysis using
the Rfam database. Curr. Protoc. Bioinform.,62, e51.
53. Jaeger,J.A., Turner,D.H. and Zuker,M. (1989) Improved predictions
of secondary structures for RNA. Proc. Natl. Acad. Sci. U.S.A.,86,
7706–7710.
54. Mathews,D.H., Sabina,J., Zuker,M. and Turner,D.H. (1999)
Expanded sequence dependence of thermodynamic parameters
improves prediction of RNA secondary structure. J. Mol. Biol.,288,
911–940.
55. Zarringhalam,K., Meyer,M.M., Dotu,I., Chuang,J.H. and Clote,P.
(2012) Integrating chemical footprinting data into RNA secondary
structure prediction. PLOS ONE,7, e45160.
56. Cordero,P., Kladwang,W., Vanlang,C.C. and Das,R. (2012)
Quantitative dimethyl sulfate mapping for automated RNA
secondary structure inference. Biochemistry,51, 7037–7039.
57. Lorenz,R., Bernhart,S.H., H ¨
oner zu Siederdissen,C., Tafer,H.,
Flamm,C., Stadler,P.F. and Hofacker,I.L. (2011) ViennaRNA
Package 2.0. Alg. Mol. Biol.,6, 26.
58. Kerpedjiev,P., Hammer,S. and Hofacker,I.L. (2015) Forna
(force-directed RNA): simple and effective online RNA secondary
structure diagrams. Bioinformatics,31, 3377–3379.
59. Regulski,E.E. and Breaker,R.R. (2008) In-line probing analysis of
riboswitches. Methods Mol. Biol.,419, 53–67.
60. Giesen,U., Kleider,W., Berding,C., Geiger,A., Ørum,H. and
Nielsen,P.E. (1998) A formula for thermal stability (Tm) prediction of
PNA/DNA duplexes. Nucleic Acids Res.,26, 5004–5006.
61. Hardt,W.D. and Hartmann,R.K. (1996) Mutational analysis of the
joining regions anking helix P18 in E. coli RNase P RNA. J. Mo l.
Biol.,259, 422–433.
62. Lindell,M., Br ¨
annvall,M., Wagner,E.G. and Kirsebom,L.A. (2005)
Lead(II) cleavage analysis of RNase P RNA in vivo.RNA,11,
1348–1354.
63. Sweeney,B.A., Hoksza,D., Nawrocki,E.P., Ribas,C.E., Madeira,F.,
Cannone,J.J., Gutell,R., Maddala,A., Meade,C.D., Williams,L.D.
et al. (2021) R2DT is a framework for predicting and visualising
RNA secondary structure using templates. Nat. Commun.,12, 3494.
64. Chen,J., Wassarman,K.M., Feng,S., Leon,K., Feklistov,A.,
Winkelman,J.T., Li,Z., Walz,T., Campbell,E.A. and Darst,S.A. (2017)
6S RNA mimics B-form DNA to regulate Escherichia coli RNA
polymerase. Mol. Cell.,68, 388–397.
65. Beckmann,B.M., Hoch,P.G., Marz,M., Willkomm,D.K., Salas,M.
and Hartmann,R.K. (2012) A pRNA-induced structural
rearrangement triggers 6S-1 RNA release from RNA polymerase in
Bacillus subtilis.EMBO J.,31, 1727–1738.
66. Panchapakesan,S. S.S. and Unrau,P.J. (2012) E. coli 6S RNA release
from RNA polymerase requires 70 ejection by scrunching and is
orchestrated by a conserved RNA hairpin. RNA,18, 2251–2259.
67. Del Campo,C., Bartholom ¨
aus,A., Fedyunin,I. and Ignatova,Z.
(2015) Secondary structure across the bacterial transcriptome reveals
versatile roles in mRNA regulation and function. PLOS Genet.,11,
e1005613.
68. Ding,Y., Tang,Y., Kwok,C.K., Zhang,Y., Bevilacqua,P.C. and
Assmann,S.M. (2014) In vivo genome-wide proling of RNA
secondary structure reveals novel regulatory features. Nature,505,
696–700.
69. Rogers,E. and Heitsch,C.E. (2014) Proling small RNA reveals
multimodal substructural signals in a Boltzmann ensemble. Nucleic
Acids Res.,42, e171.
70. Aviran,S. and Incarnato,D. (2022) Computational approaches for
RNA structure ensemble deconvolution from structure probing data.
J. Mol. Biol.,434, 167635.
71. Li,T. J.X. and Reidys,C.M. (2020) On an enhancement of RNA
probing data using information theory. Alg. Mol. Biol.,15, 15.
72. Morandi,E., Manfredonia,I., Simon,L.M., Anselmi,F., van
Hemert,M.J., Oliviero,S. and Incarnato,D. (2021) Genome-scale
deconvolution of RNA structure ensembles. Nat. Methods,18,
249–252.
73. Kutchko,K.M. and Laederach,A. (2016) Transcending the prediction
paradigm: novel applications of SHAPE to RNA function and
evolution: Novel applications of SHAPE. WIREs RNA,8, 1374.
74. Ingle,S., Azad,R.N., Jain,S.S. and Tullius,T.D. (2014) Chemical
probing of RNA with the hydroxyl radical at single-atom resolution.
Nucleic Acids Res.,42, 12758–12767.
75. Solayman,M., Litn,T., Singh,J., Paliwal,K., Zhou,Y. and Zhan,J.
(2022) Probing RNA structures and functions by solvent accessibility:
an overview from experimental and computational perspectives.
Brief. Bioinform.,23, bbac112.
76. Hajiaghayi,M., Condon,A. and Hoos,H.H. (2012) Analysis of
energy-based algorithms for RNA secondary structure prediction.
BMC Bioinformatics,13, 22.
77. Xu,X. and Chen,S.-J. (2015) Physics-based RNA structure prediction.
Biophys. Rep.,1, 2–13.
78. Jackson,V.E., Felmy,A.R. and Dixon,D.A. (2015) Prediction of the
pKa’s of aqueous metal ion +2 complexes. J. Phys. Chem. A,119,
2926–2939.
79. Kazakov,S. and Altman,S. (1991) Site-specic cleavage by metal ion
cofactors and inhibitors of M1 RNA, the catalytic subunit of RNase
PfromEscherichia coli.Proc. Natl. Acad. Sci. U.S.A.,88, 9193–9197.
80. Fuchs,R.T., Zhiyi,S., Zhuang,F. and Robb,G.B. (2015) Bias in
ligation-based small RNA sequencing library construction is
determined by adaptor and RNA structure. PLOS One,10, e0126049.
81. Zhang,Y., Zhang,J., Hara,H., Kato,I. and Inouye,M. (2005) Insights
into the mRNA Cleavage Mechanism by MazF, an mRNA
Interferase. J.Biol. Chem.,280, 3143–3150.
82. Bechhofer,D.H. and Deutscher,M.P. (2019) Bacterial ribonucleases
and their roles in RNA metabolism. Crit. Rev. Biochem. Mol. Biol.,
54, 242–300.
83. Shigematsu,M., Morichika,K., Kawamura,T., Honda,S. and
Kirino,Y. (2019) Genome-wide identication of short 2’,3’-cyclic
phosphate-containing RNAs and their regulation in aging. PLOS
Genet.,15, e1008469.
84. Shigematsu,M. and Kirino,Y. (2020) Oxidative stress enhances the
expression of 2’,3’-cyclic phosphate-containing RNAs. RNA Biol.,17,
1060–1069.
85. Shigematsu,M., Kawamura,T., Morichika,K., Izumi,N., Kiuchi,T.,
Honda,S., Pliatsika,V., Matsubara,R., Rigoutsos,I., Katsuma,S. et al.
(2021) RNase promotes robust piRNA production by generating
2’,3’-cyclic phosphate-containing precursors. Nat. Commun.,12,
4498.
86. Kierzek,R. (1992) Hydrolysis of oligoribonucleotides: inuence of
sequence and length. Nucleic Acids Res.,20, 5073–5077.
87. Ciesiołka,J., Michałowski,D., Wrzesinski,J., Krajewski,J. and
Krzyzosiak,W.J. (1998) Patterns of cleavages induced by lead ions in
dened RNA secondary structure motifs. J. Mol. Biol.,275, 211–220.
88. Mikkola,S., Kaukinen,U. and L¨
onnberg,H. (2001) The effect of
secondary structure on cleavage of the phosphodiester bonds of
RNA. Cell Biochem. Biophys.,34, 95–119.
89. Kaukinen,U., Ven¨
al¨
ainen,T., L¨
onnberg,H. and Per¨
akyl¨
a,M. (2003)
The base sequence dependent exibility of linear single-stranded
oligoribonucleotides correlates with the reactivity of the
phosphodiester bond. Org. Biomol. Chem.,1, 2439–2447.
90. Zinshteyn,B., Wangen,J.R., Hua,B. and Green,R. (2020)
Nuclease-mediated depletion biases in ribosome footprint proling
libraries. RNA,26, 1481–1488.
91. Burkhardt,D.H., Rouskin,S., Zhang,Y., Li,G.-W., Weissman,J.S. and
Gross,C.A. (2017) Operon mRNAs are organized into ORF-centric
structures that predict translation efciency. eLife,6, e22037.
92. Incarnato,D., Neri,F., Anselmi,F. and Oliviero,S. (2014)
Genome-wide proling of mouse RNA secondary structures reveals
key features of the mammalian transcriptome. Genome Biol.,15, 491.
93. Mustoe,A.M., Busan,S., Rice,G.M., Hajdin,C.E., Peterson,B.K.,
Ruda,V.M., Kubica,N., Nutiu,R., Baryza,J.L. and Weeks,K.M.
(2018) Pervasive regulatory functions of mRNA structure revealed by
high-resolution SHAPE probing. Cell,173, 181–195.
94. Pelechano,V., Wei,W. and Steinmetz,L. (2015) Widespread
co-translational RNA decay reveals ribosome dynamics. Cell,161,
1400–1412.
95. Shabalina,S.A., Ogurtsov,A.Y. and Spiridonov,N.A. (2006) A
periodic pattern of mRNA secondary structure created by the genetic
code. Nucleic Acids Res.,34, 2428–2437.
Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad312/7146347 by guest on 29 April 2023
... It is also readily extended to aggregating evidence from different sources, e.g., from different probing experiments. This is used in practice, e.g., in the Led-Seq approach (Kolberg et al., 2023). There, each lead-induced cleavage at single-stranded positions is assayed both via the 2′,3′-cyclophosphate end and the 5′-OH end and modeled via a two-dimensional sigmoidal fit p(S 1 , S 2 ). ...
... Of course, it is also possible to stratify the signal for instance by the identity of the cleaved di-nucleotide. The function p(S) can be estimated by comparing the observed signal S with the Normalized intensities S obtained from probing experiments [in this example from a Led-Seq cP library (Kolberg et al., 2023)] can be converted to probabilities of a structural feature (here the probability of a sequence position to be unpaired) by relating the empirical signal in a bin [S, S + ΔS] to the frequency of the feature of interest (here the frequency of observing an unpaired position) in a reference set. Here we compare a manually curated set of 32 reference secondary structures from Escherichia coli (▽) to secondary structures predicted from the thermodynamic model using the ViennaRNA software (•) of all sequences with valid probing information from the same study (Kolberg et al., 2023). ...
... The function p(S) can be estimated by comparing the observed signal S with the Normalized intensities S obtained from probing experiments [in this example from a Led-Seq cP library (Kolberg et al., 2023)] can be converted to probabilities of a structural feature (here the probability of a sequence position to be unpaired) by relating the empirical signal in a bin [S, S + ΔS] to the frequency of the feature of interest (here the frequency of observing an unpaired position) in a reference set. Here we compare a manually curated set of 32 reference secondary structures from Escherichia coli (▽) to secondary structures predicted from the thermodynamic model using the ViennaRNA software (•) of all sequences with valid probing information from the same study (Kolberg et al., 2023). A smooth function p(S) is then obtained by fitting a sigmoidal curve to the empirical data. ...
... Dabei ist zu beachten, dass im Fall von Led-Seq, SHAPE und ähnlichen Protokollen nur hohe Signale als Hinweise auf ungepaarte Positionen verwendet werden können, während das Fehlen von Signal nicht notwendigerweise impliziert, dass die entsprechenden Nukleotide gepaart vorliegen [13]. Wahrscheinlichkeiten, die einen Schwellenwert übersteigen, können dann in Pseudoenergien umgerechnet und als externe Information in Sekundärstruktur-Vorhersage-Algorithmen genutzt werden, um deutlich verbesserte Strukturmodelle zu erzeugen [12]. Die positionsspezifischen Profile aus Probing-Experimenten sindunabhängig vom gewählten Protokoll -für sich allein genommen nicht ausreichend, um gute Strukturmodelle zu erzeugen. ...
... In der Referenzstruktur ungepaarte Nukleotide sind orange, gepaarte Nukleotide blau markiert, zur Visualisierung wurde das Signal zusätzlich als geglättete Savitzky-Golay-Kurve dargestellt. Abbildung modifi ziert nach Kolberg et al. 2023 [12]. ...
Article
Full-text available
With Next Generation Sequencing, technical advances in the field of nucleic acid sequencing enabled access to the entire transcriptome of organisms. However, the variety of information available in RNA-seq data exceeds mere sequence information. Specific RNA molecules, as well as their modification patterns or their spatial structure, can be addressed using customized biochemical and bioinformatic methods, providing deep insights into regulatory processes at transcriptome level.
... Wu et al. [34] developed a KARR-seq method to precisely capture stable RNA base-pairing contacts and to determine transient RNA-RNA interactions that reveal RNA functions. Kolberg et al. [35] proposed a Led-Seq approach, a new method based on lead-induced cleavage and ligation of RNA fragments. They have shown promising results through inclusion of all cellular RNAs in libraries that enable probing of full-length transcripts. ...
Article
Full-text available
RNA plays a crucial role in regulating gene expression, thereby ensuring the maintenance of genetic integrity. RNA can fold into various structures based on alternative intra-molecular base-pairings and plays a critical role in the production of functional proteins. Many experimental and computational techniques have revealed the secondary and tertiary structures of RNA molecules. With the increase in RNA data, structure predictions have evolved from conventional to advanced computational methods. Various bioinformatics methods have been developed to predict RNA structures to understand underlying molecular mechanisms. This review summarizes multiple methodologies for RNA structure prediction, encompassing biophysical techniques, probing methods, and computational approaches. These computational approaches include free energy minimization, comparative sequence analysis, deep learning algorithms, and hybrid methods. Since the current era is dedicated to deep learning techniques, the present review highlights the significance of these methods to provide better insights into RNA structures that can be further explored to discover novel therapeutic drug targets for diseases.
... In recent years, we have witnessed a rapid advancement in genome-wide methodologies for identifying novel RNA-Ts, exploiting their secondary structures. Essentially, transcriptome-wide probing integrates structure probing techniques with next-generation sequencing to uncover RNA structural alterations across various temperatures, defining the temperature-responsive in vivo RNA structurome of bacterial pathogens [99][100][101]. Despite the diversity and poor conservation of RNA-T nucleotide sequences and secondary structures, bioinformatic tools have been used to identify novel RNA-Ts across multiple genomes [102][103][104]. The recently developed Robo-Therm tool combines an adaptive and user-friendly in silico motif search with a wellestablished reporter system [105]. ...
Article
Extrinsic, experimental information can be incorporated into thermodynamics-based RNA folding algorithms in the form of pseudo-energies. Evolutionary conservation of RNA secondary structure elements is detectable in alignments of phylogenetically related sequences and provides evidence for the presence of certain base pairs that can also be converted into pseudo-energy contributions. We show that the centroid base pairs computed from a consensus folding model such as RNAalifold result in a substantial improvement of the prediction accuracy for single sequences. Evidence for specific base pairs turns out to be more informative than a position-wise profile for the conservation of the pairing status. A comparison with chemical probing data, furthermore, strongly suggests that phylogenetic base pairing data are more informative than position-specific data on (un)pairedness as obtained from chemical probing experiments. In this context we demonstrate, in addition, that the conversion of signal from probing data into pseudo-energies is possible using thermodynamic structure predictions as a reference instead of known RNA structures.
Article
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a—necessarily incomplete—overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Chapter
Pseudo-energies are a generic method to incorporate extrinsic information into energy-directed RNA secondary structure predictions. Consensus structures of RNA families, usually predicted from multiple sequence alignments, can be treated as soft constraints in this manner. In this contribution we first revisit the theoretical framework and then show that pseudo-energies for the centroid base pairs of the consensus structure result in a substantial increase in folding accuracy. In contrast, only a moderate improvement can be achieved if only the information that a base is predominantly paired is utilized.
Article
Full-text available
Given the challenges for the experimental determination of RNA tertiary structures, probing solvent accessibility has become increasingly important to gain functional insights. Among various chemical probes developed, backbone-cleaving hydroxyl radical is the only one that can provide unbiased detection of all accessible nucleotides. However, the readouts have been based on reverse transcription (RT) stop at the cleaving sites, which are prone to false positives due to PCR amplification bias, early drop-off of reverse transcriptase, and the use of random primers in RT reaction. Here, we introduced a fixed-primer method called RL-Seq by performing RtcB Ligation (RL) between a fixed 5′-OH-end linker and unique 3′-P-end fragments from hydroxyl radical cleavage prior to high-throughput sequencing. The application of this method to E. coli ribosomes confirmed its ability to accurately probe solvent accessibility with high sensitivity (low required sequencing depth) and accuracy (strong correlation to structure-derived values) at the single-nucleotide resolution. Moreover, a near-perfect correlation was found between the experiments with and without using unique molecular identifiers, indicating negligible PCR biases in RL-Seq. Further improvement of RL-Seq and its potential transcriptome-wide applications are discussed.
Article
Full-text available
Recent events have pushed RNA research into the spotlight. Continued discoveries of RNA with unexpected diverse functions in healthy and diseased cells, such as the role of RNA as both the source and countermeasure to a severe acute respiratory syndrome coronavirus 2 infection, are igniting a new passion for understanding this functionally and structurally versatile molecule. Although RNA structure is key to function, many foundational characteristics of RNA structure are misunderstood, and the default state of RNA is often thought of and depicted as a single floppy strand. The purpose of this perspective is to help adjust mental models, equipping the community to better use the fundamental aspects of RNA structural information in new mechanistic models, enhance experimental design to test these models, and refine data interpretation. We discuss six core observations focused on the inherent nature of RNA structure and how to incorporate these characteristics to better understand RNA structure. We also offer some ideas for future efforts to make validated RNA structural information available and readily used by all researchers.
Article
Full-text available
Characterizing RNA structures and functions have mostly been focused on 2D, secondary and 3D, tertiary structures. Recent advances in experimental and computational techniques for probing or predicting RNA solvent accessibility make this 1D representation of tertiary structures an increasingly attractive feature to explore. Here, we provide a survey of these recent developments, which indicate the emergence of solvent accessibility as a simple 1D property, adding to secondary and tertiary structures for investigating complex structure-function relations of RNAs.
Article
Full-text available
Self-cleaving ribozymes are catalytically active RNAs that cleave themselves into a 5′-fragment with a 2′,3′-cyclic phosphate and a 3′-fragment with a 5′-hydroxyl. They are widely applied for the construction of synthetic RNA devices and RNA-based therapeutics. However, the targeted discovery of self-cleaving ribozymes remains a major challenge. We developed a transcriptome-wide method, called cyPhyRNA-seq, to screen for ribozyme cleavage fragments in total RNA extract. This approach employs the specific ligation-based capture of ribozyme 5′-fragments using a variant of the Arabidopsis thaliana tRNA ligase we engineered. To capture ribozyme 3′-fragments, they are enriched from total RNA by enzymatic treatments. We optimized and enhanced the individual steps of cyPhyRNA-seq in vitro and in spike-in experiments. Then, we applied cyPhyRNA-seq to total RNA isolated from the bacterium Desulfovibrio vulgaris and detected self-cleavage of the three predicted type II hammerhead ribozymes, whose activity had not been examined to date. cyPhyRNA-seq can be used for the global analysis of active self-cleaving ribozymes with the advantage to capture both ribozyme cleavage fragments from total RNA. Especially in organisms harbouring many self-cleaving RNAs, cyPhyRNA-seq facilitates the investigation of cleavage activity. Moreover, this method has the potential to be used to discover novel self-cleaving ribozymes in different organisms.
Article
Full-text available
RNA molecules are key players in a variety of biological events, and this is particularly true for viral RNAs. To better understand the replication of those pathogens and try to block them, special attention has been paid to the structure of their RNAs. Methods to probe RNA structures have been developed since the 1960s; even if they have evolved over the years, they are still in use today and provide useful information on the folding of RNA molecules, including viral RNAs. The aim of this review is to offer a historical perspective on the structural probing methods used to decipher RNA structures before the development of the selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE) methodology and to show how they have influenced the current probing techniques. Actually, these technological breakthroughs, which involved advanced detection methods, were made possible thanks to the development of next-generation sequencing (NGS) but also to the previous works accumulated in the field of structural RNA biology. Finally, we will also discuss how high-throughput SHAPE (hSHAPE) paved the way for the development of sophisticated RNA structural techniques.
Article
Full-text available
In animal germlines, PIWI proteins and the associated PIWI-interacting RNAs (piRNAs) protect genome integrity by silencing transposons. Here we report the extensive sequence and quantitative correlations between 2′,3′-cyclic phosphate-containing RNAs (cP-RNAs), identified using cP-RNA-seq, and piRNAs in the Bombyx germ cell line and mouse testes. The cP-RNAs containing 5′-phosphate (P-cP-RNAs) identified by P-cP-RNA-seq harbor highly consistent 5′-end positions as the piRNAs and are loaded onto PIWI protein, suggesting their direct utilization as piRNA precursors. We identified Bombyx RNase Kappa (BmRNase κ) as a mitochondria-associated endoribonuclease which produces cP-RNAs during piRNA biogenesis. BmRNase κ-depletion elevated transposon levels and disrupted a piRNA-mediated sex determination in Bombyx embryos, indicating the crucial roles of BmRNase κ in piRNA biogenesis and embryonic development. Our results reveal a BmRNase κ-engaged piRNA biogenesis pathway, in which the generation of cP-RNAs promotes robust piRNA production.
Article
Full-text available
Non-coding RNAs (ncRNA) are essential for all life, and their functions often depend on their secondary (2D) and tertiary structure. Despite the abundance of software for the visualisation of ncRNAs, few automatically generate consistent and recognisable 2D layouts, which makes it challenging for users to construct, compare and analyse structures. Here, we present R2DT, a method for predicting and visualising a wide range of RNA structures in standardised layouts. R2DT is based on a library of 3,647 templates representing the majority of known structured RNAs. R2DT has been applied to ncRNA sequences from the RNAcentral database and produced >13 million diagrams, creating the world’s largest RNA 2D structure dataset. The software is amenable to community expansion, and is freely available at https://github.com/rnacentral/R2DT and a web server is found at https://rnacentral.org/r2dt .
Article
RNA structure probing experiments have emerged over the last decade as a straightforward way to determine the structure of RNA molecules in a number of different contexts. Although powerful, the ability of RNA to dynamically interconvert between, and to simultaneously populate, alternative structural configurations, poses a nontrivial challenge to the interpretation of data derived from these experiments. Recent efforts aimed at developing computational methods for the reconstruction of coexisting alternative RNA conformations from structure probing data are paving the way to the study of RNA structure ensembles, even in the context of living cells. In this review, we critically discuss these methods, their limitations and possible future improvements.
Chapter
With third generation DNA sequencing and a general reduction of sequencing costs, the production of bioinformatic data has become easier than ever. Several pipeline automation tools have emerged to ease data processing through a multitude of steps. Here, we describe the setup and use of Snakemake, a pipeline automation tool derived from GNU MAKE.
Article
RNA molecules fold into complex structures that enable their diverse functions in cells. Recent revolutionary innovations in transcriptome-wide RNA structural probing of living cells have ushered in a new era in understanding RNA functions. Here, we summarize the latest technological advances for probing RNA secondary structures and discuss striking discoveries that have linked RNA regulation and biological processes through interrogation of RNA structures. In particular, we highlight how different long noncoding RNAs form into distinct secondary structures that determine their modes of interactions with protein partners to realize their unique functions. These dynamic structures mediate RNA regulatory functions through altering interactions with proteins and other RNAs. We also outline current methodological hurdles and speculate about future directions for development of the next generation of RNA structure-probing technologies of higher sensitivity and resolution, which could then be applied in increasingly physiologically relevant studies. This Review summarizes the recent technical advances in probing RNA secondary structures and discusses their connection with RNA regulatory functions in various biological processes and future directions in RNA structure-probing methods.