Leszek P. Pryszcz’s research while affiliated with Barcelona Institute for Science and Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (90)


The RMaP challenge workflow
RMaP challenge overview. a The several affiliations that contributed to the RMaP challenge. b Data preparation pipeline for each sub-challenge in RMaP. Datasets were prepared in vitro and measured with ONT. c General overview of the three sub-challenges proposed in RMaP. Each of them proposed a different task for selected RNA modifications. d The results obtained by the new methods are analyzed and compared.
Method summary and results of RMaP challenge 1
(Top) Summary of methods used for challenge 1. (Bottom) Comparison between Methods 1 and 2 performances on m⁵C modification detection. A lower value is better for all metrics. The diagram shows the values of each metric used in this work. The metrics values were obtained by comparing the two methods predictions with expected values. Metric values can be found in Table 2.
Method summary and results of RMaP challenge 2
(Top) Summary of methods used for challenge 2. (Bottom) Method 3 performance on m⁶A modification detection. A lower value is better for all metrics. The diagram shows the values of each metric used in this work. Metric values can be found in Table 2.
Method summary and results of RMaP challenge 3
(Top) Summary of methods used for challenge 3. (Bottom) Comparison between methods 4–7 performances on Ψ modification detection. The graph shows the values of each metric used in this work. The metrics values were obtained by comparing methods predictions with expected values. Metric values can be found in Table 2.
Example bedRMod file
Text visualization of a bedRMod file.
The RMaP challenge of predicting RNA modifications by nanopore sequencing
  • Article
  • Full-text available

April 2025

·

113 Reads

Communications Chemistry

·

·

Anne Busch

·

[...]

·

Nicolo Alagna

The field of epitranscriptomics is undergoing a technology-driven revolution. During past decades, RNA modifications like N6-methyladenosine (m⁶A), pseudouridine (ψ), and 5-methylcytosine (m⁵C) became acknowledged for playing critical roles in cellular processes. Direct RNA sequencing by Oxford Nanopore Technologies (ONT) enabled the detection of modifications in native RNA, by detecting noncanonical RNA nucleosides properties in raw data. Consequently, the field’s cutting edge has a heavy component in computer science, opening new avenues of cooperation across the community, as exchanging data is as impactful as exchanging samples. Therefore, we seize the occasion to bring scientists together within the RNA Modification and Processing (RMaP) challenge to advance solutions for RNA modification detection and discuss ideas, problems and approaches. We show several computational methods to detect the most researched mRNA modifications (m⁶A, ψ, and m⁵C). Results demonstrate that a low prediction error and a high prediction accuracy can be achieved on these modifications across different approaches and algorithms. The RMaP challenge marks a substantial step towards improving algorithms’ comparability, reliability, and consistency in RNA modification prediction. It points out the deficits in this young field that need to be addressed in further challenges.

Download

Schematic overview of the approaches that can be used to identify RNA modifications from direct RNA sequencing (DRS) data. A Overview of the methods used to detect modified sites from DRS data. Commonly used softwares to detect RNA modifications rely on either: i) basecalling errors that are present in a wild type (WT) but not a knockout (KO)/control condition, or ii) altered current intensities when comparing WT and KO/control conditions. All these methods use default (modification-unaware) RNA basecalling models and require extensive post-processing after basecalling steps –mapping, resquiggling, feature extraction and statistical testing– to identify modified sites. The alternative option is to use a modification-aware RNA basecalling model that predicts modifications during the basecalling step, which provides m⁶A modification predictions with single nucleotide and single molecule resolution. B IGV visualisation of a BAM file where reads have been basecalled using the m⁶ABasecaller, allowing per-read analysis of m⁶A modifications in full-length reads. BAM files have m⁶A information encoded at per-read and per-nucleotide level in the form of modification probabilities. Colouring nucleotides based on their modification probability allows simple visualisation of m⁶A-modified sites (bright green) in a transcriptome-wide fashion. A ‘predicted m⁶A site’ is defined as a position that has at least 25 reads coverage and ≥ 5% modification stoichiometry (i.e., a minimum of 2 modified reads supporting that site). A nucleotide in a read is defined as ‘modified’ if the modification probability is equal or greater than 0.5 (shown as ‘predicted m⁶A sites’) at the bottom of the IGV snapshot
Methodology to obtain a training dataset with high-confidence RNA modification status labels, implemented in NanoRMS2, used to train a modification-aware basecalling model. A Schematic representation of steps performed to obtain high-confidence labels based on the modification status of all reads included in the training dataset. First, a set of 9 features are retrieved for every base from every read. Then, the features are aggregated for each 7-mer from the entire genome/transcriptome (balancing the number of reads between the two samples and across the reference positions). Significant 7-mers are then identified by KS-test for the two most informative features. Reads are labelled as modified or unmodified for each significant 7-mer using 50% of data (training set) in 3-step procedure as follows: i) Gradient Boosting classifier is trained assuming all KO reads as unmodified and all WT reads as modified and predicting all reads from training set either as modified or unmodified; ii) reads with low confidence prediction are marked as unknown, and label propagation (with KNN kernel) is used to label them; and iii) final (Gradient Boosting) classifier is trained with labelled training data and all reads from the test data are predicted as unmodified or modified. All these steps are performed by the NanoRMS2 software. B Using the high confidence labels obtained using the procedure depicted in panel A, a modification-aware basecalling model can be trained with reads labelled as modified or unmodified, in a 2-step procedure: i) in a first step, only unmodified reads are used to train a canonical basecaller, ii) in a second step, this model is refined to call also modified bases. The second training step can be restricted to specific k-mers that are reported by NanoRMS2
m⁶ABasecaller predicts m⁶A in synthetic and native RNA molecules, and shows strong overlap with predicted m⁶A sites using orthogonal methods. A In the left panel, IGV snapshot of individual reads base-called with m⁶ABasecaller. The reads are centered at a known m⁶A site, both for synthetic m⁶A-modified curlcake reads (‘MOD’, upper panel) and their unmodified counterpart (UNM, lower panel). In the right panel, reads mapping to human RNF7 gene are shown, in HEK293T WT and METTL3 KO samples, as well as for reads from in vitro transcribed (IVT) human samples. Each row represents a distinct RNA read, and each base from each read has been coloured according to its modification probability. See also Additional File 1: Figure S5 for additional IGV snapshots. B Boxplot of per-site m⁶A frequencies in two independent replicates of: (i) HEK293T WT, (ii) HEK293T METTL3 KO and (iii) IVT human transcriptome. Only m⁶A sites detected in WT (≥ 5% m⁶A modification frequency) and with at least 25 reads of coverage in all replicates have been included in this analysis (N = 877). The horizontal dashed line indicates the 5% threshold for a site to be identified as “m⁶A-modified”. C Metagene plot depicting the distribution of m⁶A sites detected in HEK293T WT samples (N = 1270). In the upper left corner, the motif obtained with MEME analysing 20 nt sequence context of all replicable sites in HEK293T WT (N = 1270) is shown. See also Additional File 1: Figure S6A,B for metagene plots in additional species. D Replicability of m⁶A modification frequency in sites with modification frequency greater or equal than 5% and minimum of 25 reads of coverage in both HEK293T WT replicates (Spearman’s ρ = 0.82). Dashed vertical and horizontal lines depict the 5% threshold applied to a m⁶A site to be called. Both axes are log10 scaled. (E) Scatterplot comparing per-site m⁶A frequencies in modified sites identified in HEK293T cells in WT and upon METTL3 KO (left panel), and in WT compared to IVT control (right panel). Dashed vertical and horizontal lines depict the 5% threshold applied to a m⁶A site to be called. Both axes are log10 scaled. F Overlap between m⁶A sites detected by m⁶ABasecaller in HEK293T cells and m⁶A sites predicted using Illumina-based orthogonal methods (m6ACE-seq, miCLIP and GLORI-seq) in HEK293T cells. To provide a comparison across all methods that is independent of sequencing coverage, the set of predicted m⁶A sites by each orthogonal method was reduced to those m⁶A sites for which there was a sufficient coverage in the nanopore DRS dataset, i.e., only m⁶A sites with minimum of 25 reads coverage in the DRS dataset were included in the comparative analysis. G Comparison of m⁶A sites predicted by m.⁶ABasecaller and those predicted by other nanopore-based methods (xPore and m6Anet), ran on the same set of reads from HEK293T cells (pooled 2 replicates, see Additional File 2: Table S3)
m⁶ABasecaller accurately predicts m⁶A modification frequencies transcriptome-wide. A Density plots of m⁶A modification frequencies in mESC samples treated with different concentrations of STM2457 inhibitor (0, 2, 10 and 20uM). Results are shown for two independent biological replicates. Dashed vertical lines indicate the median m⁶A frequency of each sample. B Boxplot of the Distribution of the m⁶A frequency at different concentrations of STM2457 in sites with more than 25 reads of coverage in all replicates of all conditions (N = 81). C Scatterplots depicting the per-site m⁶A modification frequency in untreated samples (CTR) relative to STM2457-treated samples (2 µM, upper panel; 10 µM, middle panel; 20 µM upper panel). A gradient from light to dark blue shows the increase in density of data points in the plot. Dashed diagonal black line indicates the x = y line in frequencies. Grey vertical and horizontal dashed lines indicate the 5% m6A frequency threshold used to identify a site as ‘m⁶A-modified’. Axes are log-scaled. D IGV snapshot depicting the decrease of m⁶A modified reads with increasing concentration of STM2457. The number of reads containing bright green (high probability of m⁶A) diminishes with the increase of STM2457 dosage. The purple dots represent base insertions. E On the left side, a scheme depicting the generation of tamoxifen-inducible METTL3 KO mESC cell lines is shown. On the right, a Western Blot image depicting the loss of METTL3 upon tamoxifen treatment for 6 days (2 replicates) and 14 days (2 replicates), compared to MetOH-treated cells (CTR) for 6 and 14 days, respectively. GAPDH was used as loading control. F,G In the left panels, density plot distribution of the m⁶A frequency in the pooled replicates of mES cells treated with tamoxifen (KO) for 6 days (F, N = 57 sites) or 14 days (G, N = 213 sites), compared to those treated with MetOH (CTR). In the right panels, scatterplots depicting the modification frequency at m⁶A sites detected in the pooled control samples (CTR) compared to the corresponding frequency in pooled samples treated with tamoxifen for 6 days (F) or for 14 days (G). A gradient from light to dark blue shows the increase in density of data points in the plot. Dashed diagonal black line indicates the x = y line in frequencies. Grey vertical and horizontal dashed lines indicate the 5% m6A frequency threshold used to identify a site as ‘m⁶A-modified’. Axes are log-scaled. For F and G, a pseudocount of 0.001 was added to all values to allow logarithmic scaling of the values. H Quantification of m⁶A levels in polyA + RNA from mESC cells treated with tamoxifen for 6 days or 14 days. m⁶A/A is computed as the ratio of m⁶A area vs A area in LC–MS/MS results. 6-day and 14-day tamoxifen treatment led to a ~ 15X and 90X reduction in m⁶A levels compared to untreated control samples, respectively (I) IGV snapshot depicting the decrease of m⁶A modified reads with increasing duration of tamoxifen treatment. The number of reads containing bright green (high probability of m⁶A) diminishes with the METTL3 inhibition and longer tamoxifen exposure
Analysis of m⁶A modifications at per-read level. A Schematic overview of the distinct layers of information that can be studied with per-read resolution. B Distribution of polyA tail lengths in reads containing at least 1 m⁶A site (orange) and reads without m⁶A (grey). Dashed line indicates the median polyA tail length of each group. All reads for which tailfindr gave a prediction of tail length > = 10 nt were included in the analysis (N = 1.224.173 reads; median polyA tail length: 75nt for no_m6A; 90nt for m6A-containing reads). See also Additional File 1: Figure S11 for density plots using subsets of reads for each bin. C Distribution of the distance between each m⁶A modification and the closest boundary of an exon (orange), compared to the distribution of the same distance calculated for a random subset of DRACH motifs in the same genes (purple). The x axis is log10-scaled. D IGV snapshot of reads mapping to FAM32A gene. Bases have been coloured according to modification probability, showing that this gene contains two m⁶A modifications at positions chr19:16,191,317 and chr19:16,191,375, depicted with bright green colour. Reads have been binned depending on whether they contain one m⁶A modified site (chr19:16,191,317), the other m⁶A modified site (chr19:16,191,375), or both sites modified. The observed and expected co-occurrence values, given the individual per-site m6A modification frequencies, are also shown. Reads that had both positions unmodified are not shown. E Distribution of number of standard deviations (NSD) values, which quantifies the co-occurrence of pairs of m⁶A sites (N = 1,101) in HepG2 cells is shown in red. As a control, a random distribution generated with the same amount of data points and the same standard deviation, centered in 0, is also shown. F For each pair of m⁶A sites (N = 1,101) analysed, the number of standard deviations (NSD) is plotted against the genomic distance (log10-scaled) between the two sites. Each dot represents a pair of m⁶A sites. Spearman correlation is shown. G IGV snapshot depicting the modification frequency at position chr6:33,201,287 in both isoforms (SLC39A7-201 and SLC39A7-204) from gene SLC39A7. Modification frequency at per-isoform level is shown. H Scatterplot depicting the correlation of m6A frequencies at per-isoform level. Grey vertical and horizontal dashed lines indicate the 5% m⁶A frequency threshold used to identify a site as ‘m⁶A-modified’. Axes are log-scaled. I One-sided volcano plot depicting isoform-specific m⁶A modification patterns. In the y axis, the mean absolute difference in ‘m⁶A modification frequencies across 2 isoforms (N = 167 comparisons) in two replicates is shown. To increase the statistical power of the analysis as well as the number genes for which isoform-specific m⁶A analysis was possible, all HepG2 MinION reads were pooled as one replicate (N = 4,741,372 reads, see Additional File 2: Table S2). HepG2 reads from a PromethION run were used as a second replicate (N = 6,200,572). Only reads unambiguously assigned to a given isoform were kept for the analysis
De novo basecalling of RNA modifications at single molecule and nucleotide resolution

February 2025

·

70 Reads

·

10 Citations

Genome Biology

RNA modifications influence RNA function and fate, but detecting them in individual molecules remains challenging for most modifications. Here we present a novel methodology to generate training sets and build modification-aware basecalling models. Using this approach, we develop the m⁶ABasecaller, a basecalling model that predicts m⁶A modifications from raw nanopore signals. We validate its accuracy in vitro and in vivo, revealing stable m⁶A modification stoichiometry across isoforms, m⁶A co-occurrence within RNA molecules, and m⁶A-dependent effects on poly(A) tails. Finally, we demonstrate that our method generalizes to other RNA and DNA modifications, paving the path towards future efforts detecting other modifications.


Rapid and accurate demultiplexing of direct RNA nanopore sequencing datasets with SeqTagger

January 2025

·

14 Reads

·

4 Citations

Genome Research

Nanopore direct RNA sequencing (DRS) enables direct measurement of RNA molecules, including their native RNA modifications, without prior conversion to cDNA. However, commercial methods for molecular barcoding of multiple DRS samples are lacking, and community-driven efforts, such as DeePlexiCon, are not compatible with newer RNA chemistry flowcells and the latest-generation GPU cards. To overcome these limitations, we introduce SeqTagger, a rapid and robust method that can demultiplex direct RNA sequencing datasets with 99% precision and 95% recall. We demonstrate the applicability of SeqTagger in both RNA002/R9.4 and RNA004/RNA chemistries and show its robust performance both for long and short RNA libraries, including custom libraries that do not contain standard poly(A) tails, such as Nano-tRNAseq libraries. Finally, we demonstrate that increasing the multiplexing up to 96 barcodes yields highly accurate demultiplexing models. SeqTagger can be executed in a standalone manner or through the MasterOfPores NextFlow workflow. The availability of an efficient and simple multiplexing strategy improves the cost-effectiveness of this technology and facilitates the analysis of low-input biological samples.


Nano3P-seq: charting the coding and non-coding transcriptome at single molecule resolution

November 2024

·

51 Reads

RNA polyadenylation is crucial for RNA maturation, stability and function, with polyA tail lengths significantly influencing mRNA translation, efficiency and decay. Here, we provide a step-by-step protocol to perform Nanopore 3’ end-capture sequencing (Nano3P-seq), a nanopore-based cDNA sequencing method to simultaneously capture RNA abundances, tail composition and tail length estimates at single-molecule resolution. Taking advantage of a template switching-based protocol, Nano3P-seq can sequence any RNA molecule from its 3’ end, regardless of its polyadenylation status, without the need for PCR amplification or RNA adapter ligation. We provide an updated Nano3P-seq protocol that is compatible with R10.4 flowcells, as well as compatible software for polyA tail length and content prediction, which we term PolyTailor . We demonstrate that PolyTailor provides accurate estimates of transcript abundances, tail lengths and content information, while capturing both coding and non-coding RNA biotypes, including mRNAs, snRNAs, and rRNAs. This method can be applied to any RNA sample of interest (e.g. poly(A)-selected, ribodepleted, total RNA), and can be completed in one day. The Nano3P-seq protocol can be performed by researchers with moderate experience in molecular biology techniques and nanopore sequencing library preparation, and basic knowledge of linux bash syntax and R programming. This protocol makes Nano3P-seq accessible and easy to implement by future users aiming to study the tail dynamics and heterogeneity of both coding and non-coding transcriptome in a comprehensive and reproducible manner. Key Papers Beğik O, Diensthuber G, Liu H, Delgado-Tejedor A, Kontur C, Niazi AM, Valen E, Giraldez AJ, Beaudoin JD, Mattick JS, Novoa EM. Nano3P-seq: transcriptome-wide analysis of gene expression and tail dynamics using end-capture nanopore cDNA sequencing. Nature Methods 20 , 75–85 (2023). https://doi.org/10.1038/s41592-022-01714-w Delgado-Tejedor A, Medina M, Begik O, Cozzuto L, Lopez J, Blanco B, Ponomarenko J, Novoa EM. Native RNA nanopore sequencing reveals antibiotic-induced loss of rRNA modifications in the A- and P-sites. NatComm 15 , 10054 (2024). https://doi.org/10.1038/s41467-024-54368-x


SeqTagger, a rapid and accurate tool to demultiplex direct RNA nanopore sequencing datasets

October 2024

·

21 Reads

·

1 Citation

Nanopore direct RNA sequencing (DRS) enables direct measurement of RNA molecules, including their native RNA modifications, without prior conversion to cDNA. However, commercial methods for molecular barcoding of multiple DRS samples are lacking, and community-driven efforts, such as DeePlexiCon, are not compatible with newer RNA chemistry flowcells and the latest-generation GPU cards. To overcome these limitations, we introduce SeqTagger, a rapid and robust method that can demultiplex direct RNA sequencing datasets with 99% precision and 95% recall. We demonstrate the applicability of SeqTagger in both RNA002/R9.4 and RNA004/RNA chemistries and show its robust performance both for long and short RNA libraries, including custom libraries that do not contain standard poly-(A) tails, such as Nano-tRNAseq libraries. Finally, we demonstrate that increasing the multiplexing up to 96 barcodes yields highly accurate demultiplexing models. SeqTagger can be executed in a standalone manner or through the MasterOfPores NextFlow workflow. The availability of an efficient and simple multiplexing strategy improves the cost-effectiveness of this technology and facilitates the analysis of low-input biological samples.


Figure 2
Figure 3
Predicting RNA modifications by nanopore sequencing: The RMaP challenge

October 2024

·

199 Reads

The field of epitranscriptomics is undergoing a technology-driven revolution. During past decades, RNA modifications like N6-methyladenosine (m ⁶ A), pseudouridine (ψ), and 5-methylcytosine (m ⁵ C) became acknowledged for playing critical roles in gene expression regulation, RNA stability, and translation efficiency. Among modification-aware sequencing approaches, direct RNA sequencing by Oxford Nanopore Technologies (ONT) enabled the detection of modifications in native RNA, by capturing and storing properties of noncanonical RNA nucleosides in raw data. Consequently, the field's cutting edge has a heavy component in computer science, opening new avenues of cooperation across the community, as exchanging data is as impactful as exchanging samples. Therefore, we seize the occasion to bring scientists together within the RMaP challenge to advance solutions for RNA modification detection and discuss current ideas, problems and approaches. Here, we show several computational methods to detect the most researched mRNA modifications (m ⁶ A, ψ, and m ⁵ C). Results demonstrate that a low prediction error and a high prediction accuracy can be achieved on these modifications across different approaches and algorithms. The RMaP challenge marks a substantial step towards improving algorithms' comparability, reliability, and consistency in RNA modification prediction. It points out the deficits in this young field that need to be addressed in further challenges.


Enhanced detection of RNA modifications and read mapping with high-accuracy nanopore RNA basecalling models

September 2024

·

68 Reads

·

12 Citations

Genome Research

In recent years, nanopore direct RNA sequencing (DRS) became a valuable tool for studying the epitranscriptome, due to its ability to detect multiple modifications within the same full-length native RNA molecules. While RNA modifications can be identified in the form of systematic basecalling 'errors' in DRS datasets, N6-methyladenosine (m6A) modifications produce relatively low 'errors' compared to other RNA modifications, limiting the applicability of this approach to m6A sites that are modified at high stoichiometries. Here, we demonstrate that the use of alternative RNA basecalling models, trained with fully unmodified sequences, increases the 'error'signal of m6A, leading to enhanced detection and improved sensitivity even at low stoichiometries. Moreover, we find that high-accuracy alternative RNA basecalling models can show up to 97% median basecalling accuracy, outperforming currently available RNA basecalling models, which show 91% median basecalling accuracy. Notably, the use of high-accuracy basecalling models is accompanied by a significant increase in the number of mapped reads –especially in shorter RNA fractions– and increased basecalling error signatures at pseudouridine (Ψ) and N1-methylpseudouridine (m1Ψ) modified sites. Overall, our work demonstrates that alternative RNA basecalling models can be used to improve the detection of RNA modifications, read mappability, and basecalling accuracy in nanopore DRS datasets.


Figure 1 (legend in next page)
Figure 5. SUP improves the identification of m1Ψ-modified residues in synthetic mRNA vaccines. (A) IGV screenshots of the synthetic eGFP vaccine in unmodified (U) and modified (m1Ψ) molecules, basecalled with either default (red) or sup (cyan) basecalling models. The three zoomed regions showcase the increased U>C conversion rate at m1Ψ sites as well as the reduced background error produced by the sup model. The 20,000 longest, uniquely mapped reads were selected for this analysis. Positions at which the mismatch frequency exceeds 0.2 are colored while those with a mismatch frequency below 0.2 are shown in gray. (B) Comparison of mismatch frequency across all m1Ψ sites (n = 170) found in the synthetic eGPF vaccine. To test for statistical significance the non-parametric Wilcoxon-test was used and values corrected for multiple hypothesis testing using the Benjamini-Hochberg procedure.(ns: p > 0.05, *: p <= 0.05, **: p <= 0.01, ***: p <= 0.001, ****: p <= 0.0001). (C) Comparison of U>C mismatch frequency for all 170 m1Ψ sites between sup and default. Points located in the upper half of the dotted line represent sites with higher U>C frequency in sup (n = 151) while points below the dotted line represent sites with higher U>C frequency in default. (n = 19). Points are colored by the absolute difference between sup and default. (D) Comparison of U>C mismatch frequency across the most lowly modified sites in default (median = 3.45) demonstrates the improved sensitivity of sup (median = 19.43) in detecting m1Ψ. To compare the relationship between individual positions a paired two-sided Wilcoxon test was performed. For Figures 5B and 5D the box is limited by the lower quartile Q1 (bottom) and upper quartile Q3 (top). Whiskers are defined as 1.5 * IQR with outliers removed. _____________________________________________________________________________________________
Enhanced detection of RNA modifications and mappability with high-accuracy nanopore RNA basecalling models

November 2023

·

182 Reads

In recent years, nanopore direct RNA sequencing (DRS) has established itself as a valuable tool for studying the epitranscriptome, due to its ability to detect multiple modifications within the same full-length native RNA molecules. While RNA modifications can be identified in the form of systematic basecalling errors in DRS datasets, N6-methyladenosine (m6A) modifications produce relatively low errors compared to other RNA modifications, limiting the applicability of this approach to m6A sites that are modified at high stoichiometries. Here, we demonstrate that the use of alternative RNA basecalling models, trained with fully unmodified sequences, increases the error signal of m6A, leading to enhanced detection and improved sensitivity even at low stoichiometries. Moreover, we find that high-accuracy alternative RNA basecalling models can show up to 97% median basecalling accuracy, outperforming currently available RNA basecalling models, which show 91% median basecalling accuracy. Notably, the use of high-accuracy basecalling models is accompanied by a significant increase in the number of mapped reads, especially in shorter RNA fractions, and increased basecalling error signatures at pseudouridine (Y) and N1-methylpseudouridine (m1Y) modified sites. Overall, our work demonstrates that alternative RNA basecalling models can be used to improve the detection of RNA modifications, read mappability, and basecalling accuracy in nanopore DRS datasets.


De novo basecalling of m6A modifications at single molecule and single nucleotide resolution

November 2023

·

60 Reads

·

10 Citations

RNA modifications hold pivotal roles in shaping the fate and function of RNA molecules. Although nanopore sequencing technologies have proven successful at transcriptome-wide detection of RNA modifications, current algorithms are limited to predicting modifications at a per-site level rather than within individual RNA molecules. Herein, we introduce m6ABasecaller, an innovative method enabling direct basecalling of m6A modifications from raw nanopore signals within individual RNA molecules. This approach facilitates de novo prediction of m6A modifications with precision down to the single nucleotide and single molecule levels, without the need of paired knockout or control conditions. Using the m6ABasecaller, we find that the median transcriptome-wide m6A modification stoichiometry is ~10-15% in human, mouse and zebrafish. Furthermore, we show that m6A modifications affect polyA tail lengths, exhibit a propensity for co-occurrence within the same RNA molecules, and show relatively consistent stoichiometry levels across isoforms. We further validate the m6ABasecaller by treating mESC with increasing concentrations of STM2457, a METTL3 inhibitor as well as in inducible METTL3 knockout systems. Overall, this work demonstrates the feasibility de novo basecalling of m6A modifications, opening novel avenues for the application of nanopore sequencing to samples with limited RNA availability and for which control knockout conditions are unavailable, such as patient-derived samples.


Quantitative analysis of tRNA abundance and modifications by nanopore RNA sequencing

April 2023

·

411 Reads

·

117 Citations

Nature Biotechnology

Transfer RNAs (tRNAs) play a central role in protein translation. Studying them has been difficult in part because a simple method to simultaneously quantify their abundance and chemical modifications is lacking. Here we introduce Nano-tRNAseq, a nanopore-based approach to sequence native tRNA populations that provides quantitative estimates of both tRNA abundances and modification dynamics in a single experiment. We show that default nanopore sequencing settings discard the vast majority of tRNA reads, leading to poor sequencing yields and biased representations of tRNA abundances based on their transcript length. Re-processing of raw nanopore current intensity signals leads to a 12-fold increase in the number of recovered tRNA reads and enables recapitulation of accurate tRNA abundances. We then apply Nano-tRNAseq to Saccharomyces cerevisiae tRNA populations, revealing crosstalks and interdependencies between different tRNA modification types within the same molecule and changes in tRNA populations in response to oxidative stress.


Citations (42)


... detection of canonical (A, C, G, U) and modified bases (m6A, m1A, I, and so on) in a singlestep avoiding re-squiggling and further models. A first example has been recently reported with the m 6 ABasecaller program, allowing the direct call of m6A bases 9 . Translating raw ionic currents into base sequences might be carried out by transformer models that are daily used in speech-to-text translation algorithms to convert audio tracks into natural language texts and are characterized by a wide vocabulary. ...

Reference:

Ab initio detection of multiple epitranscriptomic modifications from ONT direct RNA sequencing data
De novo basecalling of RNA modifications at single molecule and nucleotide resolution

Genome Biology

... All barcode balancing strategies led to a greater number of molecules being sampled, due to higher pore turnover (Fig. 4e, Supplementary Figs. [24][25][26]. Importantly, the rejection of high abundance barcodes not only led to a more even barcode coverage (Supplementary Table 13), reducing the Gini coefficient from 0.44 to 0.15, but also to the enrichment of low abundance barcodes by 33-73% ( Fig. 4f and Supplementary Fig. 25). ...

Rapid and accurate demultiplexing of direct RNA nanopore sequencing datasets with SeqTagger
  • Citing Article
  • January 2025

Genome Research

... Currently, ONT does not offer commercial barcoding kits specifically designed for DRS, which would enable the pooling of multiple samples into a single flowcell. As a result, barcoding initiatives remain largely been driven by community efforts, including the development of tools such as DeePlexi-Con [ 72 ] , WarpDemuX [ 73 ], supporting 12 barcodes, and Se-qTagger [ 123 ], which accommodates up to 96 barcodes. The ability to multiplex up to 96 samples within a single flowcell has the potential to significantly reduce sequencing costs per sample and mitigate batch effects, and represents a critical advancement in transitioning the DRS technology toward clinical applications. ...

SeqTagger, a rapid and accurate tool to demultiplex direct RNA nanopore sequencing datasets

... As each molecule passes through the nanopore, it causes characteristic disruptions in the electrical current, which are specific to the nucleotide sequence at that given moment. These current alterations are then translated back into nucleotide sequences using machine learning algorithms in a process known as "basecalling" [ 29 ]. RNA modifications can then be detected through three primary methods: (i) by measuring alterations in the current intensity signals during sequencing [30][31][32], (ii) by identifying systematic base-calling "errors" that are the result of RNA modifications at specific positions [33][34][35][36], or (iii) by using modification-aware base-calling models pretrained to detect specific types of RNA modifications [ 37 ] (Fig. 1 B). ...

Enhanced detection of RNA modifications and read mapping with high-accuracy nanopore RNA basecalling models
  • Citing Article
  • September 2024

Genome Research

... These current alterations are then translated back into nucleotide sequences using machine learning algorithms in a process known as "basecalling" [ 29 ]. RNA modifications can then be detected through three primary methods: (i) by measuring alterations in the current intensity signals during sequencing [30][31][32], (ii) by identifying systematic base-calling "errors" that are the result of RNA modifications at specific positions [33][34][35][36], or (iii) by using modification-aware base-calling models pretrained to detect specific types of RNA modifications [ 37 ] (Fig. 1 B). For a comprehensive comparison of these methods, the reader is directed to several excellent detailed reviews [38][39][40], as this topic lies beyond the scope of this review. ...

De novo basecalling of m6A modifications at single molecule and single nucleotide resolution
  • Citing Preprint
  • November 2023

... While an indirect measurement, when coupled with appropriate biochemical and/or genetic controls, these miscall data are informative about the positions and identities of modified bases. This includes conditions that affect the cellular growth state or RNA modifying enzyme activities (Thomas et al. 2021;Lucas et al. 2023 Apr 6;Sun et al. 2023;White et al. 2023;Shaw et al. 2024;White et al. 2024). These data have also promoted an in vivo systems-level understanding of modification interdependencies across dozens of tRNAs (Shaw et al. 2024). ...

Quantitative analysis of tRNA abundance and modifications by nanopore RNA sequencing

Nature Biotechnology

... A-to-I editing is also widespread in zebrafish and has an important effect on gene activity. The Adar enzyme in zebrafish, a direct homolog of mammalian ADAR1, is catalytically active and specifically recognizes and edits the adenosine RNA sites (Niescierowicz et al. 2022). Such editing during zebrafish embryonic development is also critical for establishing the anterior-posterior and dorso-ventral axes as well as pattern formation, suggesting an important biological function at early developmental stages (Niescierowicz et al. 2022). ...

Adar-mediated A-to-I editing is required for embryonic patterning and innate immune response regulation in zebrafish

... The Candida parapsilosis species complex is an example of such pathogenic hybrids. This complex comprises five closely related described species: C. margitis, C. parapsilosis, C. theae, C. orthopsilosis and C. metapsilosis [20,21,[23][24][25][26]. Of these, the last four are opportunistic human pathogens and the last three are hybrids. ...

Genome analysis of five recently described species of the CUG-Ser clade uncovers Candida theae as a new hybrid lineage with pathogenic potential in the Candida parapsilosis species complex

DNA Research

... In reference to the upregulation of Fdh1p in response to heat stress (Sekova et al. 2021), and its implication in DNA demethylation, the impact of stress conditions on DNA methylation level in Y. lipolytica was studied recently (Kubiak-Szymendera et al. 2021a). Two types of stress factors were implemented in that study -repeated subculturing and heat shock (42 ℃ 1 h). ...

Epigenetic Response of Yarrowia lipolytica to Stress: Tracking Methylation Level and Search for Methylation Patterns via Whole-Genome Sequencing

... Long-read sequencing technologies are now increasingly applied in population-scale (epigenetic) sequencing projects, requiring the development of software tools to accommodate large cohort sizes [7]. Several tools for visualization of nucleotide modification patterns in one or a limited number of individuals are available [8][9][10][11]. However, to our knowledge, no software is suitable for visualizing nucleotide modifications in larger cohorts. ...

ModPhred: an integrative toolkit for the analysis and storage of nanopore sequencing DNA and RNA modification data

Bioinformatics