Agreement with reference sets of m6A hits on RRACH+, accessible, and high-coverage bins. (A) Precision, recall and F1 score for each tool executed at default conditions on the mouse dataset on RRACH+ bins. According to Supplementary Table 1, GM and TM identify tools working on the genome (G) or transcriptome (T) space and requiring multiple conditions, respectively. GS and TS identify tools working on the genome (G) or transcriptome (T) space and requiring a single condition, respectively. (B) Precision and recall curves at different cut-off values for the tools indicated in (A) on the mouse dataset; for each tool, the default cut-off is indicated by a square; the performance of a random classifier is included. (C) as in (A) for DRACH+ bins outside of splice-site exclusion zones. (D) as in (B) for DRACH+ bins outside of splice-site exclusion zones. (E) as in (A) for bins with high coverage. (F) as in (B) for bins with high coverage.

Agreement with reference sets of m6A hits on RRACH+, accessible, and high-coverage bins. (A) Precision, recall and F1 score for each tool executed at default conditions on the mouse dataset on RRACH+ bins. According to Supplementary Table 1, GM and TM identify tools working on the genome (G) or transcriptome (T) space and requiring multiple conditions, respectively. GS and TS identify tools working on the genome (G) or transcriptome (T) space and requiring a single condition, respectively. (B) Precision and recall curves at different cut-off values for the tools indicated in (A) on the mouse dataset; for each tool, the default cut-off is indicated by a square; the performance of a random classifier is included. (C) as in (A) for DRACH+ bins outside of splice-site exclusion zones. (D) as in (B) for DRACH+ bins outside of splice-site exclusion zones. (E) as in (A) for bins with high coverage. (F) as in (B) for bins with high coverage.

Source publication
Article
Full-text available
N6-methyladenosine (m6A) is the most abundant internal eukaryotic mRNA modification, and is involved in the regulation of various biological processes. Direct Nanopore sequencing of native RNA (dRNA-seq) emerged as a leading approach for its identification. Several software were published for m6A detection and there is a strong need for independent...

Context in source publication

Context 1
... observed an increase in the number of hits with higher coverage for all the tools ( Figure 6A) -with the exception of DiffErr, which showed the opposite trend. We then evaluated the impact of sequencing depth on the F1 score for each tool's default conditions and observed marginally improved performances with higher sequencing coverage, except for EpiNano-SVM and DiffErr, which showed an opposite trend ( Figure 6B and Supplementary Figure S4, see Supplementary Data available online at http://bib.oxfordjournals.org/, respectively). ...

Citations

... This is because generating ground truth data sets that can mimic high complexity biological samples can be challenging itself. In fact, algorithms can have discrepancies in performance when used for synthetic or biological RNA data set analysis 65 , pointing out that for further challenges the combination of synthetic (including IVT) and in vivo DRS RNA samples can be used for training and validation respectively. This approach can highlight new algorithms strategies for the analysis of biological samples. ...
Article
Full-text available
The field of epitranscriptomics is undergoing a technology-driven revolution. During past decades, RNA modifications like N6-methyladenosine (m⁶A), pseudouridine (ψ), and 5-methylcytosine (m⁵C) became acknowledged for playing critical roles in cellular processes. Direct RNA sequencing by Oxford Nanopore Technologies (ONT) enabled the detection of modifications in native RNA, by detecting noncanonical RNA nucleosides properties in raw data. Consequently, the field’s cutting edge has a heavy component in computer science, opening new avenues of cooperation across the community, as exchanging data is as impactful as exchanging samples. Therefore, we seize the occasion to bring scientists together within the RNA Modification and Processing (RMaP) challenge to advance solutions for RNA modification detection and discuss ideas, problems and approaches. We show several computational methods to detect the most researched mRNA modifications (m⁶A, ψ, and m⁵C). Results demonstrate that a low prediction error and a high prediction accuracy can be achieved on these modifications across different approaches and algorithms. The RMaP challenge marks a substantial step towards improving algorithms’ comparability, reliability, and consistency in RNA modification prediction. It points out the deficits in this young field that need to be addressed in further challenges.
... Although direct RNA sequencing has expanded our ability to study the epitranscriptome, few comprehensive, end-to-end pipelines are available, with the exception of MasterOfPores 204 and nf-core/nanoseq. Although some benchmarking studies have been conducted to compare m 6 A profiling methods 205 , a comprehensive assessment of tools for single and simultaneous detection of other chemical RNA modifications using robust ground-truth data sets remains a marked unmet need. ...
Article
Transcriptome sequencing revolutionized the analysis of gene expression, providing an unbiased approach to gene detection and quantification that enabled the discovery of novel isoforms, alternative splicing events and fusion transcripts. However, although short-read sequencing technologies have surpassed the limited dynamic range of previous technologies such as microarrays, they have limitations, for example, in resolving full-length transcripts and complex isoforms. Over the past 5 years, long-read sequencing technologies have matured considerably, with improvements in instrumentation and analytical methods, enabling their application to RNA sequencing (RNA-seq). Benchmarking studies are beginning to identify the strengths and limitations of long-read RNA-seq, although there remains a need for comprehensive resources to guide newcomers through the intricacies of this approach. In this Review, we provide a comprehensive overview of the long-read RNA-seq workflow, from library preparation and sequencing challenges to core data processing, downstream analyses and emerging developments. We present an extensive inventory of experimental and analytical methods and discuss current challenges and prospects.
... The ability to directly sequence RNA using the Nanopore technology facilitates the discovery of RNA modifications that otherwise requires dedicated experimental protocols 54 . Here, we used m6Anet 45 to obtain a set of candidate m 6 A positions (Fig. 6e and Supplementary Text Fig. 24). ...
Article
Full-text available
The human genome contains instructions to transcribe more than 200,000 RNAs. However, many RNA transcripts are generated from the same gene, resulting in alternative isoforms that are highly similar and that remain difficult to quantify. To evaluate the ability to study RNA transcript expression, we profiled seven human cell lines with five different RNA-sequencing protocols, including short-read cDNA, Nanopore long-read direct RNA, amplification-free direct cDNA and PCR-amplified cDNA sequencing, and PacBio IsoSeq, with multiple spike-in controls, and additional transcriptome-wide N⁶-methyladenosine profiling data. We describe differences in read length, coverage, throughput and transcript expression, reporting that long-read RNA sequencing more robustly identifies major isoforms. We illustrate the value of the SG-NEx data to identify alternative isoforms, novel transcripts, fusion transcripts and N⁶-methyladenosine RNA modifications. Together, the SG-NEx data provide a comprehensive resource enabling the development and benchmarking of computational methods for profiling complex transcriptional events at isoform-level resolution.
... RNA modifications can then be detected through three primary methods: (i) by measuring alterations in the current intensity signals during sequencing [30][31][32], (ii) by identifying systematic base-calling "errors" that are the result of RNA modifications at specific positions [33][34][35][36], or (iii) by using modification-aware base-calling models pretrained to detect specific types of RNA modifications [ 37 ] (Fig. 1 B). For a comprehensive comparison of these methods, the reader is directed to several excellent detailed reviews [38][39][40], as this topic lies beyond the scope of this review. ...
Article
Full-text available
RNA molecules have garnered increased attention as potential clinical biomarkers in recent years. While short-read sequencing and quantitative polymerase chain reaction have been the primary methods for quantifying RNA abundance, they typically fail to capture critical post-transcriptional regulatory elements, such as RNA modifications, which are often dysregulated in disease contexts. A promising cutting-edge technique sequencing method that addresses this gap is direct RNA sequencing, offered by Oxford Nanopore Technologies, which can simultaneously capture both RNA abundance and modification information. The rapid advancements in this platform, along with growing evidence of dysregulated RNA species in biofluids, presents a compelling clinical opportunity. In this review, we discuss the challenges and the emerging opportunities for the adoption of nanopore RNA sequencing technologies in the clinic, highlighting their potential to revolutionize personalized medicine and disease monitoring.
... In the last few years, several works have successfully shown that m 6 A RNA modifications -as well as other RNA modifications-can be detected using nanopore sequencing [39, 44-48, 51, 56, 58, 59, 65, 91, 92]. However, most methods developed so far often lack single molecule resolution (providing m 6 A predictions at per-site level), require computationally-intensive steps such as resquiggling, require the analysis of aggregated perread information (so per-read predictions are not fully independent from other reads, and are affected by sequencing depth and per-site coverage), and/or have relatively high false positive and false negative rates [50,93]. Here we address these limitations with the development of a modification-aware basecalling model, the m 6 ABasecaller (Fig. 1), which can produce m 6 A predictions in individual reads during the basecalling step, thus allowing us to address questions regarding the mechanism of m 6 A deposition in mRNAs at an unprecedented resolution, such as deciphering the interplay between m 6 A modifications and polyA tail lengths (Fig. 5A,B), learning the rules of m 6 A deposition within same reads (Fig. 5D-F) with regards to intron-exon junctions (Fig. 5C) and across isoforms (Fig. 5G). ...
Article
Full-text available
RNA modifications influence RNA function and fate, but detecting them in individual molecules remains challenging for most modifications. Here we present a novel methodology to generate training sets and build modification-aware basecalling models. Using this approach, we develop the m⁶ABasecaller, a basecalling model that predicts m⁶A modifications from raw nanopore signals. We validate its accuracy in vitro and in vivo, revealing stable m⁶A modification stoichiometry across isoforms, m⁶A co-occurrence within RNA molecules, and m⁶A-dependent effects on poly(A) tails. Finally, we demonstrate that our method generalizes to other RNA and DNA modifications, paving the path towards future efforts detecting other modifications.
... Despite the increasing number of computational tools for detecting m6A modifications in nanopore dRNA-seq data, their relative performance varies significantly across datasets. A recent benchmarking study systematically evaluated 14 tools for m6A detection using diverse datasets, including synthetic RNA oligos, yeast, mouse, and human transcriptomes [27]. Their results highlighted substantial differences in precision and recall across tools, with multicondition methods (e.g., Nanocompore, ELIGOS) performing better in high-depth datasets like yeast, while deep-learning-based single-condition tools trained on human transcriptomes (e.g., m6Anet, DENA) excelled in complex transcriptomes like human and mouse. ...
... In this study, we aim to systematically perform evaluation and calibration of m6Anet [23] and Dorado [28], using in-vitro transcribed (IVT) RNA as a negative control and chemical mapping methods like GLORI [29] and eTAM-seq [30] as ground truth (Figure 1a). While several studies have compared computational tools for m6A detection from ONT direct RNA sequencing, they evaluated obsolete flowcell/chemistry designed for DNA sequencing and focused on approaches that rely on basecalled reads rather than ionic current signals [27,31]. A recent preprint examined the RNA004 chemistry with Dorado-based detection, and concluded that RNA004 chemistry "significantly improved the throughput, accuracy, and site-specific detection of modifications" [32]. ...
Preprint
Full-text available
Direct RNA sequencing from Oxford Nanopore Technologies (ONT) has become a valuable method for studying RNA modifications such as N6-methyladenosine (m6A), pseudouridine, and 5-methylcytosine (m5C). Recent advancements in the RNA004 chemistry substantially reduce sequencing errors compared to previous chemistries (e.g., RNA002), thereby promising enhanced accuracy for epitranscriptomic analysis. In this study, we benchmark the performance of two state-of-the-art RNA modification detection models capable of handling RNA004 data - ONT's Dorado and m6Anet - using two wild-type (WT) cell lines, HEK293T and HeLa, with respective ground truths from GLORI and eTAM-seq, and their paired in vitro transcribed (IVT) RNA as a negative controls. We found that under default settings and considering sites with >=10% modification ratio and >=10X coverage, Dorado has higher recall (~0.92) than m6Anet (~0.51) for m6A detection. Among the overlapping methylated sites between ground truth and computational predictions, there are high correlations of site-specific m6A modification stoichiometry, with correlation coefficient of ~0.89 for Dorado-truth comparison and ~0.72 for m6Anet-truth comparison. However, combined assessment of WT and IVT datasets show that while the per-site false positive rate (FPR) can be lower (~8% for Dorado and ~33% for m6Anet), both computational tools can have high per-site false discovery rate (FDR) of m6A (~40% for Dorado and ~80% for m6Anet) due to the low prevalence of m6A in transcriptome, with a similar trend observed for pseudouridine (~95% FDR for Dorado). Additional motif analysis reveals that both Dorado and m6Anet exhibit high heterogeneity of false positive calls across sequence contexts, suggesting that sequence contexts help determine accuracy of specific modification calls. There is also a substantial overlap of false positive calls between the two IVT samples, suggesting a post-filtering strategy to improve modification calling by compiling a set of low-confidence sites with a probabilistic model from several IVT samples across diverse cells/tissues. Our analysis highlights key strengths and limitations of the current generation of m6A detection algorithms and offers insights into optimizing thresholds and interpretability. The IVT datasets generated by the RNA004 chemistry provides a publicly available benchmark resource for further development and refinement of computational methods.
... To predict the position of m 6 A in CEVd RNA, we applied several published algorithms (DRUMMER [26] v1.0, xPORE [27] v2.1 and Nanocompore [28] v1.0.4) on four datasets (cCEVd #1, HS-cRNA #1 and the negative controls IVT-CEVd #1 and #2). For each algorithm, we set parameter thresholds based on a recent benchmarking study [23,29]. We observed single-nucleotide polymorphisms (SNPs) due to the naturally occurring variability of viroid populations [30] and several artefacts, such as 'circularization scars', caused by different non-templated nucleotides at the 3' end of in vitro transcripts ( Figure S4). ...
Article
Full-text available
Viroids, small circular non-coding RNAs, act as infectious pathogens in higher plants, demonstrating high stability despite consisting solely of naked RNA. Their dependence of replication on host machinery poses the question of whether RNA modifications play a role in viroid biology. Here, we explore RNA modifications in the avocado sunblotch viroid (ASBVd) and the citrus exocortis viroid (CEVd), representative members of viroids replicating in chloroplasts and the nucleus, respectively, using LC – MS and Oxford Nanopore Technology (ONT) direct RNA sequencing. Although no modification was detected in ASBVd, CEVd contained approximately one m⁶A per RNA molecule. ONT sequencing predicted three m⁶A positions. Employing orthogonal SELECT method, we confirmed m⁶A in two positions A353 and A360, which are highly conserved among CEVd variants. These positions are located in the left terminal region of the CEVd rod-like structure where likely RNA Pol II and and TFIIIA-7ZF bind, thus suggesting potential biological role of methylation in viroid replication.
... These methods typically predict RNA modifications either through the analysis of the raw signal features (e.g., current intensity and/or dwell time) 43,[52][53][54][55][56][57][58] or in the form of differential base-calling 'errors' [47][48][49][50][51] . Previous works have compared the performance of some softwares in detecting m 6 A modifications in DRS datasets, finding a relatively poor overlap between the predictions across softwares 53,57,58,79,80 . However, their comparative performance in detecting other RNA modification types (e.g., Ψ, Nm, ac 4 C) has so far not been assessed. ...
Article
Full-text available
The biological relevance and dynamics of mRNA modifications have been extensively studied; however, whether rRNA modifications are dynamically regulated, and under which conditions, remains unclear. Here, we systematically characterize bacterial rRNA modifications upon exposure to diverse antibiotics using native RNA nanopore sequencing. To identify significant rRNA modification changes, we develop NanoConsensus, a novel pipeline that is robust across RNA modification types, stoichiometries and coverage, with very low false positive rates, outperforming all individual algorithms tested. We then apply NanoConsensus to characterize the rRNA modification landscape upon antibiotic exposure, finding that rRNA modification profiles are altered in the vicinity of A and P-sites of the ribosome, in an antibiotic-specific manner, possibly contributing to antibiotic resistance. Our work demonstrates that rRNA modification profiles can be rapidly altered in response to environmental exposures, and provides a robust workflow to study rRNA modification dynamics in any species, in a scalable and reproducible manner.
... Trained model methods generally outperform the two-sample comparative approaches for detecting the target RNA modification (Maestri et al. 2024). However, due to the challenges generating the training data, robust trained models are only available for a few wellcharacterized RNA modifications, such as m6A (Furlan et al. 2021). ...
... Nanocompore, a tool we developed, is a comparative tool that was recently reported to achieve good performances, especially in case of high-coverage samples (Maestri et al. 2024). Moreover, it allows extracting modification probabilities at base pair level resolution and read-level, a feature that is highly desirable for assessing the co-occurrence of multiple modifications on the same RNA molecule (Leger et al. 2021). ...
Preprint
Full-text available
RNA modifications are critical for transcript function and regulation. Direct RNA nanopore sequencing offers a unique advantage by observing these modifications through characteristic alterations in ionic current signals. Various computational tools have been developed to detect RNA modifications, typically by comparing a sample of interest with a control lacking the target modification, often achieved through enzyme knockdown or knockout. We have developed a robust in vitro transcription protocol to generate a modification-free copy of any input RNA to detect all modifications that alter the raw ionic current signal relative to the canonical nucleotides during nanopore sequencing. We generated an in vitro transcribed sample from K562 cells and used Nanocompore to detect 26,619 modified RNA sites across 2,520 transcript isoforms from 1,742 genes. Of these, 56% are consistent with the well-characterized m6A modification. By inferring the identities of these modifications, we assessed differential usage and correlation patterns, revealing a significant co-occurrence between m6A and m5C modifications within the same transcripts. Additionally, some modifications were non-randomly associated with alternative splicing events. This study provides a comprehensive survey of RNA modifications across the transcriptome, demonstrating the utility of in vitro transcription coupled with direct RNA nanopore sequencing to simultaneously detect multiple modifications without the need for additional independent biochemical assays. The protocol is consistent in the future with more complex experimental designs, for example for differential modification usage between samples.
... 3 Total RNA is extracted from the cell samples by different methodologies like RNA extraction, reverse transcription, spectrophotometry, electrophoresis, PCR (Polymerase Chain Reaction), Sequencing, and gene expression analysis. 4,5 Total RNA molecule population-based epidemiological cohort correlates outcome exposure in a large dataset with a genetic background. 6 The current manuscript integrated high-throughput RNA sequencing (RNA-Seq) and transcriptomic data to provide insight into potential disease mechanisms, identifying specific RNA. ...
Article
Full-text available
A vast number of neurodegenerative disorders arise from neurotoxicity. In neurotoxicity, more than 250 RNA molecules are up and downregulated. The manuscript investigates the exposure of chlorpyrifos organophosphate pesticide (COP) effect on total RNA in murine brain tissue in 4 genotypes for in silico neurodegeneration development. The GSE58103 dataset from the Gene Expression Omnibus (GEO) database applies for data preprocessing, normalization, and quality control. Differential expression analysis (DEG) uses the limma package in R. Study compared expression profiles from murine fetal brain tissues across four genotypes: PON-1 knockout (KO), tgHuPON1Q192 (Q-tg), tgHuPON1R192 (R-tg), and wild-type (WT). We analyze 60 samples, 15 samples per genotype, to identify DEGs. The significance criteria are adjusted p-value <.05 and a |log2 fold change| > 1. The study identifies microRNA485 as the potential biomarker of COP toxicity using the GSE58103 dataset. Significant differences exist for microRNA485 between KO and WT groups by differential expression analysis. Moreover, graphical analysis shows sample relationships among genotype groups. MicroRNA485 represents a promising biomarker for developmental COP neurotoxicity by utilizing in silico analysis in scientific practice.