Figure - available from: Nature Methods
This content is subject to copyright. Terms and conditions apply.
Nano3P-seq can be used to accurately estimate polyA tail lengths in individual molecules a, PolyA tail length estimates of non-polyadenylated (curlcake 1) and polyadenylated (curlcake 2) synthetic RNAs sequenced with Nano3P-seq. See also Extended Data Fig. 1a–c. nt, nucleotides. b, Schematic overview of the standards used to assess the tail length estimation accuracy of Nano3P-seq. c, Box plots depicting tail length estimations of RNA and cDNA standards sequenced with Nano3P-seq. Values on box plots indicate the median polyA tail length estimation for each standard. d, PolyA tail length distribution of yeast, zebrafish, and mouse mRNAs represented as single-transcript values (left) and per-gene medians (right). e, PolyA tail length estimates across different RNA biotypes from mouse brain total RNA enriched in nuclear/mitochondrial RNA. Each dot represents a read. f, Replicability of median per-gene polyA tail length estimations of zebrafish embryonic mRNAs between two biological replicates for three different time points (2, 4, and 6 h.p.f.). g, Median per-gene polyA tail length distribution of zebrafish embryonic mRNAs across zebrafish developmental stages (2, 4, and 6 hpf, shown in blue, green, and red, respectively) in three biological replicates (shown as full lines, dashed lines, and dotted/dashed lines, respectively). h, Comparative analysis of mRNA abundances (shown as log10(RPM) counts) of zebrafish mRNAs binned according to their annotated decay mode (maternal decay, zygotic activation-dependent decay, miR-430-dependent decay, and no decay) during early embryogenesis (t = 2, 4, and 6 h.p.f.). i, Median per-gene polyA tail length estimations of zebrafish mRNAs binned according to their decay mode (maternal, miR-430, zygotic, and no decay) at 2, 4, and 6 h.p.f. For Fig. 3h,i; statistical analyses were performed using the Kruskal–Wallis test. c,e,h,i, The number of observations included in the analysis is shown below each box and violin plot. Box plot limits are defined by lower (bottom) and upper (top) quartiles. The bar indicates the median, and whiskers indicate ±1.5× interquartile range. Source data
Source publication
RNA polyadenylation plays a central role in RNA maturation, fate, and stability. In response to developmental cues, polyA tail lengths can vary, affecting the translation efficiency and stability of mRNAs. Here we develop Nanopore 3′ end-capture sequencing (Nano3P-seq), a method that relies on nanopore cDNA sequencing to simultaneously quantify RNA...
Citations
... Unlike other RNA sequencing methods, dRNA-seq does not require the conversion of RNA to cDNA. It therefore allows for the direct identification of RNA isoforms 2 and associated transcript features such as polyA tail length 3 . This not only provides a more comprehensive view of the transcriptome, it also preserves native RNA base modifications, offering a unique opportunity to study the dynamic epitranscriptome 4 . ...
... This integration enhances WarpDemuX's robustness and allows for precise tuning in diverse dataset conditions. The training data for this class includes fingerprints from outlier reads affected by signal irregularities, such as stalled adapters, blocked pores, or inaccurate DNA/RNA boundaries (see Supplementary Note 1 [3][4][5][6][7]. This superior performance was achieved using just 1% of the training data per barcode (400 instances vs. 40,000), and WarpDemuX's enhanced classification performance over DPC held true for all confidence thresholds tested (Supplementary Table 5). ...
Nanopore direct RNA sequencing (dRNA-seq) enables unique insights into RNA biology. However, applications are currently limited by the lack of accurate and cost-effective sample multiplexing. Here we introduce WarpDemuX, an ultra-fast and highly accurate adapter-barcoding and demultiplexing approach for dRNA-seq with SQK-RNA002 and SQK-RNA004 chemistries. WarpDemuX enhances speed and accuracy by fast processing of the raw nanopore signal, use of a light-weight machine-learning algorithm and design of optimized barcode sets. We demonstrate its utility by performing rapid phenotypic profiling of different SARS-CoV-2 viruses through multiplexed sequencing of longitudinal samples on a single flowcell, identifying systematic differences in transcript abundance and poly(A) tail lengths during infection. Additionally, integrating WarpDemuX into sequencing control software enables real-time enrichment of target molecules through barcode-specific adaptive sampling, which we demonstrate by enriching low abundance viral RNA. In summary, WarpDemuX represents a broadly applicable, high-performance, economical multiplexing solution for dRNA-seq, facilitating advanced (epi-) transcriptomic research.
... FLAM-seq tails poly(A)-selected RNA with guanosines and inosines for full poly(A) tail capture, before priming for retrotranscription and PCR amplification 60 . Nanopore 3′-end-capture sequencing (Nano3P-seq) uses template switching to initiate RT and capture deadenylated molecules and RNAs with other types of tails 61 (Fig. 2f). To avoid poly(A) annealing biases enriching 5′ degradation products 62 , CapTrap-seq modifies the 5′ cap of intact RNA molecules with biotin and uses streptavidin with oligo(dT) priming to detect 5′ capped full-length transcripts 38 (Fig. 2g). ...
Transcriptome sequencing revolutionized the analysis of gene expression, providing an unbiased approach to gene detection and quantification that enabled the discovery of novel isoforms, alternative splicing events and fusion transcripts. However, although short-read sequencing technologies have surpassed the limited dynamic range of previous technologies such as microarrays, they have limitations, for example, in resolving full-length transcripts and complex isoforms. Over the past 5 years, long-read sequencing technologies have matured considerably, with improvements in instrumentation and analytical methods, enabling their application to RNA sequencing (RNA-seq). Benchmarking studies are beginning to identify the strengths and limitations of long-read RNA-seq, although there remains a need for comprehensive resources to guide newcomers through the intricacies of this approach. In this Review, we provide a comprehensive overview of the long-read RNA-seq workflow, from library preparation and sequencing challenges to core data processing, downstream analyses and emerging developments. We present an extensive inventory of experimental and analytical methods and discuss current challenges and prospects.
... CapTrap-seq [ 16 ] or template switching, e.g. long read CAGE [ 17 ]) but rely on pA+ selection, and (ii) protocols that are not reliant on pA+ RNA 3 ends but do not aim to accurately capture 5 ends (e.g Nano3P-seq [ 18 ]). A small subset of such methods is based on direct RNA-seq, e.g. ...
Analysis of transcript function is greatly aided by knowledge of the full-length RNA sequence. New long-read sequencing enabled by Oxford Nanopore and PacBio devices have the potential to provide full-length transcript information; however, standard methods still lack the ability to capture true RNA 5′ ends and select for polyadenylated (pA+) transcripts only. Here, we present a method that, by utilizing cap trapping and 3′-end adapter ligation, sequences transcripts between their exact 5′ and 3′ ends regardless of polyadenylation status and without the need for ribosomal RNA depletion, with the ability to characterize polyadenylation length of RNAs, if any. The method shows high reproducibility, can faithfully detect 5′ ends, 3′ ends and splice junctions, and produces gene-expression estimates that are highly correlated to those of short-read sequencing techniques. We also demonstrate that the method can detect and sequence full-length nonadenylated (pA−) RNAs, including long noncoding RNAs, promoter upstream transcripts, and enhancer RNAs, and present cases where pA+ and pA− RNAs show preferences for different but closely located transcription start sites. Our method is therefore useful for the characterization of diverse capped RNA species and analysis of relationships between transcription initiation, termination, and RNA processing.
... The direct RNA-seq protocol enables sequencing of native RNA, thereby avoiding the reverse transcription and amplification steps, as well as providing information about possible RNA modifications 25,26 . While several long-read RNA-seq datasets have been described, they are low throughput [23][24][25] , lack replicates [27][28][29][30] or cover single conditions 24,31 or individual protocols 25 ; thus, this limits the ability to comprehensively compare and evaluate the different RNA-seq protocols. Here we present the results from the Singapore Nanopore Expression (SG-NEx) project, a comprehensive benchmark dataset and systematic comparison of five different RNA-seq protocols. ...
The human genome contains instructions to transcribe more than 200,000 RNAs. However, many RNA transcripts are generated from the same gene, resulting in alternative isoforms that are highly similar and that remain difficult to quantify. To evaluate the ability to study RNA transcript expression, we profiled seven human cell lines with five different RNA-sequencing protocols, including short-read cDNA, Nanopore long-read direct RNA, amplification-free direct cDNA and PCR-amplified cDNA sequencing, and PacBio IsoSeq, with multiple spike-in controls, and additional transcriptome-wide N⁶-methyladenosine profiling data. We describe differences in read length, coverage, throughput and transcript expression, reporting that long-read RNA sequencing more robustly identifies major isoforms. We illustrate the value of the SG-NEx data to identify alternative isoforms, novel transcripts, fusion transcripts and N⁶-methyladenosine RNA modifications. Together, the SG-NEx data provide a comprehensive resource enabling the development and benchmarking of computational methods for profiling complex transcriptional events at isoform-level resolution.
... To increase the reads that could be used for tail length estimation, replicates of the PAIso-seq results were combined for subsequent analyses unless stated otherwise. We noticed that the tail length measured in TT2 cells appeared shorter than earlier results measured in other cell types (Median of 46-47 nt for gene-level tail length in TT2 cells, compared to 87 nt in HeLaS3 49 , 111.5 nt in human iPSCs 49 , 92.3 nt in mouse brain 50 , 107.8 nt in NIH3T3 12 . To verify the accuracy of our PAIso-seq results, we performed two additional analyses. ...
The mammalian early embryo development requires translation of maternal mRNA inherited from the oocyte. While poly(A) tail length influences mRNA translation efficiency during the oocyte-to-embryo transition (OET), molecular mechanisms regulating maternal RNA poly(A) tail length are not fully understood. In this study, we identified MARTRE, a previously uncharacterized protein family (MARTRE1-MARTRE6), as regulators expressed during mouse OET that modulate poly(A) tail length. MARTRE inhibits deadenylation through the direct interaction with the deadenylase CCR4-NOT, and ectopic expression of Martre stabilized mRNA by attenuating poly(A) tail shortening. Deletion of the Martre gene locus results in shortened poly(A) tails and decreased translation efficiency of actively translated mRNAs in mouse zygotes, but does not affect maternal mRNA decay. MARTRE proteins thus fine-tune maternal mRNA translation by negatively regulating the deadenylating activity of CCR4-NOT. Moreover, Martre knockout embryos show delayed 2-cell stage progression and compromised preimplantation development. Together, our findings highlight protection of long poly(A) tails from active deadenylation as an important mechanism to coordinate translation of maternal mRNA.
... As a result, analyzing the epitranscriptomic landscape for different types of RNA modi cations with DRS remains a challenge 49 . Several recent studies have, for example, highlighted the importance of being able to simultaneously detect multiple RNA modi cations 50,51 and the growing potential of direct RNA sequencing for investigating diverse RNA species and their modi cations, including tRNA 52 , rRNA 53 , and other non-coding RNAs 54 . ...
RNA modifications play a crucial role in various cellular functions. Here, we present ModiDeC, a deep-learning-based classifier able to identify and distinguish multiple RNA modifications ( N ⁶ -methyladenosine, inosine, pseudouridine, 2′- O -methylguanosine, and N ¹ -methyladenosine) using direct RNA sequencing. Alongside ModiDeC, we provide an extensive database of in vitro -transcribed and synthetic sequences generated with both the new RNA004 chemistry and the old RNA002 kit. We show that RNA modifications can be accurately recognized and distinguished across different sequence motifs using synthetic data as well as in HEK293T cells and human blood samples. ModiDeC comes with a graphical user interface that allows easy customization and adaptation to specific research questions, such as learning and classifying additional RNA modifications and further sequence motifs. The reproducibility across samples, together with the low rate of false positives, underscores the potential of ModiDeC as a powerful tool for advancing the analysis of epitranscriptomes and RNA modification.
... Consequently, it is unable to simultaneously capture information from both the coding and non-coding transcriptome while retaining the polyA tail length information. To overcome these limitations, we recently proposed an alternative RNA sequencing approach, which we termed Nanopore 3' End-capture sequencing (Nano3P-seq) 8 , which can capture both the coding and non-coding transcriptome, as well as provide accurate measurements of RNA abundances, tail lengths and tail composition and heterogeneity, with single molecule resolution, without the need of PCR amplification. ...
... Template switching is a reverse transcription (RT) mechanism by which group II intron-encoded reverse transcriptases, such as TGIRT 8,15 (Ingex) or Induro RT (NEB, cat. no. ...
... ; https://doi.org/10.1101/2024.11.20.624491 doi: bioRxiv preprint Upon PolyTailor, per-read predictions are expected to correlate well with tailfindR estimations used in the previous study8 ...
RNA polyadenylation is crucial for RNA maturation, stability and function, with polyA tail lengths significantly influencing mRNA translation, efficiency and decay. Here, we provide a step-by-step protocol to perform Nanopore 3’ end-capture sequencing (Nano3P-seq), a nanopore-based cDNA sequencing method to simultaneously capture RNA abundances, tail composition and tail length estimates at single-molecule resolution. Taking advantage of a template switching-based protocol, Nano3P-seq can sequence any RNA molecule from its 3’ end, regardless of its polyadenylation status, without the need for PCR amplification or RNA adapter ligation. We provide an updated Nano3P-seq protocol that is compatible with R10.4 flowcells, as well as compatible software for polyA tail length and content prediction, which we term PolyTailor . We demonstrate that PolyTailor provides accurate estimates of transcript abundances, tail lengths and content information, while capturing both coding and non-coding RNA biotypes, including mRNAs, snRNAs, and rRNAs. This method can be applied to any RNA sample of interest (e.g. poly(A)-selected, ribodepleted, total RNA), and can be completed in one day. The Nano3P-seq protocol can be performed by researchers with moderate experience in molecular biology techniques and nanopore sequencing library preparation, and basic knowledge of linux bash syntax and R programming. This protocol makes Nano3P-seq accessible and easy to implement by future users aiming to study the tail dynamics and heterogeneity of both coding and non-coding transcriptome in a comprehensive and reproducible manner.
Key Papers
Beğik O, Diensthuber G, Liu H, Delgado-Tejedor A, Kontur C, Niazi AM, Valen E, Giraldez AJ, Beaudoin JD, Mattick JS, Novoa EM. Nano3P-seq: transcriptome-wide analysis of gene expression and tail dynamics using end-capture nanopore cDNA sequencing. Nature Methods 20 , 75–85 (2023).
https://doi.org/10.1038/s41592-022-01714-w
Delgado-Tejedor A, Medina M, Begik O, Cozzuto L, Lopez J, Blanco B, Ponomarenko J, Novoa EM. Native RNA nanopore sequencing reveals antibiotic-induced loss of rRNA modifications in the A- and P-sites. NatComm 15 , 10054 (2024). https://doi.org/10.1038/s41467-024-54368-x
... One of the key limitations of current long-read sequencing techniques is their reliance on oligo(dT)-based RT, which preferentially captures polyadenylated mRNAs and neglects crucial non-polyadenylated transcripts, including lncRNAs, circRNAs and pathogen RNAs lacking poly(A) tails. Effective detection of these non-polyadenylated transcripts has emerged as a critical aspect of transcriptome analysis [48][49][50][51] . Based on our previous efforts in detecting circRNAs 14 , PROFIT-seq adapts a combinatorial RT strategy using double-stranded oligo(dT) and random primers as well as ssN, which enables the simultaneous detection of both non-polyadenylated and circular transcripts without biasing the expression levels of dominant mRNAs. ...
The high diversity and complexity of the eukaryotic transcriptome make it difficult to effectively detect specific transcripts of interest. Current targeted RNA sequencing methods often require complex pre-sequencing enrichment steps, which can compromise the comprehensive characterization of the entire transcriptome. Here we describe programmable full-length isoform transcriptome sequencing (PROFIT-seq), a method that enriches target transcripts while maintaining unbiased quantification of the whole transcriptome. PROFIT-seq employs combinatorial reverse transcription to capture polyadenylated, non-polyadenylated and circular RNAs, coupled with a programmable control system that selectively enriches target transcripts during sequencing. This approach achieves over 3-fold increase in effective data yield and reduces the time required for detecting specific pathogens or key mutations by 75%. We applied PROFIT-seq to study colorectal polyp development, revealing the intricate relationship between host immune responses and bacterial infection. PROFIT-seq offers a powerful tool for accurate and efficient sequencing of target transcripts while preserving overall transcriptome quantification, with broad applications in clinical diagnostics and targeted enrichment scenarios.
... This is essential for identifying cell types (Philpott et al., 2021;Shiau et al., 2023aShiau et al., , 2023b, understanding developmental mechanisms , and profiling transcriptomes . The focus is on differential transcript expression (Padilla et al., 2023), alternative splicing (Liu et al., 2023a;Wu et al., 2023), Poly(A) tail (Begik et al., 2023;Liu et al., 2023b), RNA modifications (He et al., 2022), and novel transcript discovery (Glinos et al., 2022). This approach provides a detailed understanding of transcript structure and function. ...
Over the past decade, nanopore sequencing has experienced significant advancements and changes, transitioning from an initially emerging technology to a significant instrument in the field of genomic sequencing. However, as advancements in next-generation sequencing technology persist, nanopore sequencing also improves. This paper reviews the developments, applications, and outlook on nanopore sequencing technology. Currently, nanopore sequencing supports both DNA and RNA sequencing, making it widely applicable in areas such as telomere-to-telomere (T2T) genome assembly, direct RNA sequencing (DRS), and metagenomics. The openness and versatility of nanopore sequencing have established it as a preferred option for an increasing number of research teams, signaling a transformative influence on life science research. As nanopore sequencing technology advances, it provides a faster, more cost-effective approach with extended read lengths, demonstrating the significant potential for complex genome assembly, pathogen detection, environmental monitoring, and human disease research, offering a fresh perspective in sequencing technologies.
... Unlike other RNA sequencing methods, dRNA-seq does not require the conversion of RNA to cDNA. It therefore allows for the direct identication of RNA isoforms [2] and associated transcript features such as polyA tail length [3]. This not only provides a more comprehensive view of the transcriptome, it also preserves native RNA base modications, oering a unique opportunity to study the dynamic epitranscriptome [4]. ...
... We randomized the RNA in-line barcode to RTA barcode assignment across replicates to circumvent any potential of bias introduced by the in-line RNA barcode signal. The generated data sets (1)(2)(3)(4)(5)(6) are detailed in Table S1. ...
Nanopore direct RNA sequencing (dRNA-seq) enables unique insights into (epi-)transcriptomics. However, applications are currently limited by the lack of accurate and cost-effective sample multiplexing. We introduce WarpDemuX, an ultra-fast and highly accurate adapter-barcoding and demultiplexing approach. WarpDemuX enhances speed and accuracy by fast processing of the raw nanopore signal, use of a light-weight machine-learning algorithm and design of optimized barcode sets. We demonstrate its utility by performing a rapid phenotypic profiling of different SARS-CoV-2 viruses, crucial for pandemic prevention and response, through multiplexed sequencing of longitudinal samples on a single flowcell. This identifies systematic differences in transcript abundance and poly(A) tail lengths during infection. Additionally, integrating WarpDemuX into sequencing control software enables real-time enrichment of target molecules through barcode-specific adaptive sampling, which we demonstrate by enriching low abundance viral RNA. In summary, WarpDemuX is a broadly applicable, high-performance, and economical multiplexing solution for nanopore dRNA-seq, facilitating advanced (epi-)transcriptomic research.