Science method

RNA-Seq - Science method

Explore the latest questions and answers in RNA-Seq, and find RNA-Seq experts.
Questions related to RNA-Seq
  • asked a question related to RNA-Seq
Question
3 answers
I want to find out different isoforms of a gene. I know microarray or RNA seq can be used. But I need clarification or any other methods that can be used.
Relevant answer
Answer
According to me,
To find transcript isoforms of a gene, you can use:
  1. RNA-Seq (Best method) – Provides high-resolution transcript isoform identification and quantification.
  2. Microarray – Not ideal, as it mainly detects known transcripts and lacks isoform-level resolution.
  3. qRT-PCR – Can validate specific isoforms if primers are designed accordingly.
  4. Isoform-Specific Databases – Use resources like Ensembl, RefSeq, or GENCODE for known isoforms.
Microarrays are not suitable for discovering novel isoforms but can be used to detect known ones. RNA-Seq is the preferred method.
  • asked a question related to RNA-Seq
Question
5 answers
Hello,
I have a basic question. I work with mouse hearts and I want to collect them to do bulk RNA-seq in the future. What is the best way to store them?
Right now it was recommended to collect them, rinse in RNA-free dH20 and store them in RNA Later in -80ºC until ready for extraction.
Should I mince them before freezing or can I just freeze them as a whole?
Relevant answer
Answer
Airton Pereira e Silva Doing a test-run with samples that aren't worth much and minimizing risk wherever you can is always a good idea. In one of my experiments 12 mouse livers were worth £30k - so I put each of my samples in two separate RNAlater containers stored in two separate freezers... I extracted and analysed each sample twice independently...
Re good RNA integrity - many researchers blame the kit/storage when most likely it is their lab technique that leads to RNA degradation. Again, test-runs will help there...
Immediate submersion in RNAlater is important - I've seen people extract organs from all their mice, carry them to their lab and then put them all into RNAlater there, and also have done tests on how long it takes for degradation to set in if you delay submersion... 5 min was ok (= RIN>9), >30 min was borderline (= RIN~8), >1h was unusable. So I took prepared containers with RNAlater into the mouse unit and extracted each liver, cut them into pieces and immediately added them to the containers. At this stage there's no need to be RNase free as the RNases within the cells are much more likely to be the issue.
The time issue could partly explain how some people get RIN of >9 all the time and others don't... but apparently you also have to really avoid cutting the mouse gall bladder!
I also discussed the time issue with one of the human tissue bank coordinators - apparently they had consistently bad RNA... turns out a theatre nurse would just put the tissue sample or extracted organ from transplant patients on ice and then give it to the tissue bank collector AFTER the surgery finished to not to break sterility of the operating theatre and herself... we solved it by making the collector scrub in to wait for the sample.
Random mouse hearts shouldn't be hard to come by - just ask the Animal technichian/other researchers if they have any that will be culled or are superfluous to breeding.
  • asked a question related to RNA-Seq
Question
3 answers
Hello everyone,
I am looking for comprehensive resources—books or review articles—that cover transcriptomics and RNA-Seq in detail. Specifically, I am interested in:
Principles and workflows for RNA-Seq experiments.
Methods for data analysis and visualization.
Applications of transcriptomics in research.
Any recommendations for beginner to advanced levels would be highly appreciated.
Thank you in advance for your suggestions!
Best regards
Relevant answer
Answer
Hello Seif,
I have been working on gene expression profiling for many years now. We have detected several biomarkers with our methods and, up to now, 4348 citations of our 231 Research articles. Try https://www.researchgate.net/profile/Joachim-Gruen/research.
  • asked a question related to RNA-Seq
Question
4 answers
I created a stem cell cell line with BCR-ABL1 fusion transcript. The Sanger sequencing at DNA level confirms fusion but could not be detected by RNA seq. The fusion in DNA level is located at the intronic region for both BCR and ABL1. Cell growth is different (higher) in the mutated cell ine with BCR-ABL1 fusion compared to The wild type counterpart. This is very likely because of the presence of such a cancer fusion gene. Is it possible that the fusion transcript is not possible to be detected in this case?
Relevant answer
Answer
In case the reference genome used for reads alignment does not contain the fused gene sequence,You may generate the reference genome using the expected fusion product and try aligning the reads to them.
  • asked a question related to RNA-Seq
Question
2 answers
So - my lab did a bunch of RNA isolations from tissue culture cells, for purposes of RNA-seq, and submitted them to a core facility. We got back a notification that the RINs were too low, and we should submit better quality RNAs. Now, before I sent them off I ran them on a gel, and to my naive eyes, because I don't run RNA gels very often, they looked okay - I could see rRNA bands, and there was no apparent degradation, i.e. smeary bands at the bottom of the lane.
BUT: since the last time we did RNA-seq, which was around the beginning of the pandemic, and now, we have changed our RNA isolation procedure. Previously we used QIAGEN RNeasy, with On-Column DNase, but I find that protocol very tedious and it's ridiculously expensive, so I moved to a method that I'd used in the dark ages (early 2000s), of (1) Trizol lysis -> (2) isopropanol precip -> (3) DNase digestion -> (4) ethanol precipitation. This gave much better yields than RNeasy, and worked great for RT-qPCR. Now, from reading online, I knew that RNA-seq requires higher purity than you get from a precipitation-based Trizol protocol, so for these samples I went from DNase digestion to a column purification kit (EZNA Total RNA Kit I, which multiple papers used for this purpose). The elutions from these is what I had run on the gel, and what I sent to the core facility.
Now, having been told that our RNA sucked, I redid some RNA isolations, doing the Trizol/DNase/EZNA method side-by-side with RNeasy/On-Column DNase, in parallel on the same cells. The image of the gel I ran is linked below, lanes as follows:
1. DNA ladder 2. RNA ladder 3. "positive control" total RNA (lol - don't buy Fisher 43-072-81 for this purpose) 4. one of my old RNAs samples, made with Trizol/DNase/EZNA, which gave a RIN score of 4.0 5-6. newly-prepped RNA samples, using Trizol/DNase/EZNA method 7-8. newly-prepped RNA samples, using RNeasy
Looking at the gel, it's very likely that the RNeasy samples (lanes 7-8) would give a higher RIN score, because a big part of the RIN score is the 28S:18S ratio, and there's more 28S rRNA in those lanes. But when I look at lanes 4-6 (Trizol method, old samples vs new), I don't see a clear difference in overall quality from 7-8 - there's no increase in low m.w. RNA, and the overall smeariness looks comparable. Instead, it looks like there was preferential recovery of RNA below the size of the 28S rRNA, which would thus produce a lower RIN score.
So, tl;dr version: is there any reason to think that the Trizol method might favor lower-size RNAs, which could produce an artifactually low RIN score? In other words, could the difference between my samples be explained by size-biased recovery of RNA, rather than degradation? Has anyone here ever seen anything like this? And looking at the gel, would you say that we should re-isolate all our samples using RNeasy (which will be a pain, as there were 24 samples total from 4 different cell lines), or would you expect that RNA-seq should still be interpretable despite the low RINs? (I know, I know, we are already thawing the cells to redo it all because why spend the money when you're in doubt, but I've never been a big fan of the Bioanalyzer methodology and I hate QIAGEN.)
Relevant answer
Answer
I am also facing the same problem with TRIzol lysis. I am getting very low RIN. I used Qiagen RNeasy kit also but no affect in RIN. Is TRIzol reagent too harsh for the cells?
  • asked a question related to RNA-Seq
Question
3 answers
In my RNA-seq data, I have on average >70% duplicate read percentage ( for all samples). Any suggestion how to clear my data to eliminate this bias?
Relevant answer
Answer
Depending on the type of your experiment, read duplication is to be expected and deduplication is not necessarily beneficial. You might want to have a look at this article: https://www.nature.com/articles/srep25533
  • asked a question related to RNA-Seq
Question
3 answers
Too high proportion of ncRNAs (especially SRP RNA) in the RNA sample, which hampers the RNA-sequencing. Is there anyway to remove (digest or 'catch') the ncRNAs in the RNA sample ?
Thanks for your suggestions in advance.
Relevant answer
Answer
There are two common ways to eliminate most ncRNAs.
1) Ribodepletion (depletion of rRNAs) is used to eliminate rRNAs, which constitute 80-85% of total RNAs. Although this approach eliminates the majority of rRNAs, the samples are still contaminated with tRNAs and other ncRNAs.
2) Enrichment for poly(A)+ transcripts is another approach which eliminates all poly(A)- transcripts from total RNA samples. This approach is highly desirable to study poly(A)+ transcripts, particularly mRNAs. However, one must keep in mind that some poly(A)+ kits do not eliminate all poly(A)- transcripts. Thus make sure to check the efficiency of elimination. If one is interested in studying lncRNAs, it is important to keep in mind that some lncRNAs are polyadenylated where some others are not. In this case, ribodepletion would be more appropriate.
  • asked a question related to RNA-Seq
Question
2 answers
I have an RNA-seq data that I have analysed using Limma-voom and have extracted the gene IDs, log2FC and the p-values. At p value < 0.05, I have over 10,000 DEGs, however, when I run the GO analysis, I hardly get any GO enrichment terms at the set pvalue_cutoff of 0.05 because the pvalues for the GO analysis are almost tending to 1.00. Is it possible to have such number of DEG genes and hardly get any GO terms? If I should relax the cutoff for the pvalue, what acceptable pvalue above 0.05 won't be too ridiculous or that will be acceptable?
Thank you!
Relevant answer
Answer
Hello Olatunde Omotoso,
yes, it could be possible to find such huge numbers of significant differentially expressed genes (DEGs). It depends on which two groups you are comparing. Nevertheless this is not very valid, if using only p-values or even corrected p-values. In chip data analyses we used a multi-parameter method (High-Performance Chip Data Analysis, HPCDA), that includes also p-values but a lot of other relevant data, as could be seen in my profile and the added links. You should analyze your RNA-Seq data in a similar way and then upload the resulting DEGs into GO. Surprisingly a fold change (FC) was not advantageous to get the list of significant DEGs with HPCDA. Using a FC of 4 (log2 FC of 2) could loose a lot of relevant genes, but this depends on the number of individual datasets in both groups.
  • asked a question related to RNA-Seq
Question
4 answers
My lab is starting to use RNA-seq to characterize gene transcription changes in the gut microbiome, and I am wondering if it is at all feasible to assess transcripts using RT-qPCR?  We are trying to assess changes in mRNA levels independent of species, and there are a couple problems were encountering.  
For one, there is too much sequence divergence across species to use a single primer.  So we have tried to design primers that work for individual phyla, but even then it appears as though some species are more well suited towards the primer than others.
The second problem is finding a suitable reference for quantification.  There's no way were going to make plasmids for 100+ genes, but I don't know if any small number of plasmids would be representative of overall mRNA levels.
Can anyone tell me whether they have thought about this before or successfully done this?  Is it easier/harder than I realize?  Anything I need to know about before I continue?
Relevant answer
Answer
RT-qPCR (Reverse Transcription Quantitative Polymerase Chain Reaction) can be used to analyze the gut microbiome, but there are specific considerations and steps involved in applying this technique to microbiome studies. Here’s how RT-qPCR can be utilized for gut microbiome analysis:
**1. Overview of RT-qPCR for Microbiome Analysis
**1.1 Purpose:
Quantitative Measurement: RT-qPCR can be used to quantify specific microbial populations or functional genes in the gut microbiome.
Gene Expression: It allows for the measurement of gene expression levels in microorganisms, providing insights into their activity and metabolic functions.
**2. Steps for RT-qPCR in Gut Microbiome Studies
**2.1 Sample Preparation:
Isolation of RNA: Extract RNA from gut microbiome samples (fecal samples, biopsies, etc.). This step is critical as the quality and purity of RNA can affect downstream results.
RNA Quality Control: Ensure that the RNA is of high quality and free of contaminants. RNA integrity can be assessed using a bioanalyzer or similar equipment.
**2.2 Reverse Transcription:
Convert RNA to cDNA: Use reverse transcriptase to convert the extracted RNA into complementary DNA (cDNA). This step is necessary because RT-qPCR amplifies DNA, not RNA.
**2.3 Design and Validation of Primers:
Target-Specific Primers: Design primers specific to the target genes or microbial taxa of interest. For gut microbiome studies, these targets might include genes involved in microbial functions or markers for specific microbial groups.
Primer Validation: Validate the specificity and efficiency of the primers using standard curves and melt curve analysis.
**2.4 RT-qPCR Setup:
Prepare Reaction Mix: Set up the RT-qPCR reactions using the cDNA samples, primers, and appropriate qPCR master mix.
Thermal Cycling Conditions: Optimize thermal cycling conditions (denaturation, annealing, and extension) according to the primers and master mix used.
**2.5 Data Analysis:
Quantitative Analysis: Analyze the qPCR data to quantify the expression levels of target genes or the abundance of specific microbial taxa.
Normalization: Normalize the data against housekeeping genes or other control genes to account for variations in sample input and efficiency.
**3. Applications of RT-qPCR in Gut Microbiome Studies
**3.1 Microbial Taxa Quantification:
Species Identification: Quantify specific bacterial species or genera in the gut microbiome by targeting unique genetic markers.
Abundance Measurement: Measure the relative abundance of specific microbial populations in different samples or conditions.
**3.2 Functional Gene Analysis:
Gene Expression: Assess the expression of functional genes related to metabolic processes, pathogenicity, or antibiotic resistance.
**3.3 Comparative Studies:
Disease Association: Compare the abundance or expression of microbial genes between healthy and diseased states to identify potential biomarkers or therapeutic targets.
**4. Considerations and Limitations
**4.1 RNA Stability:
Handling: RNA is less stable than DNA, so careful handling and storage are required to avoid degradation.
**4.2 Primer Specificity:
Cross-Reactivity: Ensure primers are specific to the target organisms or genes to avoid cross-reactivity and inaccurate quantification.
**4.3 Sensitivity:
Detection Limits: RT-qPCR may not detect low-abundance microbial populations effectively. Combining RT-qPCR with other techniques (e.g., metagenomics) can provide a more comprehensive view.
**4.4 Normalization:
Reference Genes: Use appropriate reference genes or internal controls to normalize the data and account for variations in sample processing and cDNA synthesis.
Summary
RT-qPCR is a powerful tool for analyzing the gut microbiome, providing insights into microbial composition and gene expression. By carefully preparing samples, designing specific primers, and analyzing the data, you can obtain quantitative information about microbial populations and their functional roles. However, due to the complexity and diversity of the gut microbiome, it’s often useful to combine RT-qPCR with other methods like metagenomics for a more comprehensive understanding.
l Perhaps this protocol list can give us more information to help solve the problem.
  • asked a question related to RNA-Seq
Question
3 answers
Recently, we observed that 99% of the sequences in our RNA-seq data corresponded to the E. coli genome. Despite multiple DNAse treatments after RNA extraction and ribosomal depletion, we were unable to eliminate the genomic E. coli contamination. We designed a primer targeting a highly abundant E. coli gene present in our RNA-seq data and checked for contamination after each step and reagent. We discovered that the RNase inhibitor from A&A Biotechnology was the source of the contamination. Therefore, I highly recommend against using this product for RNA-seq experiments.
Relevant answer
Answer
So basically this is not a question which require any answer.
  • asked a question related to RNA-Seq
Question
4 answers
How to get the RNA-seq data for gene expression analysis from SRA data?
Relevant answer
Answer
You can use prefetch of SRA toolkit given you know the accession number.
  • asked a question related to RNA-Seq
Question
12 answers
Hi everyone,
Its been over 3 months that I am trying to develop a script for variant calling and RNA seq analysis for my project. I have attended quite a few workshops but it feels like a scam. I have nobody who can guide me and I really want to learn the analysis. Can anybody tell me if there are currently any short term courses for the same?
Relevant answer
Answer
This is unfortunately the pain-point of bioinformatics journey which I've been through. I can recommend this book - https://www.biostarhandbook.com/ & have a look at the companion course https://www.biostarhandbook.com/edu/course/6/.
For variant calling (germline/hereditary) look into the GATK workflow https://gatk.broadinstitute.org/hc/en-us/articles/360036194592-Getting-started-with-GATK4
For the Salmon and Deseq-2 combination this tutorial will help but you need to know some Linux & Rstudio https://www.hadriengourle.com/tutorials/rna/#import-read-counts-using-tximport
Most freeware stuff is optimized or available on Linux. Ubuntu is not a bad start. If you need a virtual machine, have a look into Docker or https://www.virtualbox.org/
You should look into develop basic skills in Linux, Rstudio and Python to run the tools. Nowadays, you don't really need java or C-level languages to run the tools. Developing algorithms are a different story though. Keep your determination, I knew nothing in 2017 when I started as well :)
  • asked a question related to RNA-Seq
Question
9 answers
So I'm using TRIzol for RNA extraction, but suddenly I'm getting no pellet during the isopropanol step *even though I added glycogen.*
A few weeks ago I:
1. Took a large quantity of bacteria
2. Resuspended in TRIzol
3. Made a bunch of aliquots
4. Stored them all at -80
And those couple weeks ago, I was getting~50 ug of RNA per aliquot (with good RINs, and they looked great on RNA gels).
Yesterday, I took another aliquot and tried to prep it using the exact same process, and got nothing. No pellet appeared during the isopropanol precipitation step even though I added GlycoBlue.
I continued the purification to see if the pellet was just hard to see: ~0 ng/uL by nanodrop
Did I just make a pipetting error? Was my isopropanol bad? No: I repeated the repeated the process *again* with completely new sample, completely new isopropanol, and still no pellet at all.
I'm confident I'm lysing my samples well. I optimized the lysis a while ago, but even before optimization, my yields were never this bad.
I'm sure I added the GlycoBlue. I watched it diffuse into my sample.
I'm sure I added isopropanol. I watched the alcohol/water mix and used brand new isopropanol.
I'm sure I mixed the isopropanol/aqueous phase. I watched it carefully.
I'm sure I actually spun my samples. I tried it over and over.
Protocol:
1. Take bacteria+TRIzol aliquot from -80 (which used to give ~50 ug)
2. Lyse via bead beating (same beads, same bead beater, same settings, same duration that gave 50 ug in the past)
3. Add 0.2 V chloroform
4. Centrifuge 12k xg for 15 min
5. Take upper, colorless aqueous phase
6. Add 1 V of RT isopropanol
7. Add ~30 ug GlycoBlue
8. Invert a bunch of times to mix well
9. Incubate 10 min RT
10. Centrifuge ~20k xg for 10 min
Nothing. No pellet at all. Even with glycogen.
I tried spinning again: No pellet
How is this possible?
Relevant answer
Answer
I’ve had similar issues recently isolating small quantities of RNA (ribosome protected fragments). I use 300 mM sodium acetate pH 5.2 and 5 mM magnesium chloride followed by ethanol precipitation (2.5 volumes) overnight at -20 C with addition of 1.5 uL GlycoBlue. Spin at 20,000 x g for 1 h at 4 C in Eppendorf low bind tubes. I see good pellets in some samples but not for others. I do invert tubes sufficiently after addition of ethanol. I wonder if vortexing and centrifuging again might help. In some samples in looks like several particles precipitated on the side of the tube and not the bottom. Perhaps such multi-localized pelleting is the issue. Otherwise my best guess is RNase contamination. Although I am very careful with using fresh gloves and RNase Away on pipettes and gloves.
  • asked a question related to RNA-Seq
Question
5 answers
I've been trying to extract RNA from mouse lung tissues (normal and tumour) and slowly improving the yield and purity measured via Nanodrop and Qubit (I'm consistently getting 260/280 ratios of ~2 and 260/230 >2 + yields of 40-100 ng/uL). I'm using the Qiagen Mini kit with DNase I and have been using all the small tweaks I can find from here and papers to optimize my yield. I've also been trying to work with RNaseZap and maintain as clean of an environment as possible.
My trouble currently is with RNA degradation- on the Bioanalyzer my samples have come back with RIN values with a range of 5.4 to 6.9 (attached), which from what I've read is unfortunately too low for RNA-seq. I've been using an OMNI tissue homogenizer for 20 seconds x3, is this possibly leading to shearing of my RNA? Has anyone else used a rotor homogenizer on tissue and had good RIN values? I can't think of what else to optimize and I'm wondering if it's potentially a sample issue (the samples are 2-4 years old stored in the -80), but I want to maximize everything I can before coming to that conclusion.
Would another extraction method potentially lead to less degradation? Would Trizol + then running it through an RNeasy column potentially help with this issue?
Relevant answer
Answer
The reason you re getting less yield with gauge needles might be due to incomplete/partial mechanical breakdown. I see you said that you used 8 (perhaps you meant 18?), perhaps the 8 gauge isn't breaking down enough of the tissue clumps so that the 21 and 25 gauge can finish off at a finer level. I would stress that the first gauge is crucial for the success of the finer needles. 21 and 25 are quiet tough, I know 21 gauge was giving me real difficulty/strain on my fingers.
  • asked a question related to RNA-Seq
Question
1 answer
Can anyone please suggest me some commercially available kits specifically designed for sequencing RNA viruses (in particular I am targeting a positive ssRNA virus that does not have a poly(A) tail).
Your help and suggestions will be highly appreciated.
Relevant answer
Answer
  1. NEBNext Ultra II Directional RNA Library Prep Kit (NEB):Catalog number: E7760S (for 24 reactions)
  2. SMARTer Stranded Total RNA-Seq Kit v2 (Takara Bio):Catalog number: 634413 (for 96 reactions)
  • asked a question related to RNA-Seq
Question
6 answers
I would like help reading the quality and integrity of my RNA. This picture makes me think all my RNA seems degraded?
I have been trying to extract RNA from mouse lung tumor and normal tissue for the past month with varied success in terms of concentrations and purity from Nanodrop and Qubit (concentrations too low to measure to 80 ng/uL). My tissue sizes are very small (about 7 to 20mm3), so I've been trying to do all I can to maximize my yield like using RNALater-ICE and using RNAse Zap on my equipment. I currently use an Omni tissue homogenizer for 40s on my tissue in RLT buffer plus beta-mercaptoethanol. My extractions have been done with the Qiagen Mini kit and I've read all the posts I can find on here and several papers about how to optimize RNA yield with this kit; yet my yields are just averaging about 40 ng/uL of RNA.
I've decided to add the optional DNAse digestion step to my extractions and I wanted to check for gDNA contamination and assess the RNA quality of my extractions by gel electrophoresis since we do not have access to a Bioanalyzer. I've seen that RNA can be run on native gels, so these pictures are of a 1% agarose gel with 1X TBE and 60V at 60 minutes with a DNA ladder to check running of the gel and my RNA samples +/- DNase, samples were mixed with 6x loading dye. Are these images indicative of RNA degradation or do I need to run a non-denaturing gel (if so, how do I do that)? There's a dark pinkish band on the bottom half of the gel that's hard to see in the picture very clear in normal lighting and I'm not sure what that represents? I definitely don't see bands for 28S and 16S so I'm feeling kind of hopeless that all my RNA quantities measured by Qubit represent poor quality RNA. I would like to send my RNA for RNA sequencing eventually (not specifically from these samples, which are more for practice).
Thanks so much in advance for reading through and offering any guidance.
Relevant answer
Answer
Yes you could combine the RNA from different columns together.
The Qiagen kit also says you should get 10-20ug of RNA from lungs, assuming 30mg. However, the binding capacity for the column is 100ug. So in theory, you could use 150-300mg tissue and still be under the max binding capacity.
Depending on how the samples were frozen years ago, flash freeze in liquid nitrogen then store at -80 vs just putting them in the -80 to freeze, could definitely contribute to degradation.
  • asked a question related to RNA-Seq
Question
6 answers
I I have RNA-Seq data for A. thaliana and I would like to perform clustering based on accessions to observe grouping patterns. What steps should I take? Should I consider interactome, transcriptome, etc?
Thank you! :)
Relevant answer
Answer
Thank you!
  • asked a question related to RNA-Seq
Question
1 answer
Here are some examples of software that can be used for each step of RNA-seq data analysis:
  1. Quality Control: FastQC, PRINSEQ, Sickle
  2. Read Trimming: Trimmomatic, Cutadapt, AdapterRemoval
  3. Alignment: STAR, HISAT2, TopHat
  4. Quality Control of Alignment: Qualimap, RSeQC, Picard
  5. Assembly: Trinity, Oases, Trans-ABySS
  6. Quantification: RSEM, Kallisto, eXpress
  7. Differential Expression Analysis: DESeq2, EdgeR, limma
  8. Functional Annotation: Blast2GO, KEGG, Reactome
  9. Pathway Analysis: KEGG Pathway, Reactome, Enrichr
  10. Network Analysis: Cytoscape, STRING, ClueGO
  11. Visualization: IGV, GenomeBrowse, JBrowse
  12. Interpretation: GSEA, DAVID, IPA
Relevant answer
Answer
For the alignment step, I think it's important to mention pseudo alignment (Salmon, Sailfish, Kallisto) and RUM for hybrid alignment to both genome and transcriptome.
  • asked a question related to RNA-Seq
Question
1 answer
Can I align RNA-seq reads to WES data instead of a reference genome to obtain raw read counts?
Relevant answer
Answer
Why you want to align RNA data against WES? This does not make any sense. For WES, normally, the data is aligned against the whole genome as both are same.
Even if you do, it will take quite a long time, plus you need to take care of splice event as well that will complicate the already complex step.
  • asked a question related to RNA-Seq
Question
1 answer
I have performed a virome analysis on a single plant (host) and found raw viral sequence data generated by High Throughput Sequencing (HTS). The reviewers of the journal have requested that I submit these raw reads of virome sequences to the NCBI sequence read archive (or comparable database). Could you guide how to deposit these raw reads into the SRA? I would greatly appreciate your assistance on this matter. Thank you
Relevant answer
Answer
Hi Mesele
you need to go to the SRA site (https://submit.ncbi.nlm.nih.gov/subs/), need to log in or create an account, and first create a bio project. once you'll get the bio project, you'll need to create the samples, and lastly load the data by the SRA download kit. take care of having words to explain the project and fill the forms, and to be exact on the samples names, it's very case sensitive.
all the best
fred
  • asked a question related to RNA-Seq
Question
4 answers
Hello everyone,
I'm currently working on analyzing RNA seq data and downloaded a fastq file from the NCBI website for practice. After using trimmomatic to trim my data, I faced challenges while attempting to align it with the human genome (hg38) using either Hisat2 or the STAR tool. Specifically, I encountered issues when running the 'make' command to build the Hisat2 executable.
I would appreciate any advice or guidance on resolving these challenges.
Thank you!
Relevant answer
Answer
Use The Galaxy platform
  • asked a question related to RNA-Seq
Question
7 answers
I have three RNA-Seq datasets of the same tissue and want to analyse them on Galaxy. My initial literature survey gave me the idea that I can merge the three datasets if they are from the same model and tissue followed by making two groups Control and Test and then run the analysis. Am I correct?
Can somebody with more experience elaborate on this?
Or it is a better idea to analyse the three datasets separately and find the common mRNAs?
Relevant answer
Answer
There will likely be significant batch effects. I would analyze each set separately to get a higher power (which will be the case when the variability between the sets is large and won't be compensated by the reduction of standard errors due to the larger sample size).
You might consider pooling the p-values according to Fisher's method, if you need a single p-value per gene.
  • asked a question related to RNA-Seq
Question
3 answers
We have extracted RNA from our fungal strain, growing in three different carbon source growth conditions. We have received the RNA-Seq data and have carried out different gene expression analysis (DESeq) between either two of the growth conditions.
Now we are interested in the absolute gene expression levels across all the growth conditions apart from DESeq.
I have the raw hit-counts files ready in a table, first column is gene ID, 2-10 columns are the three replicates of condition 1, condition2 and condition3, respectively.
The next step would be normalization of the read counts and generate the absolute gene expression levels. However, I have limited knowledge of R, in this case, can I do it manually or have to use R to do that? Is there any package (including normalization) I can use? How can I generate a table or even a plot such as heatmap of the top 10 genes?
Thank you very much. Any hint or guide will be very much appreciated.
Relevant answer
Answer
Hello Tiantian Fu,
DESeq2 can generate such files, usually the best one is VST normalized table, where you plot each gene alone with very little variations among samples.
Regarding the top 10, they are the 10 genes with the lowest FDR values from the DESeq2 results table, then you can go back to the VST normalized table and calculate the Z scores of the top 10 genes. After that you can use GraphPad prism to plot a heatmap.
Hope this helps
  • asked a question related to RNA-Seq
Question
2 answers
I downloaded the RNA-Seq dataset from the depository and took it for analysis.
For a tissue, sequencing was done 12 times and therefore had different sets of reads and normalized counts. Now for plotting a graph, we need a single value how to get a single normalized count value for a single gene from different sequencing data.
Whether can we do the average or sum of the CpM values of individual runs?
Kindly give me your suggestions on this.
Relevant answer
Thank you Govindkumar Balagannavar, for clearing my query. It was really helpful.
  • asked a question related to RNA-Seq
Question
2 answers
Most RNA-seq analysis methods lead to log2FoldChange, whereas in qPCR, analyzed via ΔΔCT method, also gives a foldchange result. So i wonder, in ideal conditions, should those 2 foldchange results obtained from different experiments be roughly the same? Or at least, agree with each other? If there is always discrepancies between RNA-seq and qPCR results, in normal cases, how big will it be?
My situation is that, the foldchange of my samples from qPCR is above 10, while the one of the same sample from RNA-seq is only 1.3 (p<0.05, pajd<0.05). They're still of the same trend indeed, but clearly it's not reasonable to say that they agree with each other. I just can not understand what's going on and don't know what to do next.
Relevant answer
Answer
Ideally yes the RNAseq and PCR results are in agreement. However, there is often some variation. As you say, you are seeing the same trend of increased expression for your gene of interest. It is possible that the read coverage for your gene is low, or if there are other similar genes, reads may be mapped to homologs. Are you using paired tests of the same sample? There could also be sample to sample variability.
  • asked a question related to RNA-Seq
Question
2 answers
I possess RNA-Seq data in fastq.gz format and I'm seeking guidance on transforming it into a suitable BigWig (.bw) format.
What steps or tools are recommended to convert FASTQ files to BigWig format for visualization and analysis purposes? Any insights or recommended protocols would be greatly appreciated
Relevant answer
Answer
Hi Mito
you'll need some expertise in bioinformatics to get this road.
starting from fastq files, you'll need to extract data, normalize them, align them on a genome and count the transcripts...once it's done, you'll have a count matrix you'll be able to use for your BW construction.
all the best
fred
  • asked a question related to RNA-Seq
Question
2 answers
I have a single cell RNA seq dataset with 9 clusters covering 3 different cell types. I want to get the information on Gene Ontology and Pathways. Should I do a pseudo-bulk analysis or can I go ahead with the differentially expressed genes (DEGs) obtained from contrasting the two groups in the scRNA seq analysis? Is any R script available for this?
Relevant answer
Answer
Analyzing differential gene expression in single-cell RNA sequencing (scRNA-seq) data involves several steps. Below is a generalized workflow for obtaining differentially expressed genes (DEGs) from a scRNA-seq dataset contrasting two groups:
  1. Data Preprocessing:Quality Control (QC): Assess the quality of your scRNA-seq data by checking metrics such as the number of genes detected per cell, the total number of reads, and the percentage of reads mapping to the genome. Filtering: Exclude low-quality cells and low-expression genes.
  2. Normalization:Perform normalization to account for variability in sequencing depth and other technical factors. Common normalization methods include library size normalization and methods like TPM (Transcripts Per Million) or scran.
  3. Data Transformation:Consider log-transformation of the data to stabilize variance and make it more amenable to downstream analysis.
  4. Identify Highly Variable Genes (HVGs):Identify genes with high variability across cells. HVGs are often more likely to be biologically relevant.
  5. Dimensionality Reduction:Perform dimensionality reduction techniques such as Principal Component Analysis (PCA) or t-Distributed Stochastic Neighbor Embedding (t-SNE) to reduce the complexity of the data and identify major sources of variation.
  6. Clustering:Cluster the cells to group similar cells together based on gene expression profiles.
  7. Cluster Differential Expression Analysis:For each cluster, perform differential expression analysis to identify genes that are differentially expressed between the two groups. Tools like Seurat, Scanpy, or Monocle can be used for this purpose.
  8. Statistical Testing:Use appropriate statistical tests (e.g., Wilcoxon rank-sum test for non-parametric analysis or t-test for parametric analysis) to determine which genes are significantly differentially expressed between the two groups.
  9. Adjust for Multiple Testing:Correct for multiple testing using methods such as the Benjamini-Hochberg procedure to control the false discovery rate (FDR).
  10. Annotation and Visualization:Annotate the identified DEGs with known gene information and pathway analysis tools to understand their biological significance. Visualize the results using volcano plots, heatmaps, or other visualization tools.
Here's a simplified example using the Seurat package in R:
RCopy code# Install and load necessary libraries install.packages("Seurat") library(Seurat) # Load and preprocess the scRNA-seq data seurat_object <- Read10X(data.dir = "path/to/data") seurat_object <- CreateSeuratObject(counts = seurat_object) # Perform normalization, identification of highly variable genes, and dimensionality reduction seurat_object <- NormalizeData(seurat_object) seurat_object <- FindVariableFeatures(seurat_object) seurat_object <- ScaleData(seurat_object) seurat_object <- RunPCA(seurat_object) # Cluster the cells seurat_object <- FindNeighbors(seurat_object) seurat_object <- FindClusters(seurat_object) # Perform cluster differential expression analysis de_results <- FindMarkers(seurat_object, ident.1 = 1, ident.2 = 2) # Visualize the results DoHeatmap(seurat_object, features = de_results$gene, group.by = "orig.ident")
Note: This is a simplified example, and the actual analysis may vary based on the characteristics of your dataset and the tools you choose. Additionally, always consult the documentation of the specific tools you are using for detailed guidance.
  • asked a question related to RNA-Seq
Question
1 answer
Hi everyone,
Let me introduce my data:
I am analyzing the single-cell RNA seq dataset. I'm gonna find the differential expression (DEGs) from different conditions in each cell type.
In addition, I'm working with Seurat's pipeline. My data is not suitable for pseudo-bulk DEGs analysis, therefore, mixed-model (MAST) is now my choice!
My question is, why do we need to use the normalized data (Scran-normalized, Log-normalized data) as the input of the MAST test, although the MAST test itself has the normalized method for count-depth (by cellular detection rate - CDR)?
Relevant answer
Answer
I just wanna update that I have the answer to this question. Anyone interested in this question can discuss it further.
  • asked a question related to RNA-Seq
Question
1 answer
Dear ResearchGate community,
I am fairly new to RNASeq analysis & wanted to ask for your input regarding accounting for different sequencing depth across my samples. I am aware that there are several normalization techniques (e.g. TMM) for this case, however, some of my samples have considerably higher sequencing depths than others. Specifically, my samples (30) range from 20M to 46M reads/sample in sequencing depth (single-end). Can I still normalize this using the tools provided in the various packages (DESeq2, limma etc) or is it preferable to apply random subsampling of the fastq files prior to alignment (I am using kallisto)?
Many thanks in advance!
Best,
Luise
Relevant answer
Answer
It is preferable to avoid downsampling (randomly reducing the number of reads in some samples).
Normalization techniques like DESeq2, edgeR, and limma are preferred for handling differences in sequencing depth across samples, preserving the maximum amount of information. Downsampling, on the other hand, can be used to equalize sequencing depth across all samples but may reduce statistical power, especially in samples with lower read depths. Downsampling may be considered if computational capacity is limited, data size is significant, or extreme cases have vast differences in read depths. In most cases, normalization methods are preferable.
  • asked a question related to RNA-Seq
Question
1 answer
Dear ResearchGate community,
I have a question regarding the possibility of a batch effect in my single-end bulk RNASeq data set: Some of my samples (10 out of 30) were sequenced 2x due to initial low read count (on two different days, same facility & instruments) and the reads were later concatenated prior to alignment. In your opinion, does this introduce a batch effect which ought to be accounted for?
Many thanks in advance.
Best,
Luise
Relevant answer
Answer
Generally you want to treat all your samples the same way, as minor things can introduce bias (different person handling the samples, different batch of reagents, different flow cell etc). If you have any internal controls, use those for QC, if not, a housekeeping gene to check for possible biases on those 10 samples.
  • asked a question related to RNA-Seq
Question
4 answers
hello please any one experience in this area. i'm getting a very high OD for RNA is it pssoble to still do RNA-sequencing?
Relevant answer
Answer
Thank you Mansoor Hayat for your explanation now i understand better
  • asked a question related to RNA-Seq
Question
1 answer
I am trying to generate single nuclei RNA seq libraries from a non-model organism. We have a genome that it is not well annotated and a bit incomplete. I know that for single cell RNA seq is possible to do the bioinformatics analyses with an assembled transcriptome but not sure whether is the same case for single nuclei RNA seq.
Relevant answer
Answer
If the genome is incomplete or poorly annotated, a transcriptome-based analysis using a custom-assembled transcriptome is preferred. Hybrid approaches that combine information from both genome and transcriptome data may be considered.
  • asked a question related to RNA-Seq
Question
3 answers
I am not a bioinformatician and don't know much about RNA-seq data analysis. I know that Bulk RNA-seq data must be normalized before differential analysis because differences in starting materials for RNA-seq need to be corrected. The starting materials of scRNA-seq are single cells, and the current normalization method is to assume that the total count of all genes in each cell is equal. But in fact, the overall expression level of each cell is not necessarily equal. Therefore, when comparing the expression level of a gene in the same type of cells that have undergone different treatments, wouldn't it be more reflective of the actual level of the gene expression to directly compare the denoised raw data without normalization? For example, by analyzing the sequencing raw data of C. elegans muscle cell at different ages, we can see that the gene Cox-4 is significantly downregulated with age (see the attached picture), but after normalization, it may no longer show such an obvious downregulation. I'm not sure if I'm correct. I'd appreciate it if anyone could answer my question.
Relevant answer
Answer
In single-cell RNA sequencing (scRNA-seq) data analysis, the preprocessing steps, including denoising and normalization, are essential to ensure that the data is suitable for downstream differential expression analysis. While it may seem intuitive to directly use denoised raw data without normalization for differential analysis, this approach is generally not recommended for several reasons:
  1. Library Size Variation: In scRNA-seq data, cells can have different total read counts due to technical variations, cell size differences, or other factors. This variation in library size can lead to biases in differential expression analysis. Normalization is essential to adjust for these differences.
  2. Scaling Factors: Normalization methods, such as library size normalization (e.g., TPM or FPKM) or more specialized methods for scRNA-seq (e.g., scran or scater), take into account the differences in total read counts and apply scaling factors to make the data comparable between cells. This ensures that you are comparing expression levels relative to the total expression in each cell.
  3. Statistical Robustness: Normalization helps to reduce technical variability while preserving biological variability. It ensures that the statistical tests used for differential expression analysis are more robust and reliable. Directly comparing raw counts can lead to spurious results due to technical noise.
  4. Batch Effects: Even if you denoise the data, you may still have batch effects or other sources of systematic variation that need to be corrected through normalization to identify true biological differences.
  5. Biological Interpretation: Normalized data provides more meaningful results for biological interpretation. The goal of differential expression analysis is to identify genes that are differentially expressed in biologically meaningful ways, not just those that have different raw counts.
In your example with C. elegans muscle cells at different ages, it's important to normalize the data to account for differences in library size between cells. This ensures that any observed changes in gene expression are more likely to reflect true biological differences rather than technical artifacts.
That said, you should still perform differential expression analysis on the normalized data. If a gene like Cox-4 is significantly downregulated with age, the normalization should help highlight this effect by reducing the influence of technical factors.
In summary, while it may seem reasonable to directly use denoised raw data for differential analysis, it's generally recommended to perform proper normalization to account for technical variations and ensure the robustness and biological relevance of your results in scRNA-seq data analysis
  • asked a question related to RNA-Seq
Question
1 answer
Hi all,
I was planning to send RNA samples collected from Arabidopsis root tissue for standard RNA sequencing. All of these samples were processed in the same way except in the DNase removal step. I used the RapidOut DNA Removal Kit (Thermo Scientific) according to the manufacturer’s instructions. To remove the enzyme, I used the DNase Removal Reagent (DRR) that came with the kit on 16 of the 24 samples (before realizing that I would not have enough for all the samples). For the remaining 8 samples, I used EDTA at a concentration of ~4.5 mM and heat-inactivated the DNase at 75C for 5 min. For all samples, my RNA concentrations were similar (170-230 ng/ul) and my 260/280 values were greater than 2.0 as measured with a Nanodrop. My 260/230 values were between 1.7-2.0 for the 16 samples processed with the DRR and slightly lower (~1.4 - 1.6) for the samples processed with EDTA + heat-inactivation (which was not surprising, as EDTA absorbs at 230 nm). My question is, will this affect the final outcome of the RNA-Seq in terms of comparing gene expression between samples? Any advice on what to do in this case?
thanks for your scientific insight!
Relevant answer
Answer
General answer: Everything will affect RNA-seq results. More specific answer: compare levels of housekeeping genes between the two methodologies and see if they show any discordance. If not, you are likely good to go with your analysis
  • asked a question related to RNA-Seq
Question
3 answers
Hi!
I have samples with callus remainings and bacteria that have been cultured on a callus solution. Now I want to isolate only the RNA from the bacteria to eventually only get mRNA from the bacteria to be able to RNA-seq. Now the problem is that the remainig callus appears to contaminate/overrule the RNA-seq data for now. So therefore, we would like to isolate the bacterial mRNA and remove all the mammalian parts.
If anyone knows something, let me know! :)
Thanks in advance!
Relevant answer
  • asked a question related to RNA-Seq
Question
2 answers
I am writing to inquire about the low assignment ratio (19%) that I obtained using FeatureCounts in my RNA-seq analysis. I would like to confirm whether this is a normal result, and if possible, request your assistance in identifying possible reasons for this issue. To provide some context, I used HISAT2 to align paired-end stranded RNA-seq reads to the GRCh38 reference genome. The overall alignment rate by HISAT2 was 97%, with a multi-mapping ratio of 22% and a unique mapping rate of 72%. Based on this alignment result, I attempted to use FeatureCounts to obtain read counts from the BAM file generated by HISAT2. However, the successful assignment ratio was only about 19%. Thank you for your time and assistance.
Relevant answer
Answer
samples are the key points in molecular biology and higher technologies. did you check for quality of your samples before going to library preparation. low quality (RIN) will give low target assignment.
you can also check after sequencing quality using multiQC tools but the worst is done.
fred
  • asked a question related to RNA-Seq
Question
3 answers
I am looking for a tool to easily analyze the expression of different genes during the differentiation of mouse or human pluripotent stem cells to different derivatives, something similar to ChIP-atlas but for RNA-seq.
I know there are repositories with RAW RNA-seq data such as GEO or SRA, which sometimes include the tables with analyzed data. However, in many cases this needs some processing, and sometimes you only get the fastq files.
I wonder if there is some database that feeds from GEO/SRA, where I could look for classified experiments, such as "Neural differentiation of human pluripotent stem cells", and where I could easily plot de expression of a GOI in the different conditions/timepoints.
Thanks a lot!
Relevant answer
Answer
Here are some of them:
  • The Gene Expression Omnibus (GEO) database
  • The Sequence Read Archive (SRA) database
  • The Human Induced Pluripotent Stem Cell Initiative (HipSci) project
  • The Human Pluripotent Stem Cell Registry (hPSCreg) database
  • asked a question related to RNA-Seq
Question
1 answer
I need a guidance from my research fellows in Bioinformatics here who are knowing on how to make miRNA sequences from RNA-seq data from NCBI with software such as Geneious Prime for instance as I am a beginner at this (RNA-seq assembly for miRNA). Thank you.
Relevant answer
Answer
Hi Ahmad
before going to the software, be sure the samples were extracted with the right protocol and kits to allow miRNA to be selected or selected in the whole samples.
fred
  • asked a question related to RNA-Seq
Question
13 answers
Recently, I did an experiment where RNA was isolated by TRIZOl reagent.
Finally the concentration of RNA check by nanodrop was around 5-16 microgram/microliter.
Their 260/280 ratios were in the range of 1.98-2.1 and also the 260/230 ratios were in the range of 2.1-2.3.
However, unfortunately the quality analysis by qubit 3 fluorimeter analysis showed RIN values less than 3 and were in the range of 1-1.8. I have to do downstream analysis RNASeq.
Gel was also run but only single band was obtained everytime.
I dont understand when concentrations and ratios were in good range then how can I monitor where I am going wrong. Why it failed the QC by Qubit?
I request to help me findout the reason and trouble shoot the problem..
Thanks
Relevant answer
Answer
@Can
I simply stored the sample in 50 microliter of RNAse free water àt -80.
  • asked a question related to RNA-Seq
Question
7 answers
Hi All
Due to the high cost of RNA-seq per sample. Do you think that it will be correct if I bulk three-four biological replicates and send this bulk for RNA-seq?
Relevant answer
Answer
Dear all,
thanks for the insightful commentaries of my colleagues. However, I beg to differ in some detail as this problem goes a bit deeper. (Just my 0.02$)
1. Throwing three samples into one is, of course, a bad idea because you waste ressources. Why not make three differently barcoded libraries and then send them for sequencing on one lane. You do not loose information and you can always ask for more reads if your sequencing depth is not sufficient. Thus you save in sequencing costs but keep all the options.
2. Do not make technical replicates. If you master the technique they will be +/- identical. If you have technical problems no replicate will help you anyway.
3. If you run biological replicates I wouldn't use the classical R-programs. Most assume that there is a "true" value that you can't measure because of random variation in your method/sample. However that is not exactly what happens in nature. Imagine you derive three transgenic cell lines with an inducible transcription factor to find target genes. Now you compare 3 times TFon/TFoff. You get following values:
TFon TFoff
geneA
sampleA: 1000 100
sampleB: 100 10
sampleC: 10 1
It is clear that geneA is very interesting. However, if you define sampleA,B,C as triplicate most analysis programs will throw this gene out because base-line expression has a higher variation as the overall difference between on/off.
Alas, geneA may be a perfect and important target gene as you do not control the overall concentration of the transcripiton factor in the transgenic cells. So it may be perfectly ok that you see this large variation in base expression. Fold-change is what counts here. In biology a value very frequently depends on more than one factor and not all can be controlled. Classical statistics fails in this cases.
Therefore I'd recommend to run barcoded libraries - evaluate each one individually and look for the intersection of genes that come up as interesting in all three instances. Then follow up on these.
In the end no statistics can replace the good old biological confirmatory experiment anyway (although "wet" biology seems to be out of fashion nowadays).
Good luck with your experiment.
Best
Robert
  • asked a question related to RNA-Seq
Question
2 answers
Currently I'm working on a plant genome annotation work. For my plant, I don't have any RNA sequences or ESTs. Also very little ESTs and RNA sequence data from closely related species. Is it okay to use both closely related and somewhat distantly related species EST data and RNA seq data for gene prediction?
If not, is there any options I can follow instead of above method?
(I'm planning to use proteins sequences as well along with ESTs and RNA seqs).
Relevant answer
Answer
@ Kangkon Saikia Thanks so much for your answer. Yes, I'm using protein sequences as well. It will be really helpful if you can explain this part in your answer a bit more. "For this you have to perform gene prediction first followed by annotation of the CDS/transcripts."
Thanks again!
  • asked a question related to RNA-Seq
Question
1 answer
I am working on a multi-site clinical study and one of our study sites is in India. Blood samples from participants have been collected and stored at -80 in PAXgene blood RNA tubes. The study is nearing completion and I need to arrange for the blood samples (~1000 samples) to be shipped somewhere where RNA extraction can be performed and then a subset will need to undergo RNA-seq. Most of the companies who do RNA-seq will only accept RNA and they do not provide RNA extraction as a service so I will probably need to get the extraction done separately with a different service provider.
Relevant answer
Answer
you can reach out to vikram@aiims.edu , they have bsl3 facilty.
  • asked a question related to RNA-Seq
Question
1 answer
Dear all,
I am wondering what the state of the art is for detecting m1Ψ in RNA sequences and how it can be differentiated from U. I am thinking of just using RNA-seq and then a separate assay to determine m1Ψ concentration, but ideally I would like to know if an individual base is m1Ψ or U.
Thank you!
Relevant answer
Answer
Oxford Nanopore announced that they are training their base caller to recognise m1Ψ in direct RNA sequencing data at their London Calling 2023 conference. I am unsure of its current availability. You may want to ask on the Oxford Nanopore sequencing Community page for the latest on this topic.
  • asked a question related to RNA-Seq
Question
3 answers
Hi - I'm currently working with two RNA-Seq studies; one has RNA extracted from whole blood, the other PBMCs. Eventually we want to combine these data and perform some cell-specific deconvolution to look at DEGs.
Are there any recommended methods for batch correcting these data from different sources?
Mari
Relevant answer
Answer
It is better to consider batch as a factor in the design formula. The tximport pipeline proposed by Michael Love himself offers the most useful solution. Please have a look.
  • asked a question related to RNA-Seq
Question
5 answers
I have multiple sets of RNA-seq data and I want to compare gene expression between control and treated groups. My interest extends beyond differentially expressed genes; I also want to identify non-differentially expressed genes. I understand that Log2FoldChange and p-adj are commonly used to define differentially expressed genes. Alternatively, genes that fail to meet the criteria for differential expression are considered non-differentially expressed.
However, classifying a gene as non-differentially expressed does not definitively indicate that the RNA-seq data confidently establishes the absence of changes in gene expression. For instance, this could be attributed to substantial within-group variation or low gene counts that hinder unambiguous determination of expression levels. So, how can I effectively distinguish truly non-differentially expressed genes from those exhibiting significant within-group variation or yielding very low counts? Are there any software packages available for this purpose? Alternatively, are there established statistical methods or standards that can guide me in this regard?
Relevant answer
Answer
Chat-GPT really is a jabber-box. Context must be critically reworked by a person knowing the topic. Answers like the one posted by Rana above are not only not useful but also potentially misleading, not to say wrong. But most relevantly, it has nothing to do with a scientific interaction of researchers.
  • asked a question related to RNA-Seq
Question
2 answers
I have a RNA-seq dataset without controls (or you can consider them all controls) and I am interested in an unsupervised ranking or clustering of samples with regard to how they are expressing pathways of interest. I am looking to stratify samples in terms of their pathway activity for specific pathways from the PROGENy resource: https://saezlab.github.io/progeny/ . Would you have any recommendations for how to run this analysis?
Relevant answer
Answer
Hi Jan,
You can try PROGENY scores directly for pathways. You can get a pathway activity scores here : https://bioconductor.org/packages/release/bioc/html/progeny.html
You can also consider Gene Set Variance Analysis. https://bioconductor.org/packages/devel/bioc/vignettes/GSVA/inst/doc/GSVA.html This is can be used as a one sample GSEA and you can analyse how the samples are different in your favourite pathways (gene sets) instead of genes. You can find biological outliers or the response to each one compound.
  • asked a question related to RNA-Seq
Question
2 answers
Hi everyone,
I’m using RNAeasy plant mini kit of Qiagen in Arabidopsis. My concentration is good and also 260/289 value, but I’m having troubles with 260/230 in some of my samples I have 0,78 and 1,58. What may I do to improve my 260/230? these samples are to send to RNA-Seq.
Relevant answer
Answer
A simple sodium acetate precipitation of the RNA will clean up your A260/230 ratios.  Here is the method I use:
- Add ultra pure water make up RNA samples to 100ul in total
- Add 300ul 100% EtOH & 10ul 3M Sodium Acetate per sample
- Vortex + incubate overnight @ -20oC
- Centrifuge full speed @ 4oC for 30mins
- Pipette off supernatant.
- Add 500ul 70% EtOH + mix
- Centrifuge full speed @ 4oC for 15-20mins
- Pipette off supernatant & air dry for at least 15 mins [check that no EtOH remaining]
- Resuspend in 30ul or 12ul (depending on RNA concentration)
- Enjoy good A260/230 values!
Give it a go, it works really well.  Just remember the RNA pellets will be difficult to see (pellet paint can help).
  • asked a question related to RNA-Seq
Question
3 answers
Hi All,
I have the results of both qPCR CT values and RNA-seq TPM values. Now that I have 2 sets of data, is it proper to compare expression fold change (2^ of delta delta CT) with log2 of TPM values?
Thanks in advance,
Selim Rozyyev
#qPCR, #RNA-sequence analysis, #TPM.
Relevant answer
Answer
qPCR is considered the more sensitive and direct analysis. You use the RNA-seq to get a list of "interesting candidate genes" to then investigate using qPCR. You don't need to do any direct comparison to the original RNAseq data.
You can't know that the high TPM from RNA seq will also have high expression during qPCR. That's why you do the experiment.
  • asked a question related to RNA-Seq
Question
4 answers
My question is how I can increase the quantity of RNA isolated from cells in wound edge in this case from keratinocytes? I have done the experiment and still the quantity of isolated RNA is low for RNA seq. does anyone have experience who could help me?
Best
Relevant answer
Answer
Yes with the kit, Agilent, and or Qiagen,
No, kit is ok
  • asked a question related to RNA-Seq
Question
2 answers
I have RNAseq data made with selection by PolyA and other RNA-seq Total data. I want to join this data to increase the sampling within some subtypes I have few samples.
How can I normalize these two dates in just one? Is there any method or process that makes this joining of RNA-seq PolyA and Total possible?
I looked for this information in many articles that work with multiple types of data, but they don´t detail how they did.
Thanks
Relevant answer
Answer
Normalizing RNA-seq data generated from PolyA and Total RNA can be challenging, as they have different properties that can affect their library preparation and sequencing, resulting in differences in the sequencing depth and composition of the resulting reads.
One common method for normalizing RNA-seq data is to use the reads per kilobase per million mapped reads (RPKM) or transcripts per million (TPM) approach. These methods normalize for sequencing depth and gene length and provide a measure of the expression level of each gene relative to other genes within the same sample. However, RPKM and TPM normalization may not be suitable for comparing expression levels across samples with different RNA types, as they assume that the total transcriptome is being sequenced uniformly, which may not be true in the case of Total RNA.
Another method that can be used for normalizing RNA-seq data is to use the trimmed mean of M values (TMM) normalization approach, which is based on the assumption that most genes are not differentially expressed across the samples being compared. TMM normalization estimates a scaling factor for each sample based on the relative expression of a set of housekeeping genes, which are assumed to be stably expressed across all samples.
A more advanced method for normalizing RNA-seq data is to use a normalization technique that takes into account the differences between the RNA types, such as the "DESeq2" package in R, which uses a generalized linear model to estimate size factors that adjust for the differences in sequencing depth and composition between PolyA and Total RNA samples.
In summary, while RPKM or TPM normalization can be used for normalizing RNA-seq data generated from PolyA and Total RNA, it may not be suitable for comparing expression levels across different RNA types. TMM normalization can be used as a general normalization method, while advanced techniques such as DESeq2 can be used to specifically address the normalization of RNA-seq data generated from different RNA types.
  • asked a question related to RNA-Seq
Question
2 answers
I am trying to use the Ovation RNA seq V2 kit from Nugen or Tecan to amplify low-input RNA (about 1ng). (https://lifesciences.tecan.com/ovation-low-input-rna-seq-kit-v2?p=tab--1).
The Protocol and workflow of the SPIA amplification process show linear amplification of cDNA, which is single-stranded (As per the schematic provided by them). However, the kit claims it generates double-stranded cDNA, which is not shown in the figure or explained how it is done!!!
Can somebody please help me understand this?
How does the Ovation RNA seq v2 kit generate double-stranded cDNA instead of single-stranded cDNA??!!
Nugen/ tecan suggests using their library preparation kits (for Illumina) which can utilize only dsDNA templates but not single-stranded.
They seem to work well, but I don't understand how the dsDNA is generated in the amplification step (RNA Seq V2 kit).
Please help.
Relevant answer
Answer
Thank you, Wilhelm,
That still did not address the main question, though. Please check again.
After 2nd strand synthesis step, there is an SPIA amplification step, SPIA primer can bind to only the second strand and amplify only that strand. The first strand does not have the SPIA primer binding site, so it does not get amplified.
This may lead to a high quantity of single-stranded cDNA (due to SPIA amplification) and very low double-stranded cDNA if at all there is any.
So the question is, How did the manufacturer claim that the product will be double-stranded cDNA? whereas whatever is shown in the diagram, can generate only one strand.
  • asked a question related to RNA-Seq
Question
2 answers
I've generated an enhancer knock-out in mice, which I then did qPCR of the gene it potentially regulates and seen it is 25% down-regulated in the homozygous knock-out. So I decided to do RNA Seq to analyze other genes, but in the RNA Seq data the gene isn's differentially expressed. I don't understand why, and which technique should I believe. The samples used for the qPCR and RNA Seq are not the same, but are the same genotype, same age, same tissue. The only technical difference is that I did Trizol extraction for the qPCR, but for the RNA Seq I did ARN column extraction.
Relevant answer
Answer
as Péter Gyarmati said the extraction method might introduce an important bias in the quantification results. How did you define "differentially expressed genes"? 25% of down expression will not pass the fold-change threshold in RNA-Seq analysis. In future experiments, it could be a good idea to add a spike in RNA-Seq samples to compare the results with RT-qPCR
  • asked a question related to RNA-Seq
Question
4 answers
I want to ask a technical question, I have treated and non-treated Male and Female RNA seq samples, and I want to do sex-biased gene expression. My concern is that should
I compare the male vs female samples or male vs normal and female vs normal when analyzed for sex-biased gene expression using Dseq2
Relevant answer
Answer
Honestly, DESeq2 takes almost no time to run: it's by far the easiest part of the entire pipeline.
So...do it all. All comparisons.
Male treated vs male untreated -> effects of treatment on males specifically
Female treated vs female untreated -> effects of treatment on females specifically
Male and female treated vs male and female untreated -> sex-agnostic effects of treatment
Male untreated vs female untreated -> sex-specific differences under normal conditions
Male treated vs female treated -> sex-specific differences under treated conditions
male (all) vs female (all) -> treatment-agnostic effect of sex
What you'd hope is that the same genes would crop up under the same sort of comparisons (i.e. the best hits for male treated vs male untreated would be the same as those for female treated vs female untreated), and any genes that bucked the trend by being highly altered in males but not females (or vice versa) would also pop out of the other datasets, allowing you to determine whether the differences reflect sex alone, or are a consequence of sex + treatment.
RNAseq data always gives you far, far more candidates than you can realistically handle, so you can be quite aggressive in your culling of "interesting but not that interesting" candidates, but making multiple comparisons in this manner allows you to interpret your results with more nuance. And it's really quick and easy to do.
  • asked a question related to RNA-Seq
Question
3 answers
RNA-seq analysis
Relevant answer
Answer
  • asked a question related to RNA-Seq
Question
1 answer
Hi everyone,
please bear with me, because I am a complete beginner with regard to any form of bioinformatics and I am trying to understand the best approach to my experiment.
I am currently trying to isolate cells and sequence them for further bioinformatic analysis, more precisely RNA-Sequencing.
We have, however, had issues with purity and while some samples we looked at reached a purity of >90% after isolation (we usually validate it by use of flow cytometry), some samples of different animal genotypes did not.
This leads me to my first question:
How important is cell purity for Bulk RNA-Seq?
Which purity should be reached for and adequate, realiable analysis?
If anyone has any recommendations for papers to look into regarding that subject, I would be most grateful, because I have no idea where to start and what to consider.
Further along in the story we surmised that maybe Single Cell RNA Sequencing might be the better option in cases of lower purity.
But again, the same question arose: how relevant is cell purity for the following analysis and is there a cut-off value not to be crossed?
Finally:
How advantegeous would using both methods be?
Sure, Bulk gives a better general overview and Single Cell is more precise, but do they complement each other or is it essentially redundant information gained by doing both experiments?
And are there any disadvantages to using only SC or do both methods completement each other when low purity levels are in the question?
Thank you a lot in advance!!
Relevant answer
Answer
Welcome to RNA-seq! It's a crazy and wild world. You will find that responses will depend a great deal on what you're aiming to achieve. So take my responses with this in mind...it depends!
Get as close to 100% as possible otherwise you'll be having to perform a set of validation experiments to ensure that any interesting findings are due to changes in your cell type of interest and not in a "contaminating" cell type. Single cell RNA-seq may be suitable here since you'll be able to get some cell type resolution and identify the populations in which the change is occurring, of course you'll still need to validate. The strenght of scRNA-seq is that you don't need to purify/enrich your population since these get resolved as part of the procedure/analysis. However, the a drawback with scRNA-seq is that you will loose a lot of low abundant transcripts, "dropout" is also a major issue. So, if you're comfortable loosing some info on potentially valuable transcripts then scRNA-seq may be the way to go. They do potentially complement each other especially because with bulk, you may get data about low expressed transcripts. But a big caveat, it all depends! You may consider identifying and collaborating with someone with expertise in RNA-seq (sample prep and data analysis) at your local institution.
Which papers? It depends. Start with papers that are answering a similar question to yours, then dig into what would be best for you study. You can consider reaching out to representative of companies like 10X Genomics and Miltenyi Biotec...that's also a good starting point. Good luck!
  • asked a question related to RNA-Seq
Question
2 answers
Please share the protocol
Relevant answer
Answer
Just to confirm before sending it for library prep.
  • asked a question related to RNA-Seq
Question
4 answers
Hello everyone!
I have interesting question asked by my professor and I could not find relevant answer anywhere.
Why are we seeing up and down pattern on transcript abundance? Example RNA seq data for a gene from a rice transcriptome data base is attached. LOCUS ID is highlighted in yellow and transcript abundance is in below three samples after drought treatment.
The question is ,why the signal level is not uniform on Exons? is it low signal reads? Why there are gaps or sudden fall in signals? ( which are Marked in Red arrows) How to read and understand this? and I know this is the common pattern in RNA-seq data, but I don’t know why? It’s an interesting question asked by my professor! can any bioinformatician help me understand this? Thanks in advance.
Relevant answer
Answer
I am not an expert in this by any means, but I have read a lot and have seen this type of data and interpreted it before as well. I can give you what I know from my experience and others may chime in.
It is a read out of transcripts that correspond to that particular site. it might be referred to as base resolution expression of the particular sequence. Essentially the higher number of transcripts that coincide with that particular sequence the higher the score. It could be areas that were difficult to resolve due to all kinds of aspects. 1) Sequence has a lot of repeats if that was the case you would see the same resolution in the other two samples but that does not seem to be the case. These areas might be resolved better if you increase the read depth of the study.
2) It may be more suggestive of a difficulty to read them. These results may be affected by post modification of the RNA as well.
This paper describes this is clinical samples but that does not restrict the affect only to humans post-modification
Sci Adv. 2021 Aug; 7(32): eabd2605.
Published online 2021 Aug 4. doi: 10.1126/sciadv.abd2605
PMCID: PMC8336963
PMID: 34348892
Judging by the fact the title on the samples say drought I might think a more epigenetic effect (Post-modification of the RNA maybe due to stress)
  • asked a question related to RNA-Seq
Question
4 answers
Current research often uses new, next-generation, "flashy" experimental techniques (i.e. single-cell RNA-seq) that have replaced some of the older, smaller, yet fundamental experimental techniques. Many of these new-age techniques seem overused and expensive when older techniques could be an adequate replacement. What are some good examples of these new "answer-all" techniques and how were these techniques done previously with smaller, fundamental techniques?
Relevant answer
Answer
I nominate DNA methylation clocks to measure aging. The problem is not that it's expensive, but that they don't work in the context of intervention. People need to double-check on whatever their methylation clock is saying with a good old life span study. But nobody has time for that anymore, and if they did it, they wouldn't like the result. Instead, they just take the clock on faith.
  • asked a question related to RNA-Seq
Question
3 answers
We want to perform a human RNA extraction from cell culture for an RNA-seq, but we have a viral RNA extraction kit (Quick-RNA™ Viral Kit-Zymo research) available. Therefore, we want to know if any methodological issues can interfere with the results if we use the viral kit.
Relevant answer
Answer
It is generally not recommended to use a viral RNA extraction kit for the isolation of human RNA, as the kit is specifically designed for the isolation of viral RNA and may not efficiently extract human RNA. Additionally, the reagents and protocols used in the kit may not be optimized for the isolation of human RNA and may lead to poor quality or quantity of RNA. It is recommended to use a kit specifically designed for the isolation of human RNA, such as the Quick-RNA™ MicroPrep Kit-Zymo Research.
  • asked a question related to RNA-Seq
Question
1 answer
I am doing total RNA extraction from PAXgene blood RNA tubes (6.9 ml of storage buffer + 2.5ml of collected blood in each tube) using the PAXgene blood RNA kit. I just want to extract the total RNA from a portion of the blood sample (around 4.5ml of the above combination) collected in PAXgene blood collection tubes. is there anyone who extracted total RNA from PAXgene blood RNA tubes? I will be glad if anyone has an answer for it.
Relevant answer
Answer
Yes, it is possible to extract total RNA from PAXgene blood RNA tubes using the PAXgene blood RNA kit. The PAXgene blood RNA kit is specifically designed for extracting RNA from blood samples that have been collected and stored in PAXgene blood collection tubes.
To extract total RNA from a portion of the blood sample collected in a PAXgene blood RNA tube, you will need to follow the manufacturer's instructions for using the PAXgene blood RNA kit. This typically involves the following steps:
  1. Thaw the PAXgene blood RNA tube: The first step is to thaw the PAXgene blood RNA tube. This should be done slowly at room temperature or in a water bath at 37°C.
  2. Remove the desired volume of blood: Once the PAXgene blood RNA tube is thawed, you can remove the desired volume of blood. For example, if you want to extract total RNA from a 4.5 ml portion of the blood sample, you can use a pipette to remove this volume of blood from the tube.
  3. Prepare the lysis buffer: Next, you will need to prepare the lysis buffer according to the manufacturer's instructions. The lysis buffer is used to break open the cells and release the RNA.
  4. Add the lysis buffer to the blood sample: Once the lysis buffer is prepared, you can add it to the blood sample that you removed from the PAXgene blood RNA tube. Mix the lysis buffer and the blood sample thoroughly to ensure that the cells are evenly lysed.
  5. Purify the RNA: The final step is to purify the RNA from the lysed cells using the PAXgene blood RNA kit. This typically involves a series of steps such as centrifugation, filtration, and precipitation to remove contaminants and purify the RNA.
It is worth noting that the specific procedures and reagents used in the PAXgene blood RNA kit may vary depending on the specific kit that you are using. Therefore,
  • asked a question related to RNA-Seq
Question
4 answers
I work in the cancer research field and human disorders by using the bioinformatics approach. These projects contain the analysis of transcriptomic data such as microarray, RNA-seq analysis, TCGA, systems biology analysis, survival analysis and etc. also, the metagenomic analysis in microbiome fired are conducted. Those interested in participating in analyses and writing articles are invited to send their CV to the email below.
Relevant answer
Answer
How does your institute financially support the applicants?
  • asked a question related to RNA-Seq
Question
3 answers
What is the difference between Hiseq and Novaseq RNA-seq data and how to analyze them together?
Relevant answer
Answer
Well there is no method or procedure which can be universally used to normalize the data as you are looking for.
Normalization is done for different samples/conditions within the data and not across or between multiple datasets. And that is what the paper mentioned above is dealing with. Normalization methodology for a data set.
Further, its not just the data sets are different, they are generated by different platform. Thus, in addition to the batch effect, there can be several other factors which should also be considered before normalization.
  • asked a question related to RNA-Seq
Question
5 answers
Hello everyone, I am checking the quality of some RNA-seq data with FASTQC and I am getting results that are not clear to me. Is this kind of result normal?
Relevant answer
Answer
The plot shows that the average quality per base alongside your 150 pb reads is very high which is good. This result is kind of normal when the sequencing has been outsourced. Most companies will give you prefiltered fastq files containing only reads with high quality. You can ask your sequencing provider if that was the case, although sometimes you can also find that info in the report they send together with your fastq files.
  • asked a question related to RNA-Seq
Question
8 answers
Hello everyone,
I am looking for an open dataset to verify the results obtained by analyzing the total RNA-seq of patients in the TARGET-AML project (GEO search was not successful).
The dataset should include:
1) bone marrow RNA-seq of pediatric patients with non-relapsed AML;
2)bone marrow RNA-seq of pediatric patients with relapsed AML (primary tumor BEFORE relapse);
3)clinical data (relapse-free time, for example) - optional.
If you know where to find a dataset like this, I would be very thankful.
  • asked a question related to RNA-Seq
Question
3 answers
While trimming the adaptor and low quality RNA-Seq illumina paired end reads in Trimmomatic, I have got more Forward only survive of about 40 to 50%. This study is for estimate the transcript abundance (DEG) at various condition. How is the possibility to continue further...
1. USE singleton reads (R1-For only)
or
2. Only use both paired (survive) high quality reads (50% of the reads)
Any suggestion, Thanks in Advance
by, Ellango R.
Relevant answer
Answer
Thanks Abhijeet & Sonja.
After I deeply check the data. Forward has more over represented sequence of Illumina nextra adaptor/Index and Reverse has highly over represented by Poly (G) sequence. Finally we dropped this data and asked the vendor to redo the sequence.
  • asked a question related to RNA-Seq
Question
7 answers
I've run RNAseq and qPCR on a set of genes, and while the log2 expression values are consistent between the tests for most of the genes in the set, there are a handful that appear to be unregulated according to the RNAseq results and down regulated according to the qPCR results (and vice versa). Is there any possible reason that could explain this, other than just human error?
Relevant answer
Answer
In NGS, read counts have somehow to be mapped to genes. Reads are not always matching to only a unique gene, so it partially depends on the mapping algorithm how many "reads" are attributed to a gene.
Further, genes may be expressed in different variants, and the presence or abscence of an exon may impact the number of reads attributed to that gene.
For genes that are not regulated very strongly, the method used for normalization can impact the eventually observed direction of regulation.
qPCR is sensitive to the assay performance and the stability of the chosen reference genes. If amplification efficiencies are not ideal, results may change depending on the absolute Ct values of the gene. However, these are eventually "human errors" of not properly validating the qPCR assays and reference genes.
These are possible reasons that come into my mind. There may be other reasons.
  • asked a question related to RNA-Seq
Question
1 answer
Hello,
If you could help me to do the R2 plot value for RNA-seq, that would be greatly appreciated.?
I have a data number, but I don't know how to plot it (I work in yeast)
Thanks
Relevant answer
Answer
U can mail me here nkrdas2@gmail.com
  • asked a question related to RNA-Seq
Question
4 answers
Hello everybody,
I am planning a RNA-seq experiment on neonatal rats ventricular myocytes cells (NRVMs), and I was wondering how many cells do I need to have per sample to extract a sufficient amount of RNA. I need 1ug of total RNA.
Thanks in advance!
Relevant answer
Answer
depending on cell types, but average amount of RNA is 2-10pg by cell...
all the best
fred
  • asked a question related to RNA-Seq
Question
5 answers
Hallo everyone,
I've been having some trouble isolating bacterial RNA from a gram positive organism for a RNA Seq analysis. My problem is that I always get a very intense "cloud band" on the agarose gel around the position where the 5S RNA band should be.. I've tried several protocols and kits, with and without bead beating, Trizol, Lysozyme, but it happens every time.. The first idea was that these are products of degradation, but then again the intensity of the 23S and the 16S bands clearly remains very high. And also, on a Bioanalyzer this 5S band definitely does not look like degradation, but rather as a sharp peak around 127 nt.. Does anyone have any experience with that? If this is in fact the 5S rRNA, why do I get such accumulation, how should I get rid of it and would it temper with my RNA Seq results?
Thank you all in advance!
Relevant answer
Answer
Hello Antony, which protocols have you used? In all of them you get the same result shown in the photo?
  • asked a question related to RNA-Seq
Question
1 answer
Please tell me any open human databases with RNA sequencing and full genome/exome sequencing other than 1000 genomes. Preferably a healthy sample (not cancer patients).
Relevant answer
Answer
GTEX, gnomAD, COSMIC, TCGA, Protein Atlas, ensemble
  • asked a question related to RNA-Seq
Question
6 answers
-RNA seq and bioinformatics were carried out by professionals.
- Gene in question shows ~700 fold differential regulation by qPCR in multiple independent cohort of experiments - not in RNA seq.
Please advise....
Relevant answer
Answer
The large fold-change indicates that the gene is likely not expressed to a high level. Under control conditions the experession can be almost zero, and a slightly larger expression under treatment conditions will result in a very large fold-change.
Low-expressed genes give only low or no counts in RNA seq. It might be that genes with no or very few counts are filtered out from the analysis, because the counts are not reliable. If the gene is not at all detected under control conditions (0 counts in all control samples), it is not possible to calculate any (finite) fold-change at all.
  • asked a question related to RNA-Seq
Question
1 answer
A project in my lab involves single cell RNA-seq data analysis of mouse whole lung samples. However, when we analyzed and clustered the dataset and searched for markers to annotate the group, there appears to be few to no cells exhibiting the classical epithelial cell markers EPCAM and CDH1.
Each sample has around 8000 cells after filtering for mitochondrial content, and the overall quality seems fine. But over half of the cells appear to be immune cells and there were less than 100 epithelial cells for each sample.
The mouse models are established by a collaborating group, and whole lung samples are sent to a sequencing company (travel time about 3 hours minimum) to generate the scRNA data. Our collaborators have adjusted experimental protocols multiple times to increase cell viability (~85%), but we are having difficulty fixing this lack of epithelial cells.
Does anyone have some experience with this, or know why there would be so few epithelial cells in scRNA-seq data of mouse lung samples?
Both my lab and our collaborators are fairly new to handling scRNA-seq data, so any insight would be helpful.
Relevant answer
Answer
Hi Yeogha,
Firstly, what proportion of your cells do you expect to be epithelial? From my understanding of human lungs, a large proportion of the cell types are mesenchymal, immune, or endothelial - so your results might not be that unreasonable.
Secondly, have you tried any of the canonical markers of the epithelial subtypes? p63, KRT5, MUC5AC, SCGB1A1, FOXJ1 for example?
In saying that, I know very little about scRNA-seq too, so I'm sure someone else will have a much more sensible answer!
All the best,
Sam
  • asked a question related to RNA-Seq
Question
2 answers
I've done RNA-seq analysis on a dataset downloaded from GEO looking at immune gene expression in Asthmatic, COPD and normal epithelial lung cells. Trying to do a t-test for my statistical analysis, but I need to group my data into Asthmatic, Healthy and COPD samples/cells as it doesn't show up in R which samples belong to which group?
Relevant answer
Answer
Hi,
you need to compare the disease versus control samples when doing the statistical test. I didn't fully get if the problem is that you lack the information of which samples belongs to which category or it is a coding problem. As for the former, usually datasets have a metadata file in which the sample names you find in the gene expression table are present, and the treatment information is included. If it is a coding problem, you can index the sample names to divide the data into Asthmatic, Healthy and COPD.
  • asked a question related to RNA-Seq
Question
3 answers
I need to ship some RNA samples overseas for RNA-seq.
I saw a paper that compares lyophilized RNA and non-lyophilized RNA. Also, I found a protocol that dries RNA with lithum + ethanol and ships at RT.
Has anyone ever done it? Did it work?
Thanks a lot
Relevant answer
Answer
You can store RNA at room temperature as an ethanol precipitate. If you have RNA in solution, take some known amount - say 10-20 ug- and add 1/10 vol. NaAcetate and 2.5 volumes ethanol. The precipitate is stable and can be mailed. Upon receiving, spin down in a microcentrifuge and resuspend the RNA pellet in a specific volume of dH2O or whatever buffer you need.
  • asked a question related to RNA-Seq
Question
3 answers
I need tutorial to analyze RNA Seq data in ubuntu Linux and R, and IGV.
I can't run the commands in Ubuntu Linux for alignment of data and mapping of reads. I need the tutorial for running commands in Ubuntu for merge, sort and index my data, and have to use Sam tools, Bam tools, and Bed tools., but I can't run the commands. And also need to RNA Seq data analyzing in R and IGV as well.
Thank you
Relevant answer
Answer
Hmm, I'm not entirely sure how to run that myself, but since I lack coding experience I've been using the LatchBio platform. Through their RNA-seq and DeSeq2 workflows, I've analyzed and visualized a lot of the data at my lab. Hope that helps!
  • asked a question related to RNA-Seq
Question
4 answers
I want to perform a phylotranscriptome analysis. For that I have downloaded RNA-seq data from SRA. In galaxy I have trimmed my data, then did assembly through trinity to get contigs from reads. Now what should I do, I need the complete transcriptome to put it into the MEGA for MSA and subsequent analysis.
Please help.
Thanks in Advance.
Sincerely-
Sunzid.
Relevant answer
Answer
Sheikk I share you this guide:
Is under development, but I consider that is pretty good.
Also I think you must check the quality of your assembly, and in the guide that I passed you is very clear, but you can see an expanded explanation in Trinity's Github:
  • asked a question related to RNA-Seq
Question
3 answers
I performed RNA-seq and scRNA-seq on the same set of samples but the log2 fold change values are in very different range and I am not sure we can normalize it in any way. Please let me know if someone has performed a similar analysis.
Relevant answer
Answer
Hi Abhay,
You could try to 'pseudo-bulk' your colleague's scRNA-seq data and then compare that to your own.
Here, the expression values of all profiled cells in a single-cell experiment are bulked together for each replicate. This then can be processed using the same pipeline as your bulk data giving you a much more direct comparison.
One would expect concordant fold-changes to validate underlying biology (ko effect) but I wouldn't expect results to be identical though... Some differences would arise from different batches (replicates), library chemistry, sequencing runs, etc.
Also, if you still have some cells available you could run some qPCR on a few selected genes to tell you which dataset is better to trust.
  • asked a question related to RNA-Seq
Question
6 answers
I have a good R and statistical analysis background (also with machine learning). in addition, I'm a fresh biotechnology grad. I would like to try to replicate some Rna-seq analysis using R papers (with their provided data). Any SHORT (beginner-friendly) papers to recommend?
Relevant answer
Answer
I found this article quite helpful and self-explanatory for RNA seq analysis.
  • asked a question related to RNA-Seq
Question
11 answers
there are diffirent program such as Rstudio, python,... for RNA-seq data analysis, according to your knowledge and experiences whic one is better and more comprehensive? and is there another program??
Relevant answer
Answer
  • asked a question related to RNA-Seq
Question
9 answers
Hi all,
I have RNA seq data and I wanted to check the relative expression of selected targets based on RNA seq data. To validate this I have isolated RNA from a separate cohort and run the qPCR. However, the trend of my qPCR data is completely against the RNA seq data. The genes which are up-regulated in RNA seq is down-regulated in qPCR and vice versa. I do not know whether I am missing any variable here.
I know I have not written in detail but will be happy to discuss more if need any information.
Thanks
Relevant answer
Answer
What was the N for your RNA seq, and qPCR analysis? Surprisingly, the trend is the opposite, which shouldn't be the case, unless the primer isn't specific and amplifies some other segment.
  • asked a question related to RNA-Seq
Question
2 answers
I'm uploading RNA-seq data on NCBI. I have successfully done step one but in the second step there is an error occurring during data submission process. kindly guide me in details if someone know well. thanks for your cooperation
Relevant answer
Answer
thanks Erik
  • asked a question related to RNA-Seq
Question
2 answers
I'm looking for any publicly available RNA-seq data sets related to all sub-types of breast cancers to presearch for thesis project, thank you all...
Relevant answer
Answer
Thanks Saubhik Sengupta...
  • asked a question related to RNA-Seq
Question
5 answers
Can the reads from multiple samples be aligned to the reference genome at once via the HiSat2 tool in RNA-Seq data analysis? Or should I run HiSat2 on each sample individually and then somehow combine them later?
Relevant answer
Answer
Run HiSat2 on each sample individually and then combine them using samtools
  • asked a question related to RNA-Seq
Question
4 answers
Hi,
I am working on a gene cluster from an amycolatopsis strain that supposedly produces a glycopeptide antibiotic - its a silent gene cluster at the minute.
I have sent it for RNA sequencing and the cluster is highly expressed, but there was no glycopeptide produced (checked via MS)
Any ideas as to why?
Thanks
Relevant answer
Answer
Katie A S Burnette I used 3 separate cultures. The gene cluster was expressed in 2/3. But nothing was detected via MS. And yeah I have tested standards and limit of detection on the MS prior to this experiment
  • asked a question related to RNA-Seq
Question
5 answers
I'm in the initial stages of planning a miRNA seq experiment using human cultured cells and decided on TRIzol extraction, Truseq small RNA prep kit, using an illumina HiSeq2500. The illumina webinar suggests 10-20 Million reads for discovery, the QandA support page suggests 2-5M, and I wrote the tech support to ask, who suggested I do up to 100M reads for rare transcripts. Exiqon guide to miRNA discovery manual says there is not really any benefit on going over 5M reads. I was hoping to save money by pooling more samples in a lane, so I was hoping someone with experience might be able to suggest a suitable number of reads.
Relevant answer
Answer
i am working on cardiomyopathy patients Blood samples . and wanted to do miRNA sequencing can some one please suggest how many millions reads i need to sequence 20 millions or 30 millions and also please suggest the platform as well .
  • asked a question related to RNA-Seq
Question
6 answers
I am studying a protein and from imaging I can see that my protein is recruited to sites of DNA damage. I wish to UV irradiate HEK293 cells in culture prior to collection and analyses by RNA-seq and mass spectrometer. Does anyone have an idea of how to (protocol and instrumentation) UV irradiate cells in culture for such studies? 
Relevant answer
Answer
How did this turn out? Curious to know if the UV light or hydrogen peroxide worked for causing DNA damage without killing Hek293 cells.
  • asked a question related to RNA-Seq
Question
1 answer
I have an analyzed RNA seq data set. The analysis part including differential gene expression, clustering analysis and enrichment analysis has been done. I am aware that the bioinformatic part is done and most of the analysis part is also done. Could someone please guide on how to extract the biological relevance from the data set. What should be the starting point for working with this data? Should I start by looking at the differentially expressed genes in different comparisons or start from the cluster analysis and try to look for the genes.
Relevant answer
Answer
In recent years, RNA sequencing (in short RNA-Seq) has become a very widely used technology to analyze the continuously changing cellular transcriptome, i.e. the set of all RNA molecules in one cell or a population of cells. One of the most common aims of RNA-Seq is the profiling of gene expression by identifying genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This tutorial demonstrates a computational workflow for the detection of DE genes and pathways from RNA-Seq data by providing a complete analysis of an RNA-Seq experiment profiling Drosophila cells after the depletion of a regulatory gene.
Regards,
Shafagat
  • asked a question related to RNA-Seq
Question
4 answers
Hello everyone!
I have RNA seq dataset for two groups knockout and wild types of mice samples. I have the normalized values in terms of quant all datasets. Please guide me how to perform PCA on the normalized values. I am not a bioinformatician, kindly suggest non-coding methods.
Thanks in advance!
Relevant answer
Answer
Hi! I recommend using LatchBio for RNA seq data. I've used it several times, and it is super easy to use since it has a non-coding interface.
Hope that helps!
  • asked a question related to RNA-Seq
Question
3 answers
We plan to send total RNA samples from fish tissues for RNA-seq analysis. The total RNA samples will be TRIzol-extracted, DNase-treated, and cleaned using Zymo RNA Clean & Concentrator-5. For previous transcriptome profiling studies, we cleaned the total RNA samples using QIAGEN kits, so this would be our first time with the Zymo kit. The manufacturer states, "RNA is ready for all downstream applications including Next-Gen Sequencing, RT-qPCR, hybridization, etc."
Please let me know if you have any experience with Zymo-prepped RNA samples used for RNA-seq. Any feedback will be greatly appreciated.
Relevant answer
Answer
Thanks, Kyle and Mohamed, for your feedback. Our RNA purifications using this kit have shown excellent yield, A260/230 and A260/280 ratios, and integrity, but they have only been used for RT-qPCR. Based on your comments, they should also be suitable for RNA-seq.
  • asked a question related to RNA-Seq
Question
1 answer
If I am looking at a specific gene that is comprised of 3 exons or 2 protein coding regions, and I find that some of my reads being aligned are very small proportionally to to the entire protein coding region and located only in one of those protein coding regions. Should I consider this a "bad quality" alignment generally speaking? Similarly if the read spans the entirety of one protein coding region, but is largely absent in the other (1/2), how should I classify these alignments?
Relevant answer
Answer
or maybe one transcript is present and the other absent, not expressed....
quality in RNAseq is not assessed by this way, fastqc is better.
  • asked a question related to RNA-Seq
Question
2 answers
Hello,
I have several single-end fastq files. Before trimming with Trimmomatic, FASTQC reported TruSeq adapter sequences as possible source of overrepresented sequences. However, after trimming, now FASTQC reports Clontech SMART CDS Primer II A as source of overrepresented sequnces. What should i do about them? Can those sequences cause any negative effects on downstream analysis?
Thanks in advance.
Relevant answer
Answer
Thank you Mehmet Tardu
  • asked a question related to RNA-Seq
Question
5 answers
We would like to know the best value for money commercial company for DNA sequencing as part of an RNA-seq study.
Thanks you. Joe Duffy
Relevant answer
Answer
Perhaps the LatchBio platform might be of use? It does not sequence data but it's a great RNA-seq analysis pipeline.
  • asked a question related to RNA-Seq
Question
2 answers
Hi everyone,
I have a question and I was hoping to get some insight from you.
I ran RNA-seq on my samples and I didn't have replicates, I used differentially expressed genes with a cut-off of 2-fold change for Preranked-GSEA and got a list of pathways activated in each of my samples. The question is that can I use any of the values such as ES, NES, FDR, etc. or since I have no replicates, it doesn't make sense to use these? Next, if I don't use these values, should I rank my pathways based on the number of genes they have in the gene set? If yes, is there a cut-off that is being commonly used for that?
Thanks for the help.
Relevant answer
Answer
I would say that at least three replicates per sample are needed. Otherwise, you can rank the pathways but can't show statistical significance. But I feel manual ranking will be tedious.
  • asked a question related to RNA-Seq
Question
1 answer
Hello! I am looking to isolate cell nuclei from mouse brains (hypothalamus specifically) and have been considering several kits. There are two from Sigma - the Nuclei EZ Prep and Nuclei Pure Prep, as well as the Minute Single Nucleus isolation kit - offered both with and without detergent.
I have considered adapting a 'home-brew' protocol described in some recent papers, but due to time constraints, a kit would be ideal since there might be less troubleshooting and validation.
Does anyone have experience with these kits and do you have a recommendation as to which might work best?
Relevant answer
Answer
Hi Eugene, I was thinking the same after my own various in-house variations. I have used the Nuclei EZ kit, which is actually just a detergent-containing buffer, and not a true kit. I did get good results with it but I was using it as part of a Fluorescence- Associated Nuclei Sorting protocol, so I was always guaranteed pure nuclei. I do have my eye on the Minute Single Nucleus products, so if anyone else has used these?
Thanks,
RT
  • asked a question related to RNA-Seq
Question
1 answer
Hello, What would be the best methodology to perform RNA-seq with samples with low RIN? What do you recommend?
Thanks
Relevant answer
Answer
The NEBNext® Ultra™ II Directional RNA Library Prep Kit for Illumina® protocol has the ability to work with partially degraded FFPE samples.
Good luck!
  • asked a question related to RNA-Seq
Question
2 answers
If we want to use patients data, does RNA seq include any potential patient identifying information that should be checked for donor agreement?
Relevant answer
Answer
RNA sequence is considered as PHI (Protected health information).
  • asked a question related to RNA-Seq
Question
3 answers
Hello everyone,
I've just started studying about STAR aligner and I came across primary assembly and patch release. I understood so far that patch release is a minor version of a genome which comprises only the sequence(s) that has some update, not the whole genome itself.
Therefore, for RNA-Seq studies and lncRNA characterization (from alignment to differential expression), the patch release would not be recommended. Instead, the primary assembly should be used. Is that right?
I would appreciate if anyone could share any insight, review or basic publications. Thanks.
Relevant answer
Answer
I think you should use primary assembly. You can download the latest version, which at the time is release 106 of GRCh38. It is available on the FTP server of Ensembl.
  • asked a question related to RNA-Seq
Question
8 answers
I have read the sentence below, and I have still diffculty to understand the term Read Depth. I would be glad if someone could explain it to me.
Read depth:The total number of sequencing reads obtained for a sample. This should not beconfused with coverage, or sequencing depth, in genome sequencing, which refers to how many times individual nucleotides are sequenced.
Relevant answer
Answer
  • asked a question related to RNA-Seq
Question
5 answers
Recently I did RNA seq on mycobacterium. I got the TPM data from one of our colleagues in the Bioinformatics department because they helped us to analyze the raw data.
Then, I'm interested in making a volcano plot from the data. Do you think it's possible to get a p-value from TPM data?
Relevant answer
Answer
Hi Desak Nyoman Surya Suameitria Dewi, RNA-seq data should be processed with dedicated statistics, such as DESeq2 - Bioconductor or edgeR - Bioconductor. If you use p-values calculated with those statistics, you can create volcano plots. The reason why you cannot use more typical statistics, such as ANOVA, to get the p-values, is that RNA -seq data does not have a normal distribution, and many common inferential statistics probes require data with normal distribution.
Best !!
AN
  • asked a question related to RNA-Seq
Question
3 answers
I am running RNA extractions on whole gut samples for downstream RNAseq. For one individual I realized there was a length of gut tissue still in the original collection tube that I didn't add to the homogenization solution. I'm not sure what region of the gut it actually is or proportionally how large it is relative tissue that was homogenized (it is smaller), but I'm worried that if there are regional differences in RNA expression profiles that will bias the RNAseq data towards the already-processed portion of the gut.
Is this sample salvageable? If I extract RNA from the leftover tissue, could I just combine the total RNA sample volumes from both prior to sending in for sequencing? Alternatively, if we sequence both separately could we normalize and combine reads somehow? Are there any other strategies that would be more robust to prevent bias? Thanks in advance.
Relevant answer
Answer
Was the gut that didn't get homogenised effectively protected from RNases? If the leftover tissue was: stored in a reagent like RNAlater the whole time, OR frozen the whole time and never thawed, then I think it would be fine to extract the RNA from it now and add that RNA to the existing RNA from the same sample. But if the leftover tissue sat around in the collection tube at ambient temperatures for even a minute longer than the other samples - it's gone, let it go. Substantial differences in RNA quality between samples can be a major source of bias in RNAseq experiments, so deliberately adding RNA which is likely to be degraded is a bad idea. (If you are in doubt at all, do you have access to a BioAnalyzer or TapeStation or other way to check RNA integrity? You could extract the RNA from the leftover tissue, compare the RIN of the new extraction to the previous extraction, and mix them only if they are comparable.)
I don't think preparing a second library from the RNA made from the leftover tissue and then sequencing that would be worth the effort and expense. I don't think it would be possible to neatly integrate this data into your experiment.
Whatever you end up doing, this sample will be different in some way to the other samples in your experiment. So you should run an outlier analysis at the bioinformatic level, and seriously consider excluding this sample entirely if it looks different from the other samples. Building in some extra n into RNA-seq experiments to account for errors like this (they happen to us all!) is a very good idea when feasible.