Science topic
Illumina Sequencing - Science topic
Explore the latest questions and answers in Illumina Sequencing, and find Illumina Sequencing experts.
Questions related to Illumina Sequencing
How can I effectively perform metagenomic assembly on large Illumina sequencing data from environmental samples, given that I have already completed quality control but encounter issues due to the data size?
I recently performed Illumina shotgun sequencing and my fastqc data is as attached.
is this correct?

I'm using the Novaseq 6000 and HiSeq 4000. Assembled alone, the Hiseq data has little missing data and many reads per sample, Novaseq has more missing data but still is useable. When I assemble them together, Hiseq individuals have few or no SNPs. I've checked trimming for both datasets and that does not appear to be the issue.
Assembling using ipyrad, I've assembled de novo and mapped to a reference.
I am trying to map Illumina data to rotavirus genome. But I noticed that mapping to different reference genomes results in some variations in the consensus sequence. Any suggestion? Should I discard genome regions with low coverage? If yes, then what is considered low coverage?
Hello!
I have performed a phiX validation run with Illumina standard phiX kit and V3 chemistry. I diluted the phiX library to 10 pM and expected to get a cluster density of around 1000, but it resulted in a very low amount of cluster density (~130). Also as you can see in SAV plots, phred scores have decreased after cycle 100 in read 1 and cycle 40 in read 2, which resulted in a low >Q30 percentage at the end of the run. What do you think has caused this issue and how I can fix it? Can this be because of the low cluster density? can this be because of bad reagent storage conditions or handling? I have performed a system check and it was successful. what are your suggestions?
I have attached plots of SAV analysis and thumbnail images in different cycles of the run. the photo with better quality is for cycle #17 and the photo with lower quality is for cycle #436, both for A nucleotide.
Thank you.





+1
Hello,
I recently received metagenomic 16S rRNA gene sequence data from a company, which includes both raw reads, and clean data with barcodes removed. My goal is to analyze these sequences and obtain information on the taxonomic diversity and abundance of the species present in the sample.
Since I use a Windows system and cannot utilize Mac or Linux, I would greatly appreciate guidance on how to proceed with this analysis. Are there any web server-based applications available that can assist with this task?
Furthermore, if there are any researchers or experts interested in this project, I would be grateful to explore potential collaborations. Please feel free to reach out to me if you are interested or have any recommendations.
Thank you in advance for your assistance.
Best regards
We are trying to sequence the HBV and HCV but we dont have specific protocol
I have been using the DNeasy kit from Qiagen to extract DNA and, in the final step of the extraction, the product guide recommends to elute the DNA in AE Buffer that contains EDTA (0.5 mM). However, the Nextera protocol might be seriously affected by EDTA during tagmentation. Could I use TE buffer (0.1mM EDTA) for storing DNA samples in a stable manner for a long time? Or would I have to extract my samples only in nuclease free water for immediate use?
In illumina sequencing results how to rank genes on the basis of false discovery rate (FDR)? What FDR values should be considered as significant?
I need to extract DNA of hundreds of soil samples for library preparation for Illumina Sequencing. In my lab people have been working with the DNeasy Power Soil Pro Kit (individual reactions). Now that I need to speed the process due to the amount of samples I am considering the DNeasy PowerSoil HTP 96 Kit. Any opinions on the recovery/yield of this kit?
Please let me know about your experiences, any advice is appreciated.
I'd like to sequence the genome of the gopher tortoise. The genomes of congeners are ~2.4Gb. I'm trying to decide how much coverage is necessary; we plan to run the sample on a portion of a NovaSeq run, and at least one PacBio SmrtCell. I'm trying to evaluate the benefits of additional sequencing effort: my starting point would be something like 30x coverage for the 2x150 NovaSeq run, and one PacBio SmrtCell (~8x coverage with HiFi reads? Not so sure about this), but i'm wondering how necessary a second PacBio cell, or additional Illumina reads, would be for assembling a nice genome.
We don't have any tissues available for transcriptomics, and the immediate application will be to map whole-genome methylation seq reads to the genome.
I'm pretty new to all of this, so any suggestions or references to guidelines are most welcome! Thanks!
Hello, I'm preparing libraries for Illumina sequencing and i would like to know if I have to do DNA end repair immediately after shearing, or if I can freeze sheared DNA and do end repair a week or two later. Thanks
We recently got our sequencing data from a company sequenced with Novaseq. As far as I have learnt, Novaseq uses a simpler quanlity score system, which merges different scores into only four categories/levels. This of course causes problem when processing the data with quality-score based pipeline, e.g. DADA2. I got really weired results with DADA2. We also tried with UNOISE3, the ASV table looks fine for the mock data. But since this is also based on quanlity, I'm not sure if this can be trusted.
I also searched for some publications using Novaseq for amplicon sequencing. They all used traditional/old pipelines based on similarity to generate OTUs. I wonder if this is the only way to do this.
I wish there could be someone who has experience dealing with this situation. Is there still some possible way to use DADA2 or UNOISE3 with some modifications to process the data?
Hello! I have an amplicon sequencing dataset (Illumina) of the D1/D2 region (NL1 and NL4 primers). I want to analyze the sequences using Dada2 in R or Qiime2, but I am unsure which is the most comprehensive and updated database for yeast identification. I appreciate any feedback on this. Thank you in advance.
I have designed degenerate primers for a gene. On amplifying it with genomic DNA I'll get the desired band, for confirming it I have cloned the eluted band in the pGEMT vector and then sequenced it through Illumina sequencing. But I didn't get the desired sequence for it. What could be the reason for it?
Dear all,
I'm currently working on a ChIP-Seq for a transcription factor and was preparing libraries using the qiagen QIAseq Ultralow Input Library Kit, checking the ligation of adapters with the Kapa library quantification kit for Illumina sequencing and performing size selection with Ampure XP beads as described in the library kit protocol. The libraries were then checked on a bioanalyzer high sensitivity DNA ChIP.
The first replicate worked fine for me (some samples are included on the bioanalyzer Chips, sometimes the chip didn't run nicely, don't mind that), however, in the second replicate strange larger peaks appear in the bioanalyzer chip and also there are multiple peaks/bands instead of a normal distribution.
The shearing of both replicates was fine (done mechanically with a bioruptor) and targets could be recovered as checked in a ChIP-qPCR. Also, all samples that had been prepared in parallel looked weird (ChIPs and inputs), for both antibodies used and in replicate 1 for antibody 3, although all other samples were fine. If I purified those samples in parallel with a re-purification of an old sample. the old sample still looked fine, so I assume it has to do with the library preparation itself. Adapters were not reused and diluted to a 1:10 dilution as I was using 5 ng of input DNA. The number of cycles was chosen according to the Kapa library quantification and the minimum of cycles was used. For the PCR reaction I prepared a master mix and added it to the samples, so maybe that was a problem? Or do you think an enzyme could have gone bad? Is it just trash that got over-amplified, or could it be a problem with the adapter ligation? Or the most simple question: has anyone seen peaks like this before? And do you think it might be worth to give it a shot at sequencing nonetheless or is it just trash? For the libraries with antibody 1: could it be that those are "just" over-amplified? And if yes, could I try to sequence them, maybe after another round of size-selection to remove larger peaks?
The ChIPs were performed with primary cells (so I would be super happy, if I could use any of the samples) using two different conditions in wt and KO mice and the analysis therefore "just" should be the overlay between the different conditions (yes in wt, no in KO etc), only peak calling, nothing quantitative. Could I have a chance the sample quality might be sufficient for this setting? At least with antibody 1? Might the effect of this fragmentation into single peaks phenomena get lost once I pool the libraries for sequencing?
Thank you so much in advance.
Hi
I am working on a project where I have specific segments of interest. Almost 18 region of TB positive samples were selected from where the mutation can be occurred. For that mutation the specific drug could be resistant. So i wanted to sequenced that specific region applying mini seq illumina sequencing. So i extracted the DNA from left over through Qiagen extraction kit. But when I go for quantification through qubit fluorometer i was surprised that the quantity of the sample is very low . I am confused why this happens. Even though when i pcr that product the quantity is also low . But my pcr was good because the positive control result is perfect. So anyone know or have better solution on that ,please kindly help me
Thanks
Hello,
I have several single-end fastq files. Before trimming with Trimmomatic, FASTQC reported TruSeq adapter sequences as possible source of overrepresented sequences. However, after trimming, now FASTQC reports Clontech SMART CDS Primer II A as source of overrepresented sequnces. What should i do about them? Can those sequences cause any negative effects on downstream analysis?
Thanks in advance.

I am designing a custom NGS run and need to determine the sequence coverage. I am using the Illumina Sequence coverage calculator https://support.illumina.com/downloads/sequencing_coverage_calculator.html
How do I determine the genome/ region size?
Additional information:
Human gDNA will be used. I will have 4 amplicons about 250 bases each from 4 different genes.
Any help would be appreciated. Thanks, Nikhil.
I am currently analysing Illumina sequences for a metabarcoding project. The primers used in the process have NOT been removed from the raw data. But after exploring the data, I found that ~5% of my raw data has no primers.
What could explain this 5% ? Should I discard these primerless sequences?
PS: The data was prepared according to the following protocole : 16 Metagenomic Sequencing Library Preparation Part #15044223 Rev. B (copy paste on internet to find it)
Thank
Dear all, I am trying to use CD-hit to remove the duplicates from the file that is the output from trinity (RNA seq assembly).
I used the following parameters:
cd-hit-est -i in.fasta -o out_cdhit90.fasta -c 0.90 -n 9 -d 0 -M 0 -T 0
But the output file still contains lots of small or fragmented sequence plus the best one. How can I remove those small or fragmented duplicates by changing the parameters?
thanks
ZQ
Hello,
Does anyone have experience with Zymo-Seq RRBS Library Kit and subsequent Illumina sequencing? A part of my project that I work on should be "methylation profiling of brain tissues". I wanted to use Agilent's SureSelectXT Methyl-Seq Library Preparation Kit for targeted methylation sequencing, but it's much more expensive, and I don't think that for our purposes, it's necessary to target hotspot regions. I wanted to know whether you are satisfied with the kit and how do you sequence libraries? The manufacturer recommends at least 30 million reads and read length > 50 bp, so Illumina's NextSeq 500/550 Mid Output Kit v2.5 (150 Cycles) could be sufficient for four libraries?
I'm looking for some resources (article or videos or courses) to understand the working of Illumina and other NGS platforms? I have a lot of concept gaps and doubts.
Hello, I have done WGS of my bacterial strains and got some preprocessed Illumina sequencing files in .fna format. It has a format like this
>1 length=400016 depth=0.86x the sequence
>2 length=323455 depth=1.00x and so on to >102
I want to know how to deal with this .fna file and which analysis to run on it.
The Fig2A of the paper shows that a tiling library of a gene was prepared containing 50bp fragments. The fragments span over the entire gene sequence incrementing about 7bp from each other. Later this sample was used to study the sequence dependence on DNA bendability over genome scale. This is a very interesting study but I am unable to figure out how the tiling library was prepared. Is it done by preparing a sequences for primer pool for every fragment and ordering them? Can we prepare a tiling library with any amount of spacing between them? Please let me know if you have any idea.
Hello,
we are doing stool microbiome analysis for bacteria and fungi. We are using same protocol for bacterial and fungal samples, but we have zero yields after normalization, ligation and quantification only in fungal libraries. We are using SequalPrep™ Normalization Plate Kit, 96-well and usually I put 10 ul of PCR products for bacteria and 20 ul for fungi (since they are usually less concentrated). Then I continue according to protocol. After that we are using KAPA HyperPrep Kit PCR-free according to protocol, only with extended ligation time (1 hour). We already tested shorter and longer ligation time, but results are basically the same, so I don´t see that as issue. As adapters we are using TruSeq DNA Single Indexes Set A (12 Indexes, 24 Samples). For quantification we are using KAPA Library Quantification Kit - Complete universal kit and again I follow the protocol.
So do you have any idea in which step we are loosing our fungal DNA? It´s not human factor either, because two individual people did the same process and got the same zero results. Maybe some of you are more experienced in case of fungi in stool (human, mice).
Thank you in advance for any piece of advice!
I need the information about the companies in India offering services of environmental DNA for any organism. Please add website links.
I will sequence viral nucleic acids using Nextera XT library prep. I was informed by Illumina that the input must be dsDNA, at least 300-bp. I am wondering how to get it. I have a commercial kit to prepare the first strand using the viral RNA as template. My problem is synthetizing the second strand. I have seen a strategy where you must use a number of enzymes (RNAse H, ligase, polymerase), but I understand that this is more adequate when you work with long eukaryotic mRNA. For constructing the first strand, I will use random primes. Could I synthetize the second strand using random primers and only a polymerase (Klenow fragment)? In this case, how I would break the RNA/cDNA duplex? Is it possible (and necessary), to validate each step (first strand, second strand) with a qubit fluorometer?
Also, should I eliminate the viral DNA before the reverse transcription? Or may I keep it, get the cDNA for the viral RNA, and use the same reaction tube, with DNA and cDNA, to prepare a single Nextera XT library?
I know that there are commercial kits that prepare both strands at once, however I just found very expensive options (such as SuperScript™ Double-Stranded cDNA Synthesis Kit). If anyone knows a less expensive alternative, I would appreciate the advice.
We are collecting human samples and thinking to use some storage buffer to ensure optimal preservation of RNA and DNA, among other molecules.
We saw several papers, and authors mentioned Allprotect Tissue Reagent (QIAGEN) as storage buffer, but we did not find anything about its composition. Then, I decide to ask if anyone knows whether Allprotect Tissue Reagent (QIAGEN) was a buffered salt solution such as RNAlater (Life technologies) is, since we we need to take this into account for the subsequent treatment of the samples.
Thank you so much.
Nerea
Our lab has sent rat cardiac tissue for sequencing and have obtained indigestible fastq data files. Is there a software I can use to organize these fastq sequence files in order to obtain meaningful results?
I’m do de novo genome assemblies on a set of bacteria samples. I already have Illumina data for most of them and I just submitted samples for Nanopore sequencing for all of them, but I have 9 samples that still need Illumina sequencing. My university’s DNA facility doesn’t pool samples for people and we don’t have anything else we need sequenced. We also don’t know anyone else who is sequencing anything soon.
Does anyone have any recommendations for a company or group that can do Illumina sequencing on a small number of samples for cheap? Or am I going to have to pay for a full lane no matter what?
Dear All,
Can I use the DNA extracted from dissolving a low-melting point agarose for illumina sequencing (without using DNA purification kit)? It is because the recovery rate of column-based DNA purification (~100 bp) is extremely low.
Thank you,
Hi
I am extracting genomic DNA from dust samples and the 260/280 ratio is 1.4 whereas 260/230 is 1.35. I need to perform the metagenomic sequencing and for the same, the recommended 260/280 ratio should be greater than 1.8. Can anyone help me with how can I improve this. I have tried ethanol precipitation but it causes a significant reduction in the yield. As these are environmental samples the yield of genomic DNA is already low.
Thanks
A probably silly question to phage experts.
Is it possible to determine whether a phage genome is linear, circular, or circularly permutated using Illumina sequencing data and genome assembly? Thank you very much!
Hello, I am using KapaHyperPrep Kit for library preparation for Illumina sequencing. I have problem with adaptor dimers formation. Does anyone have some experience with this issue? How did you get rid of them? We tried shorter ligation time and also we used half of the usual amount of adaptors, but there is not that big improvement....
Thank you in advance!
I have some primers that have been used successfully to amplify the V1-2 region of the bacterial 16S rRNA gene. I plan to run some 16S rRNA sequencing analysis and I must now add a sequencing adapter sequences to each of my primers respectively. Should I calculated the Tm based on the 16S-specific primer region only, or should I calculate the Tm with both parts of the primer (seq adapter + 16S specific region)? The addition of the sequencing adapter increases the size of the the primers and therefore the Tm significantly above recommended annealing temperatures.
Thanks in advance!
can a bacterial gene differ between sequencing and band size on gel? if yes, what is the maximum difference, above which it cannot be the same gene?
Hello,
I recently performed a sequencing run with MiniSeq. The experiment involves editing via viral transduction, where the construct contains the CAs9 and guide RNA. When performing the NGS analysis, the total reads seem to be good enough (good enough depth). However, the number of reads with the indicator sequences in the given amplicon seems to be awfully low compared to the total reads. I bought new primer sequences for NGS prep and repeated the NGS run. No contamination or primer dimer in the library prep. Still I see the same low reads with indicator sequences in the given amplicon.
I think the transduction and random integration can be the reason for imperfect alignment of the amplicon, resulting in low number of reads with indicator sequences. Am I right? Could there be any other reason ?
The analysis was done via Rgen Cas9 analyzer website link http://www.rgenome.net/cas-analyzer/#!

Hi,
I applied both the E.Z.N.A.® Plant DNA Kit | Omega Bio-tek and the thermofisher ChargeSwitch kit (magnetic beads) for DNA extraction from plant leaves (freeze-dried). It seems omega work best in terms of yield and PCR product. I want to use the omega EZNA kit for DNA extraction followed by ITS amplicon library preparation for Illumina MiSeq sequencing.
Is there any problem EZNA kit with Illumina MiSeq or HighSeq platform? Or Can he use the Omega EZNA plant kit for DNA and use PCR products and later for Illumina sequencing?
Suggest me a low-cost DNA sequencing machine.
Please also mention the price if possible.
Thanks!
I would like to know what are the main limitations of using nanopore sequencing, what are the most difficult steps ?
How about support, applications available (are apps easy to use and free of charge ?).
Are the real costs of using nanopore are smaller or bigger than illumina's solutions ?
I have experience in Illumina sequencing, but I wonder if nanopore technology is a real competition or still technology "in-progress" ?
What are your general opinions? How about sequencing COVID ?
I would appreciate any responses from the users :)
Hi everyone,
Illumina provides a list of primers to amplify with high taxonomic coverage the ITS1 region for further fungal sequencing, but I cannot find the exact amount necessary of each primer. I understand then that have to mix them equimolarly, am I right? Has anyone used them?
Thx!
Looking expert opinion...
I have collected marine sponge samples and were shadow dried for two weeks. Now the sponge samples are well dried and can be directly powdered by grinding. I would like to study the sponge associated actinobacterial populations (uncultured) from the dried sample rather than fresh sample.
Here comes my doubt,
If we grind and use the sponge powder for metagenomic DNA extraction, does the DNA be damaged/sheared ?
or
can directly use the dried sponge material (without grinding) for metagenomic DNA extraction?
Kindly, some one clarify my doubts.
Thanks in Advance,
Siva
Hi,
I am working on some 16S sequencing data, and seems some of them are low quality.
I am not sure how to set the trunc value in dada2.
qiime dada2 denoise-paired \
--i-demultiplexed-seqs paired-end-demux.qza \
--p-trim-left-f 0 \
--p-trim-left-r 0 \
--p-trunc-len-f 220 \
--p-trunc-len-r 180 \
--o-table table.qza \
--o-representative-sequences rep_seqs.qza \
--o-denoising-stats denoising_stats.qza \
Plugin error from dada2:
No reads passed the filter. trunc_len_f (220) or trunc_len_r (180) may be individually longer than read lengths, or trunc_len_f + trunc_len_r may be shorter than the length of the amplicon + 12 nucleotides (the length of the overlap). Alternatively, other arguments (such as max_ee or trunc_q) may be preventing reads from passing the filter.
What parameters should I choose?
Thanks for all your help !!

I tried to run this function sitetest to perform Site-level Differential Methylation Analysis using IMA package but I got error message.
sitetestALL = sitetest(dataf,gcase="KO",gcontrol="WT",testmethod ="wilcox" ,Padj="BH", rawpcut = NULL,adjustpcut =NULL,betadiffcut = NULL,paired = FALSE) and I got this error message: Error in wilcox.test.default(x[1:length(lev1)], x[(length(lev1) + 1):(length(lev1) + : not enough (finite) 'x’ observations
Can you help me to solve this problem?
I am interested in the different tools that can be used to create custom databases for targeted sequencing and how to trim the databases based on the amplicon size? Also, should custom databases contain species not assigned to a species level?
I've been trying to figure out how to go by assembling raw sequences. I have 5 Lactobacillus and 1 acetobacter strains whose genomes need to be assembled. I am trying to achieve a full length 16S rRNA sequence using 27F and 1492R primers before sending it off for MiSeq Illumina sequencing. What I am confused about is how to go by assembling the genome and its annotation? I am a first timer. Please guide.
Which primer pair is better for amplifying endophytic fungal communities (metabarcoding with Illumina sequencing) in roots of trees while avoiding host amplification? I would greatly appreciate a suggestion of a suitable primer pair to identify endophytic fungal communities in tree roots using Illumina sequencing. Some sequencing facilities have the ITS1-1F-F / ITS1-1F-R available, while the ITS3mix / ITS4ngs is also suggested in the literature (Tedersoo et al 2015). Which is better? Is there any other option?
Any advice based on experience will be highly appreciated.
I ordered 16S primers according to the earth microbiome website, however, the researchers added GCT to the end of the Forward primer Illumina 5' Adapter to affect the melting temperature of the primer.
My laboratory decided to use these primers with barcoded Reverse primers to reduce overall costs by using dual index instead of single index seq. A sequencing company has informed me that this "GCT" extension can cause issues as seen below:
"Reason being is that your P5 adapter flowcell binding site is 3bp longer than the standard Illumina adapter. As a result, all of the Index 2 reads will start with “GCT”. This low diversity region may end up with quality drop or even run stop"
Has anyone tried this particular this extension to the Illumina adaptor with dual indices, and not just single. The problem has not been flagged when the primers with the extension were used as single index barcodes, or by a different sequencing company so I do not know how serious this issue could be.
Thanks for your help
Kelly
I want to sequence the amplicons with two different library prep kits.
1. Nextera DNA Flex Library Prep. kit
2. QIAseq Ultralow Input Library Kit
Can I use the libraries from both preparation kits in a single cartridge simultaneously? If not, what is the possible reason?
Furthermore, is it possible to use the Adapters of one library prep. kit for another?
I filtered different amount of sea water and we did Illumina sequencing on the samples obtained. To be able to compare among different samples I would standardize the samples by calculating the OTU reads per liter.
I found that Quime suggests standardization to the minimum otbained to reduce noise. But since I dont work with the FASTA format but with an excel table with the reads of OTUs which are assigned to the different species I obtained, I am wondering if calculating OTUs per Liter would be a standarization method which allows me to compare among my 33 different samples?
How do you standardize your HTS data of environmental samples?
Many DNA extraction protocols warn against vortexing during sample preparation as it tends to shear DNA into smaller fragments. This makes sense if the downstream application requires high molecular weight DNA - but vortexing is really handy. Illumina sequencing doesn't require high molecular weight DNA (shearing is part of the library preparation) - but I notice people still taking pains to avoid vortexing during extractions that are intended for Illumina sequencing. Is there a compelling reason to do so, or is it more likely a case of not customizing a generalized protocol to the specific needs of the situation.
Hi there,
I am working on pooled sequencing samples of drosophila. I have three populations. When I first submitted my samples for (paired end) Illumina sequencing, the biotech center informed me that they messed up sequencing the third sample, and would have to redo the run. I still received the data from the other two samples which were fine. When they redid the sequencing for the thid population, I also received another set of reads from populations 1 and 2 from the second run. My advisor advised that I concatenate the fastq files from the original and redo runs for these populations to obtain more depth. I'm wondering a couple things;
1. Has anyone else done something like this with their data
2. Could this create any sort of problems or caveats that I should be aware of when analyzing my data downstream? (I'm currently using Popoolation2)
Thank you
Hi Everyone!
I am running a few population genomic studies on some invasive mammalian species, and I'm wanting to plan my study. I will shortly have some platinum phased chromosome level genome assemblies to scaffold everything on.
What I am wondering is what are the coverage levels required for various analyses? Obviously I have the trade off between samples and coverage.
Overall, I'm wanting to describe the levels of diversity across the population, look for genomic regions with higher and lower variability, and potentially look at some evolutionary historical questions such as founder number, origins and hybridization history, and maybe adaptation (this last one is probably very tricky given the demographic history). It's a little nebulous, but I hope to get a few different things out of the study, and I don't want to think in trerospect, 'damn, I wish I had done x coverage instead of y'.
My first thought was to do moderate coverage genomes (~20 - 30x ) for around 8 individuals sampled from different 'populations' in the introduced range, and ~4 individuals from the original range to compare diversity. I realise this is a small number from the original range, but it does make sense based on historical evidence. Then for genomic diversity at a landscape scale, do ~200 individuals with ddRAD. Get the best of both worlds.
I could alternatively do far more individuals at lower coverage.
It would be great to find out people's opinions on ideal coverage levels for different questions. Say, heterozygosity requires at least x coverage, or ROHs requies y. Are there clear benefits for some questions for having 30x rather than 10x, or 10x over 1x for instance?
Any thoughts would be greatly appreciated.
I have to send my sample for metagenomics sequencing but my genomic DNA shows up a clear band with thin and long smear. It looks like my sample has degraded but I'm not sure. I read somewhere that it's quite common for genomic DNA to have smear. But I fear the sequencing lab would reject my sample.
So far, this is the best result I can get and I'm tight on schedule so it's almost impossible for me to optimized another extraction method.
Do you think my sample are qualified for the sequencing? My sample is lane 1 and lane 2 (I mistakenly cropped off the ladder, though).
I need your advices. Thankyou so much!
Hi everyone! Currently I'm optimizing lab protocol focused on 16S sequencing of low abundant samples and I'm trying to deplete residual DNA from Q5 polymerase used for library preparation using 8-methoxypsoralen. After that we are performig cleanup with SPRI beads and then indexing and second cleanup. Is it possible that 8-methoxypsoralen can inhibit/result in crash of Illumina sequencing run?
I would like to find a tool (if it exist ...), to predict the porportion of r vs K strategist in samples from a metabarcoding study.
For example, the tool I need (R package or equivalent) could work similarly to functional predictive tools such as Tax4Fun, FAPROTAX or PICRUST, but instead of predicting functions, it investigate ecological strategy such as r vs K as explained by the "r and K selection theory".
Benoît.
Which Illumina-compatible sequencing kits are recommended for the preparation of multiplexed libraries that cover the ITS region of fungi?
What are the criteria for designing biotinylated probes differ from primer designing?
I am working on a project aimed to determine the influence of long-term fertilization on soil microbial communities. I am sampling both the rhizosphere and the bulk soil and hope to use the current best choice of primers for targeting bacterial and archaeal 16S rRNA genes. Initially I planned to use the primer pair 341F/785R, which targets the V3-V4 region of 16S and is reported to have good domain coverage for both bacteria and archaea. However, I now also have the option to use separate, archaea (956F/1401R)- or bacteria (969F/1406R) -specific primers, which target the V6-V8 region of 16S. The benefits of the separate primers are better coverage for archaea, and less eukaryotic sequence contamination, but the V3-V4 primers are the standard tool typically used in similar research. I am confused with which set should I proceed with or if there are any other primer sets I should be considering?
Dear All,
could some please suggest proper PhiX concentration for multiplex Illumina sequencing with use of MiSeq kit v2 and Nextera XT.
samples #1-11 are highly similar - around 50-56%GC
sample #12 significantly different from the others - somewhere around 33%GC.
5% PhiX will be ok?
thanks in advance,
Piotr
Anybody has experience with a kit for library preparation from DNA.
For T and B immune repertoire , for illumina sequencing.
BUT
samples are from FFPE!!
I use Lymphotrak and it is great with fresh tissue or blood, but not FFPE
Which is better, V1-V3 or V3-V4 regions for 16S amplicons using 2x300 bp illumina sequencing? Are both good to target bacteria and archaea?
I am making libraries following a NEB protocol for Illumina sequencing.
The insert size is 350bp, which should yield 480bp long fragments in the library after amplification (insert+adaptor).
After amplification, I get longer fragments, with libraries peak centred on 800bp. Would anyone know why?


I am very naive with the Illumina sequencing and I got the sequenced raw data and report. I am quite new with the technique and terminology. Can anyone explain what it means that was sequenced using "2x250bp paired-end reads"? To what do the numbers refer?
Hi,
It is our first trial and I'm not very glad... We have performed a first experiment with eight fresh frozen samples (input 80 ng DNA).
We tried to Run and we are experiencing some problems, it seems that the MiSeq was able to detect the clusters (1086) but it wasn't able to give us any more results, and none of the clusters has good passing filter QC...strange...
Any idea?
Hi!
I've recently been trying to prepare a couple of DNA libraries for sequencing on the novaSeq platform. In that regard, I've had to try several methods (I'm doing bacterial transposon insertion sequencing, where traditional prep methods don't work), and I realized that apparently different kits have different index primes.
For example, the NebNext kits seem to rely on the following as rd2:
5' - AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
While Nextera kits rely on this as rd2:
5' - GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
How come? Is this not a problem on the platforms? How do they know which to expect, to make sure that the indeces are read?
Looking forward to your answers!
I know the equation is Coverage = (read length) (number of reads) / (haploid genome length) but am a little uncertain on what to put in each field (namely read #).
I have an organism with a genome 5,000,000 bp long, it's the only DNA I would be sequencing and I want to calculate the average fold coverage.
I'm using the NextSeq500.
This is where I am currently, but this doesn't look at all right.
C = LN / G
L = 150 bp
N = 130 x106
G = 5x106 bp
C = (150) (130,000,000) / 5,000,000
C = 3900
I want to go for whole genome sequencing using next gen Illumina but i have some problems.
Firstly, the organism is a mollicute with a small genome and is an obligate plant parasite with low titer. I cannot do a DNA extraction free from its host DNA. The best ratio i can possibly get is 1:10 bacterial to host DNA, optimistically. I can purify it using a NEBNext® Microbiome DNA Enrichment Kit but that will still leave me with organelle DNA.
The problem lies that the organism is very GC low, estimated around 26% GC, i've read that Illumina has a GC bias ( ) .
" For example, Illumina sequencing of a Plasmodium falciparum genome, which is extremely GC-poor with a mean GC content less than 25%, was found to favor the more GC-balanced regions, leading to few or no reads from the many GC-poor regions [13]. "
This is an issue because not only are we afraid Illumina will target GC regions of the bacteria but altogether favor the normal GC% host DNA which already outnumber the bacterial DNA. And thus we would be left with very low number of reads for our actual bacterial DNA.
Do you have any recommendations about this matter?
Also, in the event of not being able to extract adequate amounts of DNA, we would have to resort to an MDA, does anyone have experience with MDA favoring higher GC% genomes and also moving in favor of amplifying host DNA over bacterial DNA? . As a strategy to target my organism i can add oligomers in the MDA that are frequent in the organism reported in literature. Would that be recommended?
Would appreciate some answers.
I got back results from Illumina sequencing in the form of 2 files. I trimmed them and then I am trying to merge and convert them to a faste file so that I can assemble using IDBAUD. Heres my code and the error that I received
$ fq2fa --merge C_S7_trim_R1_001.fastq.gz
C_S7_trim_R2_001.fastq.gz C_S7_outmergedfile.fa
Error: terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr
Aborted.
I have downloaded idbaud to my home directory and am using function fq2fa from it. Could anyone please tell me what is wrong with this? Any help is appreciated.
Thank you
Hello! I have bacterial (16s V3-V4) and fungal (ITS2) miseq data from soils collected on different farms where I'm hoping to compare microbial networks in different farming types (conventional and organic). While OTUs have traditionally been used for network analyses, I'm wondering if, with the push for ASVs (https://www.nature.com/articles/ismej2017119), ASVs should be used instead? From the cautionary blog posts from Noah Fierer's lab (http://fiererlab.org/2017/10/09/intragenomic-heterogeneity-and-its-implications-for-esvs/), I'm a bit hesitant to use ASVs in network analyses due to potential, strong positive correlations being the result of simply having multiple sequence variants from the same organism. The only published example I've found with networks of ASVs (https://www.nature.com/articles/ismej201729) evaluates ASVs within OTUs, not just ASVs. Should only OTUs be used in the analysis of microbial networks? Or can ASVs be used instead, and the lack of published networks with ASVs simply due to the recent development of programs that produce ASVs? Many thanks for thoughts and ideas on this.
I have mi-RNA sequencing data, where my 3 prime adapter is attached with 12 Nucleotide UMI Sequence (in random manner) along with Reverse Transcriptase primer and Universal Adapter
Overall sequence pattern= miRNA+3Prime adapter+RT primer+Universal Adapter
I am trying to remove the UMI sequence using UMI tools. I have also looked in to the regex function in UMI tools but i am not sure it will work because of the sequence pattern i have.
Please let me know if i have any chance of processing the sequence with UMI tools.
I have attached an example of my sequence pattern
Image annotation: 3 prime adapter colored in RED
Reverse Transcriptase primer in Orange
and in between them is the 12 Nucleotide UMI bases.
And after Reverse Transcriptase primer is Universal Adapter.

I am trying to do some quality control checks on Illumina NGS paired end reads using FastQC 0.11.5. I tried to add the primer sequences from NEBNext multiplex library prep kit to the contaminant_list file. I already notice that the file has non-unix carriage return non-printable charaters (^M or /r). Both removing and making sure that all lines have them does not change the error. I even used the orginal contaminant file I still get:
Option c is ambiguous (casava, contaminants)
Started analysis of myfile.fastq.gz
Failed to process /path/to/file/contaminant_list.txt
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
Curiously the sequence file being analyzed seems to be processed okay, so it is not corrupted:
Approx 5% complete for myfile.fastq.gz
Approx 10% complete for myfile.fastq.gz
Approx 15% complete for myfile.fastq.gz
Approx 20% complete for myfile.fastq.gz
etc....
Here are the commands I am using:
fastqc myfile.fastq.gz -o /path/to/fastqc_out -a path/to/adapter_list.txt --noextract -t 6 -j /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el6_10.x86_64/jre/bin/java -c /path/to/Configuration/contaminant_list.txt
Anybody know what is going on?
Hi, we have two Illumina-sequenced plant samples where one is a mutant of the other. Do you know a straightforward method to compare these in order to spot small (SNPs, InDels) or maybe longer mutations (LTR insertion, don't think CNV) or do you suggest align both to a reference, call mutations and then compare?
Tnx
what exactly is the Nextera PCR Master Mix from the Nextera XT DNA Library Prep kit composed of? and at what concentrations if possible.
I saw that this question was asked already but the answer weren't specific. I already e-mailed tech support but I thought I would ask here too because I am always enlightened in some way from you guys!
Thank you in advance!
I would like to study the metagenomic diversity of actinobacteria associated with the marine sponges. In this context, I would like to use actinospecific primers to capture the complete actinobacteria phylum rather than using universal primers. Especially, would like to explore the rare actinobacteria like Salinispora.
I am aware the actinospecific primers given by Stach et al. 2003 ( SC-Act-235aS20, SC-Act-878aA19) may be useful in my case. However, as the primer read length is 640 bp it can not be used for Illumina.
Illumina can deliver less than 400 bp.
Blackwood et al., 2005 has also proposed Phyla specific primers. He has reported that the Actinobacteria Phyla specific primer "Act1159R" captures most of the actinobacteria they tested. Can it be useful in my case? Whether the Reverse primer alone sufficient to explore the actinobacterial Phyla?
Kindly, some one can provide me the write primer that will be helpful for me.
Thanks in Advance.
This is a video on Illumina Sequencing that I really like:
However, I still have some technical doubts and I do appreciate your expert comments on any of these:
1) After preparing the sample DNA it is introduced on the lanes of the Illumina flow cell. How is it ensured that DNA will be uniformly distributed and not bind preferentially around to the point of injection?
2) My understanding is that the number of DNA bridge amplification cycles will affect the size of the clusters generated on the flow cell. How many cycles of bridge amplification are necessary? What impact does this have on the results?
3) With the introduction of different fluorescent tags on different nucleotides, can all nucleotides be present in the flow cell simultaneously? How can then be ensured that the polymerization of all the strands in a cluster occurs in a synchronized manner to measure a unique signal?
Thank you very much for your help!
EDIT 29/10/2018: Added more details to bottom.
This might be a bit weird, but I thought I'd try here anyway.
Is it possible to configure Bowtie2, or similar programs, to match my query-sequence only from the end of the sequencing reads?
Query1:
AAATTTGGG
Read1:
NNNNNNNNNNNNAAATTTGGG
This query should give an an exact match score for Read1, no matter what the N's contain, while
Query1:
AAATTTGGG
Read2:
NNNNNNNNNNNNAAATTTGGGNNN
Or anything else where the end of the read does not match the query exactly, would result in a mismatch.
I'm currently just using grep to match my reads in this way, but it is terribly slow.
EDIT:
Bit of the background. I'm sequencing an oligo library coding for short peptide sequences. The peptide length varies, so to make them all behave the same way in PCR etc., I have added filler sequence bringing them all to length of 200 bp. Each peptide coding sequence starts with Kozak (GCTAGCCCACC). So each oligo in the library looks like this:
NNNN...NNNN-GCTAGCCCACCATGACCACAGGAGACACCTAGCT
1bp----Filler---------Kozak-----------------peptide-coding-----------200bp
Because of errors in oligo synthesis, PCR, and Illumina, there can be snips/indels in these sequences in the sequencing reads. Any mismatches in the N-part (filler) are of no consequence to the peptide being produced, which is why I would like to match the sequencing reads to my library only starting from the Kozak sequence. The filler length varies from 0-150 bp, so I cant think of an easy way of trimming these from the reads.
454 GS FLX Titanium system which is capable of generating 700 megabase (Mb) of sequence in 700 bp reads.
MiSeq reagents enable up to 15 Gb of output with 25 million sequencing reads and 2 × 300 bp read lengths.
In the first case to seq a total of 700 million bases it does by seq 700 base pairs at a time in a run?
In the second case to seq 15 billion bases what is 25 million sequencing reads and 2x300 bp reads.
And does the algorithm align these 300bp length sequences together to generate whole sequences?
Actually i want to do forensic genetics NGS and need assistance in this regard,
We are looking for some protocols to genomic DNA extraction of H. pylori with the quality required to be sequenced in MiSeq system of Illumina. Does anyone have suggestions?
Next Generation Sequencing techniques are currently used in microbial profiling and other studies. Between the NGS (specifically Illumina Miseq Technique) and PLFA, which is the best method to employ in microbial studies? What are the pros/cons of either?
Or, besides the two, what is the other best method to employ in microbial studies?
Hi all,
As we all know that 16S amplicon seq data by Illumina sequencing is semi-quantitative and therefore I want to normalize my data output with the qPCR 16S data. Can you please recommend me some statistical method of doing so?
Thank you!
Arslan
Hi all,
I am trying to plot rank abundance curve for my 16S rDNA data in which I have different treatments. When I do so, I get one curve only describing about the presence of each species. However, as I have different treatments, I want to plot multiple curves to show the relative comparison and species distribution. This is not working for me. I am attaching the plot which I got and would be thankful for suggestions / help!
The code i am using is given below
phyloTemp = transform_sample_counts(physeq, function(x) 1e+02 * x/sum(x))
clusterData = psmelt(phyloTemp)
clusterData = dplyr::filter(clusterData, Abundance > 0)
clusterAgg = aggregate(Abundance ~ OTU + Genus, data=clusterData, mean)
clusterAgg = aggregate(Abundance ~ OTU + Genus,data=clusterData,mean)
clusterAgg = clusterAgg[order(-clusterAgg$Abundance),][1:100,]
ggplot(clusterAgg,aes(x=reorder(OTU,-Abundance),y=Abundance)) +
geom_point(aes(color=Genus),size=3) +
theme(axis.ticks = element_blank(), axis.text.x = element_blank()) +
scale_y_log10()

I work in gram positives (S. aureus) and lately We have done WGS illumina single read for different clinical and lab strains. The results are really good. Also I had seen publication with illumina single read.
The problem is that we have to do WGS to some gram negatives (klebsiella and E.coli) and I had read that for this kind of bacteria is better Double read. I want to know if it true or we can also work with single reads (cheaper), and if it is better why (genome size?, amount of plasmids?)?
Thanks!!
My samples seem to have similar levels of amplification with no primer clouds or primer-dimers and no secondary amplification. Is there a reason I need to magnetic bead clean up each sample right after PCR instead of doing them all together after quantification/dilution/pooling?
Thanks!
We recently sequenced 96 samples on a Nextseq, prepare from single- and double stranded DNA, using the appropriate protocols.
Turns out that the samples using the single-stranded protocol seemingly did not work. One of the things I can see is that we got a very high frequency of long poly-G sequences (~20%). Manually checking the fastq files shows both complete 76bp poly-G sequences, and in other cases a small poly-A tail (10bp) following the indexing adapter, then followed by the long poly-G tail. Do these poly-G tails mean no signal (considering the NextSeqs two-colour chemistry) and should therefore be ignored and trimmed? Can they be the sympton of some library preparation problem (bad adapter fill-in, blunt-end repair)? Thanks in advance!
Hi all,
I am new to the Illumina system. When I did the PCR and ran the gel using normal sets of primers. Everything is nice and clean. However, when I used the primers with Illumina adapters. I got non-specific bands and smear. Can someone help me please?
PCR conditions are the same for both with/without the adapters.
16S primers: 515'F/926r
GTGBCAGCMGCCGCGGTAA
CCGYCAATTYMTTTRAGTTT
16S primers with adapters:
5'[TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG]GTGBCAGCMGCCGCGGTAA-3'
5’[GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG]CCGYCAATTYMTTTRAGTTT-3'
16S PCR condition:
95 C - 5 min
35 cycles of 95 C - 30 sec/ 55.5 C - 30 sec / 72 C - 30 sec
72 C - 10 min
18S primers: Euk1391f/EukBr
GTACACACCGCCCGTC
TGATCCTTCTGCAGGTTCACCTAC
with adapters:
5’[TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG]GTACACACCGCCCGTC-3'
5’[GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG]TGATCCTTCTGCAGGTTCACCTAC-3'
18S PCR condition:
94C - 3 min
35 cycles of 94C - 45 sec/ 57C - 60 sec/ 72C - 90 sec
72C - 10 min
These are all environmental DNA which the original extracted DNA concentration are very low.
Thank you!
Kitty
Can ambiguous IUPAC nucleotides (R, S, Y, K...) ever appear in the raw reads from a fastq file?
Based on my experience I have only seen A, T, G, C and N.
Thanks
I've just finalized the analysis of a vast transcriptomic study in maize with 8 conditions and n=5 (40 libraries sequenced with Illumina technology). The conclusions are very pleasing, robust statistically and complement nicely a parallel metabolomic analysis.
Most published Illumina RNASeq studies validate their sequencing results by conducting some random qPCRs on parallel samples. But they are vague about how they do it and I realize that this will engage me into a massive work: 40 samples X 12 genes (10 randomly selected genes plus 2 housekeeping control genes) X 3 technical replicates X 2 cDNA library dilutions = 2 880 qPCRs. This excludes prior pilot studies to be conducted on each gene to determine the appropriate dilutions at which the qPCRs need to be conducted. This will cost as much as the Illumina sequencing !
Because of the high number of independent biological replicates (n=5) of my RNASeq dataset, is it really necessary to validate the results with some qPCRs ? And if yes, is there a way to diminish the number of qPCRs ?
Thank you in advance for any help on this.
I need to enrich my genomic DNA sample from bacterial strains with plasmid DNA extracted from them by specific kit. But I have not found any work as a reference, and I believe my genomic analysis will be more complete with this, since these have large plasmids that may be important.
Thank you for your attention !!
To explore the microbial diversity through High sequencing approach like 454- Pyrosequencing or Mi seq Illumina sequence, sometime the terms microbial community and microbial diversity may confuse, or look similar in meanings. Kindly precisely differentiate that...
Hi,
I'm working on soil microbiology and I want to check the microbial diversity in the soil. so, I have 2 options,PE illumina or 454-Pyrosequencing. I want to ask that which technique is better to use and why is this so?
Hi there,
I'm studying changes in frequencies of approximately 40-50 SNP alleles in nine regions across a country within a gene of interest in a plant pathogen. to gather this data a large number of samples have been taken from each of the nine sites for six sequential years, and illumina sequenced in a pool. so I will have nine .bam and .vcf files with around a hundred reads (high coverage) each, giving me a number analogous to allele frequency for each snp, for each site, in each year.
I am trying to determine an appropriate statistical analysis for detecting significant changes in allele frequency across this six year time frame. so far I think a CMH-test (probably using the r package popoolation2) will be my best option, with year and allele frequency as the nominal variables, and sites as replicates. however I'm conscious of heterogeneity between sites, which given my previous results is unlikely, but possible.
Would anyone be able to give me any other options? I was also thinking of a GLM with bonferroni correction, but cannot find a convenient way of implementing this with a six leveled qualitative independent variable.
Hello everybody!
Can anyone help or recommended laboratory or institution to bioinformatic analysis of Illumina Miseq full gene region sequencing results?
we paid them! We need to sequencing 3 types of Diseases nearly 10 to 50genes each. Please send recommendation or solution to my email: shalkar.bt@gmail.com
Hello everyone,
I was wondering if anyone has a good suggestion for an Archaeal 16S rRNA gene primer pair which is routinely used for paired end sequencing. I am having trouble in shortlisting a primer pair as most people are using the universal 16S primer (which I have already tested and it doesn't cover the diversity in my environmental sample) or the primers which amplify only shorter reads not covering the expected length (approx. 400 to 500 bp) that I am looking for.
Any suggestions would be really helpful!
Thanks in advance.
I am having some major challenges in PCR amplifying DNA from mixed community samples using degenerate primers that have inosine bases in them. The same samples amplify fine with a regular TAQ polymerase (lovely bright bands, good Sanger sequences), but as soon as I try the High Fidelity type, I have no amplification or extremely low levels with a range of DNA concentrations. I worry that the polymerases are mainly incompatible with the inosine bases in the primers, as some high fidelity polymerases explicitly state they are incompatible with inosines, but not all polymerases state this problem in their guides. Therefore I do not know if it is a universal problem with high fidelity polymerases or just problems with certain brands. So far my tests suggest it is the former.
So far I have tried:
Kapa HiFi (regular, and Hot start, Fidelity and GC buffer)
Phusion HiFi mastermix - did not amplify
Platinum SuperFI master mix - notes a problem with inosines, did not amplify
With Kapa HiFi (regular, not hot start), I was able to get a faint band after 40 cycles by adding 100 ng DNA/10 uL reaction, GC Buffer, and BSA. Adding less DNA = no bands. It seems crazy to add so much DNA and still get very little amplification, this is the mitochondrial COI gene (multiple copies). With a regular TAQ (MangoMix, and AmpliTaq Gold) I get strong amplification of my target sequence by adding only ~5 ng DNA/reaction, all other things remaining equal.
Ultimately I will do Illumina MiSeq on the indexed product to sequence diverse COI amplicons, so I should really have a high fidelity polymerase doing the amplification.
Thank you in advance for sharing your experience in this area.
I have not worked with inosine bases before and did not expect it to be such a problem! I am using someone's published primers, tried using their methods as much as possible, but they did not use a high fidelity polymerase in either of their publications (though they recommend doing so in the earlier of the two pubs!). Perhaps they tried and also failed - reasons there should be a journal of failed methods!
I am very much interested for illumina sequencing of 16S rRNA gene of gut microbiota.
Does anyone have experience assessing RNA quality after RNA-immunoprecipitation? My downstream goal is RNA-seq (RIP-seq), and I want to check the quality of the RNA before proceeding to library preparation.
I've run my eluted, DNAse treated, and purified (Zymo Clean & Concentrator) RNA on a Agilent Bioanalyzer RNA Pico chip.
To MUCH dismay, my RIN was very low ( ~2) with very low concentrations.
Before trashing the very difficult to obtain samples, I just want to know if the Bioanalyzer is an accurate indicator of RNA quality when that RNA is isolated from an immunoprecpitation experiment. Any input would be great.
I just have a question regard to the normalization for illumina sequencing. We all have problem of doing sequencing and ending up with not equal sequencing depth for every samples.
To avoid it we count the relatively abundance or log transform. And they each have their own problem.
I am just wondering could I divide my number of read to certain factor so that it would end up with the same number of read for every sample. For example if I have 6 sample from a1 to a 6 with number of read is 2k, 4K, 2.5K, 3K, 3.4K, 6K then I'll divide the number of read of a1/1, a2/2, a3*0.8, a4*2/3...so that all of them with have 2K of read.
Thank you so much!
Hanh