High Throughput Sequencing - Science method
Explore the latest questions and answers in High Throughput Sequencing, and find High Throughput Sequencing experts.
Questions related to High Throughput Sequencing
I am currently studying fungal endophytes and I am utilising both culture dependent and high throughput methods to do so.
When culturing I understand it is common practice to use leaf press controls and/or plating up wash water to make sure that the surface sterilisation was successful and that in fact anything that grows is a fungal endophyte.
I have been wondering what controls people are using to check the same thing when using high throughput approaches? There doesn't seem to be too much in the literature talking about this. I have started to collect "dirty" samples which I will not sterilise but will sequence for fungi - I expect that these dirty controls will contain more species than sterilised leaves, but this still doesn't feel like a true test of the surface sterilisation process.
Interested to hear what controls you may be using!
I have sent tens of samples for 16S full-length PacBio seq. However, the sequencing center informed me that a few of samples contain smear that will affect the sequencing results.
Attached are the gel electrophoresis for QC (1% Agarose gel, 1X TAE, 100V, running for 30 mins).
The sequencing center said samples 1-3 (Fig. 1) contain smear, among these samples, the DNA sample 3 is in very low concentration (< 5ng/μl). By contrast, the sample 16-20 (Fig. 2) were determined as QC passed without smear.
I wonder, what are the possible consequences if the DNA samples contain smear for PacBio seq?
Many thanks in advance!
If any reference available kindly add here for future work. Thanks in advance
I am doing a screening of cell phenotype. Because of the high throughput and few cell number (500-1000 cells in 6-10ul) in each well, it may be better to just process cell lysate to reverse transcription without RNA extraction (directly add reaction mix to lysate). Are there any optimised and commonly-accepted recipe for this purpose?
also how should I design the volume ratio of cell suspension : lysing buffer : RT reaction mix, to minimise the cost of RT reaction mix (it accumulate fast considering well number)? Heat complementary lysing is possible in this system as well (heat up to 65 or even 80 ℃ to break cells).
I need the information about the companies in India offering services of environmental DNA for any organism. Please add website links.
Which teacher knows how to detect the fidelity of drug-resistant strains by high-throughput sequencing? What are the references?Thank you very much!
I have always come across ''Changes in bacterial composition were determined using high-throughput sequencing of the V3/V4-region of the 16S rRNA encoding gene''. Could anyone explain to me why V3/V4 region is normally used in this case? Why not other V regions?
which one is a suitable tool to check the quality of NGS high throughput sequence; FastQC or MultiQC or any other tool ?
I’ve used the same normalization and chip to search a gene expression in R2 Genomics Analysis and Visualization Plataform, but I have compared very different cells (like lung cancer and melanoma). So my doubt is: is it correct/safe to say that a gene is more expressed in lung cancer, for example, than in melanoma (p<0,05) based on comparations made in R2 Genomics?
Ps: my intention is to prove this experimentally, and the results obtainned on R2 would serve as a guide for me.
Hi, I am a starting my BS' senior year in a few months. The major of my study is molecular and cell biology, I also have a decent background of computational biology tools used for analyzing high throughput sequencing data.
I am interested in perusing my graduate studies at coral reef genomics, biotechnology of coral reef restoration, etc. The problem is I am confused somehow and do not know where to start from. Can anyone give me any advice that can help (Recommending a quality lab that works in the field, having the contacts of a professor that works on the field and maybe needs to recruit a masters or Ph.D. student, or recommending an online course or a textbook that would help me get the required knowledge)? Please, provide me with anything that you think may help. Thanks in advance.
I'm quite confused about using DESeq2 to find the differential abundant taxa in microbiome studies, especially when there are more than two groups of the factor. I know DESeq2 was initially used for RNA-seq to detect the regulation of gene expressions. It's easy to understand when there are only two groups, e.g. treated vs. untreated. We can easily say which taxa was up-regulated by looking at the log2fold change (positive or negative).
BUT what about when there are three groups, control vs. treat1 vs. treat2? The DESeq2 can still handle this situation, but then I have no idea how to interpret the log2fold change. If we detected some taxa that were significantly different, how can we know in which group these taxa were up-regulated?
Although some people suggest to do the pairwise comparison, it's still unclear to me how to do it and how to interpret it. Does anyone have very good recommendation or idea about this?
Thanks in advance.
We wish to setup a hi-c protocol, but not sure how can we tell if we got it right, so we are looking for a characterized cell-line or other type of model that we run the method on to verify we got it right.. any suggestions?
I plan to buy primers in a 96-well plate format. After I resuspend the primers, I'd like to be able to periodically freeze-thaw the plate in order to take out aliquots. Are 96-well storage mats (i.e. FisherSci AB-0674) suitable for this application?
Any other recommendations?
I've done a field study investigating the community composition (18S and 16S) along a contamination gradient. I'm creating graphs of my environmental variables and community composition using PCA and DistLM in PERMANOVA in PRIMER 7. The labels for the variables are overlapping making it hard to read them. Is there a way to make them spread out more? If you have a way to solve this it would be greatly appreciated.
I have a vcf file containing SNPs from parents and F2 offspring. I need to convert this file to either a .loc or a MAPMAKER .raw format to input into JoinMap. There are a large number of both markers and individuals so it is not practical to convert genotype code by hand. Is there a program that can do the vcf-->joinmap conversion automatically? If this is not possible, does any one know how to do the conversion in Excel by command line? Many thanks!
I would like to know of any method to extract DNA that allows sequencing using PacBio RS II sequencing technology? I will try to get reads >2kb.
I used adonis to test the difference between groups/categories based on the 16s high-throughput sequencing data. I got a very significant result (p = 0.001), but the R2 is very low (< 0.1). I know this means this model only explains less than 10 percent of my data. From the PCoA plot, there were some overlaps between the groups. Do these overlaps contribute to the low R2? I'm wondering if I can still say that the difference between groups is significant?
I also did ANOSIM and the results are similar. The statistic value is very low (~0.1) but p-value is 0.001. Is this the same problem I got when doing adonis?
Some discussion groups said that there is no relationship between R2 and p-value. Doesn't it matter however small the R2 is?
We are doing 16S high throughput sequencing of endophytic bacteria. At the preliminary run we got few bacterial sequences and mostly plastid DNA.
Did anyone try to separate chloroplasts from bacterial genomes? Are there any suggestions for us as to how to increase the bacterial portion of the DNA?
I have written a book chapter of 3,500 words about how to carry out a successful High Throughput Sequencing experiment. Does anyone know anybody editing a book that may be interested?
Colleagues, I need help with Venn diagrams and transcriptomics. I have three list of IDs (example: c58516_g4_i4), only IDs, not the sequences. I need to make a Venn diagram, to know which IDs are shared among the three lists, and which only between two of them and which are only present in its original list. I could do it manually, but it's a huge amount of IDs. Can you suggest me some sowtware for windows or script for linux ?. Thanks!
I've been trying to know more about pipelines for shotgun metagenomics data to use in my samples of water-borne bacteria, with a focus on rare cyanobacteria in particular.
Suggestions on what your favorite assemblers and functional annotation programs are most welcome!
Does anyone know of a source for a carefully constructed fungal mock community to use as a control in high-throughput sequencing?
I know there is a bacterial mock community available from BEI Resources, but haven't found a corresponding product for fungi. Thanks!
A recent run of a very short insert library (average insert size 32bp) on a lane of Hiseq 3000 yielded only 50 million reads. Many clusters were excluded by the chastity filter due to mixed populations. We suspect the kinetic exclusion amplification step in the patterned flow cells used in Hiseq 3000 and 4000 to work less well for such short inserts, hence more clusters are mixed and excluded. We wondered if lowering the input molarity could improve the yield for such libraries and if anybody had experience with this and could give advice as to what molarity works best. Any help is much appreciated.
From Illumina's patent: "Accordingly, kinetic exclusion can be achieved in such embodiments by using a relatively slow rate of transport. For example, a sufficiently low concentration of target nucleic acids can be selected to achieve a desired average transport rate, lower concentrations resulting in slower average rates of transport."
(aDNA, ancient DNA, HTS, high-throughput sequencing, NGS, next-generation sequencing)
I sent a test plate to the seq facility we are working with for ddRAD sequencing of a non-model organism without a reference genome. These 50 samples came from 3 field sites from 1 region. They ran these 50 in one lane of an Illumina HISeq. When the seq data returned, it appeared fine in preliminary popgen analyses after deNovo assembly and I recovered around 8,000 SNPs after filtering.
I then sent a large batch of samples (300) to be sequenced following the same protocol. These were from ~15 sites across 2 regions (5 in one region, 10 in another). The facility ran my samples as 300 in a single Illumina HiSeq lane, but ran the single lane twice, as opposed to 100 in each of 3 lanes, once, as I thought they would do. They said 300 in one lane ran twice would provide better quality than 100 in each of 3 lanes, ran one time.
When I received the 300 back, I combined those samples with my 50 from the test plate and proceeded to filter and call SNPs together, getting significantly less SNPs, but that was expected with the reduced depth of coverage from running so many samples/lane.
The main issue however, is the combined 350 dataset produces 2 genetic clusters when using program Structure and using a Principal Coordinates Analysis (PCoA) in Genalex. One cluster is the 300 and the other cluster is the 50. This is biologically impossible that either of these clusters are true as there are field sites in the 50 group that are very close to fieldsites in the 300 group. What could be causing the genetic "clusters" to align by sequencing run?
I'm trying to figure out the best sequencing service to use in terms of price, service, quality of results, ease of sample collection, etc. What should I be considering when I select a sequencing service?
Iam working with clinical strains isolated from the same hospital and patients with similar background (all CF patients in a CF center).
We have the complete sequence of the strains and they are quiet similar but I could find a statistical or value that can be used to demostrate if the strain are related or are isogenic. How many SNPs had to have in their sequence to be consider a diferent strain and no an isogenic strain with some mutations?
Also if anybody know about an article or something about this specific problem (delucidate isogenic or close related) please let me know!
Hi..I would like to apply random sampling technique for the distribution of genes in a collection of bacterial samples. I would like to test finally 250 samples out of 500. Please suggest me a suitable web-server based tool for generating the random numbers. Also is there a way to test if the generated numbers are random? Thank you.
I am a masters student working on extending a DNA repair model. I want to be able to pool multiple cells, use targeted MiSeq sequencing on a particular gene, and calculate the proportions of different repair events. How would I go about doing the data analysis for this?
Performing RNA-Seq data sets needs to know which the most accurate and reliable platform to go with. Could you suggest such pipeline?
Note// I have good experience with the Tuxedo package (Bowtie, Top Hat, and CummeRbund) in addition to EdgeR,
Identifying bacteia present in healthy verses disease conditions in a given biological sample
Recently, I came to know about a mobile DNA sequencer launched by Oxford (MinION).
1. What about Its efficiency and the reagents required for sequencing?
2. What are data anlysis tools used, once we generate output from the device?
I need to remove human dna data from HiSeq metagenome sequencing data of human gut. Is there any available script/software to use?
I would like to compare next generation sequencing (NGS) protocols for Ion Torrent, Illumina and Pacific Biosciences. It would much be appreciated if anyone can provide me with their laboratory protocols.
I am using QIIME to analyse my sequencing data. I have used it before with Illumina data with no problems, but now I have Illumina MiSeq data and four raw data files: two index files and two raw reads files. I understand the raw reads files (one forward and one reverse that I can merge with join_paired_ends.py), but I do not know how to handle the two index files. How do I construct the mapping file with the two indexes? How do I use these index files as input for join_paired_ends.py and split_libraries_fastq.py? Should I use the extract_barcodes.py script before joining the paired end reads?
Thank you very much!
After high-throughput sequenceing of 16S rDNA, the sequencing depths of different samples usually vary a lot. The sequencing depth can affect alpha and beta diversity analysis, therefore, we usually used the strategy of rarefaction (randomly sub-sampling of sequences from each sample) to equalize the number of sequences per sample. But when we performed functional genes' diversity (e.g. amoA gene of ammonia-oxidizing microorganisms), we often used a clone library method due to the limitation of read length of NGS. As a result, we only obtained very limited numbers of sequences (e.g. 50 to 100 sequences varied among samples) in each sample. If we randomly sub-sample like the 16S rDNA data, we may lost nearly half of the sequence number in some samples and this should have great influence on the alpha or beta diversity. So, in this case, if we can calculate the alpha and beta diversity based on the relative abundance data of OTU? i.e. before calculating the diversity, the data in each sample were firstly unified through divided the total sequence number in each sample. Is this transformation is reliable and scientific? Is there anyone using this method to calculate alpha or beta diversity? If you have related references, I will very appreciate that.
After high-throughput sequencing, we obtained thousands of 16S rDNA reads for each sample. When we analyzed these sequences, firstly, we performed quality filter to remove the low quality sequences, then we randomly resampled to even the samples, then we usually defined OTUs at 97% similarity. Some OTUs only have one single sequence, we could not confirm that if they are very rare species or wrong sequences because some unknown reasons. Some papers retained these single-sequence OTUs, while other papers removed them. So, if these single-sequence OTUs should be removed? And why? I will be very appreciate if you can give some references about the question. Thank you very much.
Now the high-throughput sequencing is very popular in the study of microbial ecology. Here I have a question about the 16S rDNA data processing:
When we analyze the 16S rDNA data after high-throughput sequencing, if the primer sequences should be removed? And why?
It will be very grateful if you can recommend some reliable references. Thanks!
I am mapping the raw reads to the contigs generated by de novo assembly of the same reads and want to represent the read coverage in the form of graph..
I'm looking to prepare a grant for some high-throughput work, but the Core Facility at my University does not offer the service. Any suggestions would help, as I know Cornell's GBS program is currently on hiatus (this is what my classmates have used in the past). Thanks in advance!
I have the evaluation of RNA sequencing (Illumina, Hi-Seq) in Excel format. P-value and adjusted p-value are stated. How can I convert p-value into adjusted p-value?
"Illumina’s “indexing” system differs from other sample barcoding methods for high-throughput sequencing in that the barcodes (“indexes”) are placed within one of the adapters rather than being directly attached to the ends of template molecules"
recently I have read about illumina sequencing technic. The question comes that is it necessary to put index in the middle of the primers when doing PCR.
If we send our PCR sample to sequencing companies with index directly attached to the ends, will it do harm to the results?
Hello! :) I have an idea of comparing two related but different species in order to learn if there are differences in their expression profiles. I used the bam files after tophat2 mapping in HTSeq to generate "counts-tables" and then chose top most expressed genes and compare the data. My question would be: do I need to normilize in some way the HTSeq counts (if so, then how?) or there is no sense in that since the species are different? Thanks a lot in advance!
Recently, I did chip-seq (H3K4me1、H3K4me3、H3K27ac、H3K27me3、Pol II、P300) from 1×107 insect cells. But, I got poor yield (about 4ng ) as standard Illumina library preparation needs 10ng and I tried many times. The cell I used is hard to culture, So I want to try some new modified chip-seq protocol in order to get high output. There are a lot of methods about doing chip-seq from small number cells.
nano-ChIP-seq: Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitors.
LinDA: Single-tube linear DNA amplification (LinDA) for robust ChIP-seq.
Small-scale ChIP: In vivo epigenomic profiling of germ cells reveals germ cell molecular signatures.
cChIP-seq: cChIP-seq: a robust small-scale method for investigation of histone modifications.
These mothods will introduce potential PCR bias due to preamplificationI and I don't know to choose them? cChIP-seq seems better, but it is not clear how does it work about carrier?
Whether do I use more cells to get high yield or select one modified chip-seq protocol from above? Is there any other better methods? Anyone can give some suggestions?
Thank you very much!
I am trying to integrate a gene of ineterest in Kluyveromyces marxianus under the control of GAL1 promoter from vector pYES2. The entire expression cassette would constitute of GAl1 promoter-gene of interest-terminator and a marker gene URA3. I would like to know if GAL1 promoter would work in K. marxianus as well??
hello everyone, i'm doing chip in insect cells. I use Richard A Young lab published paper in nature protocol in 2006 :"Chromatin immunoprecipitation and microarray-based analysis of protein location".
i tried to optimze sonication, and result i got like this(sonicaiton.png), whether i can continue to do chip or i need to optimize my sonicaiton?
Besides, i used H3K4me3 and H3K27me3 antibody（merck millipore） to do chip, i got a poor results, the final concentration is 0.1~0.3ng/ul(volume 50ul). Agilent result is also not ideal(below figure).
is it caused by sonication? or some other reason?
anybody can give me some suggestion?
thank you very much.
We think to work with total DNA obtained from reference samples (buccal swab)
I would like to study the gene expression levels of 22 genes in 1200 different samples.
Since normal qPCR is humanly not doable here (26400 reactions) I would like to know if anyone knows of a high throughput plateform that would allow us to study a loooot of samples on only a few genes...
I have heard of Microfluidic plateforms. Anyone with experience on a few of these? Would that fit our experiments and what are the average costs of such an approach?
I need sequencing many samples of human mitochondrial DNA. I do not know how many ( concentration, amount) of DNA I should need for this.
I'm planning to start designing an in vivo assay to be used for a HTS. Does anyone have any advice/know of literature on the subject on how to choose an appropriate cells line? I will have to establish a stably transfected cells line with at least two constructs. The readout will be measuring changes in GFP expression.
I am considering to know numbers of rare variants ( MAF < 0.01-0.005) between specific length of chromosome such as bp, kbp or Mbp in human or cattle sequencing data via NGS method. I will appreciate if anyone has a information regarding to numbers of rare variants from sequence data in human or cattle.
I have been struggling with this problem particularly for the similar expression gene graph where I can see only the tracking id's XLOC... which actually even do not match with any of the id's I got after performing k-means clustering (I did this to choose possible candidate genes).
It would be useful to know how to change this for other graphs too.
I am currently analyzing some huge metagenomic data obtained by Illumina's HiSeq. However, I would like first to make a taxonomic classification of the paired-end reads in order to get only the sequences of certain organisms before assembly. So, I would like to have a recommendation of a putative software for this task, cause I know there are several available in the repositories... Thanks!
I am going to evaluate the effect of Herbal medicine extract and Probiotic in TNBS-induced mice colitis by modulating the gut microbiota structure .In order to find out the most influenced or changed species by high-throughput sequencing. The fecal sample is important .
I tend to A plan. Because the drug and probiotic have Sufficient time to adjust the mice gut mirobiota before TNBS administration.But I confused about the collecting time point . Most colitis gut microbiome study just have two time point like B Plan ,before and after treatment . In A plan.it has three time point.From day -21to day 0, it is easy to find the microbiota change in nnormal circumstances.but under the environment of TNBS, should I compared day-21,day3 or day-21,day3?
If I choose the B plan,I wonder whether the short-term treatment of in the acute colitis models might have a significant change in the microbial distribution. So,Can anyone give me some suggestions?
I'm wondering whether anyone has assembled a database of oomycetes and successfully used on the of high-throughput sequencing platforms to characterize the community structure of this group of organisms in soil.
SeqTar users: is it right to calculate the probability of the valid peak using the binomial probability density function as they tell in the paper (http://nar.oxfordjournals.org/content/early/2011/12/02/nar.gkr1092.full)?
Considering that the binomial PDF for a certain value returns the probability to get a peak *exactly the same height* of the one under analysis, it seems to me that the right thing to do would be instead to calculate 1- (the binomial cumulative distribution for height of the peak under analysis-1).
This would give the probability of getting a peak *the same height of the valid peak OR MORE* given the trials and the probability, as before. Indeed using the probability density function might create problems: if a gene is heavily covered by PARE reads and the valid peak has much less reads than expected, the probability might be very low while the peak is not outstanding with respect to the surrounding nucleotides.
Using the cumulative probability, the probability of getting higher peaks would approach one, as expected. Is it right?
We have used rdp data analysis for ribosomal RNA gene (rDNA) amplicon reads from high-throughput sequencing (sanger sequencing). Almost half of my data was categorized as unclassified. I thought of going for another updated data analysis database.
Has anyone tried to use BAM files from the IonExpress (Life Technologies) which are mapped using the TMAP software? Just looking at the output, it seems like TMAP is not very splice aware. I read that it does use BOWTIE2 for aligning, but there is no mention of TOPHAT. Does anyone know if these files can be used for RNAseq or should I just start over from the Fastq files?
I have a library of tagged proteins overexpressing cells and I have total cellular extracts for all of them but in small amount (around 500 ng of total proteins for each) and I would like to avoid defreezing the cells library to regrow them and reharvest largest protein samples. I would like to set up high throughput interaction assay of all these guys with a single protein. I thought about Gst-pulldowns, ELISA or Alpha screen technology. Also the protein samples are conserved in -20C, can the freezing alter their capacity to interact? I am looking for the fastest reliable technique.
I have several sets of data but with different number of sequencing output. The data appears to have the same microorganisms but in different numbers. Can anyone suggest an analyze for this? In addition can anyone recommend any statistics to run on DGGE bands?
I have sequenced the 16S (563F-802R) of rhizobacterial communities of 88 soil samples in one experiment, 48 samples were sequenced using 454, the other 40 were sequenced by Miseq. I want to compare these data, what should i do? When comparing results, if they are equally representative?
I've done a HTS with siRNAs for the whole Human genome. I obtained lots of hits even though I'm being very stringent on my clean-up and selection process. I'm categorizing hits based on several criteria following peoples' advice and some publications, but this is a slow and impractical process. Do you have experience with this process? How do you know or choose what genes to follow?
I’m trying to make a genetic map with SLAF-seq to a F1 mapping population of a perennial woody plant. SLAF-seq is a genotyping by sequencing technique similar to GBS or ddRAD. But I now find almost 70% of the F1 offsprings have ~2%(1000-4000) loci that are not the same as corresponding loci of any of the parents. The average sequence depth is ~4X.
For example, one locus is as follows:
Then I began to doubt that the parents were polluted by pollen of other plants, but then this should not go wrong in so large a scale for the strict control.
Is anyone here familiar with this technique? Is it mutation or sequencing errors or something else? Is it normal for such a proportion? Does anyone meet similar situation in dealing with GBS or ddRAD data?
Your answers will be appreciated. Best regards.
My initial compound library had 348,276 compounds. After applying various filters like functional group filters, Physio-chemical filters, removing duplicate compounds, removing Pan Assays Interferences Compounds etc my final size of compound library has reduced by about 90 percent. I have applied these filters from FAF-Drugs server (http://mobyle.rpbs.univ-paris-diderot.fr) using Leadlike soft Physiochemical filters, functional group filters available there and all the three PAINS filters (A,B and C).
But 34000 compounds is still huge number for virtual screening (VS). My question is: are there any other online filters (freely available would be good!!) that i can apply to reduce the number?
If I proceed with these numbers for VS it would generate huge data. Therefore, I need your expert opinion on efficient post virtual screening analysis. What tools are available out there that can be used to analyses and interpret huge virtual screening data .Name of software or links of papers describing such work will also be a great start for me.
As a follow-up to a study where we analyzed several tag-SNPs in candidate genes, we sequenced the genomic areas around the tag-SNPs associated with the disease. The sequencing was done in pooled samples. But now we have trouble finding a good method to determine the allele frequencies in these pooled samples. Anyone with experience with this type of problem?
When does the term transcriptome come in case of the cancer cells cultured in vitro? If we are thinking for similarity determination of the cancer cells in comparison to original tumor tissue from which it has been derived the early passage, cells are considered for studying NGS data because there is some common hypothesis and prove for some cancer cell lines that after some passages (5-15) genomic and transcriptomic alteration occurs. I am not sure in between the early passages, which passage will be utmost suitable because NGS run is always associated with high costs.
Good-bye 454 in 2016! What is the next No 1. platform, MiSeq, IonTorrent? Once you get sequences, what next, which pipeline to use for sequence clustering and identification?
What do you do with your sequences?
In our company, we are working with HTS and HCS studies of natural ingredients and we are evaluating some commercial solutions to manage these data as well as to allow correlations considering phytochemical profile as well as botanical characteristics. I already have 5 companies that we are evaluating including: IDBS, Accelrys, Genedata, PerkinElmer, Dotmatics. If you have any experience with those platforms, could you share with us your impressions, advantages and disadvantages?
I want to use mouse tissue homogenates containing bacteria and plate it on agar plates. For my experimental purpose I need to pool the bacteria from the plates. If I scrape off the bacteria, will there be DNA from tissue along with that? Since, later I want to isolate DNA from that bacterial pool for Illumina HiSeq sequencing.
Is there a way to get rid of the animal tissue DNA from the plates or downstream, or should I not bother about it?