Science method

High Throughput Sequencing - Science method

Explore the latest questions and answers in High Throughput Sequencing, and find High Throughput Sequencing experts.
Questions related to High Throughput Sequencing
  • asked a question related to High Throughput Sequencing
Question
2 answers
I am currently studying fungal endophytes and I am utilising both culture dependent and high throughput methods to do so.
When culturing I understand it is common practice to use leaf press controls and/or plating up wash water to make sure that the surface sterilisation was successful and that in fact anything that grows is a fungal endophyte.
I have been wondering what controls people are using to check the same thing when using high throughput approaches? There doesn't seem to be too much in the literature talking about this. I have started to collect "dirty" samples which I will not sterilise but will sequence for fungi - I expect that these dirty controls will contain more species than sterilised leaves, but this still doesn't feel like a true test of the surface sterilisation process.
Interested to hear what controls you may be using!
Relevant answer
Answer
Hi, thank you for your reply, I like the idea of using the saline solution to get an idea of what DNA was on the surface of the sample, especially since I assume that the surface sterilization method may kill any surface fungi, but not necessarily remove their DNA. Out of interest how long could you store the saline solution? and do you keep it in a fridge/ freezer?
  • asked a question related to High Throughput Sequencing
Question
3 answers
I have sent tens of samples for 16S full-length PacBio seq. However, the sequencing center informed me that a few of samples contain smear that will affect the sequencing results.
Attached are the gel electrophoresis for QC (1% Agarose gel, 1X TAE, 100V, running for 30 mins).
The sequencing center said samples 1-3 (Fig. 1) contain smear, among these samples, the DNA sample 3 is in very low concentration (< 5ng/μl). By contrast, the sample 16-20 (Fig. 2) were determined as QC passed without smear.
I wonder, what are the possible consequences if the DNA samples contain smear for PacBio seq?
Many thanks in advance!
Relevant answer
Answer
Commenting as per my observations to the gel images provides
1. Doesnt look degraded DNA, but yes yield is preety low of the lanes 1 to 3, but as you r looking for full length gene, cant afford degradation
2. You can ask to repeat DNA extraction from those samples, else u can do it by urself n send the samples.
  • asked a question related to High Throughput Sequencing
Question
4 answers
If any reference available kindly add here for future work. Thanks in advance
Relevant answer
Answer
as Jahad Soorni mentioned, if you have RNAseq data yes you can have gene expression levels, that's what it is done for. on the other hand, no way to get those kind of data with genomic sequencing data.
stay safe
fred
  • asked a question related to High Throughput Sequencing
Question
11 answers
Can any software make the Volcano plot for RNA sequencing data?
Relevant answer
Answer
You can use the GraphPad prism software to create a volcano plot. Click on the link below
  • asked a question related to High Throughput Sequencing
Question
4 answers
Hi!
I am doing a screening of cell phenotype. Because of the high throughput and few cell number (500-1000 cells in 6-10ul) in each well, it may be better to just process cell lysate to reverse transcription without RNA extraction (directly add reaction mix to lysate). Are there any optimised and commonly-accepted recipe for this purpose?
Thanks!
-----------------------------------------------------------------------------------------------------------------------------------
also how should I design the volume ratio of cell suspension : lysing buffer : RT reaction mix, to minimise the cost of RT reaction mix (it accumulate fast considering well number)? Heat complementary lysing is possible in this system as well (heat up to 65 or even 80 ℃ to break cells).
  • asked a question related to High Throughput Sequencing
Question
1 answer
I need the information about the companies in India offering services of environmental DNA for any organism. Please add website links.
Relevant answer
Answer
Hi Dinakarsami, I'd be happy to help you with metagenomics and metabarcoding analysis. Pl. drop your message.
Thiru
  • asked a question related to High Throughput Sequencing
Question
4 answers
Which teacher knows how to detect the fidelity of drug-resistant strains by high-throughput sequencing? What are the references?Thank you very much!
Relevant answer
Answer
The question does not make any sense the way it is formulated. Bacteria are not fidel to anything, anyway.
  • asked a question related to High Throughput Sequencing
Question
6 answers
I have always come across ''Changes in bacterial composition were determined using high-throughput sequencing of the V3/V4-region of the 16S rRNA encoding gene''. Could anyone explain to me why V3/V4 region is normally used in this case? Why not other V regions?
Thank you.
Relevant answer
Answer
The 16S rRNA gene is approximately 1600 base pairs long and includes nine hypervariable regions of varying conservation (V1-V9). More conservative regions are useful for determining the higher-ranking taxa, whereas more quickly evolving ones can help identify genus or species. The main reason for using V3/V4 region in majority of the cases that these regions contain the maximum nucleotide heterogeneity and displays the maximum discriminatory power. However, it should be noted that no single region can differentiate among all bacteria. I do agree with the comments given by Balig Panossian and also recommend you to read the paper entitled “The effect of 16S rRNA region choice on bacterial community metabarcoding results” (https://www.nature.com/articles/sdata20197) for more details.
  • asked a question related to High Throughput Sequencing
Question
10 answers
which one is a suitable tool to check the quality of NGS high throughput sequence; FastQC or MultiQC or any other tool ?
Relevant answer
Answer
FastQC will will process one sample at a time and give you an output report for each sample separately. MultiQC will combine all the outputs from FastQC analysis and give you one QC report for all processed samples, making them more easily comparable.
  • asked a question related to High Throughput Sequencing
Question
1 answer
I’ve used the same normalization and chip to search a gene expression in R2 Genomics Analysis and Visualization Plataform, but I have compared very different cells (like lung cancer and melanoma). So my doubt is: is it correct/safe to say that a gene is more expressed in lung cancer, for example, than in melanoma (p<0,05) based on comparations made in R2 Genomics?
Ps: my intention is to prove this experimentally, and the results obtainned on R2 would serve as a guide for me.
thank you!
Relevant answer
Answer
What is the normalization based on?
I think differences in gene expression between tissues are tricky on two levels.
In the first place, there is the normalization issue. In my experience, if you test for differential gene expression between tissues, a large proportion of the transcriptome comes out DEG, and it all depends on the normalizations used. So for a given gene: do you consider its expression relative to the rest of the transcriptome? Dangerous, because highly expressed genes in the background may "dilute" your transcript of interest, even if stoichiometrically, it is "upregulated" relative to the genes it interacts with. Or do you normalize relative to housekeeping genes? It is known that housekeeping genes exhibit tissue-specific expression. Do you consider it relative to the genes it interacts with? Interesting, but requires a lot of prior knowledge...
Secondly, there is also the question of what it actually means to be differentially expressed between tissues. Transcripts (and/or their corresponding products) hardly ever function on their own. They interact with the rest of the transcriptome/proteome, causing feedback to gene expression etc. These "backgrounds" are drastically different between cell types. Also, the same amount of mRNA may lead to different amounts of protein depending on the status of the cell (presence/activation of ribosomes, tRNA,...). As a result, the impact of a change in the number of copies of a transcript may be enormous in one tissue, versus negligible in the other.
These are actually considerations that to a large extent hold true even when comparing gene expression between conditions in the same tissue, but obviously the more different your samples are to start with, the more exaggerated the effects will be. So, my two cents is: be careful, it's a dangerous comparison. I would rather try healthy lung vs lung cancer and healthy skin vs melanoma.
  • asked a question related to High Throughput Sequencing
Question
2 answers
Hi, I am a starting my BS' senior year in a few months. The major of my study is molecular and cell biology, I also have a decent background of computational biology tools used for analyzing high throughput sequencing data.
I am interested in perusing my graduate studies at coral reef genomics, biotechnology of coral reef restoration, etc. The problem is I am confused somehow and do not know where to start from. Can anyone give me any advice that can help (Recommending a quality lab that works in the field, having the contacts of a professor that works on the field and maybe needs to recruit a masters or Ph.D. student, or recommending an online course or a textbook that would help me get the required knowledge)? Please, provide me with anything that you think may help. Thanks in advance.
Relevant answer
Answer
Check out some of the marine science programs such as the ones at the University of North Carolina in Wilmington. there are all sorts of programs along the coast. Find a university that has marine field stations such as UNC, U, and College of Charleston just to name a few. I am sure there are many more along the East coast, Gulf of Mexico, and the west coast.
  • asked a question related to High Throughput Sequencing
Question
10 answers
I'm quite confused about using DESeq2 to find the differential abundant taxa in microbiome studies, especially when there are more than two groups of the factor. I know DESeq2 was initially used for RNA-seq to detect the regulation of gene expressions. It's easy to understand when there are only two groups, e.g. treated vs. untreated. We can easily say which taxa was up-regulated by looking at the log2fold change (positive or negative).
BUT what about when there are three groups, control vs. treat1 vs. treat2? The DESeq2 can still handle this situation, but then I have no idea how to interpret the log2fold change. If we detected some taxa that were significantly different, how can we know in which group these taxa were up-regulated?
Although some people suggest to do the pairwise comparison, it's still unclear to me how to do it and how to interpret it. Does anyone have very good recommendation or idea about this?
Thanks in advance.
Relevant answer
Answer
This is definitely confusing, and the meaning of the fold-change returned automatically is a bit convoluted. Ultimately, the reported fold-change depends on the models (full vs. reduced) you provide DESeq2 when performing the likelyhood ratio test (LRT), since the LRT goodness-of-fit tests the hypothesis that the data are _-times more likely to fit the full model (~treat1 + treat2) than the reduced (~1) model. This result has little biological meaning when examining multiple variables (though it is essentially the same as fold-change with Wald when comparing between two).
It is therefore more useful, once you have computed the p-value statistic, to report the fold-changes of treat1-vs-control and treat2-vs-control using the following to examine the possible iterations: resultsNames(dds). It will give you something like: "Intercept" "Treatment_treat1_vs_control" etc... You can then select the appropriate fold-change when you generate the data table:
resdf<-as.data.frame(DESeq2::results(dds, format = "DataFrame", name = "Treatment_treat1_vs_control")).
I hope this helps!
  • asked a question related to High Throughput Sequencing
Question
1 answer
We wish to setup a hi-c protocol, but not sure how can we tell if we got it right, so we are looking for a characterized cell-line or other type of model that we run the method on to verify we got it right.. any suggestions?
Thanks
Relevant answer
Answer
I suppose you're working with human or mouse Hi-C data, right?
Of course, depends on the experimental design you're running, but generally speaking, I think a good control would be embryonic stem cells (ESC). For example, in Dixon et al. Nature, 2015, they have compared ESC and four human ES-cell-derived lineages. Other work have compared mouse neural development stages with ESC, as well (Bonev et al. Cell, 2017).
In our approach, under revision, we have compared ESC with pro-B cell and B-cell lymphoma. We found very conserved structures although the data were generated from different resources.
Have you tried compared your data to public data?
Best regards,
Eijy
  • asked a question related to High Throughput Sequencing
Question
3 answers
I plan to buy primers in a 96-well plate format. After I resuspend the primers, I'd like to be able to periodically freeze-thaw the plate in order to take out aliquots. Are 96-well storage mats (i.e. FisherSci AB-0674) suitable for this application?
Any other recommendations?
Relevant answer
Answer
Hi,
We usually seal the plate with foil, and cover with the plastic one. For usage, make a small hole in the foil on top of the well, sufficient for your 20ul tips. After usage, seal it with a fresh plastic seal. When you thaw the plate again, do not forget to spin it before opening.
Best,
Michail
  • asked a question related to High Throughput Sequencing
Question
5 answers
Hello,
I've done a field study investigating the community composition (18S and 16S) along a contamination gradient. I'm creating graphs of my environmental variables and community composition using PCA and DistLM in PERMANOVA in PRIMER 7. The labels for the variables are overlapping making it hard to read them. Is there a way to make them spread out more? If you have a way to solve this it would be greatly appreciated.
Kind regards
Megan
Relevant answer
Answer
I've just been doing this for a paper. Primer and Permanova+ are not designed to produce final publication quality graphics and the figures often need a little work, although generally they are pretty good. The easiest way I find is to copy the figure (ctrl+C) and paste it into Powerpoint. You can then use drawing tools>Group>ungroup (you need to do this twice, the first time it tells you that it's not a drawing, asking if you want to convert it, so say yes, then do it again to ungroup). You can then tidy the figure, move labels around, change fonts, add information, and so on. The final step (for me) is to use Adobe Illustrator to produce a final tiff, but first get the figures the way you want them.
  • asked a question related to High Throughput Sequencing
Question
5 answers
I have a vcf file containing SNPs from parents and F2 offspring. I need to convert this file to either a .loc or a MAPMAKER .raw format to input into JoinMap. There are a large number of both markers and individuals so it is not practical to convert genotype code by hand. Is there a program that can do the vcf-->joinmap conversion automatically? If this is not possible, does any one know how to do the conversion in Excel by command line? Many thanks! 
Relevant answer
Answer
Hi,
I have received a lot of questions lately about how to make this conversion work so I'm updating a solution here. The solution I'm using right now is to go through TASSEL (http://www.maizegenetics.net/tassel). It's a free software and I trust it more than scripts I find on github (not that they don't work). You can import vcf into Tassel, then export it in a number of formats. It offers nice visualization of your vcf file which always help.
If your downstream analysis is linkage mapping using Joinmap as many of you have requested, you can export as .hapmap format. It's basically a plain text format you can edit and filter in any text editor or Excel, and it can be read directly into Joinmap by either importing the file or simple copy and paste.
If your vcf file is huge, TASSEL may take some time to finish the import. You can either do some filtering of the vcf file before the import, or use the command line option to speed it up.
I hope this helps.
Mingzi
  • asked a question related to High Throughput Sequencing
Question
7 answers
I would like to know of any method to extract DNA that allows sequencing using PacBio RS II sequencing technology? I will try to get reads >2kb.
Relevant answer
Answer
MasterPure™ Gram Positive DNA Purification Kit from Epicentre allow bacterial DNA purifiction of 20-40 Kbp DNA for PacBio sequencing - QIA gen kits are also good - but general rule for long DNA purification. You cannot use any buffers with EDTA - use water not TE to disolve DNA
  • asked a question related to High Throughput Sequencing
Question
4 answers
I used adonis to test the difference between groups/categories based on the 16s high-throughput sequencing data. I got a very significant result (p = 0.001), but the R2 is very low (< 0.1). I know this means this model only explains less than 10 percent of my data. From the PCoA plot, there were some overlaps between the groups. Do these overlaps contribute to the low R2? I'm wondering if I can still say that the difference between groups is significant? 
I also did ANOSIM and the results are similar. The statistic value is very low (~0.1) but p-value is 0.001. Is this the same problem I got when doing adonis?
Some discussion groups said that there is no relationship between R2 and p-value. Doesn't it matter however small the R2 is?
Relevant answer
Answer
Haitao,
low r-square means that you have a lot of variation that is unexplained in your model. The p-value means that despite the relative mediocrity of the model, the differences you observe still reject the null hypothesis. You should try to account for these unexplained variations and see whether your sampling design is e.g. nested, or if other variables (environmental) could be added. A first step to do that is to plot residuals vs. variables not in the model (plot id, altitude, block etc...). Good luck.
  • asked a question related to High Throughput Sequencing
Question
6 answers
We are doing 16S high throughput sequencing of endophytic bacteria. At the preliminary run we got few bacterial sequences and mostly plastid DNA.
Did anyone try to separate chloroplasts from bacterial genomes? Are there any suggestions for us as to how to increase the bacterial portion of the DNA?
Relevant answer
Answer
Please also check this discussion on RG to see whether it is useful for your project:
"How can I extract only bacterial (endophyte) DNA from plant tissue using kit?"
  • asked a question related to High Throughput Sequencing
Question
1 answer
I have written a book chapter of 3,500 words about how to carry out a successful High Throughput Sequencing experiment. Does anyone know anybody editing a book that may be interested?
Relevant answer
Answer
Hi Jose,
I received an invitation from "International Journal of Hematology and Blood Disorders". Is your chapter designed for hematology, could it be adapted and are you interested?
if so, please send me a mail (frederic.lepretre@univ-lille2.fr), I have a problem with the RG alerts.
fred
  • asked a question related to High Throughput Sequencing
Question
10 answers
Colleagues, I need help with Venn diagrams and transcriptomics. I have three list of IDs (example: c58516_g4_i4), only IDs, not the sequences. I need to make a Venn diagram, to know which IDs are shared among the three lists, and which only between two of them and which are only present in its original list. I could do it manually, but it's a huge amount of IDs. Can you suggest me some sowtware for windows or script for linux ?. Thanks!
Relevant answer
Answer
You can try the following tools called "venny". Cheer~
  • asked a question related to High Throughput Sequencing
Question
8 answers
I've been trying to know more about pipelines for shotgun metagenomics data to use in my samples of water-borne bacteria, with a focus on rare cyanobacteria in particular.
Suggestions on what your favorite assemblers and functional annotation programs are most welcome!
Relevant answer
Answer
Hello Angelo,
I have been working with megahit and metaSpades recently, they seem to do the job. For further binning of contigs I highly recommend anvi'o (http://merenlab.org/software/anvio/).
Hope that helps a bit!
Cheers
PS: for quick and dirty functional and taxonomic profiling of metagenomic data (unassembled), there's always MG-RAST
  • asked a question related to High Throughput Sequencing
Question
17 answers
Does anyone know of a source for a carefully constructed fungal mock community to use as a control in high-throughput sequencing?
I know there is a bacterial mock community available from BEI Resources, but haven't found a corresponding product for fungi. Thanks!
Relevant answer
Answer
Here is a recent preprint that might be helpful.
  • asked a question related to High Throughput Sequencing
Question
1 answer
A recent run of a very short insert library (average insert size 32bp) on a lane of Hiseq 3000 yielded only 50 million reads. Many clusters were excluded by the chastity filter due to mixed populations. We suspect the kinetic exclusion amplification step in the patterned flow cells used in Hiseq 3000 and 4000 to work less well for such short inserts, hence more clusters are mixed and excluded. We wondered if lowering the input molarity could improve the yield for such libraries and if anybody had experience with this and could give advice as to what molarity works best. Any help is much appreciated.
Many thanks,
moe
From Illumina's patent: "Accordingly, kinetic exclusion can be achieved in such embodiments by using a relatively slow rate of transport. For example, a sufficiently low concentration of target nucleic acids can be selected to achieve a desired average transport rate, lower concentrations resulting in slower average rates of transport."
(aDNA, ancient DNA, HTS, high-throughput sequencing, NGS, next-generation sequencing)
Relevant answer
Answer
Usually, submits libraries with concentrations of 5 nM or higher is optimal. But in such cases prepare of the library has the direct effect on the result.  I think you should focus to prepare the library.
dear Moritz Muschick 
  • asked a question related to High Throughput Sequencing
Question
4 answers
I sent a test plate to the seq facility we are working with for ddRAD sequencing of a non-model organism without a reference genome. These 50 samples came from 3 field sites from 1 region. They ran these 50 in one lane of an Illumina HISeq. When the seq data returned, it appeared fine in preliminary popgen analyses after deNovo assembly and I recovered around 8,000 SNPs after filtering.
I then sent a large batch of samples (300) to be sequenced following the same protocol. These were from ~15 sites across 2 regions (5 in one region, 10 in another). The facility ran my samples as 300 in a single Illumina HiSeq lane, but ran the single lane twice, as opposed to 100 in each of 3 lanes, once, as I thought they would do. They said 300 in one lane ran twice would provide better quality than 100 in each of 3 lanes, ran one time.
When I received the 300 back, I combined those samples with my 50 from the test plate and proceeded to filter and call SNPs together, getting significantly less SNPs, but that was expected with the reduced depth of coverage from running so many samples/lane.
The main issue however, is the combined 350 dataset produces 2 genetic clusters when using program Structure and using a Principal Coordinates Analysis (PCoA) in Genalex. One cluster is the 300 and the other cluster is the 50. This is biologically impossible that either of these clusters are true as there are field sites in the 50 group that are very close to fieldsites in the 300 group. What could be causing the genetic "clusters" to align by sequencing run?
Relevant answer
Answer
If the two samples (50 and 300) are really biologically similar, then your are looking for artifacts from the sequencing itself.
So this would certainly include any variables from Tyler's suggestions of  " lane/library specific biases ("lane effects" or "batch effects")- variation in PCR amplification, specificity of size selection, depth of coverage, coverage of loci across individuals (which can be a consequence of the above), variation in recovered loci across preps, etc" 
Again, if you measured any of these variables in the two analyses you should be able to determine which ones are responsible for the different clusters by looking at the first and second principal component coefficients.
Suppose the largest absolute values of the coefficients of your first principal component are those for the PCR amplification and lane effects variables and the depth of coverage  and batch effects are relative small. Then the interpretation of the 1st principal component should be that differences in PCR amplification and lane effects are responsible for the differences in the two clusters (if they are separated along the 1st principal component axis). 
If the largest coefficients have opposite signs (+ or -) then you may interpret them as "the PCR amplification is negatively correlated with lane effects".
Similarly for the second largest PC which should have a completely different set of most influential variable, notice which variables have the largest absolute values of among all variables.
Now to specifically decide why they cluster, you need to determine which of the PCs is responsible for the gap between the clusters. It may be just the 1st component is which case you might have a plot that does not vary much on the PC2 axis. If the second PC is responsible for the difference then you may see little variation on the PC1 axis but lots on the PC2 axis. If the clusters are separated on both axes then the largest variables associated with the PCS are most responsible for the differences.
If you intend to publish the data, then you will need to discuss the reasons for the clustering.
  • asked a question related to High Throughput Sequencing
Question
4 answers
I'm trying to figure out the best sequencing service to use in terms of price, service, quality of results, ease of sample collection, etc. What should I be considering when I select a sequencing service? 
Relevant answer
Answer
Hi Jordan,
Try submitting a request on Genohub - https://genohub.com/ngs/. They can help you find the best service provider according to your 16S rRNA sequencing project in terms of price, service, quality of results, ease of sample collection, etc.
Good luck!
  • asked a question related to High Throughput Sequencing
Question
3 answers
Iam working with clinical strains isolated from the same hospital and patients with similar background (all CF patients in a CF center). 
We have the complete sequence of the strains and they are quiet similar but I could find a statistical or value that can be used to demostrate if the strain are related or are isogenic. How many SNPs had to have in their sequence to be consider a diferent strain and no an isogenic strain with some mutations?
Also if anybody know about an article or something about this specific problem (delucidate isogenic or close related) please let me know!
Thanks!!
Relevant answer
Answer
Here is the article I was trying to add. I don't think there is a number specified anywhere; what I was saying is that you can have a number of SNPs and it will still be the same strain, unless you characterize it as a new one based on new properties such as the fact that it is resistant.
You can try determining what confers the resistance by looking into what is the common thing between the resistant mutants and identifying what is the molecular entity the drug normally interacts with (it probably targets a protein, but possibly the resistant strain doesn't let the drug inside, across its outer membrane).  
  • asked a question related to High Throughput Sequencing
Question
4 answers
Hi..I would like to apply random sampling technique for the distribution of genes in a collection of bacterial samples. I would like to test finally 250 samples out of 500. Please suggest me a suitable web-server based tool for generating the random numbers. Also is there a way to test if the generated numbers are random? Thank you.
Relevant answer
Answer
It is not very difficult to generate reasonably good random numbers, especially if their cycle is supposed to be less than 5million or so and you have a good seed (time of day?).
Try the Merissein twister method, for which I believe there are implementations in most/all languages. To test for randomness, there are also standard tests.
  • asked a question related to High Throughput Sequencing
Question
1 answer
I am a masters student working on extending a DNA repair model. I want to be able to pool multiple cells, use targeted MiSeq sequencing on a particular gene, and calculate the proportions of different repair events. How would I go about doing the data analysis for this? 
Relevant answer
Answer
I assume you need tens of thousands of mutations to test various repair events. Are you referring to single base substitutions or rearrangements? You may concentrate on few events, such as mutations at CpG due to cytosine methylation. If you need to treat cells with some mutagen, then you may have specific mutations spectra. It is unclear what you wish to do. 
  • asked a question related to High Throughput Sequencing
Question
5 answers
Hi all,
Performing RNA-Seq data sets needs to know which the most accurate and reliable platform to go with. Could you suggest such pipeline?
Note// I have good experience with the Tuxedo package (Bowtie, Top Hat, and CummeRbund) in addition to EdgeR, 
Thanks
Relevant answer
Answer
This paper should answer most of your questions:
Anders, S., McCarthy, D., Chen, Y., Okoniewski, M., Smyth, G., Huber, W., & Robinson, M. (2013). Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nature Protocols, 8(9), 1765–1786. http://doi.org/10.1038/nprot.2013.099
If you want to look at some example scripts, you can see them in my github repository for a recent data set we analysed. https://github.com/uhkniazi/BRC_Organoid_Joana
Once you get the count matrix, you have various options for performing any sort of differential expression of genes - EdgeR, DESeq2 etc. 
  • asked a question related to High Throughput Sequencing
Question
1 answer
what genes are under study?
Relevant answer
Answer
Dear Martin,
if you have the information about the real size of the total target population (in your case Venezuelan mestizo population .....) you can calculate the  research sample size here:
But if you have don't have info about the size of this specific population than your research sample would be purposive and than the number of the respondents is not strictly defined (as much respondents as you can reach is ok)
In both the cases if the differences of the genes is important for the research than you should have (nearly) equal number of the respondents from each category (min 30 respondents from each sub-sample)
  • asked a question related to High Throughput Sequencing
Question
2 answers
Identifying bacteia present in healthy verses disease conditions in a given biological sample
Relevant answer
Answer
You can approach Eurofins Genomics India Pvt. Ltd. for faster turn around time.
Contact:
Dr. Abishek Malakar 8375898192
Dr. Rudra Prasanna Panda 8884112396
  • asked a question related to High Throughput Sequencing
Question
1 answer
Recently, I came to know about a mobile DNA sequencer launched by Oxford (MinION).
1. What about Its efficiency and the reagents required for sequencing?
2. What are data anlysis tools used, once we generate output from the device?
Relevant answer
Answer
Hi Snijesh,
We are participants in the MinION Access Program and use the device quite regularly. The Access Program is now open to anyone after registration with Oxford Nanopore.
Regarding your questions:
  1. I don't know what you mean by 'efficiency'. For good runs, we get around 1 Gb of data, with the majority of fragments > 5 kb length. A weakness of the system is the relatively high error rate (up to 10% / base), but in the last year this has improved a lot, and hopefully will continue to get better.
For reagents/consumables: the sequencing is done on a flow cell that (practically) can only be used for one run. Library preparation is done with kits from Oxford Nanopore, supplemented with kits from NEB (end repair kit) and SPRI magnetic beads (Ampure XP beads). For a rough calculation: a run costs in the order of $1000.
  1. Oxford Nanopore provides a standard base caller for DNA. The technique is relatively new, so few other established analysis tools are available. There is a very lively community of users though that develop, and make available, new open-access tools, like for instance base calling (including base modifications), DNA scaffolding, error corrections etc. If your question is non-standard, you may have to develop your own tools though.
  • asked a question related to High Throughput Sequencing
Question
14 answers
I need to remove human dna data from HiSeq metagenome sequencing data of human gut. Is there any available script/software to use?
Relevant answer
Answer
Hey Katya!
I would recommend using the "bbmap" tool included in the "BBTools" suite for that task.
For an introduction to the "bbmap" tool: 
For a description of how to use bbmap to remove human contaminant DNA (also includes an already processed reference genome available for download):
  • asked a question related to High Throughput Sequencing
Question
5 answers
Dear all,
I would like to compare next generation sequencing (NGS) protocols for Ion Torrent, Illumina and Pacific Biosciences. It would much be appreciated if anyone can provide me with their laboratory protocols.
Kind regards,
Ali
Relevant answer
Answer
Thank you very much. Indeed very helpful.
  • asked a question related to High Throughput Sequencing
Question
4 answers
Hello,
I am using QIIME to analyse my sequencing data. I have used it before with Illumina data with no problems, but now I have Illumina MiSeq data and four raw data files: two index files and two raw reads files. I understand the raw reads files (one forward and one reverse that I can merge with join_paired_ends.py), but I do not know how to handle the two index files. How do I construct the mapping file with the two indexes? How do I use these index files as input for join_paired_ends.py and split_libraries_fastq.py? Should I use the extract_barcodes.py script before joining the paired end reads?
Thank you very much!
Relevant answer
Answer
example:
extract_barcodes.py --input_type barcode_paired_end -f index1.fastq -r index2.fastq --bc1_len 6 --bc2_len 6 -o parsed_barcodes/
I assume this is the one you need.
Best wishes,
Suparna
  • asked a question related to High Throughput Sequencing
Question
4 answers
After high-throughput sequenceing of 16S rDNA, the sequencing depths of different samples usually vary a lot. The sequencing depth can affect alpha and beta diversity analysis, therefore, we usually used the strategy of rarefaction (randomly sub-sampling of sequences from each sample) to equalize the number of sequences per sample. But when we performed functional genes' diversity (e.g. amoA gene of ammonia-oxidizing microorganisms), we often used a clone library method due to the limitation of read length of NGS. As a result, we only obtained very limited numbers of sequences (e.g. 50 to 100 sequences varied among samples) in each sample. If we randomly sub-sample like the 16S rDNA data, we may lost nearly half of the sequence number in some samples and this should have great influence on the alpha or beta diversity. So, in this case, if we can calculate the alpha and beta diversity based on the relative abundance data of OTU? i.e. before calculating the diversity, the data in each sample were firstly unified through divided the total sequence number in each sample. Is this transformation is reliable and scientific? Is there anyone using this method to calculate alpha or beta diversity? If you have related references, I will very appreciate that.
Thank you!
Relevant answer
Answer
Hi Luca,
Thank you very much for your answer. I got it, I think you may mean that the relative abundance based diversity also could not solve the low coverage problem in some samples. In this case, it would be misleading if we compared the diversity indexes among different samples.
Regards,
Guangshan
  • asked a question related to High Throughput Sequencing
Question
8 answers
After high-throughput sequencing, we obtained thousands of 16S rDNA reads for each sample. When we analyzed these sequences, firstly, we performed quality filter to remove the low quality sequences, then we randomly resampled to even the samples, then we usually defined OTUs at 97% similarity. Some OTUs only have one single sequence, we could not confirm that if they are very rare species or wrong sequences because some unknown reasons. Some papers retained these single-sequence OTUs, while other papers removed them. So, if these single-sequence OTUs should be removed? And why? I will be very appreciate if you can give some references about the question. Thank you very much.
Relevant answer
Answer
Hi Guangshan,
To further clean the read set, you can apply an error correction step to the original reads, this may be reduce the number of OTU you find.
As further step, did you screen for chimeras (as obtained from spurious fusion of sequence during the PCR step) ?
Another possible source for them is a contamination (DNA extraction kits and so on are not necessary bacteria-free).
On your protocol, do you define OTUs (at 97%) across all the samples after evening the number of reads? I do prefer to call OTUs using all the reads (after quality filtering), so that any low count OTUs that is present in more than one sample will be identified more easily. In your way, low abundance OTUs may not even reach the OTU selection step.
In general, I think removing or keeping single sequence OTU, very much depends on what you are looking for: in many application you probably will keep only OTUs above 1% of abundance in any of the samples;
if you want to look at fine diversity or low abundance species, you will probably need them.
In my opinion, as far you know what you want and you treat all the samples in the same way, you can discuss easily both way in a paper.
Luca
  • asked a question related to High Throughput Sequencing
Question
4 answers
Now the high-throughput sequencing is very popular in the study of microbial ecology. Here I have a question about the 16S rDNA data processing:
When we analyze the 16S rDNA data after high-throughput sequencing, if the primer sequences should be removed? And why? 
It will be very grateful if you can recommend some reliable references. Thanks!
Relevant answer
Hi Guangshan, 
Primer sequences should always be removed from NGS sequencing data, this is because it is only used as a molecular tool to bind pieces of DNA to your seqeuncing techonology chips etc. 
If you do not remove the primer and adapter sequences, the overall similarity of the sequences with be altered (the same sequence in all reads will increase similarity, and different adapters between samples will decrease similarity). This is very important as we use 97% similarity for assigning OTUs (species level) or other levels of similarity (for other taxonomic level), and altering the level of similarity will result excluding or including reads in OTU groups. 
For more you can refer to the QIIME pipeline - Caporaso, J. Gregory, et al. "QIIME allows analysis of high-throughput community sequencing data." Nature methods 7.5 (2010): 335-336.
Alternatively you can refer to other publications by Greg Caporaso - https://scholar.google.co.za/citations?user=8wv9sLkAAAAJ&hl=en&oi=sra, who frequenctly touches on the topic of using 16S rDNA in microbial ecology.
Or you can refer to Pat Schloss' publications, another great scientist who frequently touches on using amplicon sequencing in microbial ecology. https://scholar.google.co.za/citations?user=xswWwaMAAAAJ&hl=en&oi=sra
Hope this helps, 
Andries
  • asked a question related to High Throughput Sequencing
Question
7 answers
I am mapping the raw reads to the contigs generated by de novo assembly of the same reads and want to represent the read coverage in the form of graph..
Relevant answer
Answer
 @Arjun Sahoo   I have  the exact same  suggestion as  Dong . Yes,IGV is a powerful tool to visualize the bam file to see the coverage profiling of you reads. You need input the refence genome fasta file,a bam file and an index file atthe same time.
  • asked a question related to High Throughput Sequencing
Question
1 answer
I am interested in looking at a gene for DNRA, nrfA.
Relevant answer
Answer
That depends upon how variable the gene is and what reference sequences are availabale for identifying the organisms that express it. If you are lucky, there will be enough sequence variablity that you can produce phylogenies or at least seqeunce similarity trees where the tips are differentiated enough to identify species. These could be treated as MOTUs if there are no reference sequences and diversity of organisms is interesting enough. If you are even more lucky, then there will be reference sequences to identify the MOTUs.
  • asked a question related to High Throughput Sequencing
Question
3 answers
I'm looking to prepare a grant for some high-throughput work, but the Core Facility at my University does not offer the service. Any suggestions would help, as I know Cornell's GBS program is currently on hiatus (this is what my classmates have used in the past). Thanks in advance!
-Spencer
Relevant answer
Answer
Contact Seth Crosby at Washington University in St. Louis.   He can give you quotes on the services offered by GTAC.
  • asked a question related to High Throughput Sequencing
Question
9 answers
I have the evaluation of RNA sequencing (Illumina, Hi-Seq) in Excel format. P-value and adjusted p-value are stated. How can I convert p-value into adjusted p-value?
Relevant answer
Answer
The adjusted p-value is always the p-value, multiplied with some factor:
adj.p = f * p
with f > 1. The actual size of this factor f depends on the strategy used to correct for multiple testing. In a very simple case, f is taken to be the number of tests that are in the family (e.g. the number of genes tested for differential expression). This is known as "Bonferroni correction".
From your data you can calculate the factors for the given genes. They are not all equal, so the method for correction used here is not simply Bonferroni. Unfortunately, there are many other methods, and from a given subset of genes it is not possible to find out what method is used.
  • asked a question related to High Throughput Sequencing
Question
4 answers
"Illumina’s “indexing” system differs from other sample barcoding methods for high-throughput sequencing in that the barcodes (“indexes”) are placed within one of the adapters rather than being directly attached to the ends of template molecules"
recently I have read about illumina sequencing technic. The question comes that is it necessary to put index in the middle of the primers when doing PCR. 
If we send our PCR sample to sequencing companies with index  directly attached to the ends, will it do harm to the results?
Relevant answer
Answer
The within-adaptor index is sequenced separately from the DNA insert (and has an immediately preceding sequencing primer of its own). If we were to stick these indices into the DNA insert than we would be consistently taking away high-quality bases from the primary sequencing effort. We can't place the index at the end of the insert, because per-base quality rapidly decreases at the ends of the read as individual fragments in the cluster become more and more asynchronous (which is why we tend to trim the ends). By giving the index its own sequencing primer we can guarantee  that those base calls will be of higher quality while not shortening the amount of DNA insert we can effectively sequence, yet we can still assign those index reads to the sequence reads because they are from the same physical location on the flowcell.
You can of course also place barcodes at the end of your DNA insert, but be sure to place it on the P5 end so that sequence quality is maximized, although these won't be automatically demultiplexed. In our case, we multiplex more individuals than there are available Illumina indices, so we use combinatorial indexing with a barcode at the P5 end of the insert as well as the Illumina index in the adaptor at the P7 end.
  • asked a question related to High Throughput Sequencing
Question
11 answers
Hello! :) I have an idea of comparing two related but different species in order to learn if there are differences in their expression profiles. I used the bam files after tophat2 mapping in HTSeq to generate "counts-tables" and then chose top most expressed genes and compare the data. My question would be: do I need to normilize in some way the HTSeq counts (if so, then how?) or there is no sense in that since the species are different? Thanks a lot in advance!
Relevant answer
Answer
Hey Victoria,
Comparing expression of two species makes sense, if only you do not take it too seriously ;)
You'd need a table of homolog genes between the two species, which you may generate eg using biomart (so preferably have your genes in count tables according to Ensembl). Then for each homolog pair you can try to compare - most sense makes comparing fold change to fold change, raw expression is more prone to batch effects . For an example, see eg: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0047107
Btw - tophat/HTSeq is OK for mapping and counting, but STAR or subjunc + featureCount (from subread/subjunc) is way faster to compute.
Good luck! :)
  • asked a question related to High Throughput Sequencing
Question
6 answers
Hello, everyone.
Recently, I did chip-seq (H3K4me1、H3K4me3、H3K27ac、H3K27me3、Pol II、P300) from 1×107 insect cells. But, I got poor yield (about 4ng ) as standard Illumina library preparation needs 10ng and I tried many times. The cell I used is hard to culture, So I want to try some new modified chip-seq protocol in order to get high output. There are a lot of methods about doing chip-seq from small number cells.
For example:
nano-ChIP-seq: Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitors.
LinDA: Single-tube linear DNA amplification (LinDA) for robust ChIP-seq.
Small-scale ChIP: In vivo epigenomic profiling of germ cells reveals germ cell molecular signatures.
cChIP-seq: cChIP-seq: a robust small-scale method for investigation of histone modifications.
These mothods will introduce potential PCR bias due to preamplificationI and I don't know to choose them? cChIP-seq seems better, but it  is not clear how does it work about carrier?
Whether do I use more cells to get high yield or select one modified chip-seq protocol from above? Is there any other better methods? Anyone can give some suggestions? 
Thank you very much!
ChengDong
Relevant answer
Answer
Hello, we have optimized the True MicroChIP kit for ChIP-seq on 10.000 cells for histone marks. In combination with our MicroPlex library preparation kit v2 that allows to start libraries from 50pg.
  • asked a question related to High Throughput Sequencing
Question
1 answer
Hi,
   I am trying to integrate a gene of ineterest in Kluyveromyces marxianus under the control of GAL1 promoter from vector pYES2. The entire expression cassette would constitute of GAl1 promoter-gene of interest-terminator and a marker gene URA3. I would like to know if GAL1 promoter would work in K. marxianus as well??
Relevant answer
Answer
Dear Diptarka. I have not, none experience in this topic
Regards
  • asked a question related to High Throughput Sequencing
Question
5 answers
Hi all,
1- Someone here used Minion?
2- What is the real error rate? I saw 30% in Madoui et al. (2015).
3- What is the hardest task to manipulate the sequences and downstream analysis?
Thanks in advance.
Relevant answer
Answer
It's getting much better. When you look at error rate remember it can read sequences >10 kb compared to short ones for Illumina. Best of all, you can fit it into your pocket & plug it into your usb drive!! Minion also has a tools (Minknow) that come along with it for downstream analysis. Look at poRe & poretools.
  • asked a question related to High Throughput Sequencing
Question
3 answers
How can i find raw data of sequencing machine on internet ?
Relevant answer
Answer
  • asked a question related to High Throughput Sequencing
Question
6 answers
hello everyone, i'm doing chip in insect cells. I use Richard A Young lab published paper in nature protocol in 2006 :"Chromatin immunoprecipitation and microarray-based analysis of protein location".
i tried to optimze sonication, and result i got like this(sonicaiton.png), whether i can continue to do chip or i need to optimize my sonicaiton?
Besides, i used H3K4me3 and H3K27me3 antibody(merck millipore) to do chip, i got a poor results, the final concentration is 0.1~0.3ng/ul(volume 50ul). Agilent result is also not ideal(below figure).
is it caused by sonication? or some other reason?
anybody can give me some suggestion?
thank you very much.
Relevant answer
Answer
Dear Cheng,
Your sonication profile looks perfect, move on and test if your IP worked by ChIP-PCR. The low yield in the end can be improved by other means; i.e., using more cells, purified the IPed DNA with AMPure beads instead of column-based purifications etc.
Good luck!
Antonio.
  • asked a question related to High Throughput Sequencing
Question
3 answers
We think to work with total DNA obtained from reference samples (buccal swab)
Relevant answer
Answer
Hi Ruddy,
I suggest you go forward for DNA extraction (plenty of protocols are available), PCR using primer sets you may find in several literature, purification and sequencing. You may then assemble the fragments for the complete mtDNA sequence using any suitable software (e.g. geneious, clcBio...etc.)
I highly recommend to consult the following links for clearer overview:
Regards
  • asked a question related to High Throughput Sequencing
Question
5 answers
Hi all,
I would like to study the gene expression levels of 22 genes in 1200 different samples.
Since normal qPCR is humanly not doable here (26400 reactions) I would like to know if anyone knows of a high throughput plateform that would allow us to study a loooot of samples on only a few genes...
I have heard of Microfluidic plateforms. Anyone with experience on a few of these? Would that fit our experiments and what are the average costs of such an approach?
Thank you!
Relevant answer
Answer
I would personally use deep sequencing, for example the custom targeted RNA sequencing library prep kit from illumina.
You can multiplex up to 384 samples in a single Miseq run. It is usually more expensive than arrays or qRT-PCR, but given the sample number, prices should be competitive. Plus it is streamlined for high multiplexing and provides directly readcounts for easy analysis.
  • asked a question related to High Throughput Sequencing
Question
3 answers
I need sequencing many samples of human mitochondrial DNA. I do not know how many ( concentration, amount) of DNA I should need for this.
Thnak you
Relevant answer
Answer
Hi Ana.  Are you just going to sequence human genome and not use any type of capture?  Because of the copy number differences between mtDNA and the human genome you should not have any problems getting enough coverage for the mtDNA.  However, to see low level heteroplasty, you'll want at least 500x, preferable 1,000-fold coverage (or more).  Maybe you could provide a bit more information on how much human genomic DNA you are starting with and what you are hoping to see?
  • asked a question related to High Throughput Sequencing
Question
2 answers
I'm planning to start designing an in vivo assay to be used for a HTS. Does anyone have any advice/know of literature on the subject on how to choose an appropriate cells line? I will have to establish a stably transfected cells line with at least two constructs. The readout will be measuring changes in GFP expression.
Relevant answer
Answer
The attached article could be of help ... but maybe too late ...
Best regards
Robert
  • asked a question related to High Throughput Sequencing
Question
2 answers
Dear all
I am considering to know numbers of rare variants ( MAF < 0.01-0.005) between specific length of chromosome such as bp, kbp or Mbp in human or cattle sequencing data via NGS method. I will appreciate if anyone has a information regarding to numbers of rare variants from sequence data in human or cattle.
Regards
Sajjad 
Relevant answer
Answer
Dear Shilu Mathew
Thanks for your paper offers. I found some of these paper before. However, maybe my question was misleading, I only want to know what is the proportion of common SNP variants and rare SNP variants in average of sequencing human data. For instance, if we want to simulate a part of human sequence data for a length of 500 kb, what is the proportion of rare and common SNP variant in the noted genome region?
Regards
Sajjad 
  • asked a question related to High Throughput Sequencing
Question
5 answers
I have been struggling with this problem particularly for the similar expression gene graph where I can see only the tracking id's XLOC... which actually even do not match with any of the id's I got after performing k-means clustering (I did this to choose possible candidate genes).
It would be useful to know how to change this for other graphs too.
Thank you!
Relevant answer
Answer
Hi,
I assume you are taking about the heatmap for digital gene expression, if so than you there are two way to do it. First you have to find the command of R-script in cummerbund and chose desired field from the expression count matrix. Which is laborious and require too much effort. Second is just create this plot via  R interface by using set command lines. 
Use these links for more details:
Hope you find it useful.
Good luck.
  • asked a question related to High Throughput Sequencing
Question
1 answer
Need a contract lab for high through output assays.
Please share leads and intros [email removed by admin]
US based preferred.
Relevant answer
Answer
Hello Douglas,
Our lab, the Integrated Microscopy Core at Baylor Houston, does high throughput screening and assay development. We have available a BioMek Span8 robot for plate processing and Vala IC-200 high throughput microscope. We are a Core facility and can accept samples from anyone, anywhere. Can you tell me more about your research? Please visit our web site for more information.                  
  • asked a question related to High Throughput Sequencing
Question
4 answers
I am currently analyzing some huge metagenomic data obtained by Illumina's HiSeq. However, I would like first to make a taxonomic classification of the paired-end reads in order to get only the sequences of certain organisms before assembly. So, I would like to have a recommendation of a putative software for this task, cause I know there are several available in the repositories... Thanks!
Relevant answer
Answer
Thank you for the suggestions. I will try to use those programs. Also, I got the suggestion from a colleague to try GOTTCHA or Kraken.
Regards,
Alejandro
  • asked a question related to High Throughput Sequencing
Question
3 answers
I am going to evaluate the effect of Herbal medicine extract and Probiotic in TNBS-induced mice colitis by modulating the gut microbiota structure .In order to find out the most influenced or changed species by high-throughput sequencing. The fecal sample is important .
I tend to A plan. Because the drug and probiotic have Sufficient time to adjust the mice gut mirobiota before TNBS administration.But I confused about the collecting time point . Most colitis gut microbiome study just have two time point like B Plan ,before and after treatment . In A plan.it has three time point.From day -21to day 0, it is easy to find the microbiota change in nnormal circumstances.but under the environment of TNBS, should I compared day-21,day3 or day-21,day3?
If I choose the B plan,I wonder whether the short-term treatment of in the acute colitis models might have a significant change in the microbial distribution. So,Can anyone give me some suggestions?
Relevant answer
Answer
thank you very much.but at day3,it is very hard to collect the fecal sample because of the TNBS administration.The cecum content may be more avaliable.but in that way.the microbiome is signficantly different to the day -14 and day0(natural excrement ).I am looking forward for your advices!
  • asked a question related to High Throughput Sequencing
Question
11 answers
I'm wondering whether anyone has assembled a database of oomycetes and successfully used on the of high-throughput sequencing platforms to characterize the community structure of this group of organisms in soil.
Relevant answer
Answer
Hi. If you are still interested on this topic. In my former lab, we tried using the 28S primers from Arcate etal 2006 Microbial Ecology 51: 36-50 for pyrosequencing.  They worked pretty well on recovering oomycete communities, which were otherwise not recovered by pyrosequencing with universal large subunit primers (LROR-LR3). On the end we didn't explore much more on this topic, but it worked well for the samples we tested.
  • asked a question related to High Throughput Sequencing
Question
2 answers
SeqTar users: is it right to calculate the probability of the valid peak using the binomial probability density function as they tell in the paper (http://nar.oxfordjournals.org/content/early/2011/12/02/nar.gkr1092.full)?
Considering that the binomial PDF for a certain value returns the probability to get a peak *exactly the same height* of the one under analysis, it seems to me that the right thing to do would be instead to calculate 1- (the binomial cumulative distribution for height of the peak under analysis-1).
This would give the probability of getting a peak *the same height of the valid peak OR MORE* given the trials and the probability, as before. Indeed using the probability density function might create problems: if a gene is heavily covered by PARE reads and the valid peak has much less reads than expected, the probability might be very low while the peak is not outstanding with respect to the surrounding nucleotides.
Using the cumulative probability, the probability of getting higher peaks would approach one, as expected. Is it right?
Relevant answer
Answer
Great! Thank you.
  • asked a question related to High Throughput Sequencing
Question
5 answers
We have used rdp data analysis for ribosomal RNA gene (rDNA) amplicon reads from high-throughput sequencing (sanger sequencing). Almost half of my data was categorized as unclassified. I thought of going for another updated data analysis database.
Relevant answer
Answer
You can try green genes or silva.  A newer version seems to be in the offing (http://www.arb-silva.de/).
  • asked a question related to High Throughput Sequencing
Question
3 answers
Has anyone tried to use BAM files from the IonExpress (Life Technologies) which are mapped using the TMAP software?  Just looking at the output, it seems like TMAP is not very splice aware.  I read that it does use BOWTIE2 for aligning, but there is no mention of TOPHAT.  Does anyone know if these files can be used for RNAseq or should I just start over from the Fastq files?
Relevant answer
Answer
for a more thorough answer, mappers are generally not splic aware, how the deal with splice junctions depends, usually, on paramerer choices.  In principle you should be able to use a bam file from tmap, however, not every bam file will work with every software (for instance, RSEM has very specific requirements).  If it was me, I would probably start from the fastq files becasue it doesn't  take very long and I would know exactly how the mapping was done.
  • asked a question related to High Throughput Sequencing
Question
5 answers
I have a library of tagged proteins overexpressing cells and I have total cellular extracts for all of them but in small amount (around 500 ng of total proteins for each) and I would like to avoid defreezing the cells library to regrow them and reharvest largest protein samples. I would like to set up high throughput interaction assay of all these guys with a single protein. I thought about Gst-pulldowns, ELISA or Alpha screen technology. Also the protein samples are conserved in -20C, can the freezing alter their capacity to interact? I am looking for the fastest reliable technique. 
Relevant answer
Answer
You could give it a try with MST (MircoScale Thermophoresis). To do so, you would need to label your interacting protein. Then you fill each of your cell extracts into a reaction tube (or in a plate) and add the labeled protein. The reaction setups are filled into glass capillaries, and the measurement can be started.
During the measurement, a temperature gradient is induced which makes the molecules move (Thermophoresis). Their movement depends on different parameters that typically change when your molecule of interest (in this case your labeled protein) binds to an interaction partner (in your case this would be the overexpressed protein in the cell extract). You also need to perform negative controls with control cell extract (no protein overexpression) and in the best case scenario you also have a positive control.
In case you are not only interested if the two proteins are binding to each other, but you would also like to determine the affinity, you need to quantify the overexpressed protein. Then you need to prepare dilution series of the cell extracts. The change in thermophoresis is plotted against the ligand concentration; fitting the data points you will obtain the binding affinity.
Sure, freezing can change protein properties, depending on the freezing conditions (volume, speed of freezing, storage temperature, addition of glycerol etc.). We have best experience flash-freezing small aliquots (10µl) of pure protein in liquid nitrogen. For cell extracts, I guess this could be a different story.
  • asked a question related to High Throughput Sequencing
Question
5 answers
I have several sets of data but with different number of sequencing output. The data appears to have the same microorganisms but in different numbers. Can anyone suggest an analyze for this? In addition can anyone recommend any statistics to run on DGGE bands?
Relevant answer
Answer
I found this paper that describes a statistical method that addresses your problem, I think: http://www.cbcb.umd.edu/~niranjan/papers/WhiteNagarajanPopPLOSCompBio09.pdf
The authors propose a method to identify species which are differently abundant in pairs of metagenomic samples, and they provide a particular treatment for cases where data are sparse.
It has been cited around 100 times, so if you trace the citations forward I think you would find some interesting more recent developments.
  • asked a question related to High Throughput Sequencing
Question
1 answer
I have sequenced the 16S (563F-802R) of rhizobacterial communities of 88 soil samples in one experiment, 48 samples were sequenced using 454, the other 40 were sequenced by Miseq. I want to compare these data, what should i do? When comparing results, if they are equally representative?
Relevant answer
Answer
These sequencing technologies each have different biases and error profiles. No matter what, you'll have to take the results with a grain of salt. Maybe closer to 6...
Also a related caution: different 16S amplicon regions and extraction kits have far more profound effects on the same source samples.
In general terms, you'll want to perform OTU formation for both datasets against the same set of reference sequences, such as greengenes. This will help minimize the run-level effects.
If we _were_ being QIIME-specific, this would be done through this script:
  • asked a question related to High Throughput Sequencing
Question
2 answers
I've done a HTS with siRNAs for the whole Human genome. I obtained lots of hits even though I'm being very stringent on my clean-up and selection process. I'm categorizing hits based on several criteria following peoples' advice and some publications, but this is a slow and impractical process. Do you have experience with this process? How do you know or choose what genes to follow?
Relevant answer
Answer
Unfortunately, the nature of siRNA screening is that there will always be a lot of false positives due to off-target effects, especially if you are screening with just a single reagent (pools of siRNAs or a single siRNA) per gene. In my experience, 60-90% of the "hits" from that type of screen will end up being false positives (for an example of how divergent the results can be from testing different siRNAs against the same gene, see figure 1A and 1C from our publication "Common Seed Analysis to Identify Off-Target Effects in siRNA Screens" http://jbx.sagepub.com/content/17/3/370.full ) . As a result, you will have to plan a follow-up strategy that is sufficiently "medium throughout" to allow for up to a 90% attrition rate. Doing pathway enrichment analysis and then selecting "hits" from the enriched pathways might marginally increase your validation rate, but at the expense of identifying things that are more likely to have been previously characterized. The most cost efficient method is usually to get 3-4 novel siRNAs against the candidate genes and test them in your assay.
  • asked a question related to High Throughput Sequencing
Question
2 answers
I’m trying to make a genetic map with SLAF-seq to a F1 mapping population of a perennial woody plant. SLAF-seq is a genotyping by sequencing technique similar to GBS or ddRAD. But I now find almost 70% of the F1 offsprings have ~2%(1000-4000) loci that are not the same as corresponding loci of any of the parents. The average sequence depth is ~4X.
For example, one locus is as follows:
ACAACCCAAGAACAAATAACGTTTATTTACACATGTTTCTTCAATACATCGGAGCCGCTCTTGTATTAGTCAATAAAAATXXXXXXXXXXATAGTTAAAAACTATCCGCCACGATCAAAAAACAGAGCTTTTGTTTCACCAGGTAGTCTCGCTGCACGGGTAGTGCTAGT ab(progeny),
ACAACCCAAGAACAAATAACGTTTATTTACACATGTTTCTTCAATACATCGGAGCCGCTCTTGTATTAGTCAATAAAAATXXXXXXXXXXATAGTTAAAACCTATCCGCCACGATCAAAAAACAGAGCTTTTGTTTCACCAGGTAGTCTCGCTGCACGGGTAGTGCTAGT M,P(parents)
Then I began to doubt that the parents were polluted by pollen of other plants, but then this should not go wrong in so large a scale for the strict control.
Is anyone here familiar with this technique? Is it mutation or sequencing errors or something else? Is it normal for such a proportion? Does anyone meet similar situation in dealing with GBS or ddRAD data?
Your answers will be appreciated. Best regards.
Relevant answer
Answer
It could be that you are not identifying the full parental genotype at the 2% of loci. A heterozygous parental locus is called homozygous, but the uncalled allele then is seen in the F1s. If the sequencing depth in the parent libraries is 4X, then I am surprised you don't have more of these alleles appearing, since sampling two chromosomes 4 times gives you a 0.5^4 chance of sampling just one of the chromosomes in the 4 reads.
The usual causes of allele dropout in GBS/ddRAD data (different size selections, polymorphism in a diverse population affects a cut site) wouldn't apply here.
  • asked a question related to High Throughput Sequencing
Question
7 answers
My initial compound library had 348,276 compounds. After applying various filters like functional group filters, Physio-chemical filters, removing duplicate compounds, removing Pan Assays Interferences Compounds etc my final size of compound library has reduced by about 90 percent. I have applied these filters from FAF-Drugs server (http://mobyle.rpbs.univ-paris-diderot.fr) using Leadlike soft Physiochemical filters, functional group filters available there and all the three PAINS filters (A,B and C).
But 34000 compounds is still huge number for virtual screening (VS). My question is: are there any other online filters (freely available would be good!!) that i can apply to reduce the number?
If I proceed with these numbers for VS it would generate huge data. Therefore, I need your expert opinion on efficient post virtual screening analysis. What tools are available out there that can be used to analyses and interpret huge virtual screening data .Name of software or links of papers describing such work will also be a great start for me.
Relevant answer
Answer
You are right to be concerned about the post-docking analysis. This has to be done well to generate the best possible results. Relying on docking score only to choose your final set for screening will not work well. I would recommend thinking about the following filtering strategies
1) Removal of structures which have bad clashes with protein or poor docked geometry
2) promotion of structures that have good consensus scores when rescored with a second (and third./fourth) scoring function
3) promotion of structures which make the interactions with the active site that you think important
4) Clustering by chemistry (e.g. Murcko scaffolds) followed by chemical diversity analysis to generate a diverse subset.
Post docking analysis tools exist which are designed to do most of 1-3 (e.g. GoldMine which comes with GOLD). i've used the free pipelining tool Desktop Knime to do step 4.
  • asked a question related to High Throughput Sequencing
Question
7 answers
It would be interesting to know advantages or disadvantages of this software.
Relevant answer
Answer
iMir uses both miRanalyzer and mirdeep for miRNA analyses. may be helpful for your analysis.
  • asked a question related to High Throughput Sequencing
Question
2 answers
As a follow-up to a study where we analyzed several tag-SNPs in candidate genes, we sequenced the genomic areas around the tag-SNPs associated with the disease. The sequencing was done in pooled samples. But now we have trouble finding a good method to determine the allele frequencies in these pooled samples. Anyone with experience with this type of problem?
Relevant answer
Answer
The main problem in these types of analysis is the possibility of unequal pooling. On a data level this can be assessed to some extend by evaluating informative SNPs, if available. Some useful experiments and comparisons are described in these papers;
  • asked a question related to High Throughput Sequencing
Question
3 answers
When does the term transcriptome come in case of the cancer cells cultured in vitro? If we are thinking for similarity determination of the cancer cells in comparison to original tumor tissue from which it has been derived the early passage, cells are considered for studying NGS data because there is some common hypothesis and prove for some cancer cell lines that after some passages (5-15) genomic and transcriptomic alteration occurs. I am not sure in between the early passages, which passage will be utmost suitable because NGS run is always associated with high costs.
Relevant answer
Answer
tl;dr: qRT-PCR first to see if contaminating cells are gone
If I am reading your question correctly, I'd suggest that you profile a cell line derived from a primary tumor at the earliest passage possible where you no longer have contaminating (stromal cells, immune cells, etc.) cells also in the culture.
Even a small amount of contaminating cell infiltrate will have highly abundant gene products that will complicate your classification against different tissues when comparing after your NGS run (I assume we are talking RNA-seq here).
Experimental suggestion: if you are taking a skin cancer biopsy that should only contain epidermal keratinocytes (the cell of origin for squamous cell carcinoma, let's say), design some qRT-PCR primers for genes exclusively found in an immune system cell, a pigmented melanocyte cell, and others in the skin. When the detection of these genes has fallen significantly from your original culture, I'd say you are ready to sequence the transcriptome for a relatively pure, early passage cell line.
  • asked a question related to High Throughput Sequencing
Question
4 answers
Good-bye 454 in 2016! What is the next No 1. platform, MiSeq, IonTorrent? Once you get sequences, what next, which pipeline to use for sequence clustering and identification?
What do you do with your sequences?
Relevant answer
Answer
We just released ITS reference databases for QIIME (http://unite.ut.ee/repository.php) and mothur (http://www.mothur.org/wiki/UNITE_ITS_database).
They are based on the UNITE database and its global key to the molecular identification of all fungi known from ITS sequence data (https://www.researchgate.net/publication/255483379_Towards_a_unified_paradigm_for_sequence-based_identification_of_Fungi?ev=prf_pub).
  • asked a question related to High Throughput Sequencing
Question
1 answer
In our company, we are working with HTS and HCS studies of natural ingredients and we are evaluating some commercial solutions to manage these data as well as to allow correlations considering phytochemical profile as well as botanical characteristics. I already have 5 companies that we are evaluating including: IDBS, Accelrys, Genedata, PerkinElmer, Dotmatics. If you have any experience with those platforms, could you share with us your impressions, advantages and disadvantages?
Relevant answer
Answer
Patricia,
I wonder if you have chosen a solution for big data management, and if you have some experience to share to help others who are working in this area?
Thanks!
  • asked a question related to High Throughput Sequencing
Question
2 answers
I want to use mouse tissue homogenates containing bacteria and plate it on agar plates. For my experimental purpose I need to pool the bacteria from the plates. If I scrape off the bacteria, will there be DNA from tissue along with that? Since, later I want to isolate DNA from that bacterial pool for Illumina HiSeq sequencing.
Is there a way to get rid of the animal tissue DNA from the plates or downstream, or should I not bother about it?
Relevant answer
Answer
Add DNase I working solution to the plate ,coating uniformity and Incubation at 37 degree ,the extrinsic DNA(DNA from tissue ) will be degradated by the DNase I,but not the bacterial DNA .