Science topic

Bioinformatics Analysis - Science topic

Explore the latest questions and answers in Bioinformatics Analysis, and find Bioinformatics Analysis experts.
Questions related to Bioinformatics Analysis
  • asked a question related to Bioinformatics Analysis
Question
3 answers
Introduction:
I am conducting research on the bacterial composition of fecal samples from both healthy and diseased individuals using 16S sequencing. I am seeking expert guidance on the appropriate bioinformatic analysis methods for my dataset.
Objective:
My goal is to analyze the bacterial communities in fecal samples from a diseased cohort and a control group of healthy individuals, using 16S rRNA gene sequencing.
Sequencing Method:
I have employed a Nanopore sequencer to acquire full-length 16S sequences.
Alignment Method:
For the alignment process, I have used the kraken2 tool.
Database:
The standard database provided by kraken2 has been utilized for the alignment.
Output Files:
I have generated 12 sets of output files, ranging from kraken2-report01 to kraken2-report12 and kraken-output01.txt to kraken-output12.txt.
Downstream Analysis:
I am contemplating two approaches for downstream analysis:
  1. Converting the output data into biom format using kraken-biom and then analyzing it on the QIIME2 platform.
  2. Converting the output data into either OTU or ASV format for analysis using MicrobiomeAnalyst.
Questions:
  1. Is there a specific method for converting the kraken2 output into biom format? If so, could you provide the steps for this conversion?
  2. If the conversion-based approach is not advisable, what are the recommended methods for diversity analysis and identification of variable species post-kraken2 analysis?
Relevant answer
Answer
follow these steps:
1. Install the necessary tools: Ensure that you have the required dependencies installed, including Kraken2, Krona, and biom-format. You can refer to the respective documentation for installation instructions.
2. Generate Krona report: Run the following command to generate a Krona report from your Kraken2 output:
```
ktImportTaxonomy -q 1 kraken-output01.txt kraken-output02.txt ... kraken-output12.txt -o krona_report.html
```
This command will generate a Krona HTML report (`krona_report.html`) representing the taxonomic composition of your samples.
3. Convert Krona report to biom format: Next, use the following command to convert the Krona report to the biom format:
```
ktImportText -n Krona_Report krona_report.html -o biom_file.biom
```
This command will generate the biom file (`biom_file.biom`) in the desired format.
4. Import biom file into QIIME2: You can then import the biom file into QIIME2 for downstream analysis. Use the appropriate QIIME2 command to import the biom file based on your analysis requirements.
It's important to note that the biom format is commonly used in QIIME1, but QIIME2 generally uses the newer QZA or QZV formats. If you'reIf you're using QIIME2 for downstream analysis, it is recommended to convert the Kraken2 output into the QIIME2-compatible format (QZA or QZV) instead of the biom format. You can do this using the QIIME2 command-line interface or Python API. Here's a general outline of the steps:
1. Install QIIME2: Follow the installation instructions provided by QIIME2 (https://docs.qiime2.org/2021.8/install/) to set up QIIME2 on your system.
2. Convert Kraken2 output to QIIME2 format: Use the `qiime tools import` command to convert the Kraken2 output to QIIME2-compatible format. Here's an example command:
```
qiime tools import \
--input-path kraken-output01.txt \
--output-path kraken-output01.qza \
--type 'FeatureData[Taxonomy]'
```
Repeat this step for each Kraken2 output file to convert them all to the QIIME2 format.
3. Perform diversity analysis: Once you have the QZA files, you can perform diversity analysis using various QIIME2 plugins. For example, you can use the `qiime diversity` plugin to calculate alpha and beta diversity metrics. Refer to the QIIME2 documentation and tutorials for more information on available plugins and analysis options.
Regarding the identification of variable species, Kraken2 provides taxonomic assignments for each sequence read. You can analyze the Kraken2 output directly to identify the taxonomic groups that show differential abundance between your diseased cohort and control group. This can be done using statistical analysis tools such as DESeq2, edgeR, or LEfSe. These tools can help you identify taxonomic groups that are significantly differentially abundant between the two groups.
Alternatively, if you decide to convert your data to OTU or ASV format for downstream analysis, you can use tools like QIIME2, mothur, or DADA2 to perform OTU or ASV clustering. These tools provide options for denoising, quality filtering, chimera removal, and clustering to generate OTU or ASV tables that can be further analyzed for diversity analysis, differential abundance testing, and other downstream analyses.
Hope it helps
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Could someone explain to me why the p-value in the right column of the forest plot is different than the p-value in the test for effect in the subgroup?
I thought that these two p.values should be the same.
Relevant answer
Answer
Now coming to your table p-value in the right column of the forest plot is the p-value for the overall test of the treatment effect across all subgroups. It is calculated by combining the results of the individual studies in the meta-analysis. In this case, the p-value is 0.56, which is not statistically significant.
The p-value for the test for effect in the subgroup is the p-value for the test of the null hypothesis that the treatment effect in the subgroup is equal to zero. It is calculated using only the data from the studies in the subgroup. In this case, the p-value for the test for effect in the subgroup is 0.094035, which is statistically significant.
The two p-values are different because of the heterogeneity between the studies in the meta-analysis. The heterogeneity statistic (0.5) is very high, which indicates that there is a lot of variability in the treatment effects across studies. This variability could be due to a number of factors, such as different study designs, different populations of patients, and different treatment regimens.
When there is heterogeneity in the treatment effects across studies, it is more difficult to detect a significant overall treatment effect. This is because the variability in the treatment effects across studies can mask the true effect of the treatment.
In this case, the p-value for the overall test of the treatment effect is not statistically significant, but the p-value for the test for effect in the subgroup is statistically significant. This suggests that the treatment may be effective in the subgroup, but it is not possible to draw a definitive conclusion without further research.
It is important to note that a statistically significant p-value for the test for effect in a subgroup does not necessarily mean that the treatment is clinically effective in that subgroup. It is possible that the difference in the treatment effect is small or that it is not clinically meaningful.
To determine whether the treatment is clinically effective in a subgroup, it is important to consider the magnitude of the difference in the treatment effect and the clinical implications of that difference
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I read some papers mentioning that they used the HMP reference genome for protein homology search and I've also read about the HUMAnN database elsewhere. I'm wondering what's the difference.
Relevant answer
Answer
The HMP (Human Microbiome Project) database is a resource that focuses on characterizing microbial genomes and metagenomes in human body sites. It provides reference genomes for different microorganisms, aiding in the identification and study of microbial species. For example, HMP offers reference genomes for specific gut bacteria like Bacteroides fragilis, aiding in taxonomic classification and species-level analysis.
On the other hand, the HUMAnN (HMP Unified Metabolic Analysis Network) database and software tool are geared toward functional profiling of microbial communities. It leverages the HMP reference genomes to identify and quantify metabolic pathways and gene families in metagenomic data. For instance, HUMAnN can determine the presence and abundance of pathways like glycolysis or the nitrogen cycle, shedding light on the metabolic activities within a microbial community.
  • asked a question related to Bioinformatics Analysis
Question
4 answers
Hello everyone; I am new to R programming. I want to calculate the firmicutes to Bacteroides ratio from my OTU table. I couldn't find the command and don't know how to do it. Please guide me on this.
I put an example of my OTU table.
Relevant answer
Answer
Thank you for this...
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I know many websites have simple tools like transcription and translation available, but are there any analysis tools that researchers need that either do not exist or are not publicly available? It could be anything from algorithms to visuals. Thanks!
Relevant answer
Answer
Abhijeet Singh Thank you for your response and mentioning my earlier post! My belief is that researchers would know tools that are missing based on the fact that they would run into such problem often during their research. If there is some manual analysis task that researchers can automate, I believe that PeptiCloud can be the perfect platform to develop and make those tools publicly available. (For instance, PeptiCloud has a unique feature that allows users to further alter codon sequence of each amino acid after codon optimization with respect to a specific bacterial strain). With that being said, if you could check out PeptiCloud for yourself and see if anything could be added or improved, that would be greatly appreciated!
  • asked a question related to Bioinformatics Analysis
Question
6 answers
If I have a sequence (genome.fasta). And I want to check the gene located in 400nt -500nt.
What bash script (I have WSL in my windows) I should use or are there any conda packages ?
Thank you in advanced
Relevant answer
Answer
To extract a sequence from a larger genome file based on a specific location, you can use various command-line tools available in Bash. you can achieve this using the samtools and bedtools utilities, which can be installed via conda.
  • asked a question related to Bioinformatics Analysis
Question
3 answers
Is there any server or tools (bioconda, java, etc.) to exclusively annotate membrane protein only (similar to dbCAN for polysaccharides) from a bacterial genome?
Thank you in advanced!
  • asked a question related to Bioinformatics Analysis
Question
3 answers
Hi - I'm currently working with two RNA-Seq studies; one has RNA extracted from whole blood, the other PBMCs. Eventually we want to combine these data and perform some cell-specific deconvolution to look at DEGs.
Are there any recommended methods for batch correcting these data from different sources?
Mari
Relevant answer
Answer
It is better to consider batch as a factor in the design formula. The tximport pipeline proposed by Michael Love himself offers the most useful solution. Please have a look.
  • asked a question related to Bioinformatics Analysis
Question
3 answers
From the link https://gtexportal.org/home/datasets, under V7, I'm trying to do R/Python analyses on the Gene TPM and Transcript TPM files. But in these files (and to open them I had to use Universal Viewer since the files are too large to view with an app like NotePad), I'm seeing a bunch of ID's for samples (i.e. GTEX-1117F-0226-SM-5GZZ7), followed by transcript ID's like ENSG00000223972.4, and then a bunch of numbers like 0.02865 (and they take up like 99% of the large files). Can someone help me decipher what the numbers mean, please? And are the numbers supposed to be assigned to a specific sample ID? (The amount of letters far exceed the amount of samples, btw). I tried opening these files as tables in R but I do not think R is categorizing the contents of the file correctly.
Relevant answer
Answer
GTEX-1117F-0226-SM-5GZZ7 is the sample ID and the ENSG00000223972.4 refers to the gene symbol according to the HUGO gene nomenclature. The numbers you are referring to are gene expression values. TPM (Transcripts Per Million) is a normalization method that has been used to scale these gene expression values so that it is possible to make the expression of genes comparable between samples. 
  • asked a question related to Bioinformatics Analysis
Question
4 answers
I was using fragbuilder module in python to generate peptides of sizes 4, 6, and 10. However, the issue with fragbuilder module is that some of the bond angles are deviating from the standard values. For instance, C_alpha--C--N bond angle standard value is 121 degrees but fragbuilder assigns 111 degrees. This angle deviation causes a deviation in the distance between the nearest neighbor C_alpha---C_alpha and its value is 3.721 angstrom and the typical standard value is 3.8 A. Also another bond angle is a deviation from the standard value by 6 degrees which is the C_alpha---C---N whose value is 111.4 degrees and typical standard values are 117 degrees. My doubt is how much deviation is allowed for MD simulations of peptides (or proteins) while fixing the bond lengths and bonds angles ?
Relevant answer
Answer
Gary James Hunter Thanks for you reply.
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I want to purchase Macbook mainly for the bioinformatics analysis propose i.e., Transcriptomics, smalRNA, Methylation, lncRNA and other. Would anyone please suggest to me the best affordable one?
Relevant answer
Answer
I think a small server is a better choice for processing bioinformatics data analysis as it is cheaper and more convenient. This is because many analyses can take a long time, and MacBook do not have good heat dissipation.
  • asked a question related to Bioinformatics Analysis
Question
4 answers
I have two sequences from the predicted mRNA sequence (only exons, without intron) and gDNA sequence (with intron). Then, I align the sequences to confirm the position of exon in the DNA sequence. after that, I pick the primers from the exon region and check the specificity on Primer Blast. However, I also design primers only from predicted mRNA without considering the exon region on DNA sequence. Which is more appropriate to use in amplifying full-length genes in the DNA template?
Relevant answer
Answer
No, introns are not considered in designing primers for full-length gene regions in the DNA template. Primers are designed to amplify only the exonic regions of the gene.
  • asked a question related to Bioinformatics Analysis
Question
2 answers
When we are in the step of aligning virulent factors against human proteom to exclude those proteins with > 35% homology what is the output that we have to use for the next step of predicting transmembrane helices and molecular weight for chosen proteins?
Relevant answer
Answer
Hi Mr David thank you so much for your answer, I really appreciate your help. My question is that do we use the non-homologus protein sequence aligned related to homosapiens, or do we use the original sequence submitted e.g. (virulent factors) for further steps. And if we use the later (the non-homologus protein sequence related to my specie of interest) then from where can we obtain the output fasta sequence on the blast p platform.
  • asked a question related to Bioinformatics Analysis
Question
1 answer
Hi everyone,
please bear with me, because I am a complete beginner with regard to any form of bioinformatics and I am trying to understand the best approach to my experiment.
I am currently trying to isolate cells and sequence them for further bioinformatic analysis, more precisely RNA-Sequencing.
We have, however, had issues with purity and while some samples we looked at reached a purity of >90% after isolation (we usually validate it by use of flow cytometry), some samples of different animal genotypes did not.
This leads me to my first question:
How important is cell purity for Bulk RNA-Seq?
Which purity should be reached for and adequate, realiable analysis?
If anyone has any recommendations for papers to look into regarding that subject, I would be most grateful, because I have no idea where to start and what to consider.
Further along in the story we surmised that maybe Single Cell RNA Sequencing might be the better option in cases of lower purity.
But again, the same question arose: how relevant is cell purity for the following analysis and is there a cut-off value not to be crossed?
Finally:
How advantegeous would using both methods be?
Sure, Bulk gives a better general overview and Single Cell is more precise, but do they complement each other or is it essentially redundant information gained by doing both experiments?
And are there any disadvantages to using only SC or do both methods completement each other when low purity levels are in the question?
Thank you a lot in advance!!
Relevant answer
Answer
Welcome to RNA-seq! It's a crazy and wild world. You will find that responses will depend a great deal on what you're aiming to achieve. So take my responses with this in mind...it depends!
Get as close to 100% as possible otherwise you'll be having to perform a set of validation experiments to ensure that any interesting findings are due to changes in your cell type of interest and not in a "contaminating" cell type. Single cell RNA-seq may be suitable here since you'll be able to get some cell type resolution and identify the populations in which the change is occurring, of course you'll still need to validate. The strenght of scRNA-seq is that you don't need to purify/enrich your population since these get resolved as part of the procedure/analysis. However, the a drawback with scRNA-seq is that you will loose a lot of low abundant transcripts, "dropout" is also a major issue. So, if you're comfortable loosing some info on potentially valuable transcripts then scRNA-seq may be the way to go. They do potentially complement each other especially because with bulk, you may get data about low expressed transcripts. But a big caveat, it all depends! You may consider identifying and collaborating with someone with expertise in RNA-seq (sample prep and data analysis) at your local institution.
Which papers? It depends. Start with papers that are answering a similar question to yours, then dig into what would be best for you study. You can consider reaching out to representative of companies like 10X Genomics and Miltenyi Biotec...that's also a good starting point. Good luck!
  • asked a question related to Bioinformatics Analysis
Question
9 answers
I have data from the our experimental model - where we analyze the immune response following BCG vaccination, and then the responses and clinical outcome following Mtb infection of our vaccinated models. Because we cannot experimentally follow the very same entity after evaulating the post-vaccination response also for the post vaccination plus post infection studies - we have such data from different batches. Is it possible to do correlation here between post vaccination responses of 5 replicates in one batch (in different vaccine candidates) versus 4-5 replicates in vaccination & infection from another batch? I ask this because we are not following up the same replicates for post vaccination and post infection measurements (as it is not experimentally feasible). If correlation is not the best method, are there other ways to analyze the patterns - such as strength of association between T cell response in BCG vaccinated models versus increased survival of BCG vaccinated models (both measurements are from different batches)? We have several groups like that, with a variety of parameters measured per group in different sets of experiments.
Thanks for your responses and help.
Relevant answer
Answer
To make it a bit simpler:
say you have treatments A and B, and your experiment is done in two batches 1 and 2.
If treatment A is analyzed in batch 1 and B in batch 2, then treatment and batch are perfectky confounded you you have no chance to distangle batch-effects from treatment effects.
If samples with treatment A are measured in batch 1 and 2, and also samples with treatment B are measured in both batches, then one can model the batch effect and reveal the treatment effect.
If you have treatments A+C in batch 1 and treatments B+C in batch 2, you might estimate the batch effect from treatment C and apply it to correct A and B as well (dangerous, if the batch effect also depends on the treatment, but better than nothing).
  • asked a question related to Bioinformatics Analysis
Question
1 answer
I created this R package to allow easy VCF files visual analysis, investigate mutation rates per chromosome, gene, and much more: https://github.com/cccnrc/plot-VCF
The package is divided into 3 main sections, based on analysis target:
  1. variant Manhattan-style plots: visualize all/specific variants in your VCF file. You can plot subgroups based on position, sample, gene and/or exon
  2. chromosome summary plots: visualize plot of variants distribution across (selectable) chromosomes in your VCF file
  3. gene summary plots: visualize plot of variants distribution across (selectable) genes in your VCF file
Take a look at how many different things you can achieve in just one line of code!
It is extremely easy to install and use, well documented on the GitHub page: https://github.com/cccnrc/plot-VCF
I'd love to have your opinion, bugs you might find etc.
Relevant answer
Answer
I use TASSEL software for genome analysis. You need plink format of map and pad to operate it. You can try and explore this software
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Dear all,
I am performing analysis of 16S rRNA amplicon sequencing data. I have tested effectivity of two classifiers on the mock community and blast classifier shows the best result. However, I found out blast is using a local sequencing alignment. So I do not know if it is appropriate to use this classifier to assign a "mystery" sequence to a bacterial taxon. Is it possible that this approach will result to false positive results? Is it better to use Vsearch classifier which showed worse results but is using a global sequencing alignment?
And a bonus question. Should I use rarefied representative sequences to perform a taxonomy classification or not? I use rarefied data for alpha diversity testing (and for beta diversity testing I do not).
Thank you all for answers!
Martin
Relevant answer
Answer
  1. It is not rRNA amplicons but rRNA gene amplicons
  2. You are having amplicons which are probably 300-400 bp long, why do you think global alignment is better in this case?
  3. For rarification, read the following and decide yourself.
  • asked a question related to Bioinformatics Analysis
Question
2 answers
I upload a genome to check using Busco via galaxy server. Currently, it is 2 days and the result is not finished yet?
Did I miss something or is there is a problem?
Thank you in advanced
Relevant answer
Answer
Dear Dr. Ryan Gourlie
In my case it takes 5 days to generate the BUSCO result
  • asked a question related to Bioinformatics Analysis
Question
6 answers
I tried using Phaster.ca and PhiSpy for phage detection in the bacterial genome
They showed a completely different result for regions and the virus identified.
Do you have the same experiences and could you share your suggestions, please?
Thank in advanced!
Relevant answer
Answer
Dear Dr Vikas Sharma
Thank you for the answers!
  • asked a question related to Bioinformatics Analysis
Question
1 answer
Hello all,
I am having an issue with 16S PICRUST data. There is always a warning message post PICRUST run that more than half of the sequences have been removed from further analysis. The reason might be that the ASV fasta files contains mix DNA sequences i.e. both positive and negative strands. PICRUST can only deal with positive sequences hence the output is based on approximately 50% of the sequences of FASTA file. I am really looking for some suggestion (computer programming) on identifying negative sequences from FASTA files based on NCBI BLASTn portal and reverse complementing it. Because this work would be difficult to be performed manually considering 6000 sequences of FASTA files. I have limited knowledge in coding. Any help would be greatly appreciated.
I am running this PICRUST pipeline as mentioned here https://github.com/picrust/picrust2/wiki/Full-pipeline-script. The ASV file has been generated by using raw FASTq files on QIIME2.
Relevant answer
Answer
Hi,
To reverse complement the fasta file, use the seqtk tool.
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Hello, I have some raw files which extension is .d (acquired from Brucker instrument). Which platforms would you recommend to perform the bioinformatic analysis (possibly fee downloadable)? I have experience using MaxQuant but it does not recognize the .d files. Any recommendation? Thank you in advance
Relevant answer
Answer
MaxQuant is able to recognize Bruker data at the time you asked the question.
  • asked a question related to Bioinformatics Analysis
Question
1 answer
Good day! The question is really complex since CRISPR do not have any exact sequence - so the question is the probability of generation of 2 repeat units, each of 23-55 bp and having a short palindromic sequence within and maximum mismatch of 20%, interspersed with a spacer sequence that in 0.6-2.5 of repeat size and that doesn't match to left and right flank of the whole sequence, in a random sequence.
Relevant answer
Answer
First, I'd re-state the question to assure that I understood it correctly. A nucleotide sequence of length l contains a palindrom with unit of length k. The palindrom is not exact; there can be from kmin to k matches between units. The distance between palindrom units can be from smin to smax. First and last sub-sequences of length k are not exact matches of any palindrom unit.
My solution. Let's omit the last condition for now. How we search for a palindrom with unit of length k? Take any subsequence of length k and search for a 'match'. Searching for a 'match' is equal to checking (l-k-smin) subsequences, because the unit itself occupies k nucleotides and a spacer can't be shorter than smin nucleotides. In each window the probability of hit is (1/4)^(kmin), if every nucleotide has equal probability of occurrence. The probability of having 1 or more hits then is equal to binomial cdf with the number of attempts equal to n-k-smin, the probability of success equal to (0.25)^kmin and number of successes equal to 1. For example, GSL function gsl_cdf_binom_Q(n-k-smin,0.25^kmin,0) would give the answer. The last paramerter is zero, because the function computes the probability of more than x successes, i.e. 1 and more in this case.
Now, let's include the last condition. It is important to define what 'does not match' mean. I suppose that it means that we can't find the second palindrom unit at postions 1 and l-k. So, the number of windows that we check has to be decreased by 2. The final answer would be:
F(n-k-smin-2,0.25^kmin,0), where F - binomial cdf.
For varying length the answer would be a weighted sum of those propabilities, with weights equal to the probability of observing given legnth. So, if all lengths have equal probability, this is the mean.
I checked the answer on a synthetic set and it seems it is correct or close to being so.
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Hi,
Could anyone suggest free software or websites used to generate unique DNA barcode sequences [5-10nt] to label XXX genes for library screening?
Thank you in advance
Relevant answer
Answer
HI My dear
http://biorad-ads.com/DNABarcodeWeb/ this URL can you generate BarCode for your sequence , may be simple and more helpful for you .
  • asked a question related to Bioinformatics Analysis
Question
2 answers
I designed two sets of primer to target a gene of interest in gene expression studies (cDNA). The first primer pair (T1) had an amplicon base pair of 200bp when I carried out Insilico PCR and on targeting my gene of interest in PCR it was successful (showed band on gel and sequencing successful). Coming to the second set of primer (T2) I carried out Insilico pcr and the expected amplicons size was 1200bp which was of interest to me (I'm carrying out bioinformatics analysis of the protein seq.), but on targeting the gene in the cDNA on PCR it wasn't successful. I have troubleshoot varying different parameters but no success. Could high number of sequence length be a hindrance in Pcr, and how can i overcome this problem?
Relevant answer
Answer
Dear Joseph Japhet , Amplifying a 1200 bp fragment is not a difficult task. However, your explanation lacks some information. Therefore it is difficult to troubleshoot your experiment. You need to verify what kind of taq polymerase you have used for the PCR. (I have used TaKaRa Xtaq to clone many genes in which length varied from 800bp to 2500bp). on the other hand, you need to give the PCR cycle information. Taq polymerase requires some time read through 1200bp and replicate the gene of interest. Therefore the extension time of the PCR cycle should be higher than 90 seconds (500bp per 30 seconds as a rule). Probably you may run the same PCR program to get both genes amplified. Please check the PCR parameters...
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I have whole genome of a bacteria. Do you know which program to detect virus genome within the bacteria?
Is using annotation (example : prokka) and then looking manually for viral genes/proteins? Or by checking the assembly (example : prokka) and blast the shorter contigs will is enough?
Thank you in advanced
Relevant answer
Answer
Use PHASTER, an online tool for the detection of phage sequences in your genome. it will give you specific files with the regions then you can blast those specific phage sequences in NCBI and can do further analysis.
  • asked a question related to Bioinformatics Analysis
Question
6 answers
hello
Please introduce me the companies that provide biotechnology services such as designing different types of primers, NGS, RNASeq, etc.
Relevant answer
Answer
Following companies have offices in Tehran
Sanofil - Roche - Novo NorDisk - Novartis - Bayer - Johnson & Johnson - Merck KGaA - Zoetis - TCI - BioHorizons Implant Systems. In US many are located in Philadelphia, Pennsylvania and Boston, Mine and San Fransisco, California. - But Iran had made working with US companies nearly impossible. For ease of working outside Iran I'd look to China. In general use caution sharing new ideas with companies you have not worked with in past..
  • asked a question related to Bioinformatics Analysis
Question
6 answers
-RNA seq and bioinformatics were carried out by professionals.
- Gene in question shows ~700 fold differential regulation by qPCR in multiple independent cohort of experiments - not in RNA seq.
Please advise....
Relevant answer
Answer
The large fold-change indicates that the gene is likely not expressed to a high level. Under control conditions the experession can be almost zero, and a slightly larger expression under treatment conditions will result in a very large fold-change.
Low-expressed genes give only low or no counts in RNA seq. It might be that genes with no or very few counts are filtered out from the analysis, because the counts are not reliable. If the gene is not at all detected under control conditions (0 counts in all control samples), it is not possible to calculate any (finite) fold-change at all.
  • asked a question related to Bioinformatics Analysis
Question
5 answers
Like reactome or string
Relevant answer
Answer
Enrich R, webgetsalt, webGivi are very goof
  • asked a question related to Bioinformatics Analysis
Question
4 answers
thanks.
Relevant answer
Answer
Sure Azka Saleem
Steps u can consider following.
1. Download the SGD data as a fasta file/if separated as many files, u can merge all into single fasta file.
2. Make a list of IDs in MS excel whose sequences you need to extract.
3. Install TB tool in your dekstop, and search for fasta extract option, where u need to provide the database fasta file and the IDs and click ok.
4. In the output file you will have your required IDs sequence data.
If you still find it difficult to perform, u can mail me in nkrdas2@gmail.com
  • asked a question related to Bioinformatics Analysis
Question
4 answers
I have some lists of gene IDs from multi species, I want to have their compiled FASTA format files for each species. it looks tedious to copy each accession and collect FASTA seqs.
Batch Entrez is giving me error, may be because the identifier is related to other database.
Relevant answer
Answer
in my case for 1st gene list having TAIR identifiers i got their FASTA seq file from TAIR < download < bulk data retrieval < sequences.
  • asked a question related to Bioinformatics Analysis
Question
5 answers
I'm in the initial stages of planning a miRNA seq experiment using human cultured cells and decided on TRIzol extraction, Truseq small RNA prep kit, using an illumina HiSeq2500. The illumina webinar suggests 10-20 Million reads for discovery, the QandA support page suggests 2-5M, and I wrote the tech support to ask, who suggested I do up to 100M reads for rare transcripts. Exiqon guide to miRNA discovery manual says there is not really any benefit on going over 5M reads. I was hoping to save money by pooling more samples in a lane, so I was hoping someone with experience might be able to suggest a suitable number of reads.
Relevant answer
Answer
i am working on cardiomyopathy patients Blood samples . and wanted to do miRNA sequencing can some one please suggest how many millions reads i need to sequence 20 millions or 30 millions and also please suggest the platform as well .
  • asked a question related to Bioinformatics Analysis
Question
5 answers
Hi, I was hoping someone could recommend papers that discuss the impact of using averaged data in random forest analyses or in making regression models with large data sets for ecology.
For example, if I had 4,000 samples each from 40 sites and did a random forest analysis (looking at predictors of SOC, for example) using environmental metadata, how would that compare with doing a random forest of the averaged sample values from the 40 sites (so 40 rows of averaged data vs. 4,000 raw data points)?
I ask this because a lot of the 4,000 samples have missing sample-specific environmental data in the first place, but there are other samples within the same site that do have that data available.
I'm just a little confused on 1.) the appropriateness of interpolating average values based on missingness (best practices/warnings), 2.) the drawbacks of using smaller, averaged sample sizes to deal with missingness vs. using incomplete data sets vs. using significantly smaller sample sizes from only "complete" data, and 3.) the geospatial rules for linking environmental data with samples? (if 50% of plots in a site have soil texture data, and 50% of plots don't, yet they're all within the same site/area, what would be the best route for analysis?) (it could depend on variable, but I have ~50 soil chemical/physical variables?)
Thank you for any advice or paper or tutorial recommendations.
Relevant answer
Answer
Thank you!
  • asked a question related to Bioinformatics Analysis
Question
4 answers
I am computing Van der Waal interactions in python for a peptide of size 10 residues for various conformations. The total conformations (or the number of PDB files is 300,000). Is it possible to compute only the 1-4 atom distances to compute Van der Waals interactions as the bonded and 1-3 atom distances are irrelevant when it comes to Van der Waal interactions using some python module?
Relevant answer
Answer
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Hii, Is there a way I can extract the alternative spliced protein isoform structures from PDB? Also can we mapped the structure to uniprot sequence So we can know which structure belong to which isoform sequence?
Relevant answer
Answer
Unfortunately, most of the databases contain 3D structures only for canonical isoforms. But, you would try Google Colab, a phyton-based online notebook running AlphaFold2, which can predict the structure of any custom sequence or noncanonical isoform. Check this out https://youtu.be/le7NatFo8vI
  • asked a question related to Bioinformatics Analysis
Question
1 answer
After finishing the simulation of the cyclic peptide, I tried to find the most populated structure using the cluster peak density algorithm. from the literature, the representative structure was chosen as the structure with maximal ρsum (The summation of local densities of all residues in one structure, ρ𝑠𝑢𝑚 = ∑ ρ𝑖𝑛_𝑟𝑒𝑠𝑖=1) so how can I extract the structure which has the highest density for the all residue?
ref: Clustering by Fast Search and Find of Density Peaks. Science 2014, 344, 1492–1496
Relevant answer
Answer
Dear Sam Mohel ,
luster analysis is an exploratory analysis that tries to identify structures within the data.  Cluster analysis is also called segmentation analysis or taxonomy analysis.  More specifically, it tries to identify homogenous groups of cases if the grouping is not previously known.  Because it is exploratory, it does not make any distinction between dependent and independent variables.  The different cluster analysis methods that SPSS offers can handle binary, nominal, ordinal, and scale (interval or ratio) data.
Regards,
Shafagat
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Hello! I'm new to bioinformatics and cancer databases. I was exploring cbioportal and analyzing coexpression of different genes through scatter plots. I noticed that the axis are labeled as " RSEM (Batch normalized from Illumina HiSeq_RNASeqV2)" (I attached an example so you can see). I know that RSEM is a transcript quantification software but what does "Batch normalized" mean? does it give upper quartile normalization? FPKM? or what?.
thanks in advance!
Relevant answer
Answer
It's upper quantile normalisation. See https://www.biostars.org/p/106127/.
Here is a paper comparing normalisation methods I personally find informative.
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Suggestions of online databases/tools I can use to verify candidate genes
Relevant answer
Answer
I want to verify a list of genes, find them related to a disease I am researching on Blaise Manga Enuh
  • asked a question related to Bioinformatics Analysis
Question
16 answers
Apple's M1 mac is there in the market since 2020, but its application and compatability with bioinformatics analysis tools is scarcely discussed. For example, is it possible to index a human genome on M1 mac air (16gb), if yes, how much time it takes? Is it possible to Map reads to the reference genome? if yes, how much time it takes? Any headsup about the Conda experience?
Please share your thoughts and experiences... it can be of a great help...
Relevant answer
Answer
Yes, it is definitely possible. Bowtie has a very low memory requirement relative to other aligners. 16GB of RAM should be plenty. I've done it on a much older MAC with only 4GB and it completed successfully (though I had to run it for ~48 hours straight, so I highly recommend avoiding this particular set up). How much time it takes depends a lot on your parameters, background CPU usage, etc.... you should perform some benchmarking to figure that out.
As a more general answer to your question, 16GB of RAM will place you solidly in the "maybe" camp when answering the question "Does it have enough power to do <insert computational tool here>". It is unlikely to be enough to use STAR, more than enough to use Bowtie, way too little to use memory heavy software like Cell Ranger. In general, check the manual of any tool you'd like to use and see what the recommended and required parameters are, and that should tell you whether your set up will work or not.
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Hi,
GO and KEGG functional analysis for a gene set was using the DAVID database (https://david.ncifcrf.gov/). However, the adjusted p-values (Bonferroni and Benjamini) of the enriched GO terms and KEGG pathways were more than 0.5. Meanwhile, a PPI network was constructed using the STRING database (https://string-db.org). The network was constructed with a confidence score of  0.4 was set as the cutoff criterion with no more than ten as the maximum number of interactions in the first shell. This step added a few more genes to the gene list, and genes with no interactions were removed. When the updated gene list was used for GO and KEGG functional analysis, the enriched GO terms and KEGG pathways were now significant (p-value < 0.05). Is the attempted workflow valid?
Relevant answer
Answer
Thank You, Dr Giovanni Colonna, for taking your time in answering the question. I concur with your explanation.
  • asked a question related to Bioinformatics Analysis
Question
5 answers
Dear all, I am trying to use CD-hit to remove the duplicates from the file that is the output from trinity (RNA seq assembly).
I used the following parameters:
cd-hit-est -i in.fasta -o out_cdhit90.fasta -c 0.90 -n 9 -d 0 -M 0 -T 0
But the output file still contains lots of small or fragmented sequence plus the best one. How can I remove those small or fragmented duplicates by changing the parameters?
thanks
ZQ
Relevant answer
Answer
Hello, do you know any tool DIFFERENT from CD-hit to filter CDS unigenes.?
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Hi, I want to predict post-transitional modification for phosphorylation. I found lots of websites like Phosida, PhosphoSite Plus. I am just curious about is there any python code for this phosphorylation prediction. If you have, could you share the GitHub link?
Relevant answer
Answer
Shaban Ahmad thank you
  • asked a question related to Bioinformatics Analysis
Question
1 answer
Dear All,
I did a q PCR analysis to one micro RNA and it was upregulated in tumour tissues compared to normal ones. Then I applied a bioinformatic analysis to detect the target genes and the genes showed the most important targets for the microRNA were oncogenes (based on other studies).
I didn't do any further study on the target genes and I need to keep the bioinformatic analysis only. How can I discuss the results? Is there is any way I can discuss these results knowing that it will be only an in-silico study?
Many thanks
Relevant answer
Answer
You have analyzed a miRNA expression in normal and tumor tissue and found that miRNA expression was upregulated. Then you have used any target prediction tool and recognized that miRNA target. The targets are oncogenes as per literature. With this much analysis, you want to write results.
1. Based on the other contents of your manuscript & you yourself can decide whether this finding will be enough or not.
2. How you have done the qPCR analysis that will be important. What is your normal (considered it control) sample? How you have isolated miRNA? Have you used miRNA-specific cDNA syntheses for qPCR? What was your endogenous control? if these parts are fine then let's see other parts.
3. In a very simple manner, it can be said that if miRNA is upregulated and the target gene is an oncogene, then miRNA can target oncogene & we can call miRNA has an anti-cancer role. BUT, even if you analyzed via a bioinformatic tool (I'm sure that it will be a target prediction tool like targetscan or miRDB, etc.), you have to show target is getting targeted via basic experiments like PCR gene expression of target in normal & tumor tissue, western blot or little advance 3'UTR reporter assay. Otherwise, the target which is available is database/bioinformatic tools, they are just predicted. Without experimental validation, it becomes difficult to present.
4. Your question heading " Can upregulated micro RNAs upregulate the target genes ? Anser is yes it is possible. In our paper, we have found that upon overexpression of miRNA, several genes upregulated. This happens possibly because miRNA targets certain genes which get inhibited but upon suppression of those genes, trigger the upregulation of other genes. Gene functions in networks ways. you can check our paper.
A few more things are there which can be discussed. Will wait for your reply.
Regards
Saurabh
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Hi there,
I want to analyze deferentially gene expression of mice before and after treatment.
I have 6 mice and "paired-end" sequences, so how I could merge all my "before treatment" data to compare them with all "after treatment" data via DESeq2 ?!
Should I map/count/DESeq2 them separately?
Is there any way to combine (normalize) all replicates at first and then perform analysis like what we do generally in statistics?!
Thank you in advanced.
  • asked a question related to Bioinformatics Analysis
Question
5 answers
I have two vcf files corresponding to the results of healthy tissue and tumor tissue. I want to compare these vcf files and remove their similarities. More specific I want to remove the information of the healthy tissue from the tumor one. Have you any suggestions on which tool I should use or any way that I can do my analysis?
Thanks in advance.
  • asked a question related to Bioinformatics Analysis
Question
13 answers
EDIT: Please see below for the edited version of this question first (02.04.22)
Hi,
I am searching for a reliable normalization method. I have two chip-seq datas to be compared with t-test but the rpkm values are biased. So I need to fix this before the t-test. For instance, when a value is high, it doesn't mean it is high in reality. There can be another factor to see this value is high. In reality, I should see a value closer to mean. Likewise, if a value is low and the factor is strong, we can say that's the reason why we see the low value. We should have seen value much closer to the mean. In brief, what I want is to eliminate the effect of this factor.
In line with this purpose, I have another data showing how strong this factor is for each value in the chip-seq datas (with again RPKM values). Should I simply divide my rpkm values by the corresponding RPKM to get unbiased data? Or is it better to divide rpkm values by the ratio of RPKM/ Mean(RPKMs) ?
Do you have any other suggestions? How should I eliminate the factor?
Relevant answer
Answer
Actually, the log transformation in the figure I attached was done according to the formula: log((#1+1)/(#2+1)). Just later, I thought that I added "1" to my values to be able to carry out log transformation (not to eliminate zero values). So I considered that maybe, it would be more correct to add "1" to adjusted values just before the transformation.
Thanks again :) Jochen Wilhelm
  • asked a question related to Bioinformatics Analysis
Question
3 answers
What exactly is the role of HSP-90 in extracellular environment of the cell? I am wondering whether hsp90 is involved in the translocation of the client protein from outside to inside of the cell. If somebody is having some references please share with me. I am very curious about this molecule.
Relevant answer
Answer
My article about that just got accepted you can see it soon, including tests before, immediately and 2 hours after exercise
  • asked a question related to Bioinformatics Analysis
Question
4 answers
I have two different ChIP-seq data for different proteins, I have aligned them to some fragments in the DNA. Some of these fragments get zero read count for one of them or for both. To be able to say these fragments has protein X much more than the protein Y, I use student's t-test.
I wonder if It would be better to remove the zero values from both of the data showing rpkm values for each fragment. Moreover, they pose problem when I want to use log during data visualization part.
What would you suggest?
Relevant answer
Answer
Thank u so much for both your answer and suggestion David Eugene Booth
  • asked a question related to Bioinformatics Analysis
Question
11 answers
I have not much experience in bioinformatics and I need to find what are the common genes in several gene expression datasets, in other words, I need to find genes that match in all (or some) of my datasets. I am looking for some kind of tool that give me Venn diagrams with the coincident genes. Any suggestion (free software plese) will be very appreciated.
Relevant answer
Answer
For a List of Venn diagram tools, their features, and references, you may check out the link below.
  • asked a question related to Bioinformatics Analysis
Question
3 answers
"The development and validation of a medium density SNP genotyping assay in Shrimp" is a research proposal I'm currently working on. Given the restricted budget allotted (9,600 USD) to the project, I'd like to know ahead of time how much it might probably cost me.
Relevant answer
Answer
sorry
outside my area of expertise
  • asked a question related to Bioinformatics Analysis
Question
4 answers
I have a 1489 spike protein sequence file. I want to extract codon sequences, of 6 amino acids from this with their respective header. I don't know any sort of programming, so can anyone help me with this?
A big thank you in advance.......
Relevant answer
Answer
  • asked a question related to Bioinformatics Analysis
Question
4 answers
Hello. I am trying to run a haplotype analysis in PopArt. It's going well until I realized I can not load a previous work in PopArt. I can only export the graphical output as .svg, .png, or .pdf but not as a "network" file which I can reload or edit if I want to in the future. I noticed that it can be saved as a .nex file and the new file actually had additional lines (the portion of the code started with: "Begin NETWORK"). I think this is supposed to be read by PopArt but it fails to do so. I encounter parsing errors when I try to run the new file. I am not sure if there is a way around this as I am new to the software. Any help would be appreciated. Stay safe, anon!
Relevant answer
Answer
Great question, thanks for asking.
  • asked a question related to Bioinformatics Analysis
Question
4 answers
Hello,
I am new in this field. I am doing metagenome analysis with shotgun reads. All reads are single ended. DNA was obtained from airways of human. I just want to find taxon abundances in the samples. Then I will predict the diversities and core microbes.
My mapping results are terrible. How can I handle bad mappings?? OR should I change the tools that I used the analysis?? Which tools are more accurate or sensitive for microbiome analysis?? I need any suggestions, please!
I followed this pipeline:
  1. Assembly was done using Megahit
  2. Short contigs (<200 bps) were removed using prinseq
  3. Read mapping against contigs was performed using BWA
  4. Similarity searches for GenBank, KEGG, , eggNOG were done using Diamond
  5. Binning was done using MaxBin2
You can find my mapping results in the attachment.
Relevant answer
Answer
Dymphan Gonsalves Thank you very much, your answer is very helpful.
  • asked a question related to Bioinformatics Analysis
Question
13 answers
In the R programming language, I'm going to install the MetaDE package. Nonetheless, I get a warning that package 'MetaDE' is not available for this version of R, A version of this package for your version of R might be available elsewhere. How can I overcome this issue while I'm using R version 4.1.0?
Relevant answer
Answer
Start with R 3.6.3 and then use devtools to get acquire metaDE.
  • asked a question related to Bioinformatics Analysis
Question
7 answers
I used WebMGA to cluster my NGS data (COG). I have problem on analyzing the data provided in output.zip since the format file is unknown, in this case do I need some specific software to open each of those files? 
Relevant answer
Hi!
I wonder if someone knows how reliable WebMGA is. I would like to know your opinion
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Hi everyone,
Can anybody help in analyzing a density profile graph generated by a simulation run on GROMACS? I have attached the file for your reference.
Need an elaborate explanation as I am new to this. Kindly also suggest me any research articles related to this topic.
Thank you so much in advance!!
Good day
Regards
Renu
Relevant answer
Answer
Thank you so much. I will check them out.
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Including these steps: 1) raw data format transformation for five companies 2) update positions for all SNPs to hg37 version 3) Quality control within companies 4) Pre-phasing (SHAPEIT2) and imputation (IMPUTE2) for all SNPs of each company 5) Perform GWAS using two logistic models for 27 phenotypes 6) Statistic and downstream bioinformatic analysis. 7) Estimation of genetic parameters (rg and hg). 8) PRS analysis. However. the size of my dataset only consist more than 1000 people. With no background knowledge, how long would this take as a bioinformatics master student?
Relevant answer
Answer
more than 1000? please tell the exact number of samples and size of the data?
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I am writing to ask for more information about bioinformatics ideas for a humanized antibody. A humanized antibody has been causing me to wonder what kind of bioinformatician analysis I can do. Although I can use Docking and Molecular Dynamics to evaluate this antibody, I am looking for other ways to analyze it in structural bioinformatics. Please suggest how I can conduct a bioinformatics analysis of this antibody. Any relevant article to refer to would be greatly appreciated.
Relevant answer
Answer
If you want to perform structural analysis, I suggest you take a look at Quantum mechanical methods
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I have taken references from various sources to write a code but I am not getting the proper dataset read in the R studio.
Relevant answer
Answer
If you have the GEO accession, then you may retrieve using getGEO function in R programming.
install.packages("GEOquery")
library(GEOquery)
data<- getGEO("accessionhere", GSEMatrix =TRUE, getGPL=FALSE)
let me know if you need any further help
  • asked a question related to Bioinformatics Analysis
Question
31 answers
Applications of bioinformatics in medicine is a key factor in technological advancement in the field of modern medical technologies.
In which areas of medical technology are the technological achievements of bioinformatics used?
What are the applications of bioinformatics in medicine?
Please reply
I invite you to the discussion
Thank you very much
Best wishes
Relevant answer
Answer
Please have look on our(Eminent Biosciences (EMBS)) collaborations.. and let me know if interested to associate with us
Our recent publications In collaborations with industries and academia in India and world wide.
Our Lab EMBS's Publication In collaboration with Universidad Tecnológica Metropolitana, Santiago, Chile. Publication Link: https://pubmed.ncbi.nlm.nih.gov/33397265/
Our Lab EMBS's Publication In collaboration with Moscow State University , Russia. Publication Link: https://pubmed.ncbi.nlm.nih.gov/32967475/
Our Lab EMBS's Publication In collaboration with Icahn Institute of Genomics and Multiscale Biology,, Mount Sinai Health System, Manhattan, NY, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29199918
Our Lab EMBS's Publication In collaboration with University of Missouri, St. Louis, MO, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30457050
Our Lab EMBS's Publication In collaboration with Virginia Commonwealth University, Richmond, Virginia, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852211
Our Lab EMBS's Publication In collaboration with ICMR- NIN(National Institute of Nutrition), Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/23030611
Our Lab EMBS's Publication In collaboration with University of Minnesota Duluth, Duluth MN 55811 USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852211
Our Lab EMBS's Publication In collaboration with University of Yaounde I, PO Box 812, Yaoundé, Cameroon. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30950335
Our Lab EMBS's Publication In collaboration with Federal University of Paraíba, João Pessoa, PB, Brazil. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30693065
Our Lab EMBS's Publication In collaboration with collaboration with University of Yaoundé I, Yaoundé, Cameroon. Publication Link: https://pubmed.ncbi.nlm.nih.gov/31210847/
Our Lab EMBS's Publication In collaboration with University of the Basque Country UPV/EHU, 48080, Leioa, Spain. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852204
Our Lab EMBS's Publication In collaboration with King Saud University, Riyadh, Saudi Arabia. Publication Link: http://www.eurekaselect.com/135585
Our Lab EMBS's Publication In collaboration with NIPER , Hyderabad, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29053759
Our Lab EMBS's Publication In collaboration with Alagappa University, Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30950335
Our Lab EMBS's Publication In collaboration with Jawaharlal Nehru Technological University, Hyderabad , India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/28472910
Our Lab EMBS's Publication In collaboration with C.S.I.R – CRISAT, Karaikudi, Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237676
Our Lab EMBS's Publication In collaboration with Karpagam academy of higher education, Eachinary, Coimbatore , Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237672
Our Lab EMBS's Publication In collaboration with Ballets Olaeta Kalea, 4, 48014 Bilbao, Bizkaia, Spain. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29199918
Our Lab EMBS's Publication In collaboration with Hospital for Genetic Diseases, Osmania University, Hyderabad - 500 016, Telangana, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/28472910
Our Lab EMBS's Publication In collaboration with School of Ocean Science and Technology, Kerala University of Fisheries and Ocean Studies, Panangad-682 506, Cochin, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27964704
Our Lab EMBS's Publication In collaboration with CODEWEL Nireekshana-ACET, Hyderabad, Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/26770024
Our Lab EMBS's Publication In collaboration with Bharathiyar University, Coimbatore-641046, Tamilnadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27919211
Our Lab EMBS's Publication In collaboration with LPU University, Phagwara, Punjab, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/31030499
Our Lab EMBS's Publication In collaboration with Department of Bioinformatics, Kerala University, Kerala. Publication Link: http://www.eurekaselect.com/135585
Our Lab EMBS's Publication In collaboration with Gandhi Medical College and Osmania Medical College, Hyderabad 500 038, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27450915
Our Lab EMBS's Publication In collaboration with National College (Affiliated to Bharathidasan University), Tiruchirapalli, 620 001 Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27266485
Our Lab EMBS's Publication In collaboration with University of Calicut - 673635, Kerala, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/23030611
Our Lab EMBS's Publication In collaboration with NIPER, Hyderabad, India. ) Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29053759
Our Lab EMBS's Publication In collaboration with King George's Medical University, (Erstwhile C.S.M. Medical University), Lucknow-226 003, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25579575
Our Lab EMBS's Publication In collaboration with School of Chemical & Biotechnology, SASTRA University, Thanjavur, India Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25579569
Our Lab EMBS's Publication In collaboration with Safi center for scientific research, Malappuram, Kerala, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237672
Our Lab EMBS's Publication In collaboration with Dept of Genetics, Osmania University, Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25248957
Our Lab EMBS's Publication In collaboration with Institute of Genetics and Hospital for Genetic Diseases, Osmania University, Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/26229292
Sincerely,
Dr. Anuraj Nayarisseri
Principal Scientist & Director,
Eminent Biosciences.
Mob :+91 97522 95342
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Hello,
I am looking to obtain global RNA-Seq data for either E. coli or P. putida. I assume RNA-seq data is publicly available for many microbes, but I am unsure where I can access this information. Does anyone have insight as to what website or database I can find this data?
Many thanks,
Shawn
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Hi, I am a beginner in the field of cancer genomics. I am reading gene expression profiling papers in which researchers classify the cancer samples into two groups based on expression of group of genes. for example "High group" "Low group" and do survival analysis, then they associate these groups with other molecular and clinical parameters for example serum B2M levels, serum creatinine levels for 17p del, trisomy of 3. Some researchers classify the cancer samples into 10 groups. Now if I am proposing a cancer classification schemes and presenting a survival model based on 2 groups or 10 groups, How should I assess the predictive power of my proposed classification model and simultaneously how do i compare predictive power of mine with other survival models? Thanks you in advance.
Relevant answer
Answer
The survAUC R package provides a number of ways to compare models link: https://stats.stackexchange.com/questions/181634/how-to-compare-predictive-power-of-survival-models
  • asked a question related to Bioinformatics Analysis
Question
4 answers
Hi all,
I would like to now if you have any information related to this issues, more precisely companies who could provide services for
1. genome sequencing and assembly ;
2. whole methylation sequencing  for 20 samples including bioinformatics analysis
Thanks
Relevant answer
Answer
You can always reach out to Novogene for your sequencing projects. We have worked with them over the past few years and your get excellent service at great prices.
For the data-analysis, feel free to reach out to BISC Global (www.biscglobal.com), a bioinformatics, statistics and machine learning services company with teams in Europe and the US. We have a lot of expertise in epigenomics and genomics data analysis.
  • asked a question related to Bioinformatics Analysis
Question
5 answers
What is the script to do the quantile normalization to do a microarray dataset (GSE70970), by using limma? do i need to create model matrix first before proceeding to normalize it? i'm very new to R
Relevant answer
Answer
Yes first you need to create a numeric matrix and store it in A.
then try normalize
normalizeQuantiles(A, ties=TRUE)
ties = T will ties every column of your matrix A and the values will be normalized to the mean of the corresponding pooled quantiles.
Have fun and Happy Research!
  • asked a question related to Bioinformatics Analysis
Question
7 answers
I am wondering if these low levels of total RNA the samples are enough for RNA-seq. Does anyone already did it or has any suggestions to get a reliable data for bioinformatic analysis?
Relevant answer
Answer
  1. You are talking about total RNA and not the depleted RNA that is in itself tells that sample will not be a good choice to go for RNA-Seq
  2. Even with that amount of mRNA, sequencing is always doubtful
  3. Better we not talk about that data analysis, because if rubbish goes in rubbish comes out.
I would not go for RNA-Seq with these samples unless I have a huge amount of funding which I just want to step no matter what...
  • asked a question related to Bioinformatics Analysis
Question
12 answers
I've tried to dock an enzyme (523 residues) with its amino acid substrate, but no docking server can recognize a single amino acid as a ligand. What can I do for docking those molecules?
Relevant answer
Answer
as i remember you can use haddock from online server or if you have biovia you can use zdock and rdock
  • asked a question related to Bioinformatics Analysis
Question
1 answer
I have whole-genome sequences of a fosmid DNA. I will do the bioinformatics analysis, and my main aim is to identify the sequences of my insert.
Could you recommend a cloud-based/desktop-based (preferably Windows OS) tool for whole-genome sequences analysis of fosmid DNA?
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I have some files in bed and bedgraph format to analyze with IGV. My team and I tried to upload them on IGV following the IGV site's tutorias but it hasn't worked. The bedgraph files are large (5157) and we converted them to the bynary .tdf format using the IGVTools "Count" command but it hasn't worked. Only with some files we can see a single flat line on IGV screen without any information. With FilexT we can see that the files in bed and bedgraph are not damaged.
We think that the problem is the step when we select the option "Load from File" on IGV. How can we do? What can we do?
We use the IGV_2.10.3
Relevant answer
Answer
Look the link, maybe useful.
Regards,
Shafagat
  • asked a question related to Bioinformatics Analysis
Question
1 answer
DNA barcoding is used to obtain taxonomic information about unidentified organisms. Apart from that what other types of Bioinformatics analysis might be performed with the DNA barcode data? What are the Bioinformatics Resources for DNA barcoding data analysis?
Relevant answer
Answer
If you mean bioinformatics resources /databases
NCBI
EMBL-EBI
DDBJ
BOLD (FOR, COI barcodes)
  • asked a question related to Bioinformatics Analysis
Question
6 answers
I have been asked to check the gene expression patterns of the cells for a RNA seq data after performing principal component analysis plot using MATLAB. I have a CSV file that has the principal component values stored, but I am not sure how to perform differential expression analysis using the PC values. Any MATLAB function available? Kindly help me. Thanks in advance.
Relevant answer
Answer
I am preparing E-Readiness Index for farmers, extension personnel and agricultural scientist separately to measure the degree of an individual to utilize tools of ICT in agriculture. I have selected sub-groups as well as indicators for the same but i am stuck how to obtain e-readiness score ? As per my reading of literature i realized that PCA or Factor Analysis provides relevancy and accuracy to indicators but my confusion is when to apply PCA ? On which data - data obtained from pre-testing data or the final collected data? Or without any data collected ? Please guide.
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I work with spruce which means that we don't have high numbers of clonal replicates. In a RNA-seq experiment we had one clone with six individuals and five clones with two individual. For the one clone there are three control and three treated individuals. For the other clones there is only one replicate of each. I am trying to find a way to analysis this data. Is it possible to use the clone that has replicates as the reference and compare the other clones to it? Is there a test that can be used to see if the transcript counts of the clone with no replicates falls with in the 95% CI of the clone with replicates? I know there are some publication about single subject transcriptomics in medicine where they are trying to develop methods for personalized medicine when only one individual is sequenced.
Relevant answer
Answer
Thanks for you suggestion.
Best,
Melissa
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I am using DAVID (https://david.ncifcrf.gov/home.jsp) to cluster some genes I found upregulated in my RNAseq data. I am just using the official gene symbol without any quantitative data. However, the KEGG pathway results are giving me p-values which are extremely high. It does not make any sense to me. How the p-value can be calculated without any number? Can the p-value be significant?
Relevant answer
Answer
DAVID adopts Fisher's Exact test to measure the gene-enrichment in annotation terms. It is just a matter of making a 2x2 contingency table, as in the example here: https://david.ncifcrf.gov/content.jsp?file=functional_annotation.html (section 2.2).
---------------------------------------
A Hypothetical Example In the human genome background (30,000 genes total; Population Total (PT)), 40 genes are involved in the p53 signaling pathway (Population Hits (PH)). A given gene list has found that three genes (List Hits (LH)) out of 300 total genes in the list (List Total (LT)) belong to the p53 signaling pathway. Then we ask if 3/300 is more than a random chance compared to the human background of 40/30000. A 2 x 2 contingency table is built based on the above numbers: List Hits (LH) = 3 List Total (LT) = 300 Population Hits (PH) = 40 Population Total (PT) = 30,000
Exact p-value = 0.007. Since p-value < 0.05, this user's gene list is specifically associated (enriched) in the p53 signaling pathway by more than random chance.
---------------------------------------
Hence, quantitative data are not considered in such enrichment analyses unless you don't want to calculate an additional activation/inhibition score, as computed by Ingenuity Pathway Analysis, for example.
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I am trying to run the pamlX for CODEML but somehow not able to get start the run option. After loading all three files that are .ctl, .phy, and .tree, the pamlX program stands still and the RUN option do not works. Please assist me how I can start the RUN option compelte the analysis.
My interest is to identify lineages with accelerated evolution and test diverse branch models on CODEML, considering one to several ω ratios. If at all this analysis is possible in any other program kindly please suggest that too.
I have provided the test files in the attachment.
Relevant answer